[R] Overlaying two png?

2010-10-15 Thread steven mosher
I have a program that creates a Png file using Rgooglemap with an extent
(lonmin,lonmax,latmin,latmax)
I also have a contour plot of the same location, same extent, same sized
(height/width) png file.

I'm looking for a way to make the contour semi transparent and overlay it on
the google map ( hybrid map)

Since I have 7000 of these to do an automated process is desired ( grin)

Any pointers in the right direction ?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Beginner question on bar plot

2010-10-15 Thread steven mosher
I've read a number of examples on doing a multiple bar plot, but cant seem
to grasp
how they work or how to get my data into the proper form.

I have two  variable holding the same factor

The variables were created using a cut command, The following simulates that

A - 1:100
B - 1:100
 A[30:60] - 43
 Acut - cut(A,breaks=c(0,10,45,120),labels=c(low,med,high))
 Bcut - cut(B,breaks=c(0,10,45,120),labels=c(low,med,high))

What I want to do is create a barplot with  3 groups of side by side bars

group 1, = low and the two bars would be the count for Acut, and the count
for Bcut
group 2 = med and the two bars again would be the counts for  this factor
level in Acut and Bcut
group 3 = high  and like the above two.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading in a tab delimitated file

2010-10-27 Thread steven mosher
if your data for the rest of the file looks like this then  read.fwf will
work.
depending which vars you want to pull)

widths= c(18,32,41)

E-CBIL-28-raw-cel-1435145228.cel1

would pull 3 vars, E-CBIL-28-raw-cel-; 1435145228.cel;1

widths -c(32,41)
E-CBIL-28-raw-cel-1435145228.cel;1


you can set it differently, assign colnames and column classes as well

But the feilds must be fixed width.





On Tue, Oct 26, 2010 at 5:35 AM, amindlessbrain
jillianrowe91...@gmail.comwrote:


 Hi all,

 I have a total newbie question, but I could really use some help.

 I need to read in this file:

 SampleIDDisease
 E-CBIL-28-raw-cel-1435145228.cel1
 E-CBIL-28-raw-cel-1435145451.cel2
 E-CBIL-28-raw-cel-1435145479.cel2
 E-CBIL-28-raw-cel-1435145132.cel3
 E-CBIL-28-raw-cel-1435145417.cel3
 E-CBIL-28-raw-cel-1435145301.cel2
 E-CBIL-28-raw-cel-1435145558.cel1
 E-CBIL-28-raw-cel-1435145073.cel3
 E-CBIL-28-raw-cel-1435145196.cel2
 E-CBIL-28-raw-cel-1435145511.cel1
 E-CBIL-28-raw-cel-1435145336.cel3
 E-CBIL-28-raw-cel-1435145260.cel2
 E-CBIL-28-raw-cel-1435145167.cel2
 E-CBIL-28-raw-cel-1435145387.cel3
 E-CBIL-28-raw-cel-1435145099.cel3

 (I'm not sure why the disease column isn't showing up as a tab here, but it
 is sep by \t in my file.

 I've tried several variations on these:

 pd - read.AnnotatedDataFrame (new_treat.txt , header = TRUE ,  sep=\t,
 row.names = SampleID, colClasses = c(Disease = character))

 And I keep on getting this error:

 Error in read.table(filename, sep = sep, header = header, quote = quote,  :
  more columns than column names

 Any help would be very very very appreciated!

 Thanks!



 --
 View this message in context:
 http://r.789695.n4.nabble.com/Reading-in-a-tab-delimitated-file-tp3013620p3013620.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best IDE for R

2010-10-27 Thread steven mosher
Thanks for the pointer,

After looking at the many folders of R code I have I decided it was time to
start working in an IDE and also getting my stuff under version control (
for my own sanity)

I'll have a look at Geany.. for version control.. not so sure.

On Wed, Oct 27, 2010 at 12:14 PM, Liviu Andronic landronim...@gmail.comwrote:

 On Wed, Oct 27, 2010 at 9:05 PM, Jonathan P Daily jda...@usgs.gov wrote:
  I can second using Geany as an IDE.
 
 Great, finally a soulmate! :) More seriously, I think Geany is
 under-appreciated and virtually unkonwn in the R community.


  Another large plus for it is that it is cross platform (I work in both
 Windows and Linux), cross environment (I also code in Python/Sage), very
 customizable, and even has a version on PortableApps for windows so you can
 take a customized version around on a USB stick with ease.
 
 One important feature that teh Windows version of Geany lacks is the
 integrated virtual terminal emulator. This is mainly because the VTE
 port to Windows was never finalised (although the patch is well in
 their bugtracker). One possibility is to use Geany in a VMware virtual
 Linux machine on Windows.

 Regards
 Liviu


  --
  Jonathan P. Daily
  Technician - USGS Leetown Science Center
  11649 Leetown Road
  Kearneysville WV, 25430
  (304) 724-4480
  Is the room still a room when its empty? Does the room,
  the thing itself have purpose? Or do we, what's the word... imbue it.
  - Jubal Early, Firefly
 
 
  From: Liviu Andronic landronim...@gmail.com
  To:
  Lee Hachadoorian 
  lee.hachadooria...@gmail.comlee.hachadoorian%...@gmail.com
 
  Cc: r-h...@stat.math.ethz.ch
  Date: 10/27/2010 02:45 PM
  Subject: Re: [R] Best IDE for R
  Sent by: r-help-boun...@r-project.org
  
 
 
  On Wed, Oct 27, 2010 at 4:05 PM, Lee Hachadoorian
  lee.hachadooria...@gmail.com lee.hachadoorian%...@gmail.com wrote:
   For an R-enabled text editor, I would suggest Tinn-R for Windows or
 RGedit
   (a gedit plugin) for Linux/Gnome-desktop. Since both are just text
   editors, they will work with whatever version R you have installed
   (criteria 1).
  
   RGedit is pretty spare: basically just console integration and keyboard
   shortcuts to send code (current line, selection, defined blocks) to the
   console. Criteria 1 Y 2 basic 3 N 4 N
  
  For Linux and Mac, I usually suggest Geany [1] as an alternative to
  Gedit. Geany is an intuitive IDE that can send commands to rterm in
  the integrated virtual terminal emulator. It provides various features
  for project management, source highlighting, code folding, etc.
 
  Regards
  Liviu
 
  [1] http://www.r-bloggers.com/integrating-r-with-geany/
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 



 --
 Do you know how to read?
 http://www.alienetworks.com/srtest.cfm
 http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
 Do you know how to write?
 http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mailhttp://garbl.home.comcast.net/%7Egarbl/stylemanual/e.htm#e-mail

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] doubt in climate variability analysis in R! - code

2010-10-31 Thread steven mosher
Ok I downloaded it and showed you how to get your data out. How to read it
into a raster brick,
how to plot the data, how to get the mean rainfall of every day.lots more
you can do.

there is a  bad bit of data in the last time step.

check my blog.

In the future what you should do is write code to emulate your problem. for
example, in your problem you had created a ncdf file with a 3D matrix of
65,69,2192.
You should just do a subset of that, show the code to create a ncdf with
random numbers in it.

creating working code that emulates your problem is key if you want help.

Off list for the rest.

On Sun, Oct 31, 2010 at 10:21 AM, govin...@msu.edu wrote:



 I am sorry, i think the link was broken..! here is the correct one!!!

 http://www.4shared.com/file/4zV0g3JR/RF_80-85.html


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Strings from different locale

2010-11-01 Thread steven mosher
I'm doing some test processing of a cvs file that appears to use a different
locale
from my machine.

I get the following warning:

 input string 1 is invalid in this locale

My locale is US. Is this simply a matter of changing my locale to 'all;
locales?

I don't know what locale the string is in, is there a way to detect this or
translate

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] spliting first 10 words in a string

2010-11-02 Thread steven mosher
 Thanks david.

  Matevz, maybe I can help explain by doing a very simple and brute force
approach
as opposed to  the way david did it. But you should learn his methods.

I will just do a subset of your problem and if you understand how it works
then you should
be able to get something done and then make it more elegant.

First, I simplify the problem by separating out the sentence column.

You can do this with your data frame by simply doing this

MySentence -data.frame(sentence=yourbigDF$Opis,stringsAsFactors=FALSE)

so I take your original data.frame (yourbigDF) and I just create a copy of
that one column
 $Opis

Later we can merge the two back together after I add 10 columns for the
words


Lets make some dummy data with just 10 rows



 sentence- this is a sentence with ten words or maybe more than ten words
 sentV-rep(sentence,10)
# now I just made 10 rows of the same sentence
# NEXT because I am going to create 10 new colums of 10 rows I create
# 10 vectors each is named and each has 10 elements For the rows.
# they have NO DATA in them

 
first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth-vector(length=10)

#Next I create a dataframe with Sentence in the first column and 10 blank
colums.
# NOTE I use stringsAsFactors=False

 DF
-data.frame(Sentence=sentence,first,second,third,fourth,fifth,sixth,seventh,eighth,ninth,tenth,stringsAsFactors=FALSE)

# This is what it would look like ( the first row)
DF[1,]

Sentence first second third fourth fifth sixth seventh eighth ninth tenth
1 this is a sentence with ten words or maybe more than ten words FALSE
 FALSE FALSE  FALSE FALSE FALSE   FALSE  FALSE FALSE FALSE

Next, I will show you how to assign the first ten words to the 10 blank
columns

DF[1,2:11]-strsplit(DF[1,1], )[[1]][1:10]

#DF[1,2:11]  selects the columns 2-11 of the first row
#strsplit  returns the first 10 words [1:10] and place them in the
columsn2-11

If you want to do this the slow way you can just loop through your dataframe
row by row
or you can probably use apply.

Make more sense?
 DF[1,2:11]-strsplit(DF[1,1], )[[1]][1:10]
 DF[1,]
Sentence first
second third   fourth fifth sixth seventh eighth ninth tenth
1 this is a sentence with ten words or maybe more than ten words  this
is a sentence  with   ten   words or maybe  more
 DF[1,first]
[1] this

On Tue, Nov 2, 2010 at 12:22 PM, David Winsemius dwinsem...@comcast.netwrote:


 On Nov 2, 2010, at 3:01 PM, Matevž Pavlič wrote:

  Hi all,

 Thanks for all the help. I managed to do it with what Gaj suggested (Excel
 :().

 The last solution from David is also freat i just don't undestand why R
  put the words in 14 columns and thre rows?


 Because the maximum number of words was 14 and the fill argument was TRUE.
 There were three rows because there were three items in the supplied
 character vector.


  I would like it to put just the first 10 words in source field to 10
 diefferent destiantion fields, but the same row. And so on...is that
 possible?


 I don't know what a destination field might be. Those are not R data types.

 This would trim the extra columns (in this example set to those greater
 than 8) by adding a lot of NULL's to the end of a colClasses specification
  at the expense of a warning message which can be ignored:

  read.table(textConnection(words), fill=T, colClasses = c(rep(character,
 8), rep(NULL, 30) ) , stringsAsFactors=FALSE )

   V1V2V3  V4V5V6V7  V8
 1   I  have a columnn  with  text  that has
 2   I would  like  to split these words  in
 3 but  just first ten wordsin   the string.
 Warning message:
 In read.table(textConnection(words), fill = T, colClasses =
 c(rep(character,  :
  cols = 14 != length(data) = 38


 If you want to assign the first column to a variable then just:
  first8 - read.table(textConnection(words), fill=T, colClasses =
 c(rep(character, 8), rep(NULL, 30) ) , stringsAsFactors=FALSE)
  var1 - first8[[1]]
  var1
 [1] I   I   but

 --
 David.



 Thank you, m
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of David Winsemius
 Sent: Tuesday, November 02, 2010 3:47 PM
 To: Gaj Vidmar
 Cc: r-h...@stat.math.ethz.ch
 Subject: Re: [R] spliting first 10 words in a string


 On Nov 2, 2010, at 6:24 AM, Gaj Vidmar wrote:

  Though forbidden in this list, in Excel it's just (literally!)
 five clicks
 away!
 (with the column in question selected)
 Data - Text to Columns - Delimited - tick Space - Finish
 Pa je! (~Voila in Slovenian)
 (then import back to R, keeping only the first 10 columns if so
 desired)


 You could do the same thing without needing to leave R. Just
 read.table( textConnection(..), header=FALSE, fill=TRUE)

  read.table(textConnection(words), fill=T)

   V1V2V3  V4V5V6V7  V8   V9
 V10  V11   V12 V13 V14
 1   I  have a columnn  with  text  that   

Re: [R] splitting First 10 words in a string

2010-11-02 Thread steven mosher
 That's easy you are confusing the dummy code I sent.

 Do this:

 lit-read.csv(litologija.csv, sep=;, dec=.)
sent -data.frame(sentence=lit$Opis,stringsAsFactors=FALSE)
irst=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth-vector(length=nrow(
sent)

I put the length of the vector to 10 just to do a dummy problem.

Then do this:

for(j in 1:nrow(sent) {

  sent[j,2:11]-strsplit(sent[j,1], )[[1]][1:10]

}


That will get you a result the crude brute force way.

try that.

Then you can learn sapply way. but first you need to learn R data
structures.





On Tue, Nov 2, 2010 at 1:47 PM, Matevž Pavlič matevz.pav...@gi-zrmk.siwrote:

 Hi Steven,



 Thank you for the help. I get an error though when i do this :



 lit-read.csv(litologija.csv, sep=;, dec=.)

 sent -data.frame(sentence=lit$Opis,stringsAsFactors=FALSE)

 str(sent)

 sentV-rep(sent,10)

 str(sentV)




 first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth-vector(length=10)

 DF
 -data.frame(Sentence=sent,first,second,third,fourth,fifth,sixth,seventh,eighth,ninth,tenth,stringsAsFactors=FALSE)



 »Error in data.frame(Sentence = sent, first, second, third, fourth, fifth,
 :

 arguments imply differing number of rows: 22928, 10«



 What am I doing wrong?



 Thnks, m







 *From:* steven mosher [mailto:mosherste...@gmail.com]
 *Sent:* Tuesday, November 02, 2010 8:45 PM
 *To:* David Winsemius
 *Cc:* Matevž Pavlič; Gaj Vidmar; r-h...@stat.math.ethz.ch
 *Subject:* Re: [R] spliting first 10 words in a string



  Thanks david.



   Matevz, maybe I can help explain by doing a very simple and brute force
 approach

 as opposed to  the way david did it. But you should learn his methods.



 I will just do a subset of your problem and if you understand how it works
 then you should

 be able to get something done and then make it more elegant.



 First, I simplify the problem by separating out the sentence column.



 You can do this with your data frame by simply doing this



 MySentence -data.frame(sentence=yourbigDF$Opis,stringsAsFactors=FALSE)



 so I take your original data.frame (yourbigDF) and I just create a copy of
 that one column

  $Opis



 Later we can merge the two back together after I add 10 columns for the
 words





 Lets make some dummy data with just 10 rows







  sentence- this is a sentence with ten words or maybe more than ten
 words

  sentV-rep(sentence,10)

 # now I just made 10 rows of the same sentence

 # NEXT because I am going to create 10 new colums of 10 rows I create

 # 10 vectors each is named and each has 10 elements For the rows.

 # they have NO DATA in them




  
 first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth-vector(length=10)



 #Next I create a dataframe with Sentence in the first column and 10 blank
 colums.

 # NOTE I use stringsAsFactors=False



  DF
 -data.frame(Sentence=sentence,first,second,third,fourth,fifth,sixth,seventh,eighth,ninth,tenth,stringsAsFactors=FALSE)



 # This is what it would look like ( the first row)

 DF[1,]



 Sentence first second third fourth fifth sixth seventh eighth ninth tenth

 1 this is a sentence with ten words or maybe more than ten words FALSE
  FALSE FALSE  FALSE FALSE FALSE   FALSE  FALSE FALSE FALSE



 Next, I will show you how to assign the first ten words to the 10 blank
 columns



 DF[1,2:11]-strsplit(DF[1,1], )[[1]][1:10]



 #DF[1,2:11]  selects the columns 2-11 of the first row

 #strsplit  returns the first 10 words [1:10] and place them in the
 columsn2-11



 If you want to do this the slow way you can just loop through your
 dataframe row by row

 or you can probably use apply.



 Make more sense?

  DF[1,2:11]-strsplit(DF[1,1], )[[1]][1:10]

  DF[1,]

 Sentence first
 second third   fourth fifth sixth seventh eighth ninth tenth

 1 this is a sentence with ten words or maybe more than ten words  this
 is a sentence  with   ten   words or maybe  more

  DF[1,first]

 [1] this



 On Tue, Nov 2, 2010 at 12:22 PM, David Winsemius dwinsem...@comcast.net
 wrote:


 On Nov 2, 2010, at 3:01 PM, Matevž Pavlič wrote:

 Hi all,

 Thanks for all the help. I managed to do it with what Gaj suggested (Excel
 :().

 The last solution from David is also freat i just don't undestand why R
  put the words in 14 columns and thre rows?



 Because the maximum number of words was 14 and the fill argument was TRUE.
 There were three rows because there were three items in the supplied
 character vector.



 I would like it to put just the first 10 words in source field to 10
 diefferent destiantion fields, but the same row. And so on...is that
 possible?



 I don't know what a destination field might be. Those are not R data types.

 This would trim the extra columns (in this example set to those greater
 than 8) by adding a lot of NULL's to the end of a colClasses specification
  at the expense of a warning message which can be ignored:

  read.table

Re: [R] splitting First 10 words in a string

2010-11-02 Thread steven mosher
Line should be:

first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth-vector(length=nrow(
sent))

sorry cut and past error

On Tue, Nov 2, 2010 at 3:32 PM, steven mosher mosherste...@gmail.comwrote:

  That's easy you are confusing the dummy code I sent.

  Do this:

  lit-read.csv(litologija.csv, sep=;, dec=.)
 sent -data.frame(sentence=lit$Opis,stringsAsFactors=FALSE)

 first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth-vector(length=nrow(
 sent)

 I put the length of the vector to 10 just to do a dummy problem.

 Then do this:

 for(j in 1:nrow(sent) {

   sent[j,2:11]-strsplit(sent[j,1], )[[1]][1:10]

 }


 That will get you a result the crude brute force way.

 try that.

 Then you can learn sapply way. but first you need to learn R data
 structures.





 On Tue, Nov 2, 2010 at 1:47 PM, Matevž Pavlič 
 matevz.pav...@gi-zrmk.siwrote:

 Hi Steven,



 Thank you for the help. I get an error though when i do this :



 lit-read.csv(litologija.csv, sep=;, dec=.)

 sent -data.frame(sentence=lit$Opis,stringsAsFactors=FALSE)

 str(sent)

 sentV-rep(sent,10)

 str(sentV)




 first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth-vector(length=10)

 DF
 -data.frame(Sentence=sent,first,second,third,fourth,fifth,sixth,seventh,eighth,ninth,tenth,stringsAsFactors=FALSE)



 »Error in data.frame(Sentence = sent, first, second, third, fourth,
 fifth,  :

 arguments imply differing number of rows: 22928, 10«



 What am I doing wrong?



 Thnks, m







 *From:* steven mosher [mailto:mosherste...@gmail.com]
 *Sent:* Tuesday, November 02, 2010 8:45 PM
 *To:* David Winsemius
 *Cc:* Matevž Pavlič; Gaj Vidmar; r-h...@stat.math.ethz.ch
 *Subject:* Re: [R] spliting first 10 words in a string



  Thanks david.



   Matevz, maybe I can help explain by doing a very simple and brute force
 approach

 as opposed to  the way david did it. But you should learn his methods.



 I will just do a subset of your problem and if you understand how it works
 then you should

 be able to get something done and then make it more elegant.



 First, I simplify the problem by separating out the sentence column.



 You can do this with your data frame by simply doing this



 MySentence -data.frame(sentence=yourbigDF$Opis,stringsAsFactors=FALSE)



 so I take your original data.frame (yourbigDF) and I just create a copy of
 that one column

  $Opis



 Later we can merge the two back together after I add 10 columns for the
 words





 Lets make some dummy data with just 10 rows







  sentence- this is a sentence with ten words or maybe more than ten
 words

  sentV-rep(sentence,10)

 # now I just made 10 rows of the same sentence

 # NEXT because I am going to create 10 new colums of 10 rows I create

 # 10 vectors each is named and each has 10 elements For the rows.

 # they have NO DATA in them




  
 first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth-vector(length=10)



 #Next I create a dataframe with Sentence in the first column and 10 blank
 colums.

 # NOTE I use stringsAsFactors=False



  DF
 -data.frame(Sentence=sentence,first,second,third,fourth,fifth,sixth,seventh,eighth,ninth,tenth,stringsAsFactors=FALSE)



 # This is what it would look like ( the first row)

 DF[1,]



 Sentence first second third fourth fifth sixth seventh eighth ninth tenth

 1 this is a sentence with ten words or maybe more than ten words FALSE
  FALSE FALSE  FALSE FALSE FALSE   FALSE  FALSE FALSE FALSE



 Next, I will show you how to assign the first ten words to the 10 blank
 columns



 DF[1,2:11]-strsplit(DF[1,1], )[[1]][1:10]



 #DF[1,2:11]  selects the columns 2-11 of the first row

 #strsplit  returns the first 10 words [1:10] and place them in the
 columsn2-11



 If you want to do this the slow way you can just loop through your
 dataframe row by row

 or you can probably use apply.



 Make more sense?

  DF[1,2:11]-strsplit(DF[1,1], )[[1]][1:10]

  DF[1,]

 Sentence first
 second third   fourth fifth sixth seventh eighth ninth tenth

 1 this is a sentence with ten words or maybe more than ten words  this
 is a sentence  with   ten   words or maybe  more

  DF[1,first]

 [1] this



 On Tue, Nov 2, 2010 at 12:22 PM, David Winsemius dwinsem...@comcast.net
 wrote:


 On Nov 2, 2010, at 3:01 PM, Matevž Pavlič wrote:

 Hi all,

 Thanks for all the help. I managed to do it with what Gaj suggested (Excel
 :().

 The last solution from David is also freat i just don't undestand why R
  put the words in 14 columns and thre rows?



 Because the maximum number of words was 14 and the fill argument was TRUE.
 There were three rows because there were three items in the supplied
 character vector.



 I would like it to put just the first 10 words in source field to 10
 diefferent destiantion fields, but the same row. And so on...is that
 possible?



 I don't know what a destination field might be. Those are not R data

Re: [R] spliting first 10 words in a string

2010-11-02 Thread steven mosher
just merge the data.frames back together.

use merge or cbind()

cbind will be easier

DF1 - data.frame(x,y,z)
DF2 -data.frame(DF1$x) # copy a column
then you added columns to DF2

just put them back together

DF3 -cbind(DF2,DF1$y,DF$z)

if you spend more time with R you will be able to do things like this
elegantly, but for
now This way will work and you will learn a bit about R.

As for counting instances of a string, I might suggest looking at the table
command

k - c( all, but,all)
 table(k)
k
all but
  2   1

So you can do a table for each column in your dataframe

On Tue, Nov 2, 2010 at 12:53 PM, Matevž Pavlič 
matevz.pav...@gi-zrmk.siwrote:

 Hi,

 Ok, i got this now. At least i think so. I got a data.frame with 15 fields,
 all other words have bee truncated. Which is what i want. But ia have that
 in a seperate data.frame from that one it was before (would be nice if it
 would be in the same ...)

 'data.frame':   22801 obs. of  15 variables:
  $ V1 : chr  HUMUS SLABO MALO SLABO ...
  $ V2 : chr  IN GRANULIRAN PREPEREL VEZAN ...
  $ V3 : chr  HUMUSNA PEŠČEN MELJAST ,KONGLOMERAT, ...
  $ V4 : chr  GLINA PROD PROD P0ROZEN, ...
  $ V5 : chr  Z DO DO S ...
  $ V6 : chr  MALO r r PLASTMI ...
  $ V7 : chr  PODA, = = GFs, ...
  $ V8 : chr  LAHKO 8Q 60mm, SIVORJAV ...
  $ V9 : chr  GNETNA, mm, S  ...
  $ V10: chr  RJAVA S PRODNIKI,  ...
  $ V11: chr   PRODNIKI MALO  ...
  $ V12: chr   DO PEŠČEN  ...
  $ V13: chr   R S  ...
  $ V14: chr   = TANKIMI  ...

 Now, i have another problem. Is it possible to count which word occours
 most often each field (V1, V2, V3, ...) and which one is the second and so
 on. Ideally to create a table for each field (V1, V2, V3, ...) with the word
 and thenumber of occuraces in that field (column) .
 I suppose it could be done in SQL, but what since i saw what R can do i
 guess this can be done here to?

 Thanks, m

 -Original Message-
 From: David Winsemius [mailto:dwinsem...@comcast.net]
 Sent: Tuesday, November 02, 2010 8:23 PM
 To: Matevž Pavlič
 Cc: Gaj Vidmar; r-h...@stat.math.ethz.ch
 Subject: Re: [R] spliting first 10 words in a string


 On Nov 2, 2010, at 3:01 PM, Matevž Pavlič wrote:

  Hi all,
 
  Thanks for all the help. I managed to do it with what Gaj suggested
  (Excel :().
 
  The last solution from David is also freat i just don't undestand why
  R  put the words in 14 columns and thre rows?

 Because the maximum number of words was 14 and the fill argument was TRUE.
 There were three rows because there were three items in the supplied
 character vector.

  I would like it to put just the first 10 words in source field to 10
  diefferent destiantion fields, but the same row. And so on...is that
  possible?

 I don't know what a destination field might be. Those are not R data types.

 This would trim the extra columns (in this example set to those greater
 than 8) by adding a lot of NULL's to the end of a colClasses specification
  at the expense of a warning message which can be
 ignored:

   read.table(textConnection(words), fill=T, colClasses =
 c(rep(character, 8), rep(NULL, 30) ) , stringsAsFactors=FALSE )
V1V2V3  V4V5V6V7  V8
 1   I  have a columnn  with  text  that has
 2   I would  like  to split these words  in
 3 but  just first ten wordsin   the string.
 Warning message:
 In read.table(textConnection(words), fill = T, colClasses =
 c(rep(character,  :
   cols = 14 != length(data) = 38


 If you want to assign the first column to a variable then just:
   first8 - read.table(textConnection(words), fill=T, colClasses =
 c(rep(character, 8), rep(NULL, 30) ) , stringsAsFactors=FALSE)   var1
 - first8[[1]]   var1
 [1] I   I   but

 --
 David.

 
  Thank you, m
  -Original Message-
  From: r-help-boun...@r-project.org
  [mailto:r-help-boun...@r-project.org
  ] On Behalf Of David Winsemius
  Sent: Tuesday, November 02, 2010 3:47 PM
  To: Gaj Vidmar
  Cc: r-h...@stat.math.ethz.ch
  Subject: Re: [R] spliting first 10 words in a string
 
 
  On Nov 2, 2010, at 6:24 AM, Gaj Vidmar wrote:
 
  Though forbidden in this list, in Excel it's just (literally!) five
  clicks away!
  (with the column in question selected) Data - Text to Columns -
  Delimited - tick Space - Finish Pa je! (~Voila in Slovenian) (then
  import back to R, keeping only the first 10 columns if so
  desired)
 
  You could do the same thing without needing to leave R. Just
  read.table( textConnection(..), header=FALSE, fill=TRUE)
 
  read.table(textConnection(words), fill=T)
 V1V2V3  V4V5V6V7  V8   V9
  V10  V11   V12 V13 V14
  1   I  have a columnn  with  text  that hasquite
  a  few words  in it.
  2   I would  like  to split these words  in separate columns
  3 but  just first ten wordsin   the string.   Isthat
  possiblein  R?
 
 
  Regards,
  Assist. Prof. Gaj Vidmar, PhD
  University Rehabilitattion Institute, Republic 

[R] Reverting to previous version

2010-11-12 Thread steven mosher
R 2.12 is not functioning for me On the MAC what the most painless way of
reverting

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to know if a file exists on a remote server?

2010-11-30 Thread steven mosher
 I would use RCurl.

 if you have, for example, the url of an ftp site you can merely do a
getURL() and the contents will be returned. That call will return data that
can be coerced into a data.frame that will look like a directory structure
listing the file names.

If you need code just ask, but the RCurl docs are pretty good.



On Tue, Nov 30, 2010 at 8:10 AM, Baoqiang Cao bqcaom...@gmail.com wrote:

 Hi,

 I'd like to download some data files from a remote server, the problem
 here is that some of the files actually don't exist, which I don't
 know before try. Just wondering if a function in R could tell me if a
 file exists on a remote server? I searched this mailing list and after
 read severals mails, still clueless.  Any help will be highly
 appreciated.

 B.C.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to know if a file exists on a remote server?

2010-11-30 Thread steven mosher
 using RCurl

getFtpList - function(ftp){

# the structure returned is dependent on the ftp site as there are
# various formats for directory listings dependent upon the server
# and the OS. you will need to play with this.
# have a look at the ftp with your browser first and adjust accordingly.
# some formats only return 4 columns.
# column 1= literal string first position mean file
# column 2= number 1
# column 3 =owner
# column 4 = group
# column 5 =file size
# colmn  6 =Month
# column 7 =Day
# column 8 =Time (year)
# column 9 =FileName
#
txt - getURL(ftp)

dir - read.table( textConnection(txt),as.is=TRUE)
out - data.frame(Dir=ftp,Filename=dir[, ncol(dir)],Size=dir[ ,5],
  Month=dir[ ,6],Day=dir[ ,7],Time=dir[
,8],stringsAsFactors=FALSE)
closeAllConnections()
return(out)
}

On Tue, Nov 30, 2010 at 8:10 AM, Baoqiang Cao bqcaom...@gmail.com wrote:

 Hi,

 I'd like to download some data files from a remote server, the problem
 here is that some of the files actually don't exist, which I don't
 know before try. Just wondering if a function in R could tell me if a
 file exists on a remote server? I searched this mailing list and after
 read severals mails, still clueless.  Any help will be highly
 appreciated.

 B.C.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to know if a file exists on a remote server?

2010-11-30 Thread steven mosher
No problem, you can also  get the directory with a curlOption of dirlistonly

see the example code in the package. This will depend on the version of
libcurl that you have.

If you have an older version, my code will get you the directory.

From the Rcurl examples:

the files within a directory.
url = '
ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/'
filenames = getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE)

  # Deal with newlines as \n or \r\n. (BDR)
  # Or alternatively, instruct libcurl to change \n's to \r\n's for us with
crlf = TRUE
  # filenames = getURL(url, ftp.use.epsv = FALSE, ftplistonly = TRUE, crlf =
TRUE)
filenames = paste(url, strsplit(filenames, \r*\n)[[1]], sep = )
con = getCurlHandle( ftp.use.epsv = FALSE)
contents = sapply(filenames[1:5], getURL, curl = con)
names(contents) = filenames[1:length(contents)]


On Tue, Nov 30, 2010 at 9:56 AM, Baoqiang Cao bqcaom...@gmail.com wrote:

 Thanks Steven!
 It is excellent code indeed!

 On Tue, Nov 30, 2010 at 11:26 AM, steven mosher mosherste...@gmail.com
 wrote:
   I would use RCurl.
 
   if you have, for example, the url of an ftp site you can merely do a
  getURL() and the contents will be returned. That call will return data
 that
  can be coerced into a data.frame that will look like a directory
 structure
  listing the file names.
 
  If you need code just ask, but the RCurl docs are pretty good.
 
 
 
  On Tue, Nov 30, 2010 at 8:10 AM, Baoqiang Cao bqcaom...@gmail.com
 wrote:
 
  Hi,
 
  I'd like to download some data files from a remote server, the problem
  here is that some of the files actually don't exist, which I don't
  know before try. Just wondering if a function in R could tell me if a
  file exists on a remote server? I searched this mailing list and after
  read severals mails, still clueless.  Any help will be highly
  appreciated.
 
  B.C.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to know if a file exists on a remote server?

2010-11-30 Thread steven mosher
here:

getFtpList - function(ftp){
# column 1= literal string first position mean file
# column 2= number 1
 # column 3 =owner
# column 4 = group
# column 5 =file size
 # colmn  6 =Month
# column 7 =Day
# column 8 =Time (year)
 # column 9 =FileName
#
txt - getURL(ftp)

dir - read.table( textConnection(txt),as.is=TRUE)

if(ncol(dir)==9)out - data.frame(Dir=ftp,Filename=dir[,
ncol(dir)],Size=dir[ ,5],
  Month=dir[ ,6],Day=dir[ ,7],Time=dir[
,8],stringsAsFactors=FALSE)
if(ncol(dir)==4)out - data.frame(Dir=ftp,Filename=dir[,
ncol(dir)],Size=dir[ ,3],
  Month=dir[ ,1],Time=dir[ ,2],stringsAsFactors=FALSE)
closeAllConnections()
return(out)
}

On Tue, Nov 30, 2010 at 9:56 AM, Baoqiang Cao bqcaom...@gmail.com wrote:

 Thanks Steven!
 It is excellent code indeed!

 On Tue, Nov 30, 2010 at 11:26 AM, steven mosher mosherste...@gmail.com
 wrote:
   I would use RCurl.
 
   if you have, for example, the url of an ftp site you can merely do a
  getURL() and the contents will be returned. That call will return data
 that
  can be coerced into a data.frame that will look like a directory
 structure
  listing the file names.
 
  If you need code just ask, but the RCurl docs are pretty good.
 
 
 
  On Tue, Nov 30, 2010 at 8:10 AM, Baoqiang Cao bqcaom...@gmail.com
 wrote:
 
  Hi,
 
  I'd like to download some data files from a remote server, the problem
  here is that some of the files actually don't exist, which I don't
  know before try. Just wondering if a function in R could tell me if a
  file exists on a remote server? I searched this mailing list and after
  read severals mails, still clueless.  Any help will be highly
  appreciated.
 
  B.C.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to know if a file exists on a remote server?

2010-11-30 Thread steven mosher
 study  trycatch()

 also, be awre that even with RCurl, that you may find the file there and
then fail or lose
the connection.

worse still you may get a currupt file on download. So there is a lot of
checking to do
to make bullet proof code that downloads files.





On Tue, Nov 30, 2010 at 3:16 PM, Baoqiang Cao bqcaom...@gmail.com wrote:

 Hi Georg,

 Your code does work, I mean, it doesn't give me any error message,
 which is critical for me because I need use it in a loop and plus I
 don't know how to catch error message. Before your message, I was
 using download.file but the loop was stopped because of the error
 message when a file doesn't exist. So I guess, the option
 method=wget made the difference.

 To summarize (in case it is useful to others), there are (at least)
 two ways to download files:

 1) Georg Ruß:
  v = download.file(url,destf,method=wget)
 if(v!=0) {
 #download.file failed
 }
 #no error message though

 2)

 Henrique Dallazuanna and Steven Mosher both suggested using RCurl,
 here is an example code from Henrique for checking if a file exists on
 a server:
 
 library(RCurl)
 h = basicHeaderGatherer()
 Lines - getURI(http://www.pdb.org/pdb/files/2J0S.1001;,
 headerfunction = h$update)
 h$value()[['status']]

 If the status is 404, then not found. If exists then status should be 200.
 

 What a productive day!

 BC
 On Tue, Nov 30, 2010 at 1:34 PM, Georg Ruß resea...@georgruss.de wrote:
  On 30/11/10 10:10:07, Baoqiang Cao wrote:
  I'd like to download some data files from a remote server, the problem
  here is that some of the files actually don't exist, which I don't
  know before try. Just wondering if a function in R could tell me if a
  file exists on a remote server?
 
  Hi Baoqiang,
 
  try downloading the file with R's download.file() function. Then you
  should examine the returned value.
 
  Citing a part of ?download.file below:
 
  Value:
  An (invisible) integer code, ‘0’ for success and non-zero for
  failure.  For the ‘wget’ and ‘lynx’ methods this is the status
  code returned by the external program.  The ‘internal’ method can
  return ‘1’, but will in most cases throw an error.
 
  So if you call your download via
 
  v - download.file(url, destfile, method=wget)
 
  and v is not equal to zero, then the file is likely to be non-existent
 (at
  least the download failed). Note: the method internal doesn't really
  change the value of v, I just tried that. With wget it returns 0 for
  success and 2048 (or some other value) for non-success.
 
  Regards,
  Georg.
  --
  Research Assistant
  Otto-von-Guericke-Universität Magdeburg
  resea...@georgruss.de
  http://research.georgruss.de
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Summing over intervals

2010-07-15 Thread steven mosher
Given a matrix of MxN
  want to take the means of rows in the following fashion
m-matrix(seq(1,80),ncol=20, nrow=4)
 result-matrix(NA,nrow=4,ncol=20/5)
 result[,1]-apply(m[,1:5],1,mean)
 result[,2]-apply(m[,6:10],1,mean)
 result[,3]-apply(m[,11:15],1,mean)
 result[,4]-apply(m[,16:20],1,mean)
 result
 [,1] [,2] [,3] [,4]
[1,]9   29   49   69
[2,]   10   30   50   70
[3,]   11   31   51   71
[4,]   12   32   52   72

So, I want the mean of every successive 5 values in a row

as the dimension in columns is wide I cant write it with multiple apply as
above

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Summing over intervals

2010-07-15 Thread steven mosher
Eik and Patrick,

Thanks I will give those a try.


On Thu, Jul 15, 2010 at 8:15 AM, Patrick J Rogers pjrog...@ucsd.edu wrote:

 Hi Steven,

 You can just cut the matrix up into a 5 column matrix and use apply as
 normal

 m2-matrix(as.vector(t(m)), ncol=5, byrow=TRUE)
 result-matrix(apply(m2, 1, mean), ncol=ncol(m)/ncol(m2), byrow=TRUE)
 result

 --
 Patrick Rogers
 Dept. of Political Science
 University of California, San Diego

 On Jul 15, 2010, at 2:39 AM, steven mosher wrote:

  Given a matrix of MxN
   want to take the means of rows in the following fashion
  m-matrix(seq(1,80),ncol=20, nrow=4)
  result-matrix(NA,nrow=4,ncol=20/5)
  result[,1]-apply(m[,1:5],1,mean)
  result[,2]-apply(m[,6:10],1,mean)
  result[,3]-apply(m[,11:15],1,mean)
  result[,4]-apply(m[,16:20],1,mean)
  result
  [,1] [,2] [,3] [,4]
  [1,]9   29   49   69
  [2,]   10   30   50   70
  [3,]   11   31   51   71
  [4,]   12   32   52   72
 
  So, I want the mean of every successive 5 values in a row
 
  as the dimension in columns is wide I cant write it with multiple apply
 as
  above
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Summing by index

2010-07-30 Thread steven mosher
# build a sample data frame illustrating the problem
 ids-c(rep(1234,5),rep(5436,3),rep(7864,4))
 years-c(seq(1990,1994,by=1),seq(1991,1993,by=1),seq(1990,1993,by=1))
 data-seq(14,25,by=1)
 data[6]-NA
 DF-data.frame(Id=ids,Year=years,Data=data)
 DF
 Id Year Data
1  1234 1990   14
2  1234 1991   15
3  1234 1992   16
4  1234 1993   17
5  1234 1994   18
6  5436 1991   NA
7  5436 1992   20
8  5436 1993   21
9  7864 1990   22
10 7864 1991   23
11 7864 1992   24
12 7864 1993   25

# The result wanted is a sum of DF$Data, by DF$Id. collect the sum of $Data
for each $Id
# the  result would take the form
#  Id, sum  for each Id
# Try using BY
 result-by(DF$Data,INDICES=Data$Id,FUN=sum,na.rm=T)
Error in names(IND) - deparse(substitute(INDICES))[1L] :
  'names' attribute [1] must be the same length as the vector [0]
 idx-as.list(Data$Id)


 idx2-list(1234,1234,1234,1234,1234,5436,5436,5436,7864,7864,7864,7864)
result-by(DF$Data,INDICES=idx,FUN=sum,na.rm=T)
result
[1] 215
 result-by(DF$Data,INDICES=idx2,FUN=sum,na.rm=T)
Error in tapply(1L:12L, list(1234, 1234, 1234, 1234, 1234, 5436, 5436,  :
  arguments must have same length
 idx
list()
 idx[1]
[[1]]
NULL

 idx2
[[1]]
[1] 1234

[[2]]
[1] 1234

[[3]]
[1] 1234

[[4]]
[1] 1234

[[5]]
[1] 1234

[[6]]
[1] 5436

[[7]]
[1] 5436

[[8]]
[1] 5436

[[9]]
[1] 7864

[[10]]
[1] 7864

[[11]]
[1] 7864

[[12]]
[1] 7864

 aggregate(DF$Data, by=idx2,sum,na.rm=T)
Error in aggregate.data.frame(as.data.frame(x), ...) :
  arguments must have same length



The instruction that the INDICES must have the same length is confusing me.
the number of indices will always be less than the number of rows because
the indices are repeated, we want to sum over multiple instances of the
indices
to collect the Sum by index. I'm confused.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Summing by index

2010-07-30 Thread steven mosher
ha. that was a stupid mistake. Thanks.

On Fri, Jul 30, 2010 at 11:46 AM, David Winsemius dwinsem...@comcast.netwrote:


 On Jul 30, 2010, at 2:41 PM, steven mosher wrote:

  # build a sample data frame illustrating the problem
 ids-c(rep(1234,5),rep(5436,3),rep(7864,4))
 years-c(seq(1990,1994,by=1),seq(1991,1993,by=1),seq(1990,1993,by=1))
 data-seq(14,25,by=1)
 data[6]-NA
 DF-data.frame(Id=ids,Year=years,Data=data)
 DF
Id Year Data
 1  1234 1990   14
 2  1234 1991   15
 3  1234 1992   16
 4  1234 1993   17
 5  1234 1994   18
 6  5436 1991   NA
 7  5436 1992   20
 8  5436 1993   21
 9  7864 1990   22
 10 7864 1991   23
 11 7864 1992   24
 12 7864 1993   25

 # The result wanted is a sum of DF$Data, by DF$Id. collect the sum of
 $Data
 for each $Id
 # the  result would take the form
 #  Id, sum  for each Id
 # Try using BY
 result-by(DF$Data,INDICES=Data$Id,FUN=sum,na.rm=T)


 Try instead:

 result-by(DF$Data,INDICES=DF$Id,FUN=sum,na.rm=T)

 --
 David.

 Error in names(IND) - deparse(substitute(INDICES))[1L] :
  'names' attribute [1] must be the same length as the vector [0]
 idx-as.list(Data$Id)


 idx2-list(1234,1234,1234,1234,1234,5436,5436,5436,7864,7864,7864,7864)
 result-by(DF$Data,INDICES=idx,FUN=sum,na.rm=T)
 result
 [1] 215
 result-by(DF$Data,INDICES=idx2,FUN=sum,na.rm=T)
 Error in tapply(1L:12L, list(1234, 1234, 1234, 1234, 1234, 5436, 5436,  :
  arguments must have same length

 idx

 list()

 idx[1]

 [[1]]
 NULL

  idx2

 [[1]]
 [1] 1234

 [[2]]
 [1] 1234

 [[3]]
 [1] 1234

 [[4]]
 [1] 1234

 [[5]]
 [1] 1234

 [[6]]
 [1] 5436

 [[7]]
 [1] 5436

 [[8]]
 [1] 5436

 [[9]]
 [1] 7864

 [[10]]
 [1] 7864

 [[11]]
 [1] 7864

 [[12]]
 [1] 7864

 aggregate(DF$Data, by=idx2,sum,na.rm=T)
 Error in aggregate.data.frame(as.data.frame(x), ...) :
  arguments must have same length

 

 The instruction that the INDICES must have the same length is confusing
 me.
 the number of indices will always be less than the number of rows because
 the indices are repeated, we want to sum over multiple instances of the
 indices
 to collect the Sum by index. I'm confused.

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 David Winsemius, MD
 West Hartford, CT



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Summing by index

2010-07-30 Thread steven mosher
Thanks again david.

To finish out the example.
 DF
 Id Year Data
1  1234 1990   14
2  1234 1991   15
3  1234 1992   16
4  1234 1993   17
5  1234 1994   18
6  5436 1991   NA
7  5436 1992   20
8  5436 1993   21
9  7864 1990   22
10 7864 1991   23
11 7864 1992   24
12 7864 1993   25

result-by(DF$Data,INDICES=DF$Id,FUN=sum,na.rm=T)

id-as.numeric(unlist(names(result)))
 sums-unlist(result[])
 DF2-data.frame(Id=id,Sums=sums)
DF2
Id Sums
1 1234   80
2 5436   41
3 7864   94

Thanks again.

On Fri, Jul 30, 2010 at 11:46 AM, David Winsemius dwinsem...@comcast.netwrote:


 On Jul 30, 2010, at 2:41 PM, steven mosher wrote:

  # build a sample data frame illustrating the problem
 ids-c(rep(1234,5),rep(5436,3),rep(7864,4))
 years-c(seq(1990,1994,by=1),seq(1991,1993,by=1),seq(1990,1993,by=1))
 data-seq(14,25,by=1)
 data[6]-NA
 DF-data.frame(Id=ids,Year=years,Data=data)
 DF
Id Year Data
 1  1234 1990   14
 2  1234 1991   15
 3  1234 1992   16
 4  1234 1993   17
 5  1234 1994   18
 6  5436 1991   NA
 7  5436 1992   20
 8  5436 1993   21
 9  7864 1990   22
 10 7864 1991   23
 11 7864 1992   24
 12 7864 1993   25

 # The result wanted is a sum of DF$Data, by DF$Id. collect the sum of
 $Data
 for each $Id
 # the  result would take the form
 #  Id, sum  for each Id
 # Try using BY
 result-by(DF$Data,INDICES=Data$Id,FUN=sum,na.rm=T)


 Try instead:

 result-by(DF$Data,INDICES=DF$Id,FUN=sum,na.rm=T)

 --
 David.

 Error in names(IND) - deparse(substitute(INDICES))[1L] :
  'names' attribute [1] must be the same length as the vector [0]
 idx-as.list(Data$Id)


 idx2-list(1234,1234,1234,1234,1234,5436,5436,5436,7864,7864,7864,7864)
 result-by(DF$Data,INDICES=idx,FUN=sum,na.rm=T)
 result
 [1] 215
 result-by(DF$Data,INDICES=idx2,FUN=sum,na.rm=T)
 Error in tapply(1L:12L, list(1234, 1234, 1234, 1234, 1234, 5436, 5436,  :
  arguments must have same length

 idx

 list()

 idx[1]

 [[1]]
 NULL

  idx2

 [[1]]
 [1] 1234

 [[2]]
 [1] 1234

 [[3]]
 [1] 1234

 [[4]]
 [1] 1234

 [[5]]
 [1] 1234

 [[6]]
 [1] 5436

 [[7]]
 [1] 5436

 [[8]]
 [1] 5436

 [[9]]
 [1] 7864

 [[10]]
 [1] 7864

 [[11]]
 [1] 7864

 [[12]]
 [1] 7864

 aggregate(DF$Data, by=idx2,sum,na.rm=T)
 Error in aggregate.data.frame(as.data.frame(x), ...) :
  arguments must have same length

 

 The instruction that the INDICES must have the same length is confusing
 me.
 the number of indices will always be less than the number of rows because
 the indices are repeated, we want to sum over multiple instances of the
 indices
 to collect the Sum by index. I'm confused.

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 David Winsemius, MD
 West Hartford, CT



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Summing by index

2010-07-30 Thread steven mosher
very slick Thx.

On Fri, Jul 30, 2010 at 12:44 PM, Wu Gong w...@mtmail.mtsu.edu wrote:


 Hi,

 R has a buildin function ?rowsum

 rowsum(DF$Data,DF$Id,na.rm=T)

 -
 A R learner.
 --
 View this message in context:
 http://r.789695.n4.nabble.com/Summing-by-index-tp2308332p2308411.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data frame reordering to time series

2010-08-07 Thread steven mosher
Given a data frame, or it could be a matrix if I choose to.
The data consists of an ID, a year, and data for all 12 months.
Missing values are a factor AND missing years.

Id-c(rep(67543,4),rep(12345,3),rep(89765,5))
 Years-c(seq(1989,1992,by =1),1991,1993,1994,seq(1991,1995,by=1))
 Values2-c(12,NA,34,21,NA,65,23,NA,13,NA,13,14)
 Values-c(12,14,34,21,54,65,23,12,13,13,13,14)
 
Data-data.frame(Index=Id,Year=Years,Jan=Values,Feb=Values/2,Mar=Values2,Apr=Values2,Jun=Values,July=Values/3,Aug=Values2,Sep=Values,
+ Oct=Values,Nov=Values,Dec=Values2)
 Data
   Index Year Jan  Feb Mar Apr Jun  July Aug Sep Oct Nov Dec
1  67543 1989  12  6.0  12  12  12  4.00  12  12  12  12  12
2  67543 1990  14  7.0  NA  NA  14  4.67  NA  14  14  14  NA
3  67543 1991  34 17.0  34  34  34 11.33  34  34  34  34  34
4  67543 1992  21 10.5  21  21  21  7.00  21  21  21  21  21
5  12345 1991  54 27.0  NA  NA  54 18.00  NA  54  54  54  NA
6  12345 1993  65 32.5  65  65  65 21.67  65  65  65  65  65
7  12345 1994  23 11.5  23  23  23  7.67  23  23  23  23  23
8  89765 1991  12  6.0  NA  NA  12  4.00  NA  12  12  12  NA
9  89765 1992  13  6.5  13  13  13  4.33  13  13  13  13  13
10 89765 1993  13  6.5  NA  NA  13  4.33  NA  13  13  13  NA
11 89765 1994  13  6.5  13  13  13  4.33  13  13  13  13  13
12 89765 1995  14  7.0  14  14  14  4.67  14  14  14  14  14


The Goal is to return a Time series object for each ID. Alternatively one
could return a matrix that I can turn into a Time series.
The final structure would be something like this ( done in matrix form for
illustration)
  1989.0  1989.083
1991 ..19921993. 1994  1995
67543 12   6.0   12  12  12  4.00  12  12  12  12  12...
.34...21.. NA.NANA
12345  NA, NA,
NA,.54 27

Basically the time series will have patches at the front, middle and end
where you may have years of NA
The must be column ordered by time and aligned so that averages for all
series can be computed per month.

Now I have looping code to do this, where I loop through all the IDs and map
the row of data into the correct
column. and create column names based on the data and row names based on the
ID, but it's painfully
slow. Any wizardry would help.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data frame reordering to time series

2010-08-07 Thread steven mosher
Thanks Gabor, I probably should have done an example with fewer columns.

i will rework the example and post it up so the next guys who has this issue
can have a
clear example with a solution.



On Sat, Aug 7, 2010 at 5:04 PM, Gabor Grothendieck
ggrothendi...@gmail.comwrote:

 On Sat, Aug 7, 2010 at 4:49 PM, steven mosher mosherste...@gmail.com
 wrote:
  Given a data frame, or it could be a matrix if I choose to.
  The data consists of an ID, a year, and data for all 12 months.
  Missing values are a factor AND missing years.
 
  Id-c(rep(67543,4),rep(12345,3),rep(89765,5))
   Years-c(seq(1989,1992,by =1),1991,1993,1994,seq(1991,1995,by=1))
   Values2-c(12,NA,34,21,NA,65,23,NA,13,NA,13,14)
   Values-c(12,14,34,21,54,65,23,12,13,13,13,14)
 
  
 Data-data.frame(Index=Id,Year=Years,Jan=Values,Feb=Values/2,Mar=Values2,Apr=Values2,Jun=Values,July=Values/3,Aug=Values2,Sep=Values,
  + Oct=Values,Nov=Values,Dec=Values2)
   Data
Index Year Jan  Feb Mar Apr Jun  July Aug Sep Oct Nov Dec
  1  67543 1989  12  6.0  12  12  12  4.00  12  12  12  12  12
  2  67543 1990  14  7.0  NA  NA  14  4.67  NA  14  14  14  NA
  3  67543 1991  34 17.0  34  34  34 11.33  34  34  34  34  34
  4  67543 1992  21 10.5  21  21  21  7.00  21  21  21  21  21
  5  12345 1991  54 27.0  NA  NA  54 18.00  NA  54  54  54  NA
  6  12345 1993  65 32.5  65  65  65 21.67  65  65  65  65  65
  7  12345 1994  23 11.5  23  23  23  7.67  23  23  23  23  23
  8  89765 1991  12  6.0  NA  NA  12  4.00  NA  12  12  12  NA
  9  89765 1992  13  6.5  13  13  13  4.33  13  13  13  13  13
  10 89765 1993  13  6.5  NA  NA  13  4.33  NA  13  13  13  NA
  11 89765 1994  13  6.5  13  13  13  4.33  13  13  13  13  13
  12 89765 1995  14  7.0  14  14  14  4.67  14  14  14  14  14
 
 
  The Goal is to return a Time series object for each ID. Alternatively one
  could return a matrix that I can turn into a Time series.
  The final structure would be something like this ( done in matrix form
 for
  illustration)
   1989.0  1989.083
 1991 ..19921993. 1994  1995
  67543 12   6.0   12  12  12  4.00  12  12  12  12  12...
  .34...21.. NA.NANA
  12345  NA, NA,
  NA,.54 27
 
  Basically the time series will have patches at the front, middle and end
  where you may have years of NA
  The must be column ordered by time and aligned so that averages for all
  series can be computed per month.
 
  Now I have looping code to do this, where I loop through all the IDs and
 map
  the row of data into the correct
  column. and create column names based on the data and row names based on
 the
  ID, but it's painfully
  slow. Any wizardry would help.

 Your email came out a bit garbled so its not clear what you want to
 get out but this code will produce a multivariate ts series, i.e. an
 mts series, with one column for each series:

 f - function(x) ts(c(t(x[-(1:2)])), freq = 12, start = x$Year[1])
 do.call(cbind, by(Data, Data$Index, f))


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data frame reordering to time series

2010-08-07 Thread steven mosher
Very Slick.

Gabor this is a Huge speed up for me. Thanks. ha, Now I want to rewrite a
bunch of working code.




Id-c(rep(67543,4),rep(12345,3),rep(89765,5))
 Years-c(seq(1989,1992,by =1),1991,1993,1994,seq(1991,1995,by=1))
Values2-c(12,NA,34,21,NA,65,23,NA,13,NA,13,14)
 Values-c(12,14,34,21,54,65,23,12,13,13,13,14)
 
Data-data.frame(Index=Id,Year=Years,Jan=Values,Feb=Values/2,Mar=Values2,Apr=Values2,Jun=Values)
 Data
   Index Year Jan  Feb Mar Apr Jun
1  67543 1989  12  6.0  12  12  12
2  67543 1990  14  7.0  NA  NA  14
3  67543 1991  34 17.0  34  34  34
4  67543 1992  21 10.5  21  21  21
5  12345 1991  54 27.0  NA  NA  54
6  12345 1993  65 32.5  65  65  65
7  12345 1994  23 11.5  23  23  23
8  89765 1991  12  6.0  NA  NA  12
9  89765 1992  13  6.5  13  13  13
10 89765 1993  13  6.5  NA  NA  13
11 89765 1994  13  6.5  13  13  13
12 89765 1995  14  7.0  14  14  14

#  Gabor's solution

 f - function(x) ts(c(t(x[-(1:2)])), freq = 12, start = x$Year[1])
 do.call(cbind, by(Data, Data$Index, f))
 12345 67543 89765
Jan 1989NA  12.0NA
Feb 1989NA   6.0NA
Mar 1989NA  12.0NA
Apr 1989NA  12.0NA
May 1989NA  12.0NA
Jun 1989NA  14.0NA
Jul 1989NA   7.0NA
Aug 1989NANANA
Sep 1989NANANA
Oct 1989NA  14.0NA
Nov 1989NA  34.0NA
Dec 1989NA  17.0NA
Jan 1990NA  34.0NA
Feb 1990NA  34.0NA
Mar 1990NA  34.0NA
Apr 1990NA  21.0NA
May 1990NA  10.5NA
Jun 1990NA  21.0NA
Jul 1990NA  21.0NA
Aug 1990NA  21.0NA
Sep 1990NANANA
Oct 1990NANANA
Nov 1990NANANA
Dec 1990NANANA
Jan 1991  54.0NA  12.0
Feb 1991  27.0NA   6.0
...

On Sat, Aug 7, 2010 at 5:09 PM, steven mosher mosherste...@gmail.comwrote:

 Thanks Gabor, I probably should have done an example with fewer columns.

 i will rework the example and post it up so the next guys who has this
 issue can have a
 clear example with a solution.



 On Sat, Aug 7, 2010 at 5:04 PM, Gabor Grothendieck 
 ggrothendi...@gmail.com wrote:

 On Sat, Aug 7, 2010 at 4:49 PM, steven mosher mosherste...@gmail.com
 wrote:
  Given a data frame, or it could be a matrix if I choose to.
  The data consists of an ID, a year, and data for all 12 months.
  Missing values are a factor AND missing years.
 
  Id-c(rep(67543,4),rep(12345,3),rep(89765,5))
   Years-c(seq(1989,1992,by =1),1991,1993,1994,seq(1991,1995,by=1))
   Values2-c(12,NA,34,21,NA,65,23,NA,13,NA,13,14)
   Values-c(12,14,34,21,54,65,23,12,13,13,13,14)
 
  
 Data-data.frame(Index=Id,Year=Years,Jan=Values,Feb=Values/2,Mar=Values2,Apr=Values2,Jun=Values,July=Values/3,Aug=Values2,Sep=Values,
  + Oct=Values,Nov=Values,Dec=Values2)
   Data
Index Year Jan  Feb Mar Apr Jun  July Aug Sep Oct Nov Dec
  1  67543 1989  12  6.0  12  12  12  4.00  12  12  12  12  12
  2  67543 1990  14  7.0  NA  NA  14  4.67  NA  14  14  14  NA
  3  67543 1991  34 17.0  34  34  34 11.33  34  34  34  34  34
  4  67543 1992  21 10.5  21  21  21  7.00  21  21  21  21  21
  5  12345 1991  54 27.0  NA  NA  54 18.00  NA  54  54  54  NA
  6  12345 1993  65 32.5  65  65  65 21.67  65  65  65  65  65
  7  12345 1994  23 11.5  23  23  23  7.67  23  23  23  23  23
  8  89765 1991  12  6.0  NA  NA  12  4.00  NA  12  12  12  NA
  9  89765 1992  13  6.5  13  13  13  4.33  13  13  13  13  13
  10 89765 1993  13  6.5  NA  NA  13  4.33  NA  13  13  13  NA
  11 89765 1994  13  6.5  13  13  13  4.33  13  13  13  13  13
  12 89765 1995  14  7.0  14  14  14  4.67  14  14  14  14  14
 
 
  The Goal is to return a Time series object for each ID. Alternatively
 one
  could return a matrix that I can turn into a Time series.
  The final structure would be something like this ( done in matrix form
 for
  illustration)
   1989.0  1989.083
 1991 ..19921993. 1994  1995
  67543 12   6.0   12  12  12  4.00  12  12  12  12  12...
  .34...21.. NA.NANA
  12345  NA, NA,
  NA,.54 27
 
  Basically the time series will have patches at the front, middle and end
  where you may have years of NA
  The must be column ordered by time and aligned so that averages for all
  series can be computed per month.
 
  Now I have looping code to do this, where I loop through all the IDs and
 map
  the row of data into the correct
  column. and create column names based on the data and row names based on
 the
  ID, but it's painfully
  slow. Any wizardry would help.

 Your email came out a bit garbled so its not clear what you want to
 get out but this code will produce a multivariate ts series, i.e. an
 mts series, with one column for each series:

 f - function(x) ts(c(t(x[-(1:2)])), freq = 12, start = x$Year[1])
 do.call(cbind, by(Data, Data$Index, f))




[[alternative HTML version deleted

Re: [R] Data frame reordering to time series

2010-08-08 Thread steven mosher
In the real data the months are all complete, but the years can be missing.
So years can be missing up front, in the middle, at the end. but if a year
is present than every month has a value or NA.

To create regular R ts I had to plow through the data frame, collect a year
caluculate an index to put it into the final time series.

I had tried zoo out and it handled the irregular spaced data, but a large
data structure of zoo objects had stumped me. espcially since I need to do
matching and selecting
of the zoo objects.

In the real data, there are about 7000 time series of 1500 months and those
7000
get averaged and combined in different ways


On Sat, Aug 7, 2010 at 8:45 PM, Gabor Grothendieck
ggrothendi...@gmail.comwrote:

 On Sat, Aug 7, 2010 at 9:18 PM, steven mosher mosherste...@gmail.com
 wrote:
  Very Slick.
  Gabor this is a Huge speed up for me. Thanks. ha, Now I want to rewrite a
  bunch of working code.
 
 
 
  Id-c(rep(67543,4),rep(12345,3),rep(89765,5))
   Years-c(seq(1989,1992,by =1),1991,1993,1994,seq(1991,1995,by=1))
  Values2-c(12,NA,34,21,NA,65,23,NA,13,NA,13,14)
   Values-c(12,14,34,21,54,65,23,12,13,13,13,14)
 
  
 Data-data.frame(Index=Id,Year=Years,Jan=Values,Feb=Values/2,Mar=Values2,Apr=Values2,Jun=Values)
   Data
 Index Year Jan  Feb Mar Apr Jun
  1  67543 1989  12  6.0  12  12  12
  2  67543 1990  14  7.0  NA  NA  14
  3  67543 1991  34 17.0  34  34  34
  4  67543 1992  21 10.5  21  21  21
  5  12345 1991  54 27.0  NA  NA  54
  6  12345 1993  65 32.5  65  65  65
  7  12345 1994  23 11.5  23  23  23
  8  89765 1991  12  6.0  NA  NA  12
  9  89765 1992  13  6.5  13  13  13
  10 89765 1993  13  6.5  NA  NA  13
  11 89765 1994  13  6.5  13  13  13
  12 89765 1995  14  7.0  14  14  14
  #  Gabor's solution
   f - function(x) ts(c(t(x[-(1:2)])), freq = 12, start = x$Year[1])
   do.call(cbind, by(Data, Data$Index, f))
   12345 67543 89765


 The original data had consecutive months in each series (actually
 there was a missing 1992 in one case but I assumed that was an
 inadvertent omission and the actual data was complete); however, here
 we have missing 6 month chunks in addition.  That makes the series
 non-consecutive so to solve that we could either apply this to the
 data (after putting the missing 1992 year back in):

 Data - cbind(Data, NA, NA, NA, NA, NA, NA)

 or we could use a time series class that can handle irregularly spaced
 data:

 library(zoo)
 f - function(x) {
dat - x[-(1:2)]
tim - as.yearmon(outer(x$Year, seq(0, length = ncol(dat))/12, +))
zoo(c(as.matrix(dat)), tim)
 }
 do.call(cbind, by(Data, Data$Index, f))

 The last line is  unchanged from before.  This code will also handle
 the original situation correctly even if the missing 1992 is truly
 missing.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data frame reordering to time series

2010-08-08 Thread steven mosher
Ok,

I'm a bit confused by what you mean by regularly spaced
After I do the  do.call I do get a data structure with all the times present
and every time has a NA or a data value.

Steve

On Sun, Aug 8, 2010 at 2:46 AM, Gabor Grothendieck
ggrothendi...@gmail.comwrote:

 On Sun, Aug 8, 2010 at 2:01 AM, steven mosher mosherste...@gmail.com
 wrote:
  In the real data the months are all complete, but the years can be
 missing.
  So years can be missing up front, in the middle, at the end. but if a
 year
  is present than every month has a value or NA.
  To create regular R ts I had to plow through the data frame, collect a
 year
  caluculate an index to put it into the final time series.
 
  I had tried zoo out and it handled the irregular spaced data, but a large
  data structure of zoo objects had stumped me. espcially since I need to
 do
  matching and selecting
  of the zoo objects.
  In the real data, there are about 7000 time series of 1500 months and
 those
  7000
  get averaged and combined in different ways

 If there are missing years and you want to get a regularly spaced
 series out then use the zoo version of f (rather than the ts version of f)
 and if this is the last statement (same as before but assigning
 it to the variable z):

   z - do.call(cbind, by(Data, Data$Index, f))

 then to get a regularly spaced ts object just do this:

   as.ts(z)

 or

   as.zooreg(as.ts(z))

 to create a regularly spaced zooreg object.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data frame reordering to time series

2010-08-08 Thread steven mosher
Thanks again,

They worked for me as well. I did a simpler example with fewer years just to
show that it worked...( shorted here for display)

 f - function(x) {
+dat - x[-(1:2)]
+tim - as.yearmon(outer(x$Year, seq(0, length = ncol(dat))/12,
+))
+zoo(c(as.matrix(dat)), tim)
+ }
 g-do.call(cbind, by(Data, Data$Index, f))
 g
 X12345 X34567 X56789
Jan 1989 NA  3  6
Feb 1989 NA  3  6
Mar 1989 NA  3  6
Apr 1989 NA  3  6
May 1989 NA  3  6
Jun 1989 NA  3  6
Jul 1989 NA  3  6
Aug 1989 NA  3  6
Sep 1989 NA  3  6
Oct 1989 NA  3  6
Nov 1989 NA  3  6
Dec 1989 NA  3  6
Jan 1990  2  4  6
Feb 1990  2  4  6
Mar 1990  2  4  6
Apr 1990  2  4  6
May 1990  2  4  6
Jun 1990  2  4  6
Jul 1990  2  4  6
Aug 1990  2  4  6
Sep 1990  2  4  6
Oct 1990  2  4  6
Nov 1990  2  4  6
Dec 1990  2  4  6
Jan 1991 NA  5 NA

.

z-as.zooreg(as.ts(g))
 z
 X12345 X34567 X56789
1989(1)  NA  3  6
1989(2)  NA  3  6
1989(3)  NA  3  6
1989(4)  NA  3  6
1989(5)  NA  3  6
1989(6)  NA  3  6
1989(7)  NA  3  6
1989(8)  NA  3  6
1989(9)  NA  3  6
1989(10) NA  3  6
1989(11) NA  3  6
1989(12) NA  3  6
1990(1)   2  4  6
1990(2)   2  4  6
1990(3)   2  4  6
1990(4)   2  4  6
1990(5)   2  4  6
1990(6)   2  4  6
1990(7)   2  4  6
1990(8)   2  4  6
1990(9)   2  4  6
1990(10)  2  4  6
1990(11)  2  4  6
1990(12)  2  4  6
1991(1)  NA  5 NA
1991(2)  NA  5 NA
1991(3)  NA  5 NA
1991(4)  NA  5 NA
1991(5)  NA  5 NA
1991(6)  NA  5 NA
1991(7)  NA  5 NA
1991(8)  NA  5 NA
1991(9)  NA  5 NA
1991(10) NA  5 NA
1991(11) NA  5 NA
1991(12) NA  5 NA
1992(1)   2 NA NA
1992(2)   2 NA NA


***
The interesting this is the change from months to the (1)...



On Sun, Aug 8, 2010 at 8:55 AM, Gabor Grothendieck
ggrothendi...@gmail.comwrote:

 On Sun, Aug 8, 2010 at 11:21 AM, steven mosher mosherste...@gmail.com
 wrote:
  Ok,
  I'm a bit confused by what you mean by regularly spaced
  After I do the  do.call I do get a data structure with all the times
 present
  and every time has a NA or a data value.
  Steve
 

 regularly spaced means that every observation is one month later than
 the prior.  If there are missing 6 month chunks or missing entire
 years then the observations are not regularly spaced since there are
 some months not present.

 It works for me:

  Id-c(rep(67543,4),rep(12345,3),rep(89765,5))
   Years-c(seq(1989,1992,by =1),1991,1993,1994,seq(1991,1995,by=1))
   Values2-c(12,NA,34,21,NA,65,23,NA,13,NA,13,14)
   Values-c(12,14,34,21,54,65,23,12,13,13,13,14)
 
  
 Data-data.frame(Index=Id,Year=Years,Jan=Values,Feb=Values/2,Mar=Values2,Apr=Values2,Jun=Values,July=Values/3,Aug=Values2,Sep=Values,
 +  Oct=Values,Nov=Values,Dec=Values2)
 
  library(zoo)
  f - function(x) {
 +dat - x[-(1:2)]
 +tim - as.yearmon(outer(x$Year, seq(0, length = ncol(dat))/12,
 +))
 +zoo(c(as.matrix(dat)), tim)
 + }
  do.call(cbind, by(Data, Data$Index, f))
 X12345X67543X89765
 Jan 1989NA 12.00NA
 Feb 1989NA  6.00NA
 Mar 1989NA 12.00NA
 Apr 1989NA 12.00NA
 May 1989NA 12.00NA
 Jun 1989NA  4.00NA
 Jul 1989NA 12.00NA
 Aug 1989NA 12.00NA
 Sep 1989NA 12.00NA
 Oct 1989NA 12.00NA
 Nov 1989NA 12.00NA
 Jan 1990NA 14.00NA
 Feb 1990NA  7.00NA
 Mar 1990NANANA
 Apr 1990NANANA
 May 1990NA 14.00NA
 Jun 1990NA  4.67NA
 Jul 1990NANANA
 Aug 1990NA 14.00NA
 Sep 1990NA 14.00NA
 Oct 1990NA 14.00NA
 Nov 1990NANANA
 Jan 1991 54.00 34.00 12.00
 Feb 1991 27.00 17.00  6.00
 Mar 1991NA 34.00NA
 Apr 1991NA 34.00NA
 May 1991 54.00 34.00 12.00
 Jun 1991 18.00 11.33  4.00
 Jul 1991NA 34.00NA
 Aug 1991 54.00 34.00 12.00
 Sep 1991 54.00 34.00 12.00
 Oct 1991 54.00 34.00 12.00
 Nov 1991

[R] nested 'by'

2010-08-09 Thread steven mosher
Assuming a data frame or matrix with two columns representing variable that
you want to aggregate over.
you want to calculate column means, by year, for each Id



example-data.frame(id=c(rep(12345,5),rep(54321,6),rep(45678,7)),Year=rep(seq(1900,1902,by=1),6),
x=seq(1,18,by=1),y=seq(18,1,by=-1))
 example
  id Year  x  y
1  12345 1900  1 18
2  12345 1901  2 17
3  12345 1902  3 16
4  12345 1900  4 15
5  12345 1901  5 14
6  54321 1902  6 13
7  54321 1900  7 12
8  54321 1901  8 11
9  54321 1902  9 10
10 54321 1900 10  9
11 54321 1901 11  8
12 45678 1902 12  7
13 45678 1900 13  6
14 45678 1901 14  5
15 45678 1902 15  4
16 45678 1900 16  3
17 45678 1901 17  2
18 45678 1902 18  1

 result-by(example[,3:4], example$id, by(example[,3:4],
example$Year,colMeans, na.rm=T))
Error in FUN(X[[1L]], ...) : could not find function FUN


desired result should look like:
 id  Year  meanx mean y
1  12345 1900   ......
2  12345 1901   ...
3  12345 1902   ...
4  54321 1900
5  54321 1901
6  54321 1902
7 45678 1900
8 45678 1901
9 45678 1902

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nested 'by'

2010-08-09 Thread steven mosher
That works.

Thanks

On Mon, Aug 9, 2010 at 7:55 AM, Henrique Dallazuanna www...@gmail.comwrote:

 Try this:

 aggregate(example[c('x', 'y')], example[c('id', 'Year')], 'mean')


 On Mon, Aug 9, 2010 at 11:46 AM, steven mosher mosherste...@gmail.comwrote:

 Assuming a data frame or matrix with two columns representing variable
 that
 you want to aggregate over.
 you want to calculate column means, by year, for each Id




 example-data.frame(id=c(rep(12345,5),rep(54321,6),rep(45678,7)),Year=rep(seq(1900,1902,by=1),6),
 x=seq(1,18,by=1),y=seq(18,1,by=-1))
  example
  id Year  x  y
 1  12345 1900  1 18
 2  12345 1901  2 17
 3  12345 1902  3 16
 4  12345 1900  4 15
 5  12345 1901  5 14
 6  54321 1902  6 13
 7  54321 1900  7 12
 8  54321 1901  8 11
 9  54321 1902  9 10
 10 54321 1900 10  9
 11 54321 1901 11  8
 12 45678 1902 12  7
 13 45678 1900 13  6
 14 45678 1901 14  5
 15 45678 1902 15  4
 16 45678 1900 16  3
 17 45678 1901 17  2
 18 45678 1902 18  1

  result-by(example[,3:4], example$id, by(example[,3:4],
 example$Year,colMeans, na.rm=T))
 Error in FUN(X[[1L]], ...) : could not find function FUN


 desired result should look like:
  id  Year  meanx mean y
 1  12345 1900   ......
 2  12345 1901   ...
 3  12345 1902   ...
 4  54321 1900
 5  54321 1901
 6  54321 1902
 7 45678 1900
 8 45678 1901
 9 45678 1902

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sweep and zoo objects

2010-08-11 Thread steven mosher
rc-list(c(
123,321,234,543,654,768,986,987,246,284),c(Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec))

# the matrix has rownames that are used as identifiers and columns
# of time. 1 years worth of data. Thats the native format

 test-matrix(seq(1,120, by=1), nrow=10,dimnames=rc)
 test
  Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
123   1  11  21  31  41  51  61  71  81  91 101 111
321   2  12  22  32  42  52  62  72  82  92 102 112
234   3  13  23  33  43  53  63  73  83  93 103 113
543   4  14  24  34  44  54  64  74  84  94 104 114
654   5  15  25  35  45  55  65  75  85  95 105 115
768   6  16  26  36  46  56  66  76  86  96 106 116
986   7  17  27  37  47  57  67  77  87  97 107 117
987   8  18  28  38  48  58  68  78  88  98 108 118
246   9  19  29  39  49  59  69  79  89  99 109 119
284  10  20  30  40  50  60  70  80  90 100 110 120

#The desired result would be a merged zoo object with the row names used as
the colnames of the multiple zoo series

test2-matrix(test,nrow=12,  byrow=F)
g-zoo(test2[,1],frequency=12)
 MYZOO -merge(g,test2[,2:10])

# the result MYZOO is a zoo object, but we've lost the row names in the
transformation of the matrix

#So
colnames(MYZOO)-row.names(test)

#Fixes that problem. Is there a more elegant way to do this???

# now this zoo object needs to be swept out of a much longer zoo object
# with the same column names.. The 'sweep' function is -

Sweep works normally by sweeping out a vector from an array (by column or by
row


sweep(x, MARGIN, STATS, FUN=-, check.margin=TRUE, ...)


 so in my example  x would be a  long yearmon zoo object with the same
column names
as MYZOO above, but decades of data. MARGIN would be rows and the STATS
to sweep out would be the values in MYZOO.


test3-matrix(seq(1,720, by=1), ncol=10)
p-zoo(test3[,1], freq=12)
longzoo-merge(p,test3[,2:10])
colnames(longzoo)-row.names(test)

what we want to do is to sweep out MYZOO from longzoo. I could just repeat
the data in MYZOO 6 times and then subtract MYZOO from longzoo, but thats
a potential memory buster in this situation

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sweeping a zoo series

2010-08-11 Thread steven mosher
Given a long zoo matrix, the goal is to sweep out a statistic from the
entire length of the
sequences.

 
longzoomatrix-zoo(matrix(rnorm(720),ncol=6),as.yearmon(outer(1900,seq(0,length=120)/12,+)))
 cnames-c(12345,23456,34567,45678,56789,67890)
 colnames(longzoomatrix)-cnames
 longzoomatrix[1:24,]
   12345   23456   34567   45678   56789
 67890
Jan 1900 -0.17123165  1.02087086  0.79514870 -0.54519494 -0.13025459
-0.009980402
Feb 1900  1.21729926 -0.74541038 -0.08138406 -2.01180775  0.19256998
 0.551965871
Mar 1900  1.13222481 -1.25315703  0.01013473  0.08366155 -0.84246010
-1.405959298
Apr 1900 -0.02352559 -1.25001473 -1.53570550 -0.17945324  0.33368133
 2.045125104
May 1900  2.08204920  1.28091067 -0.80888146  0.31796730  0.83248551
 1.439049603
Jun 1900  0.62209570 -0.66189249 -0.57923119 -0.04346112 -2.71353384
-0.346826902
Jul 1900 -1.39758918 -0.54525469 -0.05230070 -0.36725079  1.28281798
 1.391174712
Aug 1900  0.12594069  0.09303970  0.69916411 -1.01902352 -0.82720898
-0.208113626
Sep 1900 -0.34310543  0.41718435  0.79455765  1.13234707  0.14652667
-0.551426097
Oct 1900  1.70634123 -1.20073104 -1.08771551 -0.01715296  0.24931996
-0.753481196
Nov 1900  0.15224070 -0.05108370 -0.97410069  0.51130170  0.13880814
-2.160811186
Dec 1900  0.34726817  0.61830719  0.84429979 -0.26253635  0.95243068
-0.533562966
Jan 1901  0.28647563 -0.40650198 -1.19640622  0.70267162  0.18867804
 0.098855045
Feb 1901  1.27269836  0.31797472 -1.13038040  1.33654480  0.08885501
-0.134690872
Mar 1901 -1.36934330 -0.17244539  0.81705554 -0.09113888  0.90241413
 0.473939164
Apr 1901 -0.89768498  0.82497595  0.15684387  2.25294476 -1.72886103
-0.104769411
May 1901 -0.27898445 -1.24348285  1.36203180  0.02422083 -1.33745980
 1.098856752
Jun 1901 -0.67968801  0.42082064  0.47056133 -0.12981223  0.19445803
-0.284638114
Jul 1901  0.03791761 -0.22118130  1.96044737 -1.18280989  0.90075205
 0.055720535
Aug 1901  1.12904079  0.57177055  0.64300572 -0.16284983  0.07951656
-0.159396821
Sep 1901 -1.43513934  0.03036697  1.09039400  0.99201776  0.98744827
-0.057234838
Oct 1901  0.73828382  0.53967835  2.16608282 -0.82929778 -1.9987
 0.352778450
Nov 1901  0.06561583 -1.20126258  0.67427027  0.15493106  0.08867697
 1.223073528
Dec 1901 -1.23347027 -1.09699304  0.59398031 -0.22269292 -0.21569543
 1.389667825

The statistic to be swept out is itself a zoo series with matching column
names.
There are twelve valies for each column representing an monthly average for
that
series.

The average is to be subtracted

 sweepzoo-zoo(matrix(rnorm(72),ncol=6), frequency=12)
 colnames(sweepzoo)-cnames
 sweepzoo
   12345  23456  34567  45678   56789  67890
1(1)  -2.5569706 -0.4375741 -0.1803866 -0.6303760 -0.08995198  2.7293244
2(1)   1.4154202  0.2559212  0.2104513  0.7439446  0.84897905 -0.4144865
3(1)  -1.3709275  1.0472759  1.5975148  0.3190503  1.10430959 -1.8285194
4(1)  -1.1436430  2.2071763 -0.2637954 -0.4915366 -0.03925020  1.3311624
5(1)  -0.8003656  1.6421541 -1.4603128  0.4493069  0.28194066 -0.4728086
6(1)   0.9236015  0.3780122 -1.3848196  0.4263684  0.99584590 -1.4536475
7(1)   0.8810281  0.0381152  0.3810457 -0.6884233 -0.11018089  0.4221188
8(1)   0.3819421 -0.8431364  1.9876901  0.7072257  0.45524929  2.7013515
9(1)  -1.1247988  1.3083178 -0.3438442  0.3300832  0.67013503  1.2912443
10(1) -0.3643043  1.0756782 -1.2026318  0.4477054  0.54486700 -0.3369889
11(1)  0.8294049  1.8170357  0.5691249  1.9213791 -0.29295754 -0.2617228
12(1) -1.0085265 -0.7556545 -1.4033321 -0.4646647 -0.14984913 -0.4848657

A brute force way to do this is to repeat the 12 values for each column so
that
the number of rows in the  sweepzoo is equal to the nmber of rows on the
long zoo, object and then just subtract them. longzoomatrix-sweepzoo

As a function sweep() wont work because it expects a  vector whose
dimensions
matches the dimension of the MARGIN.

Is there a elegant way to do this short of creating a sweep zoo that
matches
the row dimension of longzoo?  ( would be a nice addition to sweep)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sweeping a zoo series

2010-08-11 Thread steven mosher
The colMeans comes closest,

for a single series the assume you have 100 years of monthly data.

The mean you want to scale by is the mean for a restricted period in the
center
of the series.. say 1950-1960

for this period you have the average jan (1950-1960) average feb, ect.

your final series would be

jan 1900 - average jan(1950-60)
feb 1990 - average feb

jan 2000 - average jan(1950-60)

Which gives you a scaling that is not relative to the mean of the whole, but
relative to a base period which is selctable.

BTW switching to zoo has greatly simplified the code.

On Wed, Aug 11, 2010 at 11:21 AM, Gabor Grothendieck 
ggrothendi...@gmail.com wrote:

 On Wed, Aug 11, 2010 at 12:22 PM, steven mosher mosherste...@gmail.com
 wrote:
  Given a long zoo matrix, the goal is to sweep out a statistic from the
  entire length of the
  sequences.
 
 
  
 longzoomatrix-zoo(matrix(rnorm(720),ncol=6),as.yearmon(outer(1900,seq(0,length=120)/12,+)))
   cnames-c(12345,23456,34567,45678,56789,67890)
   colnames(longzoomatrix)-cnames
   longzoomatrix[1:24,]
12345   23456   34567   45678   56789
   67890
  Jan 1900 -0.17123165  1.02087086  0.79514870 -0.54519494 -0.13025459
  -0.009980402
  Feb 1900  1.21729926 -0.74541038 -0.08138406 -2.01180775  0.19256998
   0.551965871
  Mar 1900  1.13222481 -1.25315703  0.01013473  0.08366155 -0.84246010
  -1.405959298
  Apr 1900 -0.02352559 -1.25001473 -1.53570550 -0.17945324  0.33368133
   2.045125104
  May 1900  2.08204920  1.28091067 -0.80888146  0.31796730  0.83248551
   1.439049603
  Jun 1900  0.62209570 -0.66189249 -0.57923119 -0.04346112 -2.71353384
  -0.346826902
  Jul 1900 -1.39758918 -0.54525469 -0.05230070 -0.36725079  1.28281798
   1.391174712
  Aug 1900  0.12594069  0.09303970  0.69916411 -1.01902352 -0.82720898
  -0.208113626
  Sep 1900 -0.34310543  0.41718435  0.79455765  1.13234707  0.14652667
  -0.551426097
  Oct 1900  1.70634123 -1.20073104 -1.08771551 -0.01715296  0.24931996
  -0.753481196
  Nov 1900  0.15224070 -0.05108370 -0.97410069  0.51130170  0.13880814
  -2.160811186
  Dec 1900  0.34726817  0.61830719  0.84429979 -0.26253635  0.95243068
  -0.533562966
  Jan 1901  0.28647563 -0.40650198 -1.19640622  0.70267162  0.18867804
   0.098855045
  Feb 1901  1.27269836  0.31797472 -1.13038040  1.33654480  0.08885501
  -0.134690872
  Mar 1901 -1.36934330 -0.17244539  0.81705554 -0.09113888  0.90241413
   0.473939164
  Apr 1901 -0.89768498  0.82497595  0.15684387  2.25294476 -1.72886103
  -0.104769411
  May 1901 -0.27898445 -1.24348285  1.36203180  0.02422083 -1.33745980
   1.098856752
  Jun 1901 -0.67968801  0.42082064  0.47056133 -0.12981223  0.19445803
  -0.284638114
  Jul 1901  0.03791761 -0.22118130  1.96044737 -1.18280989  0.90075205
   0.055720535
  Aug 1901  1.12904079  0.57177055  0.64300572 -0.16284983  0.07951656
  -0.159396821
  Sep 1901 -1.43513934  0.03036697  1.09039400  0.99201776  0.98744827
  -0.057234838
  Oct 1901  0.73828382  0.53967835  2.16608282 -0.82929778 -1.9987
   0.352778450
  Nov 1901  0.06561583 -1.20126258  0.67427027  0.15493106  0.08867697
   1.223073528
  Dec 1901 -1.23347027 -1.09699304  0.59398031 -0.22269292 -0.21569543
   1.389667825
 
  The statistic to be swept out is itself a zoo series with matching column
  names.
  There are twelve valies for each column representing an monthly average
 for
  that
  series.
 
  The average is to be subtracted
 
   sweepzoo-zoo(matrix(rnorm(72),ncol=6), frequency=12)
   colnames(sweepzoo)-cnames
   sweepzoo
12345  23456  34567  45678   56789  67890
  1(1)  -2.5569706 -0.4375741 -0.1803866 -0.6303760 -0.08995198  2.7293244
  2(1)   1.4154202  0.2559212  0.2104513  0.7439446  0.84897905 -0.4144865
  3(1)  -1.3709275  1.0472759  1.5975148  0.3190503  1.10430959 -1.8285194
  4(1)  -1.1436430  2.2071763 -0.2637954 -0.4915366 -0.03925020  1.3311624
  5(1)  -0.8003656  1.6421541 -1.4603128  0.4493069  0.28194066 -0.4728086
  6(1)   0.9236015  0.3780122 -1.3848196  0.4263684  0.99584590 -1.4536475
  7(1)   0.8810281  0.0381152  0.3810457 -0.6884233 -0.11018089  0.4221188
  8(1)   0.3819421 -0.8431364  1.9876901  0.7072257  0.45524929  2.7013515
  9(1)  -1.1247988  1.3083178 -0.3438442  0.3300832  0.67013503  1.2912443
  10(1) -0.3643043  1.0756782 -1.2026318  0.4477054  0.54486700 -0.3369889
  11(1)  0.8294049  1.8170357  0.5691249  1.9213791 -0.29295754 -0.2617228
  12(1) -1.0085265 -0.7556545 -1.4033321 -0.4646647 -0.14984913 -0.4848657
 
  A brute force way to do this is to repeat the 12 values for each column
 so
  that
  the number of rows in the  sweepzoo is equal to the nmber of rows on
 the
  long zoo, object and then just subtract them. longzoomatrix-sweepzoo
 
  As a function sweep() wont work because it expects a  vector whose
  dimensions
  matches the dimension of the MARGIN.
 
  Is there a elegant way to do this short of creating a sweep zoo that
  matches
  the row dimension of longzoo?  ( would be a nice addition

[R] Creating list from a long vector

2010-08-14 Thread steven mosher
Stupid question, but its been a long night.

If I have a long vector how can I turn it into a list of the same length

x-rep(seq(1,100,by=1),each=10)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating list from a long vector

2010-08-14 Thread steven mosher
Thx, I see my problem. more sleep required

On Sat, Aug 14, 2010 at 9:25 AM, Romain Francois romain.franc...@dbmail.com
 wrote:


 Le 14/08/10 18:22, steven mosher a écrit :


 Stupid question, but its been a long night.

 If I have a long vector how can I turn it into a list of the same length

 x-rep(seq(1,100,by=1),each=10)


 Perhaps as.list ?

 --
 Romain Francois
 Professional R Enthusiast
 +33(0) 6 28 91 30 30
 http://romainfrancois.blog.free.fr
 |- http://bit.ly/bzoWrs : Rcpp svn revision 2000
 |- http://bit.ly/b8VNE2 : Rcpp at LondonR, oct 5th
 `- http://bit.ly/aAyra4 : highlight 0.2-2




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Trouble loading saved Rdata

2010-08-14 Thread steven mosher
In the particular application I have I save test.Rdata to a sub directory
dir-Example
dir.create(dir)
test-data.frame(a=c(1,2,3),b=c(3,4,5)

full-file.path(dir,test.Rdata,fsep=.Platform$file.sep)
save(test,file=full)
load(full)
returns NULL

it works fine when the object is saved to the working directory, but fails
when saved to a sub directory.
The Rdata is there. Bytes are in it. but loading it doesnt work.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trouble loading saved Rdata

2010-08-15 Thread steven mosher
The typos were just transcription errors I'' report out the session info

On Sat, Aug 14, 2010 at 5:35 PM, Joshua Wiley jwiley.ps...@gmail.comwrote:

 That worked for me once I properly quoted test.RData on

  sessionInfo()
 R version 2.11.1 (2010-05-31)
 x86_64-pc-mingw32

 locale:
 [1] LC_COLLATE=English_United States.1252
 [2] LC_CTYPE=English_United States.1252
 [3] LC_MONETARY=English_United States.1252
 [4] LC_NUMERIC=C
 [5] LC_TIME=English_United States.1252

 If correcting the quoting does not help you, perhaps you can report
 the results of sessionInfo()

 Cheers,

 Josh

 On Sat, Aug 14, 2010 at 5:14 PM, steven mosher mosherste...@gmail.com
 wrote:
  In the particular application I have I save test.Rdata to a sub
 directory
  dir-Example
  dir.create(dir)
  test-data.frame(a=c(1,2,3),b=c(3,4,5)
 
  full-file.path(dir,test.Rdata,fsep=.Platform$file.sep)
  save(test,file=full)
  load(full)
  returns NULL
 
  it works fine when the object is saved to the working directory, but
 fails
  when saved to a sub directory.
  The Rdata is there. Bytes are in it. but loading it doesnt work.
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 http://www.joshuawiley.com/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trouble loading saved Rdata

2010-08-15 Thread steven mosher
Did you exit R and then return?

fname-test.Rdata
full-file.path(Example,fname,fsep=.Platform$file.sep)
full
[1] Example/test.Rdata
 load(full)
 test
NULL
 sessionInfo()
R version 2.11.1 (2010-05-31)
x86_64-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.11.1

On Sat, Aug 14, 2010 at 5:35 PM, Joshua Wiley jwiley.ps...@gmail.comwrote:

 That worked for me once I properly quoted test.RData on

  sessionInfo()
 R version 2.11.1 (2010-05-31)
 x86_64-pc-mingw32

 locale:
 [1] LC_COLLATE=English_United States.1252
 [2] LC_CTYPE=English_United States.1252
 [3] LC_MONETARY=English_United States.1252
 [4] LC_NUMERIC=C
 [5] LC_TIME=English_United States.1252

 If correcting the quoting does not help you, perhaps you can report
 the results of sessionInfo()

 Cheers,

 Josh

 On Sat, Aug 14, 2010 at 5:14 PM, steven mosher mosherste...@gmail.com
 wrote:
  In the particular application I have I save test.Rdata to a sub
 directory
  dir-Example
  dir.create(dir)
  test-data.frame(a=c(1,2,3),b=c(3,4,5)
 
  full-file.path(dir,test.Rdata,fsep=.Platform$file.sep)
  save(test,file=full)
  load(full)
  returns NULL
 
  it works fine when the object is saved to the working directory, but
 fails
  when saved to a sub directory.
  The Rdata is there. Bytes are in it. but loading it doesnt work.
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 http://www.joshuawiley.com/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trouble loading saved Rdata

2010-08-15 Thread steven mosher
I think it came down to my actual program having a function that saved the
objects it was passed with a .RData extension as opposed to .Rdata

Rechecking the whole thing.

On Sun, Aug 15, 2010 at 11:05 AM, Joshua Wiley jwiley.ps...@gmail.comwrote:

 Steven,

 I have exited my R session and restarted and I can load the file
 without issue.  I have also tried loading the saved data on some older
 versions of R (2.10.1 and 2.11.0) and Windows (XP).  Have you tried
 recreating the test object, ensuring that it is not NULL itself,
 resaving it, and then see if loading it works better?

 Josh

 On Sun, Aug 15, 2010 at 12:06 AM, steven mosher mosherste...@gmail.com
 wrote:
  Did you exit R and then return?
  fname-test.Rdata
  full-file.path(Example,fname,fsep=.Platform$file.sep)
  full
  [1] Example/test.Rdata
   load(full)
   test
  NULL
  sessionInfo()
  R version 2.11.1 (2010-05-31)
  x86_64-apple-darwin9.8.0
  locale:
  [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
  attached base packages:
  [1] stats graphics  grDevices utils datasets  methods   base
  loaded via a namespace (and not attached):
  [1] tools_2.11.1
  On Sat, Aug 14, 2010 at 5:35 PM, Joshua Wiley jwiley.ps...@gmail.com
  wrote:
 
  That worked for me once I properly quoted test.RData on
 
   sessionInfo()
  R version 2.11.1 (2010-05-31)
  x86_64-pc-mingw32
 
  locale:
  [1] LC_COLLATE=English_United States.1252
  [2] LC_CTYPE=English_United States.1252
  [3] LC_MONETARY=English_United States.1252
  [4] LC_NUMERIC=C
  [5] LC_TIME=English_United States.1252
 
  If correcting the quoting does not help you, perhaps you can report
  the results of sessionInfo()
 
  Cheers,
 
  Josh
 
  On Sat, Aug 14, 2010 at 5:14 PM, steven mosher mosherste...@gmail.com
  wrote:
   In the particular application I have I save test.Rdata to a sub
   directory
   dir-Example
   dir.create(dir)
   test-data.frame(a=c(1,2,3),b=c(3,4,5)
  
   full-file.path(dir,test.Rdata,fsep=.Platform$file.sep)
   save(test,file=full)
   load(full)
   returns NULL
  
   it works fine when the object is saved to the working directory, but
   fails
   when saved to a sub directory.
   The Rdata is there. Bytes are in it. but loading it doesnt work.
  
  [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
 
 
  --
  Joshua Wiley
  Ph.D. Student, Health Psychology
  University of California, Los Angeles
  http://www.joshuawiley.com/
 
 



 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 http://www.joshuawiley.com/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trouble loading saved Rdata

2010-08-15 Thread steven mosher
  During my session I write several .Rdata  objects to a variety of
subdirectories
so replicating the exact problem wasnt very easy.

In the actual program all the files get written.
all the files have sizes that fit the amount of data in them.


It looks like the problem was naming the files .RData as opposed to .Rdata
since there was one function that named all the files before saving, it
kinda messed up
the program and my ability to replicate the problem.

Seems to be working now

Thanks



On Sun, Aug 15, 2010 at 6:44 AM, David Winsemius dwinsem...@comcast.netwrote:


 On Aug 15, 2010, at 3:06 AM, steven mosher wrote:

  Did you exit R and then return?

 fname-test.Rdata
 full-file.path(Example,fname,fsep=.Platform$file.sep)
 full
 [1] Example/test.Rdata
 load(full)
 test
 NULL


 I am unable to reproduce the problem (after correcting two different
 syntactic errors in the initial posting that should have thrown errors and
 prevented the creation of both test and full . I didn't exit my session
 and return but I did remove the test object after saving it. My guess is
 that the test object was not correctly formed at the time it was saved.

  test-data.frame(a=c(1,2,3),b=c(3,4,5))
   save(test,file=full)
  test
  a b
 1 1 3
 2 2 4
 3 3 5

  full
 [1] Example/test.Rdata
   rm(test)
  load(file=full)
  test
  a b
 1 1 3
 2 2 4
 3 3 5

 (I pretty much have the same setup that is indicated below running on MacOS
 10.5.8.)

 --
 David.




  sessionInfo()

 R version 2.11.1 (2010-05-31)
 x86_64-apple-darwin9.8.0

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 loaded via a namespace (and not attached):
 [1] tools_2.11.1

 On Sat, Aug 14, 2010 at 5:35 PM, Joshua Wiley jwiley.ps...@gmail.com
 wrote:

  That worked for me once I properly quoted test.RData on

  sessionInfo()

 R version 2.11.1 (2010-05-31)
 x86_64-pc-mingw32

 locale:
 [1] LC_COLLATE=English_United States.1252
 [2] LC_CTYPE=English_United States.1252
 [3] LC_MONETARY=English_United States.1252
 [4] LC_NUMERIC=C
 [5] LC_TIME=English_United States.1252

 If correcting the quoting does not help you, perhaps you can report
 the results of sessionInfo()

 Cheers,

 Josh

 On Sat, Aug 14, 2010 at 5:14 PM, steven mosher mosherste...@gmail.com
 wrote:

 In the particular application I have I save test.Rdata to a sub

 directory

 dir-Example
 dir.create(dir)
 test-data.frame(a=c(1,2,3),b=c(3,4,5)

 full-file.path(dir,test.Rdata,fsep=.Platform$file.sep)
 save(test,file=full)
 load(full)
 returns NULL

 it works fine when the object is saved to the working directory, but

 fails

 when saved to a sub directory.
 The Rdata is there. Bytes are in it. but loading it doesnt work.

  [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide

 http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.




 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 http://www.joshuawiley.com/


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 David Winsemius, MD
 West Hartford, CT



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] differecing a zoo series

2010-08-20 Thread steven mosher
A quick question

 x - as.yearmon(2000 + seq(0, 23)/12)
 x
 [1] Jan 2000 Feb 2000 Mar 2000 Apr 2000 May 2000 Jun 2000 Jul
2000 Aug 2000 Sep 2000 Oct 2000 Nov 2000 Dec 2000 Jan 2001
[14] Feb 2001 Mar 2001 Apr 2001 May 2001 Jun 2001 Jul 2001 Aug
2001 Sep 2001 Oct 2001 Nov 2001 Dec 2001
 data-seq(1,24,by=1)
 testzoo-zoo(data,order.by=x)

The operation I ant to perform on the zoo series is this. I will illustrate
with a small example and formula:

the coredata of the zoo series is  1,2,3,4,5,6,7,8)
I want to calculate Result- zoo[x]-zoo[x-1]  (NA,1,1,1,1,1...NA)
The first element of course is undefined(NA). is there any method to do this
elegantly, padding NAs at the
start  works but its ugly. if I get a simple function I can apply it to a
matrix of zoo series

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] merging two maxtrices

2010-09-05 Thread steven mosher
  j-matrix(nrow=10,ncol=10)
  k-matrix(seq(1:50), ncol=10)
  row.names(k) - seq(2,10,by=2)
  j
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
 [2,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
 [3,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
 [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
 [5,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
 [6,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
 [7,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
 [8,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
 [9,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
[10,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
  k
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
2 16   11   16   21   26   31   36   4146
4 27   12   17   22   27   32   37   4247
6 38   13   18   23   28   33   38   4348
8 49   14   19   24   29   34   39   4449
105   10   15   20   25   30   35   40   4550

is there a simple way to merge j and k By the row.names in k

so that row named '2' is placed in the 2nd row of j.. and so forth through
4,6,8,10

the actual example has a sparse k.. not evenly spaced

so this should also be mergeable

 row.names(k) - c(1,2,5,6,9)
 k
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
116   11   16   21   26   31   36   4146
227   12   17   22   27   32   37   4247
538   13   18   23   28   33   38   4348
649   14   19   24   29   34   39   4449
95   10   15   20   25   30   35   40   4550

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging two maxtrices

2010-09-05 Thread steven mosher
weird, I tried that but it didnt appear to work.. hmm. Thanks I try it again

On Sun, Sep 5, 2010 at 12:21 AM, bill.venab...@csiro.au wrote:

 Is this all you want?

  j - matrix(nrow=10,ncol=10)
  k - matrix(seq(1:50), ncol=10)
  row.names(k) - seq(2,10,by=2)
 
  row.names(j) - 1:10
  j[row.names(k), ] - k
 
  j
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 1NA   NA   NA   NA   NA   NA   NA   NA   NANA
 2 16   11   16   21   26   31   36   4146
 3NA   NA   NA   NA   NA   NA   NA   NA   NANA
 4 27   12   17   22   27   32   37   4247
 5NA   NA   NA   NA   NA   NA   NA   NA   NANA
 6 38   13   18   23   28   33   38   4348
 7NA   NA   NA   NA   NA   NA   NA   NA   NANA
 8 49   14   19   24   29   34   39   4449
 9NA   NA   NA   NA   NA   NA   NA   NA   NANA
 105   10   15   20   25   30   35   40   4550
 



 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of steven mosher
 Sent: Sunday, 5 September 2010 5:10 PM
 To: r-help
 Subject: [R] merging two maxtrices

  j-matrix(nrow=10,ncol=10)
  k-matrix(seq(1:50), ncol=10)
  row.names(k) - seq(2,10,by=2)
  j
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
  [1,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
  [2,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
  [3,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
  [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
  [5,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
  [6,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
  [7,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
  [8,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
  [9,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
 [10,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
  k
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 2 16   11   16   21   26   31   36   4146
 4 27   12   17   22   27   32   37   4247
 6 38   13   18   23   28   33   38   4348
 8 49   14   19   24   29   34   39   4449
 105   10   15   20   25   30   35   40   4550

 is there a simple way to merge j and k By the row.names in k

 so that row named '2' is placed in the 2nd row of j.. and so forth through
 4,6,8,10

 the actual example has a sparse k.. not evenly spaced

 so this should also be mergeable

  row.names(k) - c(1,2,5,6,9)
  k
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 116   11   16   21   26   31   36   4146
 227   12   17   22   27   32   37   4247
 538   13   18   23   28   33   38   4348
 649   14   19   24   29   34   39   4449
 95   10   15   20   25   30   35   40   4550

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging two maxtrices

2010-09-05 Thread steven mosher
ya, unfortunately  k was actually a dataframe and not a matrix when it was
returned
from the function, which explains why I got unexpected results

On Sun, Sep 5, 2010 at 12:21 AM, bill.venab...@csiro.au wrote:

 Is this all you want?

  j - matrix(nrow=10,ncol=10)
  k - matrix(seq(1:50), ncol=10)
  row.names(k) - seq(2,10,by=2)
 
  row.names(j) - 1:10
  j[row.names(k), ] - k
 
  j
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 1NA   NA   NA   NA   NA   NA   NA   NA   NANA
 2 16   11   16   21   26   31   36   4146
 3NA   NA   NA   NA   NA   NA   NA   NA   NANA
 4 27   12   17   22   27   32   37   4247
 5NA   NA   NA   NA   NA   NA   NA   NA   NANA
 6 38   13   18   23   28   33   38   4348
 7NA   NA   NA   NA   NA   NA   NA   NA   NANA
 8 49   14   19   24   29   34   39   4449
 9NA   NA   NA   NA   NA   NA   NA   NA   NANA
 105   10   15   20   25   30   35   40   4550
 



 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of steven mosher
 Sent: Sunday, 5 September 2010 5:10 PM
 To: r-help
 Subject: [R] merging two maxtrices

  j-matrix(nrow=10,ncol=10)
  k-matrix(seq(1:50), ncol=10)
  row.names(k) - seq(2,10,by=2)
  j
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
  [1,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
  [2,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
  [3,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
  [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
  [5,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
  [6,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
  [7,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
  [8,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
  [9,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
 [10,]   NA   NA   NA   NA   NA   NA   NA   NA   NANA
  k
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 2 16   11   16   21   26   31   36   4146
 4 27   12   17   22   27   32   37   4247
 6 38   13   18   23   28   33   38   4348
 8 49   14   19   24   29   34   39   4449
 105   10   15   20   25   30   35   40   4550

 is there a simple way to merge j and k By the row.names in k

 so that row named '2' is placed in the 2nd row of j.. and so forth through
 4,6,8,10

 the actual example has a sparse k.. not evenly spaced

 so this should also be mergeable

  row.names(k) - c(1,2,5,6,9)
  k
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 116   11   16   21   26   31   36   4146
 227   12   17   22   27   32   37   4247
 538   13   18   23   28   33   38   4348
 649   14   19   24   29   34   39   4449
 95   10   15   20   25   30   35   40   4550

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] If then else with command for

2010-09-14 Thread steven mosher
not sure how you wanted sampling of x


 R-142

color-rep(0,142)

for(i in 1:R){

y-sample(x,142,replace=FALSE)

onebit- as.numeric(3471 %in% y)

twobit-as.numeric(6720 %in% y)*2

fourbit-as.numeric(6263 %in% y)*4

colorbit-onebit+twobit+fourbit+1

color[i]-colorbit

}

On Tue, Sep 14, 2010 at 10:29 AM, Mestat mes...@pop.com.br wrote:


 Hey listers,
 I am trying to do something simple... Check the program below...
 I would like to create a variable named COLOR according to the conditions
 that I stablished... But the problem is that it seems that my variable
 COLOR
 is checking just on sample, may be last in the loop... Certainly, I am
 missing something...
 Thanks in advance,
 Marcio


 x-c(288,139,196,159,536,134,623,517,96,467,277,155,386,241,422,6263,612,532,250,412,339,55,290,249,164,97,74,144,1277,240,163,63,488,111,128,230,720,179,37,24,65,37,89,187,60,939,1008,81,310,58,169,38,68,190,78,807,220,226,69,179129,119,73,59,92,127,104,75,505,183,49,41,76,113,90,79,408,140,200,284,103,58,654,118,431,192,233,102,97,56,69,73,86,53,105,81,77,472,129,194,299,81,122,113,186,91,145,133,114,78,78,72,70,3471,641,275,815,149,185,172,240,67,526,122,229,298,317,179,233,66,129,87,82,63,65,72,6720,381,240,118,396,66,35,43,166,216,53,82,90,62,77,207,68,52,277,396,220,751,146,95,37,35,39,46,59,44,105,87,66,62,175,252,128,330,57,83,208,74,63,109,37,105,38,82,76,63,86,603,209,100,121,191,130,63,128,90,79,50,1025,121,87,309,75,189,36,82,84,60,132,46,965,155,132,219,112,53,90,66,100,77,52,60,100,153,418,392,76,130,197,262,49,105,87,70,147,720,342,233,203,249,92,134,231,782,184,182,432,49,63,94,124,69,53,91,451,53,21,42,50,40,32,58,26,28,61,60,35,764,105,592,55,28,46,34,123!
 ,4!
  1,54,207,64,562,295,226,63,233)
 R-142
 color-rep(0,142)
 for(i in 1:R){
 x-sample(x,142,replace=FALSE)
 if (!3471 %in% x  !6263 %in% x  !6720 %in% x){color[i]-1} else
 if (3471 %in% x  !6263 %in% x  !6720 %in% x){color[i]-2} else
 if (!3471 %in% x  6263 %in% x  !6720 %in% x){color[i]-3} else
 if (!3471 %in% x  !6263 %in% x  6720 %in% x){color[i]-4} else
 if (3471 %in% x  6263 %in% x  !6720 %in% x){color[i]-5} else
 if (3471 %in% x  !6263 %in% x  6720 %in% x){color[i]-6} else
 if (!3471 %in% x  6263 %in% x  6720 %in% x){color[i]-7} else
 if (3471 %in% x  6263 %in% x  6720 %in% x){color[i]-8} else{color[i]-0}
 }

 --
 View this message in context:
 http://r.789695.n4.nabble.com/If-then-else-with-command-for-tp2539341p2539341.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to uncompress a gz file in R

2010-09-15 Thread steven mosher
 Wongsang,

 Just to be clear  R.utils is different than utils

 As Henrik notes gunzip has been in R.utils ( see http://cran.r-project.org/)
for some time. It works
like a champ. R.utils is a great package.

On Wed, Sep 15, 2010 at 9:30 AM, Wonsang You y...@ifn-magdeburg.de wrote:

 Dear Henrik,

 Thank you so much for your kind help. Unfortunately, I could not find out
 any function such as 'gunzip' in R.utils package. Instead, I could be
 successful by using the following command.

 system(gunzip filename)

 On the other hand, the function 'gzfile' supports the compression as gz
 format, but I still do not know how to decompress gz file by using the
 function 'gzfile'.

 Best Regards,
 Wonsang


 On 14 September 2010 15:23, Henrik Bengtsson h...@stat.berkeley.edu wrote:

  To uncompress an *.gz file into another file on disk, see also ?gunzip
  in the R.utils package.
 
  /Henrik
 
  2010/9/14 Uwe Ligges lig...@statistik.tu-dortmund.de:
   See ?gzfile
  
   Uwe Ligges
  
  
   On 14.09.2010 11:02, Wonsang You wrote:
  
   Dear Fellows,
  
   I would like to know how to uncompress a gz file at the R console. I
  could
   not find out any help from the R-help archive.
   Thanks for your great help.
  
   Best Regards,
   Wonsang You
  
  
   -
   --
   Wonsang You
   Special Lab Non-Invasive Brain Imaging
   Leibniz Institute for Neurobiology
   http://www.ifn-magdeburg.de
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Get File Names in Folder, Read Files, Update, and Write

2010-09-15 Thread steven mosher
you are welcomed.

Steve

On Wed, Sep 15, 2010 at 6:11 PM, Suphajak Ngamlak 
supha...@phatrasecurities.com wrote:

  Thank you so much. It works well



 Best Regards,

 Suphajak Ngamlak
 Equity and Derivatives Trading
 Phatra Securities Public Company Limited
 Tel: +662-305-9179
 Email: supha...@phatrasecurities.com

 *From:* steven mosher [mailto:mosherste...@gmail.com]
 *Sent:* Thursday, September 16, 2010 2:29 AM
 *To:* Suphajak Ngamlak
 *Subject:* Re: [R] Get File Names in Folder, Read Files, Update, and Write




 Import-C:/A0810.RSK

 Table-read.table(file= Import, sep = ,, head=TRUE, na.strings = NA)

 Table$VALUE -0

 Export-C:/A_XVal0810.RSK

 write.table(Table, file= Export, sep = ,, col.names = TRUE)


 As uwe, suggests list.files() or you can use dir()

 A robust way to do this would use R.utils package ( see Cran)
 but this is only necessary if you want to move the folder and still
 have the code work. it makes the code work regardless of your working
 directory

 assuming you know the name of the folder and its unique.

 insertString   - _XVal
 targetFolder  - yourfoldername



  folderPath  -  getAbsolutePath(targetFolder) #see R.utils
  outputFolder -  folderpath  # you could create a different output folder
 name

 # only grab the .RSK files using a regular expression

 fullFilenames -  list.files(path=folderPath,
 full.names=TRUE,pattern=(.RSK))

 # get only the file names for modification
 filenames   -  list.files(path=folderPath,
 full.names=FALSE,pattern=(.RSK))

 for(filenumber in 1:length(fullFilenames)) {
  Table-read.table(file= fullFilenames[filenumber], sep = ,,
  head=TRUE, na.strings = NA)
  Table$VALUE -0
  outfileName   -  paste(substr(filenames[filenumber]1,1),
   insertString,

 substr(filenames[filenumber],2,nchar(filenames[filenumber]),
   sep=)
  outFilePath
 -file.path(outputFolder,outfileName,fsep=.Platform$file.sep)
  write.table(Table, file= outFilePath, sep = ,, col.names = TRUE)

 }



  On Wed, Sep 15, 2010 at 1:55 AM, Suphajak Ngamlak 
 supha...@phatrasecurities.com wrote:

 Dear All,



 Could you please recommend how I can do this?



 I have several text files in one folder. Let's name them A0801.RSK,
 A0802.RSK, 

 I would like R to

 1)  Know all file names in this folder

 2)  Update value in one column of these files

 3)  Write results in another text file with _xval in the file names



 Below is R code for read, update, and write one file



 Import-C:/A0810.RSK

 Table-read.table(file= Import, sep = ,, head=TRUE, na.strings = NA)

 Table$VALUE -0

 Export-C:/A_XVal0810.RSK

 write.table(Table, file= Export, sep = ,, col.names = TRUE)



 Thank you

 Best Regards,

 Suphajak Ngamlak
 Equity and Derivatives Trading
 Phatra Securities Public Company Limited
 Tel: +662-305-9179
 Email: supha...@phatrasecurities.com


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to uncompress a gz file in R

2010-09-16 Thread steven mosher
you are welcome. Henrik's package  is an great piece of work. It is worth
the time to read through the whole thing and see how you can improve your
programs by using other features as well.

On Thu, Sep 16, 2010 at 2:16 AM, Wonsang You y...@ifn-magdeburg.de wrote:

 Dear Henrik and Steven,

 Thank you for your kind help and guidance even though it is a basic
 question. I misunderstood that gunzip is a part of not R.utils but
 utils. I could find out the function in R.utils. Then, it was successful
 to decompress any gz file as follows.

 library(R.utils)
 gunzip(foo.gz)

 Best Regards,
 Wonsang



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Substitute NAs by zero

2010-09-20 Thread steven mosher
 v-c(1,2,3,4,5,6,7,8,97,6,5,4,NA,NA)

 b-zoo(v)
 b
 1  2  3  4  5  6  7  8  9 10 11 12 13 14
 1  2  3  4  5  6  7  8 97  6  5  4 NA NA
 b[is.na(b)]-0
 b
 1  2  3  4  5  6  7  8  9 10 11 12 13 14
 1  2  3  4  5  6  7  8 97  6  5  4  0  0
 is.zoo(b)
[1] TRUE


On Mon, Sep 20, 2010 at 2:37 AM, skan juanp...@gmail.com wrote:


 Hello

 How can I substitute all NA values by zero in a R zoo series?
 I've been reading about na.locf and na.omit  but I think none of them do
 what I need.

 thanks.
 --
 View this message in context:
 http://r.789695.n4.nabble.com/Substitute-NAs-by-zero-tp2546715p2546715.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] diagnosing download.file() problems

2010-09-21 Thread steven mosher
I'm accessing around 95 tar files on an FTP server ranging in size between
10 and 40MB a piece.

while certainly can click on them and download them outside of R, I'd like
to have my script do it.

Retrieving the ftp directory with RCurl works fine (about 90% of the time)

but downloading the files by looping through all the files is a random
process.

I may get 1-8 files download and then it throws an error

cannot open URL 

sometimes I only can get 1 file before this error. with tryCatch() I've been
able to do some clean up
after the crash, but automating this whole download process has turned into
a bit of a circus.

The parameters (url, destfile, mode) are all correct in the download.file
call as the second attempt at a url will often succeed.

Is there anyway to get a deeper look at the cause of the problem? I've tried
closing all connections
in between each download. any pointers would be welcomed.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] diagnosing download.file() problems

2010-09-21 Thread steven mosher
That is what I feared. I know other people on slow connections have done
this without issue ( at least they didnt report an issue ) I had a similar
issue with geonames.org  who at least published their terms
of service (requests per second or sumptin like that.. so  could program
around it) I'll hunt around
on their ftp and then write the admin a note.. i really don't want to brute
force the matter

 it's government data made available for the public so I expect the admin
will be helpful.

 Thanks for confirming what I suspected, for a minute [ .ed: for two days] I
thought I had taken crazy pills.

 i did note, however, some odd behavior with tryCatch, where statements
after the finally={} were executed. Not sure if that deserves a bug report.

On Tue, Sep 21, 2010 at 2:33 AM, Barry Rowlingson 
b.rowling...@lancaster.ac.uk wrote:

 On Tue, Sep 21, 2010 at 9:39 AM, steven mosher mosherste...@gmail.com
 wrote:
  I'm accessing around 95 tar files on an FTP server ranging in size
 between
  10 and 40MB a piece.
 
  while certainly can click on them and download them outside of R, I'd
 like
  to have my script do it.
 
  Retrieving the ftp directory with RCurl works fine (about 90% of the
 time)
 
  but downloading the files by looping through all the files is a random
  process.
 
  I may get 1-8 files download and then it throws an error
 
  cannot open URL 
 
  sometimes I only can get 1 file before this error. with tryCatch() I've
 been
  able to do some clean up
  after the crash, but automating this whole download process has turned
 into
  a bit of a circus.
 
  The parameters (url, destfile, mode) are all correct in the download.file
  call as the second attempt at a url will often succeed.
 
  Is there anyway to get a deeper look at the cause of the problem? I've
 tried
  closing all connections
  in between each download. any pointers would be welcomed.

 Sounds to me like the FTP server is operating some kind of rate
 limiting. Do you have access to the server log files, or the server
 administrator, or perhaps the server's terms and conditions to see if
 its so :)

 Barry


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] get absolute file path

2010-09-26 Thread steven mosher
The package R.utils has a function to get absolutepath

On Sun, Sep 26, 2010 at 1:00 AM, Sebastian Gibb li...@sebastiangibb.dewrote:

 Hello,

 I get a value which stores a relative file name. (I get it from another
 function, which I don't want to change.)
 e.g.
  fileName - ../data/2010-08.csv;

 Is it possible to get the absolute file path out of this value?
 (e.g. /home/sebastian/documents/data/2010-08.csv)

 Kind regards,

 Sebastian

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Script auto-detecting its own path

2010-10-04 Thread steven mosher
  in the package R.utils

  getAbsolutePath()

  or you can do a file.list(.. full.names=TRUE,
recursive=TRUE,pattern=(.R))

 the rest will require grep and pulling the file name and directory path
apart

 If its not evident just ask and I'll write something for you.

 basically you want a call that returns the full path of a R script?





On Mon, Oct 4, 2010 at 12:13 PM, Hadley Wickham had...@rice.edu wrote:

  I'm not sure this will solve the issue because if I move the script, I
 would
  still have to go into the script and edit the /path/to/my/script.r, or
 do
  I misunderstand your workaround?
  I'm looking for something like:
  file.path.is.here(myscript.r)
  and which would return something like:
  [1] c:/user/Desktop/
  so that regardless of where the script is, as long as the accompanying
  scripts are in the same directory, they can be easily sourced with
 something
  like:
  dirX - file.path.is.here(MasterScript.r)
  source(paste(dirX, AuxillaryFile.r, sep=))

 If you use relative paths like so:

 # master.r
 source(AuxillaryFile.r)

 Then source(path/to/master.r, chdir = T) will work.  Mastering
 working directories is a much better idea than coming up with your own
 workarounds.

 Hadley

 --
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RE : R getting slower until it breaks...

2010-10-06 Thread steven mosher
I know its no consolation, but I have a similar issue with R on a MAC, also
ploting
out large numbers of raster layers.  sometimes the problem lingers even
after I clear the workspace, do gc() etc.  Almost as if R wont ask for
processor resources.

weird.

On Wed, Oct 6, 2010 at 12:06 PM, Bastien Ferland-Raymond 
bastien.ferland-raymon...@ulaval.ca wrote:

 Thanks a lot for your quick answer.  Here is my answer to your questions:

 Have you looked to see how fast your memory might be growing?
 BFR- Yes I did, it's not to bad, it's starts around 60 000ko, rise up to
 120 000 at the most, so not too scary.

 Are you leaving around any large objects that should be removed?
 BFR- I was carefull making sure the function doesn't create anything that
 would be visible with objects().  Could it be creating other type (hidden)
 objects?  Maybe, but I'm not very familliar with that stuff.

 Have you looked to see if you are paging?
 BFR- I just red the wiki about paging, didn't know that term before.  If I
 look at perfmon, its looks like keeping steady at 6000 pages/s with rare
 peaks as high as 900 000. Does it sounds normal?  How can it affect R?

 Is it your CPU time that is increasing, or your wall clock time?
 BFR- If I go to the task manager - performance.  R is initially using
 around 40% of the processor (so around 80% of 1 core) but with (real) time
 passing, it gets lower and lower to get as low as 6% (12% of one core).  I
 was surprized to see that as usually my simulation in R use one whole core.

 It sounds like there might be some memory leak that might be causing your
 process size to grow and possibly causing paging.  You will need to gather
 some of the performance data that perfmon can provide and look at the memory
 usage, CPU time and I/O rates over time to see if there are any changes.
 BFR- The term Memory leak feels right with my problem.  Is there ways I
 can control/detect/prevent this kind of problem in R.  Also, how can I check
 the I/O, i never looked at that before.

 Thanks again

 Bastien



 On Wed, Oct 6, 2010 at 2:11 PM, Bastien Ferland-Raymond
 bastien.ferland-raymon...@ulaval.ca wrote:
  Hello R-users,
 
  I'm currently facing a pretty hard problem which I'm hopping you'll be
 able to help me with.  I'm using R to create images.  That alone is not the
 problem, the problem is that I'm using R to create 168 000 images...  My
 code (which is given below) use different package (raster and rgdal) to
 import a image (size 20gig) and divide it into 168 000 pictures that are 100
 pixel x 100 pixel.  The code works fine for making the images, but if I ask
 it to run all 168 000, it always breaks around 15 000.
 
  It starts with the code being able to make around 2 pictures per second,
 but then it slows down and after around 2000 pictures it's only 1 picture
 per second.  Later on it's getting closer to 1 pictures every 3 seconds etc.
  until it bugs.  I have no error message, only Windows that tells me that R
 encounter a problem and most be close...  Initially I though it was a
 Windows problem, that I couldn't put too many file into a folder and it was
 slowering it down.  Then I divided my batch process into smaller (5000
 files) folder but it didn't help, still breaks at 15 000.  I also try to do
 gc() after each 5000 pictures to save memory but it didn't help either.  I
 removed every loops from the code because I thought it was the problem, but
 it was just faster at bugging... After the bug, I need to restart the
 computer if I want to go back to the initial speed.
 
  I'm pretty much running out of options.  It's there limitation in R as
 the number of files it can create in one session?  Is it a windows problem?
  Is there better way to clear the memory than gc()? Any thought on that?
 
  I'm using R 2.11.1, win XP, my hard drive is NTSF, computer: intel core2
 duo E6750 32 bit with 2 gig of Ram.
 
  Here is my code, but I doubt it would help much with my problem:
 
  
  # It made of 4 functions (sorry, it's french):
 
 
 ##
 
 ##
  ###  Ensemble des fonctions pour faire les images NDVI rouge et verte
  ###
 
 ##
  ##  Bastien Ferland-Raymond, 5 oct 2010
  #
 
 ##
 
  
  ## Simplement rouler le script au complet
  
  ### Library nécessaire:
  library(raster)
  library(rgdal)
  library(shapefiles)
 
 
 #
  ## Fonction 1  -  NDVI a partir de coordonnee Pixel et largeur #
   calculate_NDVI- function(Type, object, VALUE) {
redorgreen - ifelse(Type==red,2,3)
list1 - unstack(object)
rast1 - list1[[1]]
rast2 - list1[[redorgreen]]
NAvalue(rast1)- 

Re: [R] RE : R getting slower until it breaks...

2010-10-06 Thread steven mosher
Thanks,

 haven't used valgrind in years, this should be fun.

Steve

On Wed, Oct 6, 2010 at 1:55 PM, Ben Bolker bbol...@gmail.com wrote:

 steven mosher moshersteven at gmail.com writes:

 
  I know its no consolation, but I have a similar issue with R on a MAC,
 also
  ploting
  out large numbers of raster layers.  sometimes the problem lingers even
  after I clear the workspace, do gc() etc.  Almost as if R wont ask for
  processor resources.
 

   If it is a memory leak, it might be worth reading up on the use of
 valgrind (section 4.3.2 in the 'R extensions' manual).  The information
 provided by valgrind might not be immediately interpretable, but it could
 help others track down a problem ...

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Loss of precision in read.csv.

2010-10-09 Thread steven mosher
Given a csv file from this location

Airports-http://www.ourairports.com/data/airports.csv;

download.file(Airports,basename(Airports))


airports -read.csv(airports.csv,encoding=UTF-8)

 airports[1,]

id ident type  name latitude_deg longitude_deg
elevation_ft continent iso_country iso_region municipality scheduled_service

1 6523   00A heliport Total Rf Heliport  *40.0708  -74.9336 *
  11  NA  US  US-PA Bensalemno

  gps_code iata_code local_code home_link wikipedia_link keywords

1  00A  00A


And the precision is lost which we can show by using readLines:


fred-readLines(airports.csv)

 fred[2]
[1] 6523,\00A\,\heliport\,\Total Rf Heliport\,*
40.07080078125,-74.9336013793945*
,11,\NA\,\US\,\US-PA\,\Bensalem\,\no\,\00A\,,\00A\,,,


I tried various approaches, using colClasses, switching to read.tables,
specifying dec=.


I tested read.csv and it does preserve precision on my test case, but not on
this data.


Ideas?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loss of precision in read.csv.

2010-10-09 Thread steven mosher
Ha Thanks,

  That was it.

On Sat, Oct 9, 2010 at 2:38 PM, Joshua Wiley jwiley.ps...@gmail.com wrote:

 Hi Steven,

 As near as I can tell, no precision is lost.  R is just being
 courteous and not excessively filling our consoles.  Try:

 print(airports[1,latitude_deg], digits = 22)

 which is the most digits R will print (although internally it can
 store more I believe).

 Alternately, you can convert it to character class:

 as.character(airports[1, ])

 So in short, this is just a cosmetic feature of presenting the data,
 not its actual storage.

 Cheers,

 Josh

 On Sat, Oct 9, 2010 at 1:33 PM, steven mosher mosherste...@gmail.com
 wrote:
  Given a csv file from this location
 
  Airports-http://www.ourairports.com/data/airports.csv;
 
  download.file(Airports,basename(Airports))
 
 
  airports -read.csv(airports.csv,encoding=UTF-8)
 
  airports[1,]
 
 id ident type  name latitude_deg longitude_deg
  elevation_ft continent iso_country iso_region municipality
 scheduled_service
 
  1 6523   00A heliport Total Rf Heliport  *40.0708  -74.9336 *
   11  NA  US  US-PA Bensalemno
 
   gps_code iata_code local_code home_link wikipedia_link keywords
 
  1  00A  00A
 
 
  And the precision is lost which we can show by using readLines:
 
 
  fred-readLines(airports.csv)
 
  fred[2]
  [1] 6523,\00A\,\heliport\,\Total Rf Heliport\,*
  40.07080078125,-74.9336013793945*
  ,11,\NA\,\US\,\US-PA\,\Bensalem\,\no\,\00A\,,\00A\,,,
 
 
  I tried various approaches, using colClasses, switching to read.tables,
  specifying dec=.
 
 
  I tested read.csv and it does preserve precision on my test case, but not
 on
  this data.
 
 
  Ideas?
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 http://www.joshuawiley.com/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read from a website

2010-10-12 Thread steven mosher
Hmm,

RCurl might have something on this.

otherwise pull you can figure out their scheme and just construct the url
from scratch.

when you finish filling in the form, look at the url they construct. do it a
few times
and you can just emulate that. I've done that in the past without problems.
depends on the site.


On Tue, Oct 12, 2010 at 2:32 AM, Santosh Srinivas 
santosh.srini...@gmail.com wrote:

 Something similar to this was discussed recently, but I'm unable to find
 the
 thread.



 I want to read from a site where I need to enter the date into a form
 before
 I am presented with the CSV link. E.g. like reading ticker data from yahoo
 (but assuming you HAVE to enter the dates and click on request).



 How do I simulate this from R?



 Thanks for the help.




[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculating a Maximum for a row or column with NA's

2010-04-17 Thread steven mosher
Is there a simple way to calculate the maximum for a row or column of a
matrix when there are NA,s present.

# given a matrix that has any number of NA per row
 m-matrix(c(seq(1,9)),nrow=3)
 m
 [,1] [,2] [,3]
[1,]147
[2,]258
[3,]369
 m[3,1]=NA
 m[1,]=NA
 m
 [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]258
[3,]   NA69

# applying max to rows doesnt work as max returns
# NA if any of the elements is NA.
 row_max-apply(m,1,max)
 row_max
[1] NA  8 NA

# my desired result given m would be:
#  NA, 8, 9

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating a Maximum for a row or column with NA's

2010-04-17 Thread steven mosher
Ya I got that result but fixing it was a mystery. especially since I will
eventually want to subtract the row max from the row Min ( or calculate the
range)
if a matrix thus is:

   [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]258
[3,]   NA69

and apply(m,1,max,na.rm=TRUE)

yeilds

[1] -Inf89

Then  rowmin yeilds

[1] -Inf26

need to see what happens if I subtract these two vectors.


 [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]258
[3,]   NA69
 rmax-apply(m,1,max,na.rm=TRUE)
Warning message:
In FUN(newX[, i], ...) : no non-missing arguments to max; returning -Inf
 rmax
[1] -Inf89
 rmin-apply(m,1,min,na.rm=TRUE)
Warning message:
In FUN(newX[, i], ...) : no non-missing arguments to min; returning Inf
 rmin
[1] Inf   2   6
 rmax-rmin
[1] -Inf63
 rrange-rmax-rmin
 rrange
[1] -Inf63


The final maxtrix may have a large number of these -Inf..

I Was looking at maxtrixStats  package but it still beta


On Sat, Apr 17, 2010 at 10:01 PM, David Winsemius dwinsem...@comcast.netwrote:


 On Apr 18, 2010, at 12:16 AM, steven mosher wrote:

  Is there a simple way to calculate the maximum for a row or column of a
 matrix when there are NA,s present.

 # given a matrix that has any number of NA per row

 m-matrix(c(seq(1,9)),nrow=3)
 m

[,1] [,2] [,3]
 [1,]147
 [2,]258
 [3,]369

 m[3,1]=NA
 m[1,]=NA
 m

[,1] [,2] [,3]
 [1,]   NA   NA   NA
 [2,]258
 [3,]   NA69

 # applying max to rows doesnt work as max returns
 # NA if any of the elements is NA.

 row_max-apply(m,1,max)
 row_max

 [1] NA  8 NA

 # my desired result given m would be:
 #  NA, 8, 9


 Not exactly your desired result, but surely you could fix that:

  row_max-apply(m,1,max, na.rm=TRUE)
 Warning message:
 In FUN(newX[, i], ...) : no non-missing arguments to max; returning -Inf
  row_max
 [1] -Inf89




[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 David Winsemius, MD
 West Hartford, CT



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating a Maximum for a row or column with NA's

2010-04-17 Thread steven mosher
thx jorge, I was playing around with ifelse to solve the problem, but was
unaware
of the all(is.na(x)). RTFM I guess.

On Sat, Apr 17, 2010 at 10:31 PM, Jorge Ivan Velez jorgeivanve...@gmail.com
 wrote:

 Hi Steven,

 Try this:

 R apply(m,1, function(x) ifelse(all(is.na(x)), NA, max(x, na.rm = TRUE)))
 [1] NA  8  9

 See ?ifelse, ?all and ?max for more information.

 HTH,
 Jorge


 On Sun, Apr 18, 2010 at 12:16 AM, steven mosher  wrote:

 Is there a simple way to calculate the maximum for a row or column of a
 matrix when there are NA,s present.

 # given a matrix that has any number of NA per row
  m-matrix(c(seq(1,9)),nrow=3)
  m
 [,1] [,2] [,3]
 [1,]147
 [2,]258
 [3,]369
  m[3,1]=NA
  m[1,]=NA
  m
 [,1] [,2] [,3]
 [1,]   NA   NA   NA
 [2,]258
 [3,]   NA69

 # applying max to rows doesnt work as max returns
 # NA if any of the elements is NA.
  row_max-apply(m,1,max)
  row_max
 [1] NA  8 NA

 # my desired result given m would be:
 #  NA, 8, 9

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating a Maximum for a row or column with NA's

2010-04-18 Thread steven mosher
Henrik,

  Thanks! I was just recommending the package to another fellow who is
learning R as I am. I was going crazy. Jorge gave me a solution that works,
however the data set I'm working with is huge so I'm hoping that switching
to your package will give both readability and performance improvements.

On Sun, Apr 18, 2010 at 2:47 AM, Henrik Bengtsson h...@stat.berkeley.eduwrote:

 On Sun, Apr 18, 2010 at 7:26 AM, steven mosher mosherste...@gmail.com
 wrote:
  Ya I got that result but fixing it was a mystery. especially since I will
  eventually want to subtract the row max from the row Min ( or calculate
 the
  range)
  if a matrix thus is:
 
[,1] [,2] [,3]
  [1,]   NA   NA   NA
  [2,]258
  [3,]   NA69
 
  and apply(m,1,max,na.rm=TRUE)
 
  yeilds
 
  [1] -Inf89
 
  Then  rowmin yeilds
 
  [1] -Inf26
 
  need to see what happens if I subtract these two vectors.
 
 
  [,1] [,2] [,3]
  [1,]   NA   NA   NA
  [2,]258
  [3,]   NA69
  rmax-apply(m,1,max,na.rm=TRUE)
  Warning message:
  In FUN(newX[, i], ...) : no non-missing arguments to max; returning -Inf
  rmax
  [1] -Inf89
  rmin-apply(m,1,min,na.rm=TRUE)
  Warning message:
  In FUN(newX[, i], ...) : no non-missing arguments to min; returning Inf
  rmin
  [1] Inf   2   6
  rmax-rmin
  [1] -Inf63
  rrange-rmax-rmin
  rrange
  [1] -Inf63
 
 
  The final maxtrix may have a large number of these -Inf..
 
  I Was looking at maxtrixStats  package but it still beta

 The matrixStats package is labelled beta, because the author of it
 is *extremely* picky when it comes to bumping code up to be labelled
 release; he often requires a code base to be stable for years before
 removing the label beta.  I would give matrixStats' rowMaxs() a try.

 /Henrik
 (author of matrixStats)

 
 
  On Sat, Apr 17, 2010 at 10:01 PM, David Winsemius 
 dwinsem...@comcast.netwrote:
 
 
  On Apr 18, 2010, at 12:16 AM, steven mosher wrote:
 
   Is there a simple way to calculate the maximum for a row or column of a
  matrix when there are NA,s present.
 
  # given a matrix that has any number of NA per row
 
  m-matrix(c(seq(1,9)),nrow=3)
  m
 
 [,1] [,2] [,3]
  [1,]147
  [2,]258
  [3,]369
 
  m[3,1]=NA
  m[1,]=NA
  m
 
 [,1] [,2] [,3]
  [1,]   NA   NA   NA
  [2,]258
  [3,]   NA69
 
  # applying max to rows doesnt work as max returns
  # NA if any of the elements is NA.
 
  row_max-apply(m,1,max)
  row_max
 
  [1] NA  8 NA
 
  # my desired result given m would be:
  #  NA, 8, 9
 
 
  Not exactly your desired result, but surely you could fix that:
 
   row_max-apply(m,1,max, na.rm=TRUE)
  Warning message:
  In FUN(newX[, i], ...) : no non-missing arguments to max; returning -Inf
   row_max
  [1] -Inf89
 
 
 
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
  David Winsemius, MD
  West Hartford, CT
 
 
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Noobie question on aggregate tapply and by

2010-04-25 Thread steven mosher
I have a 43MB dataframe ( 5 variables) and I'm trying to summarize subsets
of the data.
I've RTFM ( not very clear) and looked at a variety of samples but cant seem
to figure out
how to make these functions work.

A sample of what I want to do would be this:

ids-seq(1,50)
 years-c(rep(5,10),rep(6,10),rep(7,10),rep(8,20))
 data-c(rep(23.2,7),rep(14.2,17),rep(29.2,6),rep(13.4,10),rep(16.3,5), NA,
rep(40,4))
data2-c(rep(22.2,5),rep(13.2,8),NA, rep(29.8,16),rep(12.4,10),rep(16.3,5),
rep(38,5))
 DF-data.frame(ids,years,data,data2)

That will give you a dataframe that is a good analog of what I have. i
would like to calculate means
( with NA removed na.rm) for each level of years.

  data  data2
5 xx. yy.
6 xx yz
7 ... ,,,
8 ..  ...

And then things like this:

5-7 :   xx yy
8   :xy zz

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Noobie question on aggregate tapply and by

2010-04-25 Thread steven mosher
Thanks I'll try that, still need to understand how the other functions
work.. just to satisfy myself..thanks again

On Sun, Apr 25, 2010 at 12:06 AM, Tal Galili tal.gal...@gmail.com wrote:

 Here is one solution for your question:

 mean.data - with(DF, tapply(data, years, mean, na.rm = T))
 mean.data2 - with(DF, tapply(data2, years, mean, na.rm = T))
 cbind(mean.data , mean.data2)


 Another one would be for you to read about the package plyr (which is
 better for this job, actually)

 And regarding the years being recoded, look at either:
 ?cut
 or
 ?recode (from the car package)

 Best,
 Tal




 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)

 --




 On Sun, Apr 25, 2010 at 9:29 AM, steven mosher mosherste...@gmail.comwrote:

 I have a 43MB dataframe ( 5 variables) and I'm trying to summarize subsets
 of the data.
 I've RTFM ( not very clear) and looked at a variety of samples but cant
 seem
 to figure out
 how to make these functions work.

 A sample of what I want to do would be this:

 ids-seq(1,50)
  years-c(rep(5,10),rep(6,10),rep(7,10),rep(8,20))
  data-c(rep(23.2,7),rep(14.2,17),rep(29.2,6),rep(13.4,10),rep(16.3,5),
 NA,
 rep(40,4))
 data2-c(rep(22.2,5),rep(13.2,8),NA,
 rep(29.8,16),rep(12.4,10),rep(16.3,5),
 rep(38,5))
  DF-data.frame(ids,years,data,data2)

 That will give you a dataframe that is a good analog of what I have. i
 would like to calculate means
 ( with NA removed na.rm) for each level of years.

  data  data2
 5 xx. yy.
 6 xx yz
 7 ... ,,,
 8 ..  ...

 And then things like this:

 5-7 :   xx yy
 8   :xy zz

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Noobie question on aggregate tapply and by

2010-04-25 Thread steven mosher
 thx I was struggling with the DF[,3:4] part of it

On Sun, Apr 25, 2010 at 10:47 AM, John Kane jrkrid...@yahoo.ca wrote:

 Here's one way with aggregate()

 library(car)  # You probably will need to install it.

 aggregate(DF[,3-4], by=list(years), mean,na.rm=TRUE)

 recode(x, c(1,2)='A'; else='B')

 DF$years - recode(DF$years, c(5,6,7)= '5-7')

 DF


 You may also want to have a look at the reshape and plyr packages.

 --- On Sun, 4/25/10, steven mosher mosherste...@gmail.com wrote:

  From: steven mosher mosherste...@gmail.com
  Subject: [R] Noobie question on aggregate tapply and by
  To: r-help r-help@r-project.org
  Received: Sunday, April 25, 2010, 2:29 AM
  I have a 43MB dataframe ( 5
  variables) and I'm trying to summarize subsets
  of the data.
  I've RTFM ( not very clear) and looked at a variety of
  samples but cant seem
  to figure out
  how to make these functions work.
 
  A sample of what I want to do would be this:
 
  ids-seq(1,50)
   years-c(rep(5,10),rep(6,10),rep(7,10),rep(8,20))
 
  data-c(rep(23.2,7),rep(14.2,17),rep(29.2,6),rep(13.4,10),rep(16.3,5),
  NA,
  rep(40,4))
  data2-c(rep(22.2,5),rep(13.2,8),NA,
  rep(29.8,16),rep(12.4,10),rep(16.3,5),
  rep(38,5))
   DF-data.frame(ids,years,data,data2)
 
  That will give you a dataframe that is a good analog of
  what I have. i
  would like to calculate means
  ( with NA removed na.rm) for each level of years.
 
data  data2
  5 xx.
 yy.
  6 xx
 yz
  7 ...
 ,,,
  8 ..
...
 
  And then things like this:
 
  5-7 :   xx yy
  8   :xy
 zz
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org
  mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
  reproducible code.
 




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Tapply.

2010-04-25 Thread steven mosher
Having some difficulties with understanding how tapply works and getting
return values I expect

Data: dataframe. DF  DF$Id $D $Year...

 Id  D  Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct
Nov Dec
 11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA  NA
 11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231  NA
 11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235  NA  NA 245
 11264402000 0 1982 236 237 242 240 242 205 199  NA  NA  NA  NA  NA
 11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA  NA  NA  NA
 11264402000 0 1983  NA 247  NA  NA  NA  NA  NA 205  NA  NA  NA  NA
 11264402000 1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA 225  NA  NA
 11264402000 0 1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA  NA
 11264402000 0 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240  NA
 11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240  NA
 11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240  NA
 11264402000 0 1988 238 246 249  NA 244 213 212 224 232 238 232 230
 11264402000 1 1988 238 246 249 246 244 213 212 224 232  NA  NA 230
 11264402000 3 1988 238 246 249 246 244 213 212 224 232  NA  NA 230
 11264402000 0 1989 232 233 238 239 231  NA 215  NA  NA  NA  NA 238
 11264402000 1 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA 238
 11264402000 3 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA 238

and the result should be a dataframe of column means by year  with the
variable D dropped (or kept doesnt matter)

11264402000 1  1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA  NA
 11264402000.5  1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231  NA
 11264402000.5  1982 236 237 242 240 242 205 199  NA  NA  NA  NA  NA
 11264402000.5  1983  NA 247  NA  NA  NA  NA  NA 205  NA  225  NA
 NA
 112644020001  1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA  NA
 11264402000 2 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240  NA
 112644020001.33 1988 238 246 249  246 244 213 212 224 232 238 232
230
 112644020001.33  1989 232 233 238 239 231  NA 215  NA  NA  NA  NA
238

 It would seem that Tapply should work
 result-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T)

 but i get errors about the length of arguments, which

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tapply.

2010-04-26 Thread steven mosher
I've tried both mean and colMean.

I did success with one attempt using mean, however if only have 1 year and
its a NA
then I get NaN ( which I can replace). I'll keep trying.



On Mon, Apr 26, 2010 at 12:26 AM, Petr PIKAL petr.pi...@precheza.cz wrote:

 Hi

 r-help-boun...@r-project.org napsal dne 26.04.2010 06:52:55:

  Having some difficulties with understanding how tapply works and getting
  return values I expect
 
  Data: dataframe. DF  DF$Id $D $Year...
 
   Id  D  Year Jan Feb Mar Apr May Jun Jul Aug Sep
 Oct
  Nov Dec
   11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA
 NA
   11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231
 NA
   11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235  NA  NA
 245
   11264402000 0 1982 236 237 242 240 242 205 199  NA  NA  NA  NA
 NA
   11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA  NA  NA
 NA
   11264402000 0 1983  NA 247  NA  NA  NA  NA  NA 205  NA  NA  NA
 NA
   11264402000 1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA 225  NA
 NA
   11264402000 0 1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA
 NA
   11264402000 0 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240
 NA
   11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
 NA
   11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
 NA
   11264402000 0 1988 238 246 249  NA 244 213 212 224 232 238 232
 230
   11264402000 1 1988 238 246 249 246 244 213 212 224 232  NA  NA
 230
   11264402000 3 1988 238 246 249 246 244 213 212 224 232  NA  NA
 230
   11264402000 0 1989 232 233 238 239 231  NA 215  NA  NA  NA  NA
 238
   11264402000 1 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA
 238
   11264402000 3 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA
 238
 
  and the result should be a dataframe of column means by year  with the
  variable D dropped (or kept doesnt matter)
 
  11264402000 1  1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA
 NA
   11264402000.5  1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231
  NA
   11264402000.5  1982 236 237 242 240 242 205 199  NA  NA  NA  NA
  NA
   11264402000.5  1983  NA 247  NA  NA  NA  NA  NA 205  NA  225 NA
   NA
   112644020001  1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA
 NA
   11264402000 2 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240
 NA
   112644020001.33 1988 238 246 249  246 244 213 212 224 232 238
 232
  230
   112644020001.33  1989 232 233 238 239 231  NA 215  NA  NA  NA
 NA
  238
 
   It would seem that Tapply should work
   result-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T)

 Why colMeans?  It is function used instead of apply(...,.. ,mean).

 Maybe you want

 result-tapply( DF[,1:15], DF$Year, mean,na.rm=T)

 Regards
 Petr

 
   but i get errors about the length of arguments, which
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tapply.

2010-04-26 Thread steven mosher
That fails:

The manual says:

tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)

ArgumentsXan atomic object, typically a vector.INDEXlist of factors, each of
same length as X. The elements are coerced to factors by
as.factorhttp://127.0.0.1:31214/library/base/help/as.factor
.

my error says:

Error in tapply(DF[, 1:15], DF$Year, mean, na.rm = T) :   arguments must
have same length

The issue that I have is I dont understand what the requirements for the
list of factors
are. In my example DF$Years is  a sequence of years..1979,1980,1982,1983,
1987..
like that with missing years: so when the manual say: list of factors each
the same
length as X? what does that mean? I could have a DF with 20 rows and only
two
different years. or 20 rows and 20 different years.

Suppose:

a- c(1,2,3,4)
 b-c(2,3,4,5)
 df=data.frame(a,b)
 length(df)

The length of DF is 2.
Does that mean the list of factors, each of same length as X. would have
to be
2? that doesnt seem to make sense.





On Mon, Apr 26, 2010 at 12:26 AM, Petr PIKAL petr.pi...@precheza.cz wrote:

 Hi

 r-help-boun...@r-project.org napsal dne 26.04.2010 06:52:55:

  Having some difficulties with understanding how tapply works and getting
  return values I expect
 
  Data: dataframe. DF  DF$Id $D $Year...
 
   Id  D  Year Jan Feb Mar Apr May Jun Jul Aug Sep
 Oct
  Nov Dec
   11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA
 NA
   11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231
 NA
   11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235  NA  NA
 245
   11264402000 0 1982 236 237 242 240 242 205 199  NA  NA  NA  NA
 NA
   11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA  NA  NA
 NA
   11264402000 0 1983  NA 247  NA  NA  NA  NA  NA 205  NA  NA  NA
 NA
   11264402000 1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA 225  NA
 NA
   11264402000 0 1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA
 NA
   11264402000 0 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240
 NA
   11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
 NA
   11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
 NA
   11264402000 0 1988 238 246 249  NA 244 213 212 224 232 238 232
 230
   11264402000 1 1988 238 246 249 246 244 213 212 224 232  NA  NA
 230
   11264402000 3 1988 238 246 249 246 244 213 212 224 232  NA  NA
 230
   11264402000 0 1989 232 233 238 239 231  NA 215  NA  NA  NA  NA
 238
   11264402000 1 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA
 238
   11264402000 3 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA
 238
 
  and the result should be a dataframe of column means by year  with the
  variable D dropped (or kept doesnt matter)
 
  11264402000 1  1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA
 NA
   11264402000.5  1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231
  NA
   11264402000.5  1982 236 237 242 240 242 205 199  NA  NA  NA  NA
  NA
   11264402000.5  1983  NA 247  NA  NA  NA  NA  NA 205  NA  225 NA
   NA
   112644020001  1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA
 NA
   11264402000 2 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240
 NA
   112644020001.33 1988 238 246 249  246 244 213 212 224 232 238
 232
  230
   112644020001.33  1989 232 233 238 239 231  NA 215  NA  NA  NA
 NA
  238
 
   It would seem that Tapply should work
   result-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T)

 Why colMeans?  It is function used instead of apply(...,.. ,mean).

 Maybe you want

 result-tapply( DF[,1:15], DF$Year, mean,na.rm=T)

 Regards
 Petr

 
   but i get errors about the length of arguments, which
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tapply.

2010-04-26 Thread steven mosher
Thanks,

  I was trying to stick with the base package and figure out how the base
routines worked. I looked at plyer and it was very appealing. I guess i'll
give in and use it

On Mon, Apr 26, 2010 at 2:33 AM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:

 Use of ddply() in the plyr package appears to work.

 library(plyr)
 ddply(df[, -1], .(Year), colwise(mean), na.rm = TRUE)

  D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
 1 1.00 1980 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN
 2 0.50 1981 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245
 3 0.50 1982 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN
 4 0.50 1983 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN
 5 0.00 1986 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN
 6 1.33 1987 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN
 7 1.33 1988 238 246 249 246 244 213 212 224 232 238 232 230
 8 1.33 1989 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238

 Replace the NaNs with NAs and that should do it

 HTH,
 Dennis

 On Sun, Apr 25, 2010 at 9:52 PM, steven mosher mosherste...@gmail.comwrote:

 Having some difficulties with understanding how tapply works and getting
 return values I expect

 Data: dataframe. DF  DF$Id $D $Year...

  Id  D  Year Jan Feb Mar Apr May Jun Jul Aug Sep
 Oct
 Nov Dec
  11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA
  NA
  11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231
  NA
  11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235  NA  NA
 245
  11264402000 0 1982 236 237 242 240 242 205 199  NA  NA  NA  NA
  NA
  11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA  NA  NA
  NA
  11264402000 0 1983  NA 247  NA  NA  NA  NA  NA 205  NA  NA  NA
  NA
  11264402000 1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA 225  NA
  NA
  11264402000 0 1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA
  NA
  11264402000 0 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240
  NA
  11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
  NA
  11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
  NA
  11264402000 0 1988 238 246 249  NA 244 213 212 224 232 238 232
 230
  11264402000 1 1988 238 246 249 246 244 213 212 224 232  NA  NA
 230
  11264402000 3 1988 238 246 249 246 244 213 212 224 232  NA  NA
 230
  11264402000 0 1989 232 233 238 239 231  NA 215  NA  NA  NA  NA
 238
  11264402000 1 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA
 238
  11264402000 3 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA
 238

 and the result should be a dataframe of column means by year  with the
 variable D dropped (or kept doesnt matter)

 11264402000 1  1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA
  NA
  11264402000.5  1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231
  NA
  11264402000.5  1982 236 237 242 240 242 205 199  NA  NA  NA  NA
  NA
  11264402000.5  1983  NA 247  NA  NA  NA  NA  NA 205  NA  225  NA
  NA
  112644020001  1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA
  NA
  11264402000 2 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240
  NA
  112644020001.33 1988 238 246 249  246 244 213 212 224 232 238 232
 230
  112644020001.33  1989 232 233 238 239 231  NA 215  NA  NA  NA  NA
 238

  It would seem that Tapply should work
  result-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T)

  but i get errors about the length of arguments, which

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tapply.

2010-04-26 Thread steven mosher
I guess my problem was seeing a bunch of examples where they pulled a
variable from a dataframe..

  tapply(df$data, index=list(..

and I
assumed that the df$data was just generalizable to a collection of vectors
a vector of vector being a vector

Thanks.

On Mon, Apr 26, 2010 at 2:43 AM, Petr PIKAL petr.pi...@precheza.cz wrote:

 Hi


 steven mosher mosherste...@gmail.com napsal dne 26.04.2010 10:21:37:

  That fails:
 
  The manual says:
 
  tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)

  Arguments
 
  X
 
  an atomic object, typically a vector.
 
  INDEX
 
  list of factors, each of same length as X. The elements are coerced to
 factors by
  as.factor.
 
  my error says:

 
  Error in tapply(DF[, 1:15], DF$Year, mean, na.rm = T) :
 
arguments must have same length
 
  The issue that I have is I dont understand what the requirements for the
 list of factors
  are. In my example DF$Years is  a sequence of
 years..1979,1980,1982,1983, 1987..
  like that with missing years: so when the manual say: list of factors
 each the same
  length as X? what does that mean? I could have a DF with 20 rows and
 only two
  different years. or 20 rows and 20 different years.
 
  Suppose:
 
  a- c(1,2,3,4)
   b-c(2,3,4,5)
   df=data.frame(a,b)
   length(df)

 data frame is not vector nor atomic but list hence length(df) gives you
 number of columns. It is similar to length of a list

  lll-list(a=1, b=2, c=3)
  length(lll)
 [1] 3
 

 If you accept that the first argument of tapply has to be vector you can
 not put data frame there.

 Next second argument has to be list of factors so you can put there
 several factors, each of the same length as first argument (a vector).

 If you want to perform aggregating operation on whole data frame you shall
 consider

 ?by or ?aggregate

 Other options are plyr or doBy packages.

 Syntax for aggregate is quite similar to tapply, only first argument can
 be data frame.

 Regards
 Petr


 
  The length of DF is 2.
  Does that mean the list of factors, each of same length as X. would
 have to be
  2? that doesnt seem to make sense.
 
 
 
  On Mon, Apr 26, 2010 at 12:26 AM, Petr PIKAL petr.pi...@precheza.cz
 wrote:
  Hi
 
  r-help-boun...@r-project.org napsal dne 26.04.2010 06:52:55:
 
   Having some difficulties with understanding how tapply works and
 getting
   return values I expect
  
   Data: dataframe. DF  DF$Id $D $Year...
  
Id  D  Year Jan Feb Mar Apr May Jun Jul Aug
 Sep
  Oct
   Nov Dec
11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228 237
  NA
  NA
11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225  NA
 231
  NA
11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235  NA
  NA
  245
11264402000 0 1982 236 237 242 240 242 205 199  NA  NA  NA
  NA
  NA
11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA  NA
  NA
  NA
11264402000 0 1983  NA 247  NA  NA  NA  NA  NA 205  NA  NA
  NA
  NA
11264402000 1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA 225
  NA
  NA
11264402000 0 1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA
  NA
  NA
11264402000 0 1987 241  NA  NA  NA  NA 218  NA  NA 235 243
 240
  NA
11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243
 240
  NA
11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243
 240
  NA
11264402000 0 1988 238 246 249  NA 244 213 212 224 232 238
 232
  230
11264402000 1 1988 238 246 249 246 244 213 212 224 232  NA
  NA
  230
11264402000 3 1988 238 246 249 246 244 213 212 224 232  NA
  NA
  230
11264402000 0 1989 232 233 238 239 231  NA 215  NA  NA  NA
  NA
  238
11264402000 1 1989 232 233 238 239 231  NA  NA  NA  NA  NA
  NA
  238
11264402000 3 1989 232 233 238 239 231  NA  NA  NA  NA  NA
  NA
  238
  
   and the result should be a dataframe of column means by year  with the
   variable D dropped (or kept doesnt matter)
  
   11264402000 1  1980  NA  NA  NA  NA  NA 212 203 209 228 237
  NA
  NA
11264402000.5  1981  NA  NA 243 244  NA  NA  NA  NA 225  NA
 231
   NA
11264402000.5  1982 236 237 242 240 242 205 199  NA  NA  NA
  NA
   NA
11264402000.5  1983  NA 247  NA  NA  NA  NA  NA 205  NA  225
 NA
NA
112644020001  1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA
  NA
  NA
11264402000 2 1987 241  NA  NA  NA  NA 218  NA  NA 235 243
 240
  NA
112644020001.33 1988 238 246 249  246 244 213 212 224 232 238
  232
   230
112644020001.33  1989 232 233 238 239 231  NA 215  NA  NA  NA
  NA
   238
  
It would seem that Tapply should work
result-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T)

  Why colMeans?  It is function used instead of apply(...,.. ,mean).
 
  Maybe you want
 
  result-tapply( DF[,1:15], DF$Year, mean,na.rm=T)
 
  Regards
  Petr
 
  
but i get errors about the length of arguments

Re: [R] Tapply.

2010-04-27 Thread steven mosher
Thanks dennis.

Is there a book on R u could recommend.



On Mon, Apr 26, 2010 at 7:12 PM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:


  On Mon, Apr 26, 2010 at 8:01 AM, steven mosher 
  mosherste...@gmail.comwrote:
  Thanks,

   I was trying to stick with the base package and figure out how the base
 routines worked.

 If you want to use base functions, then here's a solution with aggregate:
 (the Id column
 was removed first):

  with(DF, aggregate(DF[, -2], list(Year = Year), FUN = mean, na.rm =
 TRUE))
   YearD Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
 1 1980 1.00 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN
 2 1981 0.50 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245
 3 1982 0.50 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN
 4 1983 0.50 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN
 5 1986 0.00 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN
 6 1987 1.33 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN
 7 1988 1.33 238 246 249 246 244 213 212 224 232 238 232 230
 8 1989 1.33 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238

 The problem with tapply() is that the function has to be called recursively
 on each
 column you want to summarize. You could do it in a loop:
  res - matrix(NA, 8, 14)
  res[, 1] - unique(DF$Year)
  res[, 2] - with(DF, tapply(D, Year, mean, na.rm = TRUE))
  for(j in 3:14) res[, j] - tapply(DF[, j], DF$Year, mean, na.rm = TRUE)
  res
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
 [,13]
 [1,] 1980 1.00  NaN  NaN  NaN  NaN  NaN  212  203   209   228   237
 NaN
 [2,] 1981 0.50  NaN  251  243  246  241  NaN  NaN   NaN   230   NaN
 231
 [3,] 1982 0.50  236  237  242  240  242  205  199   NaN   NaN   NaN
 NaN
 [4,] 1983 0.50  NaN  247  NaN  NaN  NaN  NaN  NaN   205   NaN   225
 NaN
 [5,] 1986 0.00  NaN  NaN  NaN  240  NaN  NaN  NaN   213   NaN   NaN
 NaN
 [6,] 1987 1.33  241  NaN  NaN  NaN  NaN  218  NaN   NaN   235   243
 240
 [7,] 1988 1.33  238  246  249  246  244  213  212   224   232   238
 232
 [8,] 1989 1.33  232  233  238  239  231  NaN  215   NaN   NaN   NaN
 NaN
  [,14]
 [1,]   NaN
 [2,]   245
 [3,]   NaN
 [4,]   NaN
 [5,]   NaN
 [6,]   NaN
 [7,]   230
 [8,]   238

 but it's not the most efficient way to do things.

 Essentially, this approach conforms to the 'split-apply-combine' strategy
 which is
 more efficiently implemented in functions like aggregate() or in packages
 such
 as doBy, plyr, reshape and data.table, some of which were mentioned earlier
 by
 Petr Pikal.

 HTH,
 Dennis


 On Mon, Apr 26, 2010 at 8:01 AM, steven mosher mosherste...@gmail.comwrote:

 Thanks,

   I was trying to stick with the base package and figure out how the base
 routines worked. I looked at plyer and it was very appealing. I guess i'll
 give in and use it

 On Mon, Apr 26, 2010 at 2:33 AM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:

 Use of ddply() in the plyr package appears to work.

 library(plyr)
 ddply(df[, -1], .(Year), colwise(mean), na.rm = TRUE)

  D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
 1 1.00 1980 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN
 2 0.50 1981 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245
 3 0.50 1982 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN
 4 0.50 1983 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN
 5 0.00 1986 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN
 6 1.33 1987 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN
 7 1.33 1988 238 246 249 246 244 213 212 224 232 238 232 230
 8 1.33 1989 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238

 Replace the NaNs with NAs and that should do it

 HTH,
 Dennis

 On Sun, Apr 25, 2010 at 9:52 PM, steven mosher 
 mosherste...@gmail.comwrote:

 Having some difficulties with understanding how tapply works and getting
 return values I expect

 Data: dataframe. DF  DF$Id $D $Year...

  Id  D  Year Jan Feb Mar Apr May Jun Jul Aug Sep
 Oct
 Nov Dec
  11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA
  NA
  11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231
  NA
  11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235  NA  NA
 245
  11264402000 0 1982 236 237 242 240 242 205 199  NA  NA  NA  NA
  NA
  11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA  NA  NA
  NA
  11264402000 0 1983  NA 247  NA  NA  NA  NA  NA 205  NA  NA  NA
  NA
  11264402000 1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA 225  NA
  NA
  11264402000 0 1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA
  NA
  11264402000 0 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240
  NA
  11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
  NA
  11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
  NA
  11264402000 0 1988 238 246 249  NA 244 213 212 224 232 238 232
 230
  11264402000 1 1988 238 246 249 246

Re: [R] Tapply.

2010-04-27 Thread steven mosher
Thanks,

 I had been wondering what Drop did. That makes it more clear.

While I have code that loops and does the problem correctly, I wanted to
do things the R way and be fast and terse. hehe.

So:
ID   dy  jan  ...
11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240  NA
 11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240  NA

in words : for each id, for each year return
 the max of jan,feb,.over d
 the min of jan, feb  over d
 the mean of jan,feb.. over d
 the (max+min)/2 of jan, feb...over d
 the count of d for jan.feb..
 the results of a function called with all elements of this id

Anyway, your kind attention has been greatly appreciated.






On Tue, Apr 27, 2010 at 2:40 AM, Petr PIKAL petr.pi...@precheza.cz wrote:

 Hi
 r-help-boun...@r-project.org napsal dne 26.04.2010 17:05:54:

  I guess my problem was seeing a bunch of examples where they pulled a
  variable from a dataframe..
 
tapply(df$data, index=list(..

 df$data results in vector so as eg. df[,5] unless you use drop=FALSE
 option

 
  and I
  assumed that the df$data was just generalizable to a collection of
 vectors
  a vector of vector being a vector

 df[,1:15] is not a vector of vectors. R sometimes can give you nasty
 surprise with object types and modes but changing a type of object merely
 by selecting some part of it wold be quite problematic.

 see

 str(df$data)
 str(df[, 1])
 str(df[,1, drop=FALSE])
 str(df[,1:15])

 Regards
 Petr



 
  Thanks.
 
  On Mon, Apr 26, 2010 at 2:43 AM, Petr PIKAL petr.pi...@precheza.cz
 wrote:
 
   Hi
  
  
   steven mosher mosherste...@gmail.com napsal dne 26.04.2010 10:21:37:
  
That fails:
   
The manual says:
   
tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)
  
Arguments
   
X
   
an atomic object, typically a vector.
   
INDEX
   
list of factors, each of same length as X. The elements are coerced
 to
   factors by
as.factor.
   
my error says:
  
   
Error in tapply(DF[, 1:15], DF$Year, mean, na.rm = T) :
   
  arguments must have same length
   
The issue that I have is I dont understand what the requirements for
 the
   list of factors
are. In my example DF$Years is  a sequence of
   years..1979,1980,1982,1983, 1987..
like that with missing years: so when the manual say: list of
 factors
   each the same
length as X? what does that mean? I could have a DF with 20 rows and
   only two
different years. or 20 rows and 20 different years.
   
Suppose:
   
a- c(1,2,3,4)
 b-c(2,3,4,5)
 df=data.frame(a,b)
 length(df)
  
   data frame is not vector nor atomic but list hence length(df) gives
 you
   number of columns. It is similar to length of a list
  
lll-list(a=1, b=2, c=3)
length(lll)
   [1] 3
   
  
   If you accept that the first argument of tapply has to be vector you
 can
   not put data frame there.
  
   Next second argument has to be list of factors so you can put there
   several factors, each of the same length as first argument (a vector).
  
   If you want to perform aggregating operation on whole data frame you
 shall
   consider
  
   ?by or ?aggregate
  
   Other options are plyr or doBy packages.
  
   Syntax for aggregate is quite similar to tapply, only first argument
 can
   be data frame.
  
   Regards
   Petr
  
  
   
The length of DF is 2.
Does that mean the list of factors, each of same length as X.
 would
   have to be
2? that doesnt seem to make sense.
   
   
   
On Mon, Apr 26, 2010 at 12:26 AM, Petr PIKAL
 petr.pi...@precheza.cz
   wrote:
Hi
   
r-help-boun...@r-project.org napsal dne 26.04.2010 06:52:55:
   
 Having some difficulties with understanding how tapply works and
   getting
 return values I expect

 Data: dataframe. DF  DF$Id $D $Year...

  Id  D  Year Jan Feb Mar Apr May Jun Jul
 Aug
   Sep
Oct
 Nov Dec
  11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228
 237
NA
NA
  11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225 NA
   231
NA
  11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235 NA
NA
245
  11264402000 0 1982 236 237 242 240 242 205 199  NA  NA NA
NA
NA
  11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA NA
NA
NA
  11264402000 0 1983  NA 247  NA  NA  NA  NA  NA 205  NA NA
NA
NA
  11264402000 1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA
 225
NA
NA
  11264402000 0 1986  NA  NA  NA 240  NA  NA  NA 213  NA NA
NA
NA
  11264402000 0 1987 241  NA  NA  NA  NA 218  NA  NA 235
 243
   240
NA
  11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235
 243
   240
NA
  11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235
 243
   240
NA
  11264402000

Re: [R] closest match in R to c-like struct?

2010-05-01 Thread steven mosher
I was talking with another guy on the list about this very topic.

A simple example would help.

first a sample C struct, and then how one would do the equivalent in R.

In the end i suppose one want to do a an 'array' of these structs, or list
of the structs.

On Sat, May 1, 2010 at 8:04 AM, Ted Harding ted.hard...@manchester.ac.ukwrote:

 On 01-May-10 14:46:28, Giovanni Azua wrote:
  Hello,
  What would be in R the closest match to a c-struct? e.g. data.frame
  requires all elements to be of the same length ... or is there a way to
  circumvent this?
 
  TIA,
  Best regards,
  Giovanni

 Well, 'list' must be pretty close! The main difference would be
 that in C the structure type would be declared first, and then
 applied to create an object with that structure, whereas an R
 lists are created straight off. If you want to set up a generic
 list type for a certain purpose, you would wrap its definition
 in a function.

 Another difference is that R lacks the pointer type, so that
 R's mylist$component is the equivalent of C's mylist.component;
 I don't think you can do the equivalent in R of C's mylist-component
 (though I'm likely to be wrong about that, and to be promptly corrected)!

 Hopingb this helps,
 Ted.

 
 E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
 Fax-to-email: +44 (0)870 094 0861
 Date: 01-May-10   Time: 16:04:06
 -- XFMail --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] closest match in R to c-like struct?

2010-05-01 Thread steven mosher
Ya, thats a common one. also writing a struct to file and reading a struct
from file.
mostly in R if I have multiple returns, I'm just talking two or three values
so i return a results vector. but thats ugly and prone to very bad things
down the road.



On Sat, May 1, 2010 at 9:58 AM, Giovanni Azua brave...@gmail.com wrote:


 On May 1, 2010, at 6:48 PM, steven mosher wrote:
  I was talking with another guy on the list about this very topic.
 
  A simple example would help.
 
  first a sample C struct, and then how one would do the equivalent in R.
 
  In the end i suppose one want to do a an 'array' of these structs, or
 list
  of the structs.

 Or like in my use-case ... I needed a c-like struct to define the type for
 aggregating the data to return from a function.

 Best regards,
 Giovanni

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] closest match in R to c-like struct?

2010-05-01 Thread steven mosher
maybe I can illustrate the problem by showing how a c programmer might think
about the problem and the kinds of mistakes 'we' ( I) make when trying to do
this in R

 cstruct-function(int, bool){
+
+ myint- int*2;
+
+ mybool-!bool;
+ myvec-rep(mybool,10)
+ mymat-matrix(myint*10,nrow=3,ncol=3)
+ myframe-data.frame(rep(myint,5),rep(bool,5))
+ returnlist-list(myint,mybool,myvec,mymat,myframe)
+ return(returnlist)
+
+
+
+ }

# so I have a function that returns a list of hetergenous variables.
# an int, a bool, a vector of bools, a matrix of ints, a dataframe of ints
and bools

 test-cstruct(3,T)


 test
[[1]]
[1] 6

[[2]]
[1] FALSE

[[3]]
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

[[4]]
 [,1] [,2] [,3]
[1,]   60   60   60
[2,]   60   60   60
[3,]   60   60   60

[[5]]
  rep.myint..5. rep.bool..5.
1 6 TRUE
2 6 TRUE
3 6 TRUE
4 6 TRUE
5 6 TRUE

# Now I want to access the first element of my list which is an  an int
# first mistake I always make is I just revert to thinking in the
# 'dot' structure of a c struct.

 test.myint
Error: object 'test.myint' not found

# Then I think its stored like a var in a dataframe, accessed by the $
 test$myint
NULL

# then I try to access the first element of the list
 test[1]
[[1]]
[1] 6

# That works.. but the [[1]] confuses me when I eval test[1] I want 6 back
# again thinking in C.
# so I try the third element

 test[3]
[[1]]
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

# ok I get my vect of bools back. Now I want the first element
# of that thing
# well test[3] is that thing.. and I want element 1 of test[3]

 test[3][1]
[[1]]
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

#hmm thats not what I expect. I wanted F back.
# frustrated I try this which i know is wrong

 test[3,1]
Error in test[3, 1] : incorrect number of dimensions

# crap.. maybe the $ is supposed to be used
 test$V3
NULL

# arrg.. how about 'dot
 test.myvec
Error: object 'test.myvec' not found


Anyways, That's the kind of frustration. I have a list, third element is a
matrix
how do I referernce the 2 row 2 colum of the matrix in my list.. for
example.
and so forth..

On Sat, May 1, 2010 at 10:56 AM, Ted Harding
ted.hard...@manchester.ac.ukwrote:

 On 01-May-10 16:58:49, Giovanni Azua wrote:
 
  On May 1, 2010, at 6:48 PM, steven mosher wrote:
  I was talking with another guy on the list about this very topic.
 
  A simple example would help.
 
  first a sample C struct, and then how one would do the equivalent in
  R.
 
  In the end i suppose one want to do a an 'array' of these structs, or
  list
  of the structs.
 
  Or like in my use-case ... I needed a c-like struct to define the type
  for aggregating the data to return from a function.
 
  Best regards,
  Giovanni

 Assuming that I understand what you want, this is straightforward
 and can be found throughout the many functions available in R.
 The general form is:

  myfunction - function(...){
code to compute objects A1, A2, ... , An
list(valA1=A1, valA2=A2, ... , valAn=An)
  }

 and then a call like

  myresults - myfunction(...)

 will create a list myresults with compnents valA1, ... ,valAn
 which you can access as desired on the lines of

  myresults$valA5

 As a simple example, the following is a function which explores
 by simulation the power of the Fisher Exact Test for comparing
 two proportions in a 2x2 table:

  power.fisher.test - function(p1,p2,n1,n2,alpha=0.05,nsim=100){
y1 - rbinom(nsim,size=n1,prob=p1)
y2 - rbinom(nsim,size=n2,prob=p2)
y - cbind(y1,n1-y1,y2,n2-y2)
p.value - rep(0,nsim)
for (i in 1:nsim)
  p.value[i] - fisher.test(matrix(y[i,],2,2))$p.value
list(Pwr=mean(p.value  alpha),SE.Pwr=sd(p.value  alpha)/sqrt(nsim))
  }

 So, given two binomials B(n1,p1) and B(n2,p2), what would be the
 power of the Fisher test to detect that p1 was different from p2,
 at given significance level alpha? This is investigated by repeating,
 nsim times:
  sample from Bin(n1,p1), sample from Bin(n2.p2)
  do a Fisher test and get its P-value; store it
in a vector p.value of length nsim
 and then finally:
  estimate the power as the proportion Pwr of the nsim cases
in which the P-value was less than alpha
  get the SE of this estimate
  return these two values as components Pwr and SE.Pwr of a list

 As it happens, here each component of the resulting list is of
 the same type (a single number); but in a different computation
 each component (and of course there could be more than two)
 could be anything -- even another list. So you can have lists
 of lists ... !

 Thus, instead of the simple returned list above:

  list(Pwr=mean(p.value  alpha),
   SE.Pwr=sd(p.value  alpha)/sqrt(nsim))

 you could have

  list(Binoms=list(Bin1=list(size=n1,prob=p1),
   Bin2=list(size=n2,prob=p2))
   Pwr=mean(p.value  alpha

Re: [R] closest match in R to c-like struct?

2010-05-01 Thread steven mosher
perfect. I had tried a variant assigning names,to the vars, but that didnt
work.
now it makes sense why that didnt. I had tried


 myint-int
 names(myint)-myint
and then returnlist-list(myint, .)

and of course

test[1]  got me  myint, 6

Thanks

On Sat, May 1, 2010 at 12:42 PM, David Winsemius dwinsem...@comcast.netwrote:


 On May 1, 2010, at 3:14 PM, steven mosher wrote:

  maybe I can illustrate the problem by showing how a c programmer might
 think
 about the problem and the kinds of mistakes 'we' ( I) make when trying to
 do
 this in R

 cstruct-function(int, bool){
 +
 + myint- int*2;
 +
 + mybool-!bool;
 + myvec-rep(mybool,10)
 + mymat-matrix(myint*10,nrow=3,ncol=3)
 + myframe-data.frame(rep(myint,5),rep(bool,5))
 + returnlist-list(myint,mybool,myvec,mymat,myframe)
 + return(returnlist)
 +
 +
 +
 + }

 # so I have a function that returns a list of hetergenous variables.
 # an int, a bool, a vector of bools, a matrix of ints, a dataframe of ints
 and bools

  test-cstruct(3,T)



  test

 [[1]]
 [1] 6

 [[2]]
 [1] FALSE

 [[3]]
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

 [[4]]
[,1] [,2] [,3]
 [1,]   60   60   60
 [2,]   60   60   60
 [3,]   60   60   60

 [[5]]
  rep.myint..5. rep.bool..5.
 1 6 TRUE
 2 6 TRUE
 3 6 TRUE
 4 6 TRUE
 5 6 TRUE

 # Now I want to access the first element of my list which is an  an int
 # first mistake I always make is I just revert to thinking in the
 # 'dot' structure of a c struct.

  test.myint

 Error: object 'test.myint' not found


 There is no dot . accessor function. If the first element were named
 (which ist is not) then you could have used test$myint.

 If you wnated to access the elements of htat list with names you need to
 assing to names at the time it is created, eg.:

 returnlist-list(myint=myint, mybool=mybool, myvec-myvec, mymat=mymat,
 myframe=myframe)

 As it is you need to do this to get what you later indicate you want, an
 atomic object:

 test[[1]]

 Double-brackets yield the thing itself, whereas single brackets yield a
 sub-list.

  test[4]
 [[1]]

 [,1] [,2] [,3]
 [1,]   60   60   60
 [2,]   60   60   60
 [3,]   60   60   60

  test[[4]]

 [,1] [,2] [,3]
 [1,]   60   60   60
 [2,]   60   60   60
 [3,]   60   60   60
  class(test[4])
 [1] list
  class(test[[4]])
 [1] matrix



 # Then I think its stored like a var in a dataframe, accessed by the $

 test$myint

 NULL

 # then I try to access the first element of the list

 test[1]

 [[1]]
 [1] 6

 # That works.. but the [[1]] confuses me when I eval test[1] I want 6 back
 # again thinking in C.
 # so I try the third element

  test[3]

 [[1]]
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

 # ok I get my vect of bools back. Now I want the first element
 # of that thing
 # well test[3] is that thing.. and I want element 1 of test[3]

  test[3][1]

 [[1]]
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

 #hmm thats not what I expect. I wanted F back.
 # frustrated I try this which i know is wrong

  test[3,1]

 Error in test[3, 1] : incorrect number of dimensions

 # crap.. maybe the $ is supposed to be used

 test$V3

 NULL

 # arrg.. how about 'dot

 test.myvec

 Error: object 'test.myvec' not found


 Anyways, That's the kind of frustration. I have a list, third element is a
 matrix
 how do I referernce the 2 row 2 colum of the matrix in my list.. for
 example.



  str(test)
 List of 5
  $ : num 6
  $ : logi FALSE
  $ : logi [1:10] FALSE FALSE FALSE FALSE FALSE FALSE ...
  $ : num [1:3, 1:3] 60 60 60 60 60 60 60 60 60
  $ :'data.frame':   5 obs. of  2 variables:
  ..$ rep.myint..5.: num [1:5] 6 6 6 6 6
  ..$ rep.bool..5. : logi [1:5] TRUE TRUE TRUE TRUE TRUE

  test[[4]][2,2]
 [1] 60



  and so forth..

 On Sat, May 1, 2010 at 10:56 AM, Ted Harding
 ted.hard...@manchester.ac.ukwrote:

  On 01-May-10 16:58:49, Giovanni Azua wrote:


 On May 1, 2010, at 6:48 PM, steven mosher wrote:

 I was talking with another guy on the list about this very topic.

 A simple example would help.

 first a sample C struct, and then how one would do the equivalent in
 R.

 In the end i suppose one want to do a an 'array' of these structs, or
 list
 of the structs.


 Or like in my use-case ... I needed a c-like struct to define the type
 for aggregating the data to return from a function.

 Best regards,
 Giovanni


 Assuming that I understand what you want, this is straightforward
 and can be found throughout the many functions available in R.
 The general form is:

 myfunction - function(...){
  code to compute objects A1, A2, ... , An
  list(valA1=A1, valA2=A2, ... , valAn=An)
 }

 and then a call like

 myresults - myfunction(...)

 will create a list myresults with compnents valA1, ... ,valAn
 which you can access as desired on the lines of

 myresults$valA5

 As a simple example, the following is a function which explores

Re: [R] closest match in R to c-like struct?

2010-05-01 Thread steven mosher
 cstruct-function(int, bool){
+
+ myint- int*2;
+
+ mybool-!bool;
+ myvec-rep(mybool,10)
+
+ mymat-matrix(myint*10,nrow=3,ncol=3)
+ myframe-data.frame(int=rep(myint,5),bool=rep(bool,5))
+ returnlist-list(myint=myint,mybool=mybool,myvec=myvec,mymat=mymat,myframe
+ =myframe)
+ return(returnlist)
+
+
+
+ }

 test-cstruct(3,T)
 test
$myint
[1] 6

$mybool
[1] FALSE

$myvec
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

$mymat
 [,1] [,2] [,3]
[1,]   60   60   60
[2,]   60   60   60
[3,]   60   60   60

$myframe
  int bool
1   6 TRUE
2   6 TRUE
3   6 TRUE
4   6 TRUE
5   6 TRUE

 test$myint
[1] 6
 test$mybool
[1] FALSE
 test$myvec
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 test$myvec[2]
[1] FALSE
 test$mymat
 [,1] [,2] [,3]
[1,]   60   60   60
[2,]   60   60   60
[3,]   60   60   60
 test$mymat[2,2]
[1] 60
 test$mymat[,2]
[1] 60 60 60
 test$myframe
  int bool
1   6 TRUE
2   6 TRUE
3   6 TRUE
4   6 TRUE
5   6 TRUE
 test$myframe$int
[1] 6 6 6 6 6
 test$myframe$bool
[1] TRUE TRUE TRUE TRUE TRUE
 test$myframe$int[2]
[1] 6
 test$myframe$bool[3]
[1] TRUE

 listoftest-list(cstruct(3,T),cstruct(4,F),cstruct(5,T))
 listoftest[1]
[[1]]
[[1]]$myint
[1] 6

[[1]]$mybool
[1] FALSE

[[1]]$myvec
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

[[1]]$mymat
 [,1] [,2] [,3]
[1,]   60   60   60
[2,]   60   60   60
[3,]   60   60   60

[[1]]$myframe
  int bool
1   6 TRUE
2   6 TRUE
3   6 TRUE
4   6 TRUE
5   6 TRUE


 listoftest[1]$myframe$int[3]
NULL
 listoftest[1]$myframe$int
NULL
 listoftest[1]$myframe
NULL
 listoftest[[1]]$myframe
  int bool
1   6 TRUE
2   6 TRUE
3   6 TRUE
4   6 TRUE
5   6 TRUE
 listoftest[[1]]$myframe$int[1]
[1] 6
 listoftest2-list(flist=cstruct(65,T),slist=cstruct(12,F))
 listoftest2$flist$myframe$int[3]
[1] 130


TADA!

On Sat, May 1, 2010 at 12:42 PM, David Winsemius dwinsem...@comcast.netwrote:


 On May 1, 2010, at 3:14 PM, steven mosher wrote:

  maybe I can illustrate the problem by showing how a c programmer might
 think
 about the problem and the kinds of mistakes 'we' ( I) make when trying to
 do
 this in R

 cstruct-function(int, bool){
 +
 + myint- int*2;
 +
 + mybool-!bool;
 + myvec-rep(mybool,10)
 + mymat-matrix(myint*10,nrow=3,ncol=3)
 + myframe-data.frame(rep(myint,5),rep(bool,5))
 + returnlist-list(myint,mybool,myvec,mymat,myframe)
 + return(returnlist)
 +
 +
 +
 + }

 # so I have a function that returns a list of hetergenous variables.
 # an int, a bool, a vector of bools, a matrix of ints, a dataframe of ints
 and bools

  test-cstruct(3,T)



  test

 [[1]]
 [1] 6

 [[2]]
 [1] FALSE

 [[3]]
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

 [[4]]
[,1] [,2] [,3]
 [1,]   60   60   60
 [2,]   60   60   60
 [3,]   60   60   60

 [[5]]
  rep.myint..5. rep.bool..5.
 1 6 TRUE
 2 6 TRUE
 3 6 TRUE
 4 6 TRUE
 5 6 TRUE

 # Now I want to access the first element of my list which is an  an int
 # first mistake I always make is I just revert to thinking in the
 # 'dot' structure of a c struct.

  test.myint

 Error: object 'test.myint' not found


 There is no dot . accessor function. If the first element were named
 (which ist is not) then you could have used test$myint.

 If you wnated to access the elements of htat list with names you need to
 assing to names at the time it is created, eg.:

 returnlist-list(myint=myint, mybool=mybool, myvec-myvec, mymat=mymat,
 myframe=myframe)

 As it is you need to do this to get what you later indicate you want, an
 atomic object:

 test[[1]]

 Double-brackets yield the thing itself, whereas single brackets yield a
 sub-list.

  test[4]
 [[1]]

 [,1] [,2] [,3]
 [1,]   60   60   60
 [2,]   60   60   60
 [3,]   60   60   60

  test[[4]]

 [,1] [,2] [,3]
 [1,]   60   60   60
 [2,]   60   60   60
 [3,]   60   60   60
  class(test[4])
 [1] list
  class(test[[4]])
 [1] matrix



 # Then I think its stored like a var in a dataframe, accessed by the $

 test$myint

 NULL

 # then I try to access the first element of the list

 test[1]

 [[1]]
 [1] 6

 # That works.. but the [[1]] confuses me when I eval test[1] I want 6 back
 # again thinking in C.
 # so I try the third element

  test[3]

 [[1]]
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

 # ok I get my vect of bools back. Now I want the first element
 # of that thing
 # well test[3] is that thing.. and I want element 1 of test[3]

  test[3][1]

 [[1]]
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

 #hmm thats not what I expect. I wanted F back.
 # frustrated I try this which i know is wrong

  test[3,1]

 Error in test[3, 1] : incorrect number of dimensions

 # crap.. maybe the $ is supposed to be used

 test$V3

 NULL

 # arrg.. how about 'dot

 test.myvec

 Error: object 'test.myvec' not found


 Anyways, That's the kind of frustration. I have a list, third element is a
 matrix
 how

Re: [R] closest match in R to c-like struct?

2010-05-01 Thread steven mosher
thanks ted..

being new the R  thsi has been a huge help, espececially on the
Myint=myint thing... I assummed the name was just implicit.

On Sat, May 1, 2010 at 1:19 PM, Ted Harding ted.hard...@manchester.ac.ukwrote:

 See below.

 On 01-May-10 19:14:08, steven mosher wrote:
  maybe I can illustrate the problem by showing how a c programmer
  might think about the problem and the kinds of mistakes 'we' (I)
  make when trying to do this in R
 
   cstruct-function(int, bool){
  +
  + myint- int*2;
  +
  + mybool-!bool;
  + myvec-rep(mybool,10)
  + mymat-matrix(myint*10,nrow=3,ncol=3)
  + myframe-data.frame(rep(myint,5),rep(bool,5))
  + returnlist-list(myint,mybool,myvec,mymat,myframe)
  + return(returnlist)
  +
  +
  +
  + }
 
 # so I have a function that returns a list of hetergenous variables.
 # an int, a bool, a vector of bools, a matrix of ints, a dataframe of
 # ints and bools
 
  test-cstruct(3,T)
 
 
  test
  [[1]]
  [1] 6
 
  [[2]]
  [1] FALSE
 
  [[3]]
   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 
  [[4]]
   [,1] [,2] [,3]
  [1,]   60   60   60
  [2,]   60   60   60
  [3,]   60   60   60
 
  [[5]]
rep.myint..5. rep.bool..5.
  1 6 TRUE
  2 6 TRUE
  3 6 TRUE
  4 6 TRUE
  5 6 TRUE
 
 # Now I want to access the first element of my list which is an
 # an int
 # first mistake I always make is I just revert to thinking in the
 # 'dot' structure of a c struct.
 
  test.myint
  Error: object 'test.myint' not found
 
 # Then I think its stored like a var in a dataframe, accessed by
 # the $
  test$myint
  NULL
 
 # then I try to access the first element of the list
  test[1]
  [[1]]
  [1] 6
 
 # That works.. but the [[1]] confuses me when I eval test[1] I want 6
 # back
 # again thinking in C.
 # so I try the third element
 
  test[3]
  [[1]]
   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 
 # ok I get my vect of bools back. Now I want the first element
 # of that thing
 # well test[3] is that thing.. and I want element 1 of test[3]
 
  test[3][1]
  [[1]]
   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 
 #hmm thats not what I expect. I wanted F back.
 # frustrated I try this which i know is wrong
 
  test[3,1]
  Error in test[3, 1] : incorrect number of dimensions
 
 # crap.. maybe the $ is supposed to be used
  test$V3
  NULL
 
 # arrg.. how about 'dot
  test.myvec
  Error: object 'test.myvec' not found
 
 
  Anyways, That's the kind of frustration. I have a list, third element
  is a matrix how do I referernce the 2 row 2 colum of the matrix in my
  list.. for example.
  and so forth..

 When you constructed your return-list, you simply entered the list
 components using the R-names of the objects, as used in the code:

  returnlist-list(myint,mybool,myvec,mymat,myframe)

 To use the $ extractor, you need to give them external names,
 so you could modify the above to:

  returnlist-list(Myint=myint,Mybool=mybool,
   Myvec=myvec,Mymat=mymat,Myframe=myframe)

 Then, after

  test-cstruct(3,T)

 you can access test$Myint, test$mybool, etc.; and, in particular,
 test$Mymat will be the matrix mymat you put in there, so you
 can extract elements of this using

  test$Mymat[2,2]

 for the element in row 2, column 2, and so on. Without making
 the return-list a named list, its components have no names,
 so then test$mymat (as you did) will not work because there
 is no component with name mymat (there is no component with
 any name). The name mymat was used by R to identify the
 object whose contents were to be placed in the list; that
 internal object-name does not get placed in the list.

 Note: In my modification above I used Myint=myint etc. instead
 of myint=myint to highlight the distinction between the
 component-name and the object-name. But you can just as well
 use exactly the same name for component-name as for object-name:
 R will recognise them as distinct and do the right thing.
 So you could just as well do:

  returnlist-list(myint=myint,mybool=mybool,
   myvec=myvec,mymat=mymat,myframe=myframe)

 and then, after

  test-cstruct(3,T)

 do

  test$Mymat[2,2]

 You can also use positional references if the list components have
 no names. Since your mymat is in position 4,

  test[[4]]

 would return the whole matrix. Then

  test[[4]][2,2]

 would return the element in row 2, column 2.

 As a standard example, try for instance

  X  - 0.1*((-10):10)
  Y  - 0.5*X + 0.2*rnorm(length(X))
  LM - lm(Y ~ X)

  summary(LM)
  # Call:
  # lm(formula = Y ~ X)
  # Residuals:
  #   Min1QMedian3Q   Max
  # -0.373283 -0.083458  0.009206  0.139763  0.278242
  # Coefficients:
  # Estimate Std. Error t value Pr(|t|)
  # (Intercept) -0.060180.04448  -1.3530.192
  # X0.462700.07345   6.299 4.78e-06 ***
  # ---
  # Signif. codes:  0 ?***? 0.001 ?**? 0.01

[R] Adding a header after the file is written

2010-05-03 Thread steven mosher
The situation arises where I open a file to write a data.frame to it. with
write.table.

multiple lines are written to the file and the file is kept in Append=TRUE
mode.

If one sets the col.names to the names of the variables being written,  you
have output
that looks like this...

name1 name2  name3.

 x  x x
 x  x x
 x  x x
name1  name2  name 3
 x  x x
 x  x x
 x  x x

And so forth as each time write is called, the col.names are written.

Setting col.names=NULL obviously removes them.

I thought a simple solution would be to check for the file existence first
and on the first write, include the col.names. with append=T.
on subsequent writes, col.names would be set to NULL.
that didnt work and threw warnings.

Is there anyway to do this. basically open a file for writing, with
append=TRUE and only write the col.names once
at the first write. or am I stuck and forced to write the whole file without
the col.names and then read back in and rewrite
with col.names=the cols names I want

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding a header after the file is written

2010-05-03 Thread steven mosher
Ok I will try that. I think I set it to False when I tried it the first
time, maybe that was my mistake

On Mon, May 3, 2010 at 3:28 PM, Ista Zahn istaz...@gmail.com wrote:

 Hi Steve,
 I think you just need to set col.names = FALSE (instead of col.names
 =NULL) on subsequent writes.

 -Ista

 On Mon, May 3, 2010 at 5:19 PM, steven mosher mosherste...@gmail.com
 wrote:
  The situation arises where I open a file to write a data.frame to it.
 with
  write.table.
 
  multiple lines are written to the file and the file is kept in
 Append=TRUE
  mode.
 
  If one sets the col.names to the names of the variables being written,
  you
  have output
  that looks like this...
 
  name1 name2  name3.
 
   x  x x
   x  x x
   x  x x
  name1  name2  name 3
   x  x x
   x  x x
   x  x x
 
  And so forth as each time write is called, the col.names are written.
 
  Setting col.names=NULL obviously removes them.
 
  I thought a simple solution would be to check for the file existence
 first
  and on the first write, include the col.names. with append=T.
  on subsequent writes, col.names would be set to NULL.
  that didnt work and threw warnings.
 
  Is there anyway to do this. basically open a file for writing, with
  append=TRUE and only write the col.names once
  at the first write. or am I stuck and forced to write the whole file
 without
  the col.names and then read back in and rewrite
  with col.names=the cols names I want
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Ista Zahn
 Graduate student
 University of Rochester
 Department of Clinical and Social Psychology
 http://yourpsyche.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding a header after the file is written

2010-05-04 Thread steven mosher
thanks.. worked

On Mon, May 3, 2010 at 3:24 PM, Ted Harding ted.hard...@manchester.ac.ukwrote:

 On 03-May-10 21:19:34, steven mosher wrote:
  The situation arises where I open a file to write a data.frame
  to it.  with write.table.
 
  multiple lines are written to the file and the file is kept in
  Append=TRUE
  mode.
 
  If one sets the col.names to the names of the variables being
  written, you have output that looks like this...
 
  name1 name2  name3.
 
   x  x x
   x  x x
   x  x x
  name1  name2  name 3
   x  x x
   x  x x
   x  x x
 
  And so forth as each time write is called, the col.names are written.
 
  Setting col.names=NULL obviously removes them.
 
  I thought a simple solution would be to check for the file existence
  first and on the first write, include the col.names. with append=T.
  on subsequent writes, col.names would be set to NULL.
  that didnt work and threw warnings.
 
  Is there anyway to do this. basically open a file for writing, with
  append=TRUE and only write the col.names once at the first write.
  or am I stuck and forced to write the whole file without the col.names
  and then read back in and rewrite with col.names=the cols names I
  want

 The following (which uses a tiny dataframe I had lying around after
 responding to an earlier query) looks like what you want to do
 (provided you first test existince of the file before switching
 to the second form of write.table()):

  foo
  # $Bar1
  # [1] 1
  # $Bar2
  # [1] 2
  # $Bar3
  # [1] 3
  # $Bar4
  # [1] 4

 write.table(foo,file=foo.txt,row.names=FALSE,
col.names=c(Bar.1,Bar.2,Bar.3,Bar.4),
append=FALSE)
 write.table(foo,file=foo.txt,row.names=FALSE,
col.names=FALSE,append=TRUE)
 write.table(foo,file=foo.txt,row.names=FALSE,
col.names=FALSE,append=TRUE)
 write.table(foo,file=foo.txt,row.names=FALSE,
col.names=FALSE,append=TRUE)
 write.table(foo,file=foo.txt,row.names=FALSE,
col.names=FALSE,append=TRUE)


 Contents of foo.txt after the above:

  Bar.1 Bar.2 Bar.3 Bar.4
  1 2 3 4
  1 2 3 4
  1 2 3 4
  1 2 3 4
  1 2 3 4

 Ted.

 
 E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
 Fax-to-email: +44 (0)870 094 0861
 Date: 03-May-10   Time: 23:24:55
 -- XFMail --


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error in La.svd Lapack routine 'dgesdd'

2010-05-04 Thread steven mosher
Error in La.svd(x, nu, nv) : error code 1 from Lapack routine ‘dgesdd’

what resources are there to track down errors like this

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] extracting a matched string using regexpr

2010-05-05 Thread steven mosher
Given a text like

I want to be able to extract a matched regular expression from a piece of
text.

this apparently works, but is pretty ugly
# some html
test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
# a pattern to extract 5 digits
 pattern-[0-9]{5}
# regexpr returns a start point[1] and an attribute match.length
attr(,match.length)
# get the substring from the start point to the stop point.. where stop =
start +length-1

answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1)
 answer
[1] 88958

I tried using sub(pattern, replacement, x )  with a regexp that captured the
group. I'd found an example of this in the mails
but it didnt seem to work..

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extracting a matched string using regexpr

2010-05-05 Thread steven mosher
Thanks I was looking at that package and reading your mails in the archive.
I think my tiny mind got twisted in the regexp..

On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck
ggrothendi...@gmail.comwrote:

 Here are two ways to extract 5 digits.

 In the first one \\1 refers to the portion matched between the
 parentheses in the regular expression.

 In the second one strapply is like apply where the object to be worked
 on is the first argument (array for apply, string for strapply) the
 second modifies it (which dimension for apply, regular expression for
 strapply) and the last is a function which acts on each value
 (typically each row or column for apply and each match for strapply).
 In this case we use c as our function to just return all the results.
 They are returned in a list with one component per string but here
 test is just a single string so we get a list one long and we ask for
 the contents of the first component using [[1]].

 # 1 - sub
 sub(.*(\\d{5}).*, \\1, test)

 # 2 - strapply - see http://gsubfn.googlecode.com
 library(gsubfn)
 strapply(test, \\d{5}, c)[[1]]



 On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com
 wrote:
  Given a text like
 
  I want to be able to extract a matched regular expression from a piece of
  text.
 
  this apparently works, but is pretty ugly
  # some html
 
 test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  # a pattern to extract 5 digits
  pattern-[0-9]{5}
  # regexpr returns a start point[1] and an attribute match.length
  attr(,match.length)
  # get the substring from the start point to the stop point.. where stop =
  start +length-1
 
 
 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1)
  answer
  [1] 88958
 
  I tried using sub(pattern, replacement, x )  with a regexp that captured
 the
  group. I'd found an example of this in the mails
  but it didnt seem to work..


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extracting a matched string using regexpr

2010-05-05 Thread steven mosher
 test
[1]
/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
 sub(.*(\\d{5}).*, \\1, test)
[1] /th
 sub(.*([0-9]{5}).*,\\1,test)
[1] 88958



I think the / in  the source throws something off.
as the group capture appears to not be working, except the bracket version
it did.


On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck
ggrothendi...@gmail.comwrote:

 Here are two ways to extract 5 digits.

 In the first one \\1 refers to the portion matched between the
 parentheses in the regular expression.

 In the second one strapply is like apply where the object to be worked
 on is the first argument (array for apply, string for strapply) the
 second modifies it (which dimension for apply, regular expression for
 strapply) and the last is a function which acts on each value
 (typically each row or column for apply and each match for strapply).
 In this case we use c as our function to just return all the results.
 They are returned in a list with one component per string but here
 test is just a single string so we get a list one long and we ask for
 the contents of the first component using [[1]].

 # 1 - sub
 sub(.*(\\d{5}).*, \\1, test)

 # 2 - strapply - see http://gsubfn.googlecode.com
 library(gsubfn)
 strapply(test, \\d{5}, c)[[1]]



 On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com
 wrote:
  Given a text like
 
  I want to be able to extract a matched regular expression from a piece of
  text.
 
  this apparently works, but is pretty ugly
  # some html
 
 test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  # a pattern to extract 5 digits
  pattern-[0-9]{5}
  # regexpr returns a start point[1] and an attribute match.length
  attr(,match.length)
  # get the substring from the start point to the stop point.. where stop =
  start +length-1
 
 
 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1)
  answer
  [1] 88958
 
  I tried using sub(pattern, replacement, x )  with a regexp that captured
 the
  group. I'd found an example of this in the mails
  but it didnt seem to work..


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extracting a matched string using regexpr

2010-05-05 Thread steven mosher
Hmm.

I have R11 just downloaded fresh.

I'll reload a new session..and revert. I will note that I've had trouble
with \\d
which is why I was using [0-9]

MAC here.

On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck
ggrothendi...@gmail.comwrote:

 That's not what I get:

 
 test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  sub(.*(\\d{5}).*, \\1, test)
 [1] 88958
  R.version.string
 [1] R version 2.10.1 (2009-12-14)

 I also got the above in R 2.11.0 patched as well.


 On Wed, May 5, 2010 at 5:55 PM, steven mosher mosherste...@gmail.com
 wrote:
   test
  [1]
 
 /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  sub(.*(\\d{5}).*, \\1, test)
  [1] /th
  sub(.*([0-9]{5}).*,\\1,test)
  [1] 88958
 
 
  I think the / in  the source throws something off.
  as the group capture appears to not be working, except the bracket
 version
  it did.
 
  On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck 
 ggrothendi...@gmail.com
  wrote:
 
  Here are two ways to extract 5 digits.
 
  In the first one \\1 refers to the portion matched between the
  parentheses in the regular expression.
 
  In the second one strapply is like apply where the object to be worked
  on is the first argument (array for apply, string for strapply) the
  second modifies it (which dimension for apply, regular expression for
  strapply) and the last is a function which acts on each value
  (typically each row or column for apply and each match for strapply).
  In this case we use c as our function to just return all the results.
  They are returned in a list with one component per string but here
  test is just a single string so we get a list one long and we ask for
  the contents of the first component using [[1]].
 
  # 1 - sub
  sub(.*(\\d{5}).*, \\1, test)
 
  # 2 - strapply - see http://gsubfn.googlecode.com
  library(gsubfn)
  strapply(test, \\d{5}, c)[[1]]
 
 
 
  On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com
  wrote:
   Given a text like
  
   I want to be able to extract a matched regular expression from a piece
   of
   text.
  
   this apparently works, but is pretty ugly
   # some html
  
  
 test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
   # a pattern to extract 5 digits
   pattern-[0-9]{5}
   # regexpr returns a start point[1] and an attribute match.length
   attr(,match.length)
   # get the substring from the start point to the stop point.. where
 stop
   =
   start +length-1
  
  
  
 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1)
   answer
   [1] 88958
  
   I tried using sub(pattern, replacement, x )  with a regexp that
 captured
   the
   group. I'd found an example of this in the mails
   but it didnt seem to work..
 
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extracting a matched string using regexpr

2010-05-05 Thread steven mosher
Thnks,

perhaps we should report it

On Wed, May 5, 2010 at 4:52 PM, Gabor Grothendieck
ggrothendi...@gmail.comwrote:

 I am using Vista.  Another thing to try is strapply using the tcl
 engine (assuming you do have tcltk capabilities) and the R engine.  On
 Vista R 2.11.0 patched I get the same result:

  capabilities()[[tcltk]]
 [1] TRUE
  strapply(test, \\d{5}, c, engine = tcl)[[1]]
 [1] 88958
  strapply(test, \\d{5}, c, engine = R)[[1]]
 [1] 88958

 On Vista with R 2.9.2 I do get bad results:

 
 test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  sub(.*(\\d{5}).*, \\1, test)
 [1]
 /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  sub(.*(\\d{5}).*, \\1, test, extended = TRUE)
 [1]
 /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  R.version.string
 [1] R version 2.9.2 Patched (2009-09-08 r49647)
  win.version()
 [1] Windows Vista (build 6002) Service Pack 2


 On Wed, May 5, 2010 at 6:20 PM, steven mosher mosherste...@gmail.com
 wrote:
  Hmm.
  I have R11 just downloaded fresh.
  I'll reload a new session..and revert. I will note that I've had trouble
  with \\d
  which is why I was using [0-9]
  MAC here.
 
  On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck 
 ggrothendi...@gmail.com
  wrote:
 
  That's not what I get:
 
  
  
 test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
   sub(.*(\\d{5}).*, \\1, test)
  [1] 88958
   R.version.string
  [1] R version 2.10.1 (2009-12-14)
 
  I also got the above in R 2.11.0 patched as well.
 
 
  On Wed, May 5, 2010 at 5:55 PM, steven mosher mosherste...@gmail.com
  wrote:
test
   [1]
  
  
 /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
   sub(.*(\\d{5}).*, \\1, test)
   [1] /th
   sub(.*([0-9]{5}).*,\\1,test)
   [1] 88958
  
  
   I think the / in  the source throws something off.
   as the group capture appears to not be working, except the bracket
   version
   it did.
  
   On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck
   ggrothendi...@gmail.com
   wrote:
  
   Here are two ways to extract 5 digits.
  
   In the first one \\1 refers to the portion matched between the
   parentheses in the regular expression.
  
   In the second one strapply is like apply where the object to be
 worked
   on is the first argument (array for apply, string for strapply) the
   second modifies it (which dimension for apply, regular expression for
   strapply) and the last is a function which acts on each value
   (typically each row or column for apply and each match for strapply).
   In this case we use c as our function to just return all the results.
   They are returned in a list with one component per string but here
   test is just a single string so we get a list one long and we ask for
   the contents of the first component using [[1]].
  
   # 1 - sub
   sub(.*(\\d{5}).*, \\1, test)
  
   # 2 - strapply - see http://gsubfn.googlecode.com
   library(gsubfn)
   strapply(test, \\d{5}, c)[[1]]
  
  
  
   On Wed, May 5, 2010 at 5:13 PM, steven mosher 
 mosherste...@gmail.com
   wrote:
Given a text like
   
I want to be able to extract a matched regular expression from a
piece
of
text.
   
this apparently works, but is pretty ugly
# some html
   
   
   
 test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
# a pattern to extract 5 digits
pattern-[0-9]{5}
# regexpr returns a start point[1] and an attribute match.length
attr(,match.length)
# get the substring from the start point to the stop point.. where
stop
=
start +length-1
   
   
   
   
 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1)
answer
[1] 88958
   
I tried using sub(pattern, replacement, x )  with a regexp that
captured
the
group. I'd found an example of this in the mails
but it didnt seem to work..
  
  
 
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] quick question on getting a listing of files on ftp site

2010-05-22 Thread steven mosher
Given a valid ftp address, is there a package that will allow me to get a
listing of the files/directory structure
on that site? RCurl looks to have this ability are there others?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] quick question on ftp access

2010-05-22 Thread steven mosher
I'm looking for a function or package that will allow me to get a list of
the files at an ftp site.
RCurl looks promising. Are there other packages that have similar
functionality

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Matrix to Vector

2010-06-05 Thread steven mosher
Given a matrix of m*n, I want to reorder it as a vector, using a row major
transpose.

so:

 m-matrix(seq(1,48),nrow=6,byrow=T)
 m
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]12345678
[2,]9   10   11   12   13   14   15   16
[3,]   17   18   19   20   21   22   23   24
[4,]   25   26   27   28   29   30   31   32
[5,]   33   34   35   36   37   38   39   40
[6,]   41   42   43   44   45   46   47   48

I want to reorder this as a vector copying by row, so that the final vector
has elements ordered thusly: row 1, column 1:N (m[1,1:n]) maps to
row 1-n, and m[2,1:n] maps to row[n+1:2n] ...

this obviously is not a solution: as the inherent column major storage
paradigm of a matrix
defeats the approach.
 dim(m)-c(48,1)
 m
  [,1]
 [1,]1
 [2,]9
 [3,]   17
 [4,]   25
 [5,]   33
 [6,]   41
 [7,]2
 [8,]   10
 [9,]   18
[10,]   26
[11,]   34
[12,]   42
[13,]3
[14,]   11
[15,]   19
[16,]   27
[17,]   35
[18,]   43
[19,]4
[20,]   12
[21,]   20
[22,]   28
[23,]   36
[24,]   44
[25,]5
[26,]   13
[27,]   21
[28,]   29
[29,]   37
[30,]   45
[31,]6
[32,]   14
[33,]   22
[34,]   30
[35,]   38
[36,]   46
[37,]7
[38,]   15
[39,]   23
[40,]   31
[41,]   39
[42,]   47
[43,]8
[44,]   16
[45,]   24
[46,]   32
[47,]   40
[48,]   48


I already have a version that loops through the data ( this is actually a
portion of a data frame ) to reorder
this into a vector, but I was hoping there was an elegant way

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix to Vector

2010-06-05 Thread steven mosher
 as.vector(t(m))
 [1]  1  9 17 25 33 41  2 10 18 26 34 42  3 11 19 27 35 43  4 12 20 28 36 44
 5 13 21 29 37 45  6 14 22 30 38 46  7 15 23 31 39 47  8 16 24
[46] 32 40 48

the result I want is this:

[1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
[46] 46 47 48



On Sat, Jun 5, 2010 at 11:17 AM, Henrique Dallazuanna www...@gmail.comwrote:

 Try this:

 as.vector(t(m))

 On Sat, Jun 5, 2010 at 3:12 PM, steven mosher mosherste...@gmail.comwrote:

 Given a matrix of m*n, I want to reorder it as a vector, using a row major
 transpose.

 so:

  m-matrix(seq(1,48),nrow=6,byrow=T)
  m
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
 [1,]12345678
 [2,]9   10   11   12   13   14   15   16
 [3,]   17   18   19   20   21   22   23   24
 [4,]   25   26   27   28   29   30   31   32
 [5,]   33   34   35   36   37   38   39   40
 [6,]   41   42   43   44   45   46   47   48

 I want to reorder this as a vector copying by row, so that the final
 vector
 has elements ordered thusly: row 1, column 1:N (m[1,1:n]) maps to
 row 1-n, and m[2,1:n] maps to row[n+1:2n] ...

 this obviously is not a solution: as the inherent column major storage
 paradigm of a matrix
 defeats the approach.
  dim(m)-c(48,1)
  m
  [,1]
  [1,]1
  [2,]9
  [3,]   17
  [4,]   25
  [5,]   33
  [6,]   41
  [7,]2
  [8,]   10
  [9,]   18
 [10,]   26
 [11,]   34
 [12,]   42
 [13,]3
 [14,]   11
 [15,]   19
 [16,]   27
 [17,]   35
 [18,]   43
 [19,]4
 [20,]   12
 [21,]   20
 [22,]   28
 [23,]   36
 [24,]   44
 [25,]5
 [26,]   13
 [27,]   21
 [28,]   29
 [29,]   37
 [30,]   45
 [31,]6
 [32,]   14
 [33,]   22
 [34,]   30
 [35,]   38
 [36,]   46
 [37,]7
 [38,]   15
 [39,]   23
 [40,]   31
 [41,]   39
 [42,]   47
 [43,]8
 [44,]   16
 [45,]   24
 [46,]   32
 [47,]   40
 [48,]   48


 I already have a version that loops through the data ( this is actually a
 portion of a data frame ) to reorder
 this into a vector, but I was hoping there was an elegant way

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix to Vector

2010-06-05 Thread steven mosher
 I bet that is what I did.

On Sat, Jun 5, 2010 at 11:54 AM, John Kane jrkrid...@yahoo.ca wrote:

 m-matrix(seq(1,48),nrow=6,byrow=T)
 as.vector(t(m))

 gives me the correct result.

 Any chance you may have already transformed m ?

 --- On Sat, 6/5/10, steven mosher mosherste...@gmail.com wrote:

  From: steven mosher mosherste...@gmail.com
  Subject: Re: [R] Matrix to Vector
  To: Henrique Dallazuanna www...@gmail.com
  Cc: r-help@r-project.org
  Received: Saturday, June 5, 2010, 2:44 PM
   as.vector(t(m))
   [1]  1  9 17 25 33 41  2 10 18 26 34
  42  3 11 19 27 35 43  4 12 20 28 36 44
   5 13 21 29 37 45  6 14 22 30 38 46  7 15 23 31
  39 47  8 16 24
  [46] 32 40 48
 
  the result I want is this:
 
  [1]  1  2  3  4  5  6
  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
  24
  25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
  45
  [46] 46 47 48
 
 
 
  On Sat, Jun 5, 2010 at 11:17 AM, Henrique Dallazuanna
  www...@gmail.comwrote:
 
   Try this:
  
   as.vector(t(m))
  
   On Sat, Jun 5, 2010 at 3:12 PM, steven mosher mosherste...@gmail.com
 wrote:
  
   Given a matrix of m*n, I want to reorder it as a
  vector, using a row major
   transpose.
  
   so:
  
m-matrix(seq(1,48),nrow=6,byrow=T)
m
   [,1] [,2] [,3] [,4] [,5]
  [,6] [,7] [,8]
   [1,]12
  3456
  78
   [2,]
  9   10   11   12   13   14   15   16
  
  [3,]   17   18   19   20   21   22   23   24
  
  [4,]   25   26   27   28   29   30   31   32
  
  [5,]   33   34   35   36   37   38   39   40
  
  [6,]   41   42   43   44   45   46   47   48
  
   I want to reorder this as a vector copying by row,
  so that the final
   vector
   has elements ordered thusly: row 1, column 1:N
  (m[1,1:n]) maps to
   row 1-n, and m[2,1:n] maps to row[n+1:2n] ...
  
   this obviously is not a solution: as the inherent
  column major storage
   paradigm of a matrix
   defeats the approach.
dim(m)-c(48,1)
m
[,1]
[1,]1
[2,]9
[3,]   17
[4,]   25
[5,]   33
[6,]   41
[7,]2
[8,]   10
[9,]   18
   [10,]   26
   [11,]   34
   [12,]   42
   [13,]3
   [14,]   11
   [15,]   19
   [16,]   27
   [17,]   35
   [18,]   43
   [19,]4
   [20,]   12
   [21,]   20
   [22,]   28
   [23,]   36
   [24,]   44
   [25,]5
   [26,]   13
   [27,]   21
   [28,]   29
   [29,]   37
   [30,]   45
   [31,]6
   [32,]   14
   [33,]   22
   [34,]   30
   [35,]   38
   [36,]   46
   [37,]7
   [38,]   15
   [39,]   23
   [40,]   31
   [41,]   39
   [42,]   47
   [43,]8
   [44,]   16
   [45,]   24
   [46,]   32
   [47,]   40
   [48,]   48
  
  
   I already have a version that loops through the
  data ( this is actually a
   portion of a data frame ) to reorder
   this into a vector, but I was hoping there was an
  elegant way
  
  [[alternative HTML
  version deleted]]
  
   __
   R-help@r-project.org
  mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained,
  reproducible code.
  
  
  
  
   --
   Henrique Dallazuanna
   Curitiba-Paraná-Brasil
   25° 25' 40 S 49° 16' 22 O
  
 
  [[alternative HTML version deleted]]
 
 
  -Inline Attachment Follows-
 
  __
  R-help@r-project.org
  mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
  reproducible code.
 




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] counting Na/not NA by groups by column

2010-06-09 Thread steven mosher
# create a matrix with some random NAs in it
 m-matrix(NA,nrow=15,ncol=14)
 m[,3:14]-52
 m[13,9]-NA
 m[4:7,8]-NA
 m[1:2,5]-NA
 m[,2]-rep(1800:1804, by=3)
 y-order(m[,2])
 m-m[y,]
 m[,1]-rep(1:3,by=5)
 m
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[,14]
 [1,]1 1800   52   52   NA   52   52   52   5252525252
 52
 [2,]2 1800   52   52   52   52   52   NA   5252525252
 52
 [3,]3 1800   52   52   52   52   52   52   5252525252
 52
 [4,]1 1801   52   52   NA   52   52   52   5252525252
 52
 [5,]2 1801   52   52   52   52   52   NA   5252525252
 52
 [6,]3 1801   52   52   52   52   52   52   5252525252
 52
 [7,]1 1802   52   52   52   52   52   52   5252525252
 52
 [8,]2 1802   52   52   52   52   52   52   5252525252
 52
 [9,]3 1802   52   52   52   52   52   52   NA52525252
 52
[10,]1 1803   52   52   52   52   52   NA   5252525252
 52
[11,]2 1803   52   52   52   52   52   52   5252525252
 52
[12,]3 1803   52   52   52   52   52   52   5252525252
 52
[13,]1 1804   52   52   52   52   52   NA   5252525252
 52
[14,]2 1804   52   52   52   52   52   52   5252525252
 52
[15,]3 1804   52   52   52   52   52   52   5252525252
 52

# the goal is to count all NON NA  by changes in column 2
# we can get the count for all rows easily.
 col.sum-(apply(!is.na(m[,3:14]),2,sum))
 col.sum
 [1] 15 15 13 15 15 11 14 15 15 15 15 15

# what we want is a result that looks like this
   1800  3   3   2  3  3   2   3   3   3   3   3   3
   1801  3   3   2  3  3   2   3   3   3   3   3   3
   1802  3   3   3  3  3   3   2   3   3   3   3   3
   1803  3   3   3  3  3   2   3   3   3   3   3   3
   1804  3   3   3  3  3   2   3   3   3   3   3   3

I've toyed a bit with By

 mask-!is.na(m[,3:14])
 test-cbind(m[,1:2],mask)
 test
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[,14]
 [1,]1 18001101111 1 1 1 1
  1
 [2,]2 18001111101 1 1 1 1
  1
 [3,]3 18001111111 1 1 1 1
  1
 [4,]1 18011101111 1 1 1 1
  1
 [5,]2 18011111101 1 1 1 1
  1
 [6,]3 18011111111 1 1 1 1
  1
 [7,]1 18021111111 1 1 1 1
  1
 [8,]2 18021111111 1 1 1 1
  1
 [9,]3 18021111110 1 1 1 1
  1
[10,]1 18031111101 1 1 1 1
  1
[11,]2 18031111111 1 1 1 1
  1
[12,]3 18031111111 1 1 1 1
  1
[13,]1 18041111101 1 1 1 1
  1
[14,]2 18041111111 1 1 1 1
  1
[15,]3 18041111111 1 1 1 1
  1

 result-by(test[,3:14],test[,2], sum)
 result
INDICES: 1800
[1] 34
-
INDICES: 1801
[1] 34
-
INDICES: 1802
[1] 35
-
INDICES: 1803
[1] 35
-
INDICES: 1804
[1] 35

as this sums all the values and not by column. it's wrong
 so is there an elegant way to get the number of
NON Nas.. by column   governed by changes in the values of a variable.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counting Na/not NA by groups by column

2010-06-09 Thread steven mosher
thats beautiful

 apply(m[, 3:14], 2,
+  function(x) tapply(x, m[,2], function(x) sum(!is.na(x
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
1800332332333 3 3 3
1801332332333 3 3 3
1802333333233 3 3 3
1803333332333 3 3 3
1804333332333 3 3 3

i was thinking of doing 'by' inside an apply, but this is perfect. thx

On Wed, Jun 9, 2010 at 6:16 PM, Erik Iverson er...@ccbr.umn.edu wrote:

 Hello,


 steven mosher wrote:

 # create a matrix with some random NAs in it

 m-matrix(NA,nrow=15,ncol=14)
 m[,3:14]-52
 m[13,9]-NA
 m[4:7,8]-NA
 m[1:2,5]-NA
 m[,2]-rep(1800:1804, by=3)
 y-order(m[,2])
 m-m[y,]
 m[,1]-rep(1:3,by=5)



 # what we want is a result that looks like this
   1800  3   3   2  3  3   2   3   3   3   3   3   3
   1801  3   3   2  3  3   2   3   3   3   3   3   3
   1802  3   3   3  3  3   3   2   3   3   3   3   3
   1803  3   3   3  3  3   2   3   3   3   3   3   3
   1804  3   3   3  3  3   2   3   3   3   3   3   3


 This should work:

 apply(m[, 3:14], 2,
  function(x) tapply(x, m[,2], function(x) sum(!is.na(x

 It uses tapply inside of apply to break up the groups by m[, 2].


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] working with zoo time index ??

2010-06-15 Thread steven mosher
Hi Gabor,

 Not sure where to report this, but

Mac 10.5.8
R: 11.1

When you examine the zoo vignette and hit the back button, you get a hang.
I havent tested with other vignettes and cant imagine that is is specific to
yours
FWIW.

Did I mention that zoo is great. Thx for your work on it.

On Tue, Jun 15, 2010 at 6:30 AM, Gabor Grothendieck ggrothendi...@gmail.com
 wrote:

 On Tue, Jun 15, 2010 at 8:27 AM, skan juanp...@gmail.com wrote:
 
  Hello
 
  Where could I find examples on how to work with the time index in a
  timeseries  or zoo series?
 
  Let say I've got this series
 
  DATA
  1990-01-01 10:00:00   0.900
  1990-01-01 10:01:00   0.910
  1990-01-01 10:03:00   0.905
  1990-01-01 10:04:00   0.905
  1990-01-01 10:05:00   0.890
 
  ...
 
  2000-12-31 20:00:00   0.992
 
 
  How do I make simple calculations such as ... ?
  Calculate the mean of the first data every day. (mapply, for loop, tapply
 ?)
  Transform data to a table,  with dates in one axis and  times in the
 other.
 

 There are three vignettes that come with zoo.  vignette() lists their
 names and vignette(zoo) displays the one called zoo (similarly for
 the other two).  Also see the help files: ?zoo, ?read.zoo,
 ?aggregate.zoo
 and note the examples at the bottom of the help files.
 Also library(help = zoo) lists the help files available.

 Lines - 1990-01-01 10:00:00   0.900
 1990-01-01 10:01:00   0.910
 1990-01-01 10:03:00   0.905
 1990-01-01 10:04:00   0.905
 1990-01-01 10:05:00   0.890
 1990-01-02 10:00:00   0.940
 1990-01-02 10:01:00   0.990
 library(zoo)
 library(chron)
 z - read.zoo(textConnection(Lines), index = 1:2, FUN = function(x)
 as.chron(paste(x[,1], x[,2])))

 # take first data value for each day and then take their mean
 mean(aggregate(z, as.Date, head, 1))

 # create data frame from z made up of dates, times and value
 # dates and times are chron package functions.
 # (If you use a different date and time class then it would be different.)
 data.frame(dates = dates(time(z)), times = times(time(z)), value =
 coredata(z))

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] applying ifelse to dataframe

2010-06-22 Thread steven mosher
The following dataframe will illustrate the problem

 
DF-data.frame(name=rep(1:5,each=2),x1=rep(A,10),x2=seq(10,19,by=1),x3=rep(NA,10),x4=seq(20,29,by=1))
 DF$x3[5]-50

 # we have a data frame. we are interested in the columns x2,x3,x4 which
contain sparse
 # values and many NA.
 DF
   name x1 x2 x3 x4
1 1  A 10 NA 20
2 1  A 11 NA 21
3 2  A 12 NA 22
4 2  A 13 NA 23
5 3  A 14 50 24
6 3  A 15 NA 25
7 4  A 16 NA 26
8 4  A 17 NA 27
9 5  A 18 NA 28
105  A 19 NA 29

# we have a list of target values that we want to search for in the data
frame
# if the value is in the data frame we want to keep it there,  otherwise,
 replace it with NA

targets-c(11,12,13,16,19,50,27,24,22,26)
# so we apply a test by column to the last 3 columns using the in test
# this gives us a mask of whether the data frame 'contains' elements in the
# target list

mask-apply(DF[,3:5],2, %in% ,targets)
mask

 x2x3x4
 [1,] FALSE FALSE FALSE
 [2,]  TRUE FALSE FALSE
 [3,]  TRUE FALSE  TRUE
 [4,]  TRUE FALSE FALSE
 [5,] FALSE  TRUE  TRUE
 [6,] FALSE FALSE FALSE
 [7,]  TRUE FALSE  TRUE
 [8,] FALSE FALSE  TRUE
 [9,] FALSE FALSE FALSE
[10,]  TRUE FALSE FALSE

# and so DF[2,3] is equal to 11 and 11 is in the target list, so the mask is
True
# now something like DF- ifelse(mask==T,DF,NA) is CONCEPTUALLY what I want
to do
in the end I'd  Like a result that looks like

   name x1 x2 x3 x4
1 1  A NA NA NA
2 1  A 11 NA NA
3 2  A 12 NA 22
4 2  A 13 NANA
5 3  A NA 50 24
6 3  A NA NA NA
7 4  A 16 NA 26
8 4  A NA NA 27
9 5  A NA NA NA
105  A 19 NA NA

Ive tried forcing the DF and the mask into vectors so that ifelse() would
work
and have tried apply using ifelse.. without much luck. any thoughts?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] applying ifelse to dataframe

2010-06-22 Thread steven mosher
Thanks, the dataframe, is indeed clever at preserving its dimensions.

I'll try your solution with the real data

On Tue, Jun 22, 2010 at 12:23 M, Petr PIKAL petr.pi...@precheza.cz wrote:

 Hi

 r-help-boun...@r-project.org napsal dne 22.06.2010 08:28:04:

  The following dataframe will illustrate the problem
 
 

 DF-data.frame(name=rep(1:5,each=2),x1=rep(A,10),x2=seq(10,19,by=1),x3=rep
  (NA,10),x4=seq(20,29,by=1))
   DF$x3[5]-50
 
   # we have a data frame. we are interested in the columns x2,x3,x4 which
  contain sparse
   # values and many NA.
   DF
 name x1 x2 x3 x4
  1 1  A 10 NA 20
  2 1  A 11 NA 21
  3 2  A 12 NA 22
  4 2  A 13 NA 23
  5 3  A 14 50 24
  6 3  A 15 NA 25
  7 4  A 16 NA 26
  8 4  A 17 NA 27
  9 5  A 18 NA 28
  105  A 19 NA 29
 
  # we have a list of target values that we want to search for in the
 data
  frame
  # if the value is in the data frame we want to keep it there, otherwise,
   replace it with NA
 
  targets-c(11,12,13,16,19,50,27,24,22,26)
  # so we apply a test by column to the last 3 columns using the in test
  # this gives us a mask of whether the data frame 'contains' elements in
 the
  # target list
 
  mask-apply(DF[,3:5],2, %in% ,targets)
  mask
 
   x2x3x4
   [1,] FALSE FALSE FALSE
   [2,]  TRUE FALSE FALSE
   [3,]  TRUE FALSE  TRUE
   [4,]  TRUE FALSE FALSE
   [5,] FALSE  TRUE  TRUE
   [6,] FALSE FALSE FALSE
   [7,]  TRUE FALSE  TRUE
   [8,] FALSE FALSE  TRUE
   [9,] FALSE FALSE FALSE
  [10,]  TRUE FALSE FALSE
 
  # and so DF[2,3] is equal to 11 and 11 is in the target list, so the
 mask is
  True
  # now something like DF- ifelse(mask==T,DF,NA) is CONCEPTUALLY what I
 want

 Data frames are quite clever in preserving their dimensions. I would do

 mask=data.frame(a=TRUE, b=TRUE, !mask)

 to add column 1 and 2

 and

 DF[mask]-NA

 Regards
 Petr


  to do
  in the end I'd  Like a result that looks like
 
 name x1 x2 x3 x4
  1 1  A NA NA NA
  2 1  A 11 NA NA
  3 2  A 12 NA 22
  4 2  A 13 NANA
  5 3  A NA 50 24
  6 3  A NA NA NA
  7 4  A 16 NA 26
  8 4  A NA NA 27
  9 5  A NA NA NA
  105  A 19 NA NA
 
  Ive tried forcing the DF and the mask into vectors so that ifelse()
 would
  work
  and have tried apply using ifelse.. without much luck. any thoughts?
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] applying ifelse to dataframe

2010-06-22 Thread steven mosher
Hmm


DF-data.frame(name=rep(1:5,each=2),x1=rep(A,10),x2=seq(10,19,by=1),x3=rep(NA,10),x4=seq(20,29,by=1))
  DF$x3[5]-50
 mask-apply(sample,2,%in%, target)
  DF
   name x1 x2 x3 x4
1 1  A 10 NA 20
2 1  A 11 NA 21
3 2  A 12 NA 22
4 2  A 13 NA 23
5 3  A 14 50 24
6 3  A 15 NA 25
7 4  A 16 NA 26
8 4  A 17 NA 27
9 5  A 18 NA 28
105  A 19 NA 29
  mask
  [,1]  [,2]  [,3]  [,4]  [,5]
[1,] FALSE FALSE FALSE FALSE FALSE
[2,] FALSE FALSE FALSE FALSE FALSE
[3,]  TRUE  TRUE FALSE  TRUE FALSE
[4,] FALSE FALSE FALSE FALSE FALSE
[5,]  TRUE FALSE FALSE FALSE FALSE
  mask-data.frame(a=TRUE,b=TRUE,!mask)
  DF[mask]-NA
Error in FUN(X[[1L]], ...) :
  only defined on a data frame with all numeric variables
  DF2-data.frame(DF[,3:5])
  mask-apply(sample,2,%in%, target)
  mask-data.frame(!mask)
  DF2[mask]-NA
Error in FUN(X[[1L]], ...) :
  only defined on a data frame with all numeric variables
  DF2
   x2 x3 x4
1  10 NA 20
2  11 NA 21
3  12 NA 22
4  13 NA 23
5  14 50 24
6  15 NA 25
7  16 NA 26
8  17 NA 27
9  18 NA 28
10 19 NA 29
  mask-apply(DF2,2,%in%, target)
  mask-data.frame(!mask)
  DF2[mask]-NA
Error in FUN(X[[1L]], ...) :
  only defined on a data frame with all numeric variables

On Tue, Jun 22, 2010 at 12:23 AM, Petr PIKAL petr.pi...@precheza.cz wrote:

 Hi

 r-help-boun...@r-project.org napsal dne 22.06.2010 08:28:04:

  The following dataframe will illustrate the problem
 
 

 DF-data.frame(name=rep(1:5,each=2),x1=rep(A,10),x2=seq(10,19,by=1),x3=rep
  (NA,10),x4=seq(20,29,by=1))
   DF$x3[5]-50
 
   # we have a data frame. we are interested in the columns x2,x3,x4 which
  contain sparse
   # values and many NA.
   DF
 name x1 x2 x3 x4
  1 1  A 10 NA 20
  2 1  A 11 NA 21
  3 2  A 12 NA 22
  4 2  A 13 NA 23
  5 3  A 14 50 24
  6 3  A 15 NA 25
  7 4  A 16 NA 26
  8 4  A 17 NA 27
  9 5  A 18 NA 28
  105  A 19 NA 29
 
  # we have a list of target values that we want to search for in the
 data
  frame
  # if the value is in the data frame we want to keep it there, otherwise,
   replace it with NA
 
  targets-c(11,12,13,16,19,50,27,24,22,26)
  # so we apply a test by column to the last 3 columns using the in test
  # this gives us a mask of whether the data frame 'contains' elements in
 the
  # target list
 
  mask-apply(DF[,3:5],2, %in% ,targets)
  mask
 
   x2x3x4
   [1,] FALSE FALSE FALSE
   [2,]  TRUE FALSE FALSE
   [3,]  TRUE FALSE  TRUE
   [4,]  TRUE FALSE FALSE
   [5,] FALSE  TRUE  TRUE
   [6,] FALSE FALSE FALSE
   [7,]  TRUE FALSE  TRUE
   [8,] FALSE FALSE  TRUE
   [9,] FALSE FALSE FALSE
  [10,]  TRUE FALSE FALSE
 
  # and so DF[2,3] is equal to 11 and 11 is in the target list, so the
 mask is
  True
  # now something like DF- ifelse(mask==T,DF,NA) is CONCEPTUALLY what I
 want

 Data frames are quite clever in preserving their dimensions. I would do

 mask=data.frame(a=TRUE, b=TRUE, !mask)

 to add column 1 and 2

 and

 DF[mask]-NA

 Regards
 Petr


  to do
  in the end I'd  Like a result that looks like
 
 name x1 x2 x3 x4
  1 1  A NA NA NA
  2 1  A 11 NA NA
  3 2  A 12 NA 22
  4 2  A 13 NANA
  5 3  A NA 50 24
  6 3  A NA NA NA
  7 4  A 16 NA 26
  8 4  A NA NA 27
  9 5  A NA NA NA
  105  A 19 NA NA
 
  Ive tried forcing the DF and the mask into vectors so that ifelse()
 would
  work
  and have tried apply using ifelse.. without much luck. any thoughts?
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] applying ifelse to dataframe

2010-06-22 Thread steven mosher
Thanks for the solution

On Tue, Jun 22, 2010 at 1:02 AM, Peter Ehlers ehl...@ucalgary.ca wrote:

 On 2010-06-22 1:45, steven mosher wrote:

 Hmm


 DF-data.frame(name=rep(1:5,each=2),x1=rep(A,10),x2=seq(10,19,by=1),x3=rep(NA,10),x4=seq(20,29,by=1))
   DF$x3[5]-50
  mask-apply(sample,2,%in%, target)


 This is getting confusing. What's 'sample'?
 What's 'target'? Probably what you originally called 'targets'.


DF
name x1 x2 x3 x4
 1 1  A 10 NA 20
 2 1  A 11 NA 21
 3 2  A 12 NA 22
 4 2  A 13 NA 23
 5 3  A 14 50 24
 6 3  A 15 NA 25
 7 4  A 16 NA 26
 8 4  A 17 NA 27
 9 5  A 18 NA 28
 105  A 19 NA 29
   mask
   [,1]  [,2]  [,3]  [,4]  [,5]
 [1,] FALSE FALSE FALSE FALSE FALSE
 [2,] FALSE FALSE FALSE FALSE FALSE
 [3,]  TRUE  TRUE FALSE  TRUE FALSE
 [4,] FALSE FALSE FALSE FALSE FALSE
 [5,]  TRUE FALSE FALSE FALSE FALSE



 This suggests that 'sample' may be a matrix, not
 a dataframe.

 Anyway, try this on your original problem:


  targets-c(11,12,13,16,19,50,27,24,22,26)
  mask-apply(DF[,3:5],2, %in% ,targets)
  is.na(DF[3:5]) - !mask

  -Peter Ehlers


mask-data.frame(a=TRUE,b=TRUE,!mask)
   DF[mask]-NA
 Error in FUN(X[[1L]], ...) :
   only defined on a data frame with all numeric variables
   DF2-data.frame(DF[,3:5])
   mask-apply(sample,2,%in%, target)
   mask-data.frame(!mask)
   DF2[mask]-NA
 Error in FUN(X[[1L]], ...) :
   only defined on a data frame with all numeric variables
   DF2
x2 x3 x4
 1  10 NA 20
 2  11 NA 21
 3  12 NA 22
 4  13 NA 23
 5  14 50 24
 6  15 NA 25
 7  16 NA 26
 8  17 NA 27
 9  18 NA 28
 10 19 NA 29
   mask-apply(DF2,2,%in%, target)
   mask-data.frame(!mask)
   DF2[mask]-NA
 Error in FUN(X[[1L]], ...) :
   only defined on a data frame with all numeric variables

 On Tue, Jun 22, 2010 at 12:23 AM, Petr PIKALpetr.pi...@precheza.cz
  wrote:

  Hi

 r-help-boun...@r-project.org napsal dne 22.06.2010 08:28:04:

  The following dataframe will illustrate the problem




 DF-data.frame(name=rep(1:5,each=2),x1=rep(A,10),x2=seq(10,19,by=1),x3=rep

 (NA,10),x4=seq(20,29,by=1))
  DF$x3[5]-50

  # we have a data frame. we are interested in the columns x2,x3,x4 which
 contain sparse
  # values and many NA.
  DF
name x1 x2 x3 x4
 1 1  A 10 NA 20
 2 1  A 11 NA 21
 3 2  A 12 NA 22
 4 2  A 13 NA 23
 5 3  A 14 50 24
 6 3  A 15 NA 25
 7 4  A 16 NA 26
 8 4  A 17 NA 27
 9 5  A 18 NA 28
 105  A 19 NA 29

 # we have a list of target values that we want to search for in the

 data

 frame
 # if the value is in the data frame we want to keep it there, otherwise,
  replace it with NA

 targets-c(11,12,13,16,19,50,27,24,22,26)
 # so we apply a test by column to the last 3 columns using the in test
 # this gives us a mask of whether the data frame 'contains' elements in

 the

 # target list

 mask-apply(DF[,3:5],2, %in% ,targets)
 mask

  x2x3x4
  [1,] FALSE FALSE FALSE
  [2,]  TRUE FALSE FALSE
  [3,]  TRUE FALSE  TRUE
  [4,]  TRUE FALSE FALSE
  [5,] FALSE  TRUE  TRUE
  [6,] FALSE FALSE FALSE
  [7,]  TRUE FALSE  TRUE
  [8,] FALSE FALSE  TRUE
  [9,] FALSE FALSE FALSE
 [10,]  TRUE FALSE FALSE

 # and so DF[2,3] is equal to 11 and 11 is in the target list, so the

 mask is

 True
 # now something like DF- ifelse(mask==T,DF,NA) is CONCEPTUALLY what I

 want

 Data frames are quite clever in preserving their dimensions. I would do

 mask=data.frame(a=TRUE, b=TRUE, !mask)

 to add column 1 and 2

 and

 DF[mask]-NA

 Regards
 Petr


  to do
 in the end I'd  Like a result that looks like

name x1 x2 x3 x4
 1 1  A NA NA NA
 2 1  A 11 NA NA
 3 2  A 12 NA 22
 4 2  A 13 NANA
 5 3  A NA 50 24
 6 3  A NA NA NA
 7 4  A 16 NA 26
 8 4  A NA NA 27
 9 5  A NA NA NA
 105  A 19 NA NA

 Ive tried forcing the DF and the mask into vectors so that ifelse()

 would

 work
 and have tried apply using ifelse.. without much luck. any thoughts?

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide

 http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why software fails in scientific research

2010-07-01 Thread steven mosher
  Thomas,

 How popular is R inside of NOAA?




On Thu, Jul 1, 2010 at 11:25 AM, Thomas Adams thomas.ad...@noaa.gov wrote:

 OK…

 My Grandfather, who was a farmer, was outstanding in his field…

 Cheers…


 Murray M Cooper, PhD wrote:

 For what its worth!

 A good friend who also happens to be an ecologist
 told me An ecologist is a statistician who likes to be
 outside.

 Murray M Cooper, Phd
 Richland Statistics

 - Original Message - From: Gavin Simpson 
 gavin.simp...@ucl.ac.uk
 To: Bert Gunter gunter.ber...@gene.com
 Cc: r-help@r-project.org
 Sent: Thursday, July 01, 2010 11:57 AM
 Subject: Re: [R] Why software fails in scientific research


  On Wed, 2010-06-30 at 11:17 -0700, Bert Gunter wrote:

 Just one small additional note below ...

 Bert Gunter
 Genentech Nonclinical Biostatistics


 But a lot of academics are not going to waste their time documenting
 code

 properly, so others can reap the benefits of it. They would rather get
 on
 with
 the next project, to get the next paper. 


 -- Indeed. My personal experience over 3 decades in industrial (private)
 research is that data analysis is viewed as relatively
 unimportant/straightforward/pedestrian and is left to technicians (or
 postdocs) -- often with what is done being largely dictated by the
 conventions of a particular journal or discipline. The lab heads and
 research directors are responsible for the grand research strategies,
 managing resources, etc. and don't want to waste much time on something
 that
 routine. So worrying about reproducibility of data analysis code (if
 there
 is any, given the use of GUI software like Excel) falls beneath their
 radar.

 Clearly there are disciplines (e.g. ecology?) where this may NOT be the
 case.


 If ecology is anything to go by (and I am an ecologist, sort of, just
 about), there is a large body of the community doing things because i)
 that is how they've always been done, or ii) because that's what
 reviewers/editors expect etc. with a much smaller group of researchers
 pushing at the boundaries (of their field) to use techniques
 statisticians and the like have been using for a very long time.

 Reproducible research is still very much in the (very, very) small
 minority of the work I come across reviewing papers etc. But I am
 encouraged by the number of people I know who are starting to use tools
 like R to conduct their research.

  -- Bert


 G

 --
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography, [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Thomas E Adams
 National Weather Service
 Ohio River Forecast Center
 1901 South State Route 134
 Wilmington, OH 45177

 EMAIL:  thomas.ad...@noaa.gov

 VOICE:  937-383-0528
 FAX:937-383-0033


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >