[R] Overlaying two png?
I have a program that creates a Png file using Rgooglemap with an extent (lonmin,lonmax,latmin,latmax) I also have a contour plot of the same location, same extent, same sized (height/width) png file. I'm looking for a way to make the contour semi transparent and overlay it on the google map ( hybrid map) Since I have 7000 of these to do an automated process is desired ( grin) Any pointers in the right direction ? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Beginner question on bar plot
I've read a number of examples on doing a multiple bar plot, but cant seem to grasp how they work or how to get my data into the proper form. I have two variable holding the same factor The variables were created using a cut command, The following simulates that A - 1:100 B - 1:100 A[30:60] - 43 Acut - cut(A,breaks=c(0,10,45,120),labels=c(low,med,high)) Bcut - cut(B,breaks=c(0,10,45,120),labels=c(low,med,high)) What I want to do is create a barplot with 3 groups of side by side bars group 1, = low and the two bars would be the count for Acut, and the count for Bcut group 2 = med and the two bars again would be the counts for this factor level in Acut and Bcut group 3 = high and like the above two. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading in a tab delimitated file
if your data for the rest of the file looks like this then read.fwf will work. depending which vars you want to pull) widths= c(18,32,41) E-CBIL-28-raw-cel-1435145228.cel1 would pull 3 vars, E-CBIL-28-raw-cel-; 1435145228.cel;1 widths -c(32,41) E-CBIL-28-raw-cel-1435145228.cel;1 you can set it differently, assign colnames and column classes as well But the feilds must be fixed width. On Tue, Oct 26, 2010 at 5:35 AM, amindlessbrain jillianrowe91...@gmail.comwrote: Hi all, I have a total newbie question, but I could really use some help. I need to read in this file: SampleIDDisease E-CBIL-28-raw-cel-1435145228.cel1 E-CBIL-28-raw-cel-1435145451.cel2 E-CBIL-28-raw-cel-1435145479.cel2 E-CBIL-28-raw-cel-1435145132.cel3 E-CBIL-28-raw-cel-1435145417.cel3 E-CBIL-28-raw-cel-1435145301.cel2 E-CBIL-28-raw-cel-1435145558.cel1 E-CBIL-28-raw-cel-1435145073.cel3 E-CBIL-28-raw-cel-1435145196.cel2 E-CBIL-28-raw-cel-1435145511.cel1 E-CBIL-28-raw-cel-1435145336.cel3 E-CBIL-28-raw-cel-1435145260.cel2 E-CBIL-28-raw-cel-1435145167.cel2 E-CBIL-28-raw-cel-1435145387.cel3 E-CBIL-28-raw-cel-1435145099.cel3 (I'm not sure why the disease column isn't showing up as a tab here, but it is sep by \t in my file. I've tried several variations on these: pd - read.AnnotatedDataFrame (new_treat.txt , header = TRUE , sep=\t, row.names = SampleID, colClasses = c(Disease = character)) And I keep on getting this error: Error in read.table(filename, sep = sep, header = header, quote = quote, : more columns than column names Any help would be very very very appreciated! Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Reading-in-a-tab-delimitated-file-tp3013620p3013620.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best IDE for R
Thanks for the pointer, After looking at the many folders of R code I have I decided it was time to start working in an IDE and also getting my stuff under version control ( for my own sanity) I'll have a look at Geany.. for version control.. not so sure. On Wed, Oct 27, 2010 at 12:14 PM, Liviu Andronic landronim...@gmail.comwrote: On Wed, Oct 27, 2010 at 9:05 PM, Jonathan P Daily jda...@usgs.gov wrote: I can second using Geany as an IDE. Great, finally a soulmate! :) More seriously, I think Geany is under-appreciated and virtually unkonwn in the R community. Another large plus for it is that it is cross platform (I work in both Windows and Linux), cross environment (I also code in Python/Sage), very customizable, and even has a version on PortableApps for windows so you can take a customized version around on a USB stick with ease. One important feature that teh Windows version of Geany lacks is the integrated virtual terminal emulator. This is mainly because the VTE port to Windows was never finalised (although the patch is well in their bugtracker). One possibility is to use Geany in a VMware virtual Linux machine on Windows. Regards Liviu -- Jonathan P. Daily Technician - USGS Leetown Science Center 11649 Leetown Road Kearneysville WV, 25430 (304) 724-4480 Is the room still a room when its empty? Does the room, the thing itself have purpose? Or do we, what's the word... imbue it. - Jubal Early, Firefly From: Liviu Andronic landronim...@gmail.com To: Lee Hachadoorian lee.hachadooria...@gmail.comlee.hachadoorian%...@gmail.com Cc: r-h...@stat.math.ethz.ch Date: 10/27/2010 02:45 PM Subject: Re: [R] Best IDE for R Sent by: r-help-boun...@r-project.org On Wed, Oct 27, 2010 at 4:05 PM, Lee Hachadoorian lee.hachadooria...@gmail.com lee.hachadoorian%...@gmail.com wrote: For an R-enabled text editor, I would suggest Tinn-R for Windows or RGedit (a gedit plugin) for Linux/Gnome-desktop. Since both are just text editors, they will work with whatever version R you have installed (criteria 1). RGedit is pretty spare: basically just console integration and keyboard shortcuts to send code (current line, selection, defined blocks) to the console. Criteria 1 Y 2 basic 3 N 4 N For Linux and Mac, I usually suggest Geany [1] as an alternative to Gedit. Geany is an intuitive IDE that can send commands to rterm in the integrated virtual terminal emulator. It provides various features for project management, source highlighting, code folding, etc. Regards Liviu [1] http://www.r-bloggers.com/integrating-r-with-geany/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mailhttp://garbl.home.comcast.net/%7Egarbl/stylemanual/e.htm#e-mail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] doubt in climate variability analysis in R! - code
Ok I downloaded it and showed you how to get your data out. How to read it into a raster brick, how to plot the data, how to get the mean rainfall of every day.lots more you can do. there is a bad bit of data in the last time step. check my blog. In the future what you should do is write code to emulate your problem. for example, in your problem you had created a ncdf file with a 3D matrix of 65,69,2192. You should just do a subset of that, show the code to create a ncdf with random numbers in it. creating working code that emulates your problem is key if you want help. Off list for the rest. On Sun, Oct 31, 2010 at 10:21 AM, govin...@msu.edu wrote: I am sorry, i think the link was broken..! here is the correct one!!! http://www.4shared.com/file/4zV0g3JR/RF_80-85.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Strings from different locale
I'm doing some test processing of a cvs file that appears to use a different locale from my machine. I get the following warning: input string 1 is invalid in this locale My locale is US. Is this simply a matter of changing my locale to 'all; locales? I don't know what locale the string is in, is there a way to detect this or translate [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] spliting first 10 words in a string
Thanks david. Matevz, maybe I can help explain by doing a very simple and brute force approach as opposed to the way david did it. But you should learn his methods. I will just do a subset of your problem and if you understand how it works then you should be able to get something done and then make it more elegant. First, I simplify the problem by separating out the sentence column. You can do this with your data frame by simply doing this MySentence -data.frame(sentence=yourbigDF$Opis,stringsAsFactors=FALSE) so I take your original data.frame (yourbigDF) and I just create a copy of that one column $Opis Later we can merge the two back together after I add 10 columns for the words Lets make some dummy data with just 10 rows sentence- this is a sentence with ten words or maybe more than ten words sentV-rep(sentence,10) # now I just made 10 rows of the same sentence # NEXT because I am going to create 10 new colums of 10 rows I create # 10 vectors each is named and each has 10 elements For the rows. # they have NO DATA in them first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth-vector(length=10) #Next I create a dataframe with Sentence in the first column and 10 blank colums. # NOTE I use stringsAsFactors=False DF -data.frame(Sentence=sentence,first,second,third,fourth,fifth,sixth,seventh,eighth,ninth,tenth,stringsAsFactors=FALSE) # This is what it would look like ( the first row) DF[1,] Sentence first second third fourth fifth sixth seventh eighth ninth tenth 1 this is a sentence with ten words or maybe more than ten words FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE Next, I will show you how to assign the first ten words to the 10 blank columns DF[1,2:11]-strsplit(DF[1,1], )[[1]][1:10] #DF[1,2:11] selects the columns 2-11 of the first row #strsplit returns the first 10 words [1:10] and place them in the columsn2-11 If you want to do this the slow way you can just loop through your dataframe row by row or you can probably use apply. Make more sense? DF[1,2:11]-strsplit(DF[1,1], )[[1]][1:10] DF[1,] Sentence first second third fourth fifth sixth seventh eighth ninth tenth 1 this is a sentence with ten words or maybe more than ten words this is a sentence with ten words or maybe more DF[1,first] [1] this On Tue, Nov 2, 2010 at 12:22 PM, David Winsemius dwinsem...@comcast.netwrote: On Nov 2, 2010, at 3:01 PM, Matevž PavliÄ wrote: Hi all, Thanks for all the help. I managed to do it with what Gaj suggested (Excel :(). The last solution from David is also freat i just don't undestand why R put the words in 14 columns and thre rows? Because the maximum number of words was 14 and the fill argument was TRUE. There were three rows because there were three items in the supplied character vector. I would like it to put just the first 10 words in source field to 10 diefferent destiantion fields, but the same row. And so on...is that possible? I don't know what a destination field might be. Those are not R data types. This would trim the extra columns (in this example set to those greater than 8) by adding a lot of NULL's to the end of a colClasses specification at the expense of a warning message which can be ignored: read.table(textConnection(words), fill=T, colClasses = c(rep(character, 8), rep(NULL, 30) ) , stringsAsFactors=FALSE ) V1V2V3 V4V5V6V7 V8 1 I have a columnn with text that has 2 I would like to split these words in 3 but just first ten wordsin the string. Warning message: In read.table(textConnection(words), fill = T, colClasses = c(rep(character, : cols = 14 != length(data) = 38 If you want to assign the first column to a variable then just: first8 - read.table(textConnection(words), fill=T, colClasses = c(rep(character, 8), rep(NULL, 30) ) , stringsAsFactors=FALSE) var1 - first8[[1]] var1 [1] I I but -- David. Thank you, m -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius Sent: Tuesday, November 02, 2010 3:47 PM To: Gaj Vidmar Cc: r-h...@stat.math.ethz.ch Subject: Re: [R] spliting first 10 words in a string On Nov 2, 2010, at 6:24 AM, Gaj Vidmar wrote: Though forbidden in this list, in Excel it's just (literally!) five clicks away! (with the column in question selected) Data - Text to Columns - Delimited - tick Space - Finish Pa je! (~Voila in Slovenian) (then import back to R, keeping only the first 10 columns if so desired) You could do the same thing without needing to leave R. Just read.table( textConnection(..), header=FALSE, fill=TRUE) read.table(textConnection(words), fill=T) V1V2V3 V4V5V6V7 V8 V9 V10 V11 V12 V13 V14 1 I have a columnn with text that
Re: [R] splitting First 10 words in a string
That's easy you are confusing the dummy code I sent. Do this: lit-read.csv(litologija.csv, sep=;, dec=.) sent -data.frame(sentence=lit$Opis,stringsAsFactors=FALSE) irst=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth-vector(length=nrow( sent) I put the length of the vector to 10 just to do a dummy problem. Then do this: for(j in 1:nrow(sent) { sent[j,2:11]-strsplit(sent[j,1], )[[1]][1:10] } That will get you a result the crude brute force way. try that. Then you can learn sapply way. but first you need to learn R data structures. On Tue, Nov 2, 2010 at 1:47 PM, Matevž PavliÄ matevz.pav...@gi-zrmk.siwrote: Hi Steven, Thank you for the help. I get an error though when i do this : lit-read.csv(litologija.csv, sep=;, dec=.) sent -data.frame(sentence=lit$Opis,stringsAsFactors=FALSE) str(sent) sentV-rep(sent,10) str(sentV) first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth-vector(length=10) DF -data.frame(Sentence=sent,first,second,third,fourth,fifth,sixth,seventh,eighth,ninth,tenth,stringsAsFactors=FALSE) »Error in data.frame(Sentence = sent, first, second, third, fourth, fifth, : arguments imply differing number of rows: 22928, 10« What am I doing wrong? Thnks, m *From:* steven mosher [mailto:mosherste...@gmail.com] *Sent:* Tuesday, November 02, 2010 8:45 PM *To:* David Winsemius *Cc:* Matevž PavliÄ; Gaj Vidmar; r-h...@stat.math.ethz.ch *Subject:* Re: [R] spliting first 10 words in a string Thanks david. Matevz, maybe I can help explain by doing a very simple and brute force approach as opposed to the way david did it. But you should learn his methods. I will just do a subset of your problem and if you understand how it works then you should be able to get something done and then make it more elegant. First, I simplify the problem by separating out the sentence column. You can do this with your data frame by simply doing this MySentence -data.frame(sentence=yourbigDF$Opis,stringsAsFactors=FALSE) so I take your original data.frame (yourbigDF) and I just create a copy of that one column $Opis Later we can merge the two back together after I add 10 columns for the words Lets make some dummy data with just 10 rows sentence- this is a sentence with ten words or maybe more than ten words sentV-rep(sentence,10) # now I just made 10 rows of the same sentence # NEXT because I am going to create 10 new colums of 10 rows I create # 10 vectors each is named and each has 10 elements For the rows. # they have NO DATA in them first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth-vector(length=10) #Next I create a dataframe with Sentence in the first column and 10 blank colums. # NOTE I use stringsAsFactors=False DF -data.frame(Sentence=sentence,first,second,third,fourth,fifth,sixth,seventh,eighth,ninth,tenth,stringsAsFactors=FALSE) # This is what it would look like ( the first row) DF[1,] Sentence first second third fourth fifth sixth seventh eighth ninth tenth 1 this is a sentence with ten words or maybe more than ten words FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE Next, I will show you how to assign the first ten words to the 10 blank columns DF[1,2:11]-strsplit(DF[1,1], )[[1]][1:10] #DF[1,2:11] selects the columns 2-11 of the first row #strsplit returns the first 10 words [1:10] and place them in the columsn2-11 If you want to do this the slow way you can just loop through your dataframe row by row or you can probably use apply. Make more sense? DF[1,2:11]-strsplit(DF[1,1], )[[1]][1:10] DF[1,] Sentence first second third fourth fifth sixth seventh eighth ninth tenth 1 this is a sentence with ten words or maybe more than ten words this is a sentence with ten words or maybe more DF[1,first] [1] this On Tue, Nov 2, 2010 at 12:22 PM, David Winsemius dwinsem...@comcast.net wrote: On Nov 2, 2010, at 3:01 PM, Matevž PavliÄ wrote: Hi all, Thanks for all the help. I managed to do it with what Gaj suggested (Excel :(). The last solution from David is also freat i just don't undestand why R put the words in 14 columns and thre rows? Because the maximum number of words was 14 and the fill argument was TRUE. There were three rows because there were three items in the supplied character vector. I would like it to put just the first 10 words in source field to 10 diefferent destiantion fields, but the same row. And so on...is that possible? I don't know what a destination field might be. Those are not R data types. This would trim the extra columns (in this example set to those greater than 8) by adding a lot of NULL's to the end of a colClasses specification at the expense of a warning message which can be ignored: read.table
Re: [R] splitting First 10 words in a string
Line should be: first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth-vector(length=nrow( sent)) sorry cut and past error On Tue, Nov 2, 2010 at 3:32 PM, steven mosher mosherste...@gmail.comwrote: That's easy you are confusing the dummy code I sent. Do this: lit-read.csv(litologija.csv, sep=;, dec=.) sent -data.frame(sentence=lit$Opis,stringsAsFactors=FALSE) first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth-vector(length=nrow( sent) I put the length of the vector to 10 just to do a dummy problem. Then do this: for(j in 1:nrow(sent) { sent[j,2:11]-strsplit(sent[j,1], )[[1]][1:10] } That will get you a result the crude brute force way. try that. Then you can learn sapply way. but first you need to learn R data structures. On Tue, Nov 2, 2010 at 1:47 PM, Matevž PavliÄ matevz.pav...@gi-zrmk.siwrote: Hi Steven, Thank you for the help. I get an error though when i do this : lit-read.csv(litologija.csv, sep=;, dec=.) sent -data.frame(sentence=lit$Opis,stringsAsFactors=FALSE) str(sent) sentV-rep(sent,10) str(sentV) first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth-vector(length=10) DF -data.frame(Sentence=sent,first,second,third,fourth,fifth,sixth,seventh,eighth,ninth,tenth,stringsAsFactors=FALSE) »Error in data.frame(Sentence = sent, first, second, third, fourth, fifth, : arguments imply differing number of rows: 22928, 10« What am I doing wrong? Thnks, m *From:* steven mosher [mailto:mosherste...@gmail.com] *Sent:* Tuesday, November 02, 2010 8:45 PM *To:* David Winsemius *Cc:* Matevž PavliÄ; Gaj Vidmar; r-h...@stat.math.ethz.ch *Subject:* Re: [R] spliting first 10 words in a string Thanks david. Matevz, maybe I can help explain by doing a very simple and brute force approach as opposed to the way david did it. But you should learn his methods. I will just do a subset of your problem and if you understand how it works then you should be able to get something done and then make it more elegant. First, I simplify the problem by separating out the sentence column. You can do this with your data frame by simply doing this MySentence -data.frame(sentence=yourbigDF$Opis,stringsAsFactors=FALSE) so I take your original data.frame (yourbigDF) and I just create a copy of that one column $Opis Later we can merge the two back together after I add 10 columns for the words Lets make some dummy data with just 10 rows sentence- this is a sentence with ten words or maybe more than ten words sentV-rep(sentence,10) # now I just made 10 rows of the same sentence # NEXT because I am going to create 10 new colums of 10 rows I create # 10 vectors each is named and each has 10 elements For the rows. # they have NO DATA in them first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth-vector(length=10) #Next I create a dataframe with Sentence in the first column and 10 blank colums. # NOTE I use stringsAsFactors=False DF -data.frame(Sentence=sentence,first,second,third,fourth,fifth,sixth,seventh,eighth,ninth,tenth,stringsAsFactors=FALSE) # This is what it would look like ( the first row) DF[1,] Sentence first second third fourth fifth sixth seventh eighth ninth tenth 1 this is a sentence with ten words or maybe more than ten words FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE Next, I will show you how to assign the first ten words to the 10 blank columns DF[1,2:11]-strsplit(DF[1,1], )[[1]][1:10] #DF[1,2:11] selects the columns 2-11 of the first row #strsplit returns the first 10 words [1:10] and place them in the columsn2-11 If you want to do this the slow way you can just loop through your dataframe row by row or you can probably use apply. Make more sense? DF[1,2:11]-strsplit(DF[1,1], )[[1]][1:10] DF[1,] Sentence first second third fourth fifth sixth seventh eighth ninth tenth 1 this is a sentence with ten words or maybe more than ten words this is a sentence with ten words or maybe more DF[1,first] [1] this On Tue, Nov 2, 2010 at 12:22 PM, David Winsemius dwinsem...@comcast.net wrote: On Nov 2, 2010, at 3:01 PM, Matevž PavliÄ wrote: Hi all, Thanks for all the help. I managed to do it with what Gaj suggested (Excel :(). The last solution from David is also freat i just don't undestand why R put the words in 14 columns and thre rows? Because the maximum number of words was 14 and the fill argument was TRUE. There were three rows because there were three items in the supplied character vector. I would like it to put just the first 10 words in source field to 10 diefferent destiantion fields, but the same row. And so on...is that possible? I don't know what a destination field might be. Those are not R data
Re: [R] spliting first 10 words in a string
just merge the data.frames back together. use merge or cbind() cbind will be easier DF1 - data.frame(x,y,z) DF2 -data.frame(DF1$x) # copy a column then you added columns to DF2 just put them back together DF3 -cbind(DF2,DF1$y,DF$z) if you spend more time with R you will be able to do things like this elegantly, but for now This way will work and you will learn a bit about R. As for counting instances of a string, I might suggest looking at the table command k - c( all, but,all) table(k) k all but 2 1 So you can do a table for each column in your dataframe On Tue, Nov 2, 2010 at 12:53 PM, Matevž PavliÄ matevz.pav...@gi-zrmk.siwrote: Hi, Ok, i got this now. At least i think so. I got a data.frame with 15 fields, all other words have bee truncated. Which is what i want. But ia have that in a seperate data.frame from that one it was before (would be nice if it would be in the same ...) 'data.frame': 22801 obs. of 15 variables: $ V1 : chr HUMUS SLABO MALO SLABO ... $ V2 : chr IN GRANULIRAN PREPEREL VEZAN ... $ V3 : chr HUMUSNA PEÅ ÄEN MELJAST ,KONGLOMERAT, ... $ V4 : chr GLINA PROD PROD P0ROZEN, ... $ V5 : chr Z DO DO S ... $ V6 : chr MALO r r PLASTMI ... $ V7 : chr PODA, = = GFs, ... $ V8 : chr LAHKO 8Q 60mm, SIVORJAV ... $ V9 : chr GNETNA, mm, S ... $ V10: chr RJAVA S PRODNIKI, ... $ V11: chr PRODNIKI MALO ... $ V12: chr DO PEÅ ÄEN ... $ V13: chr R S ... $ V14: chr = TANKIMI ... Now, i have another problem. Is it possible to count which word occours most often each field (V1, V2, V3, ...) and which one is the second and so on. Ideally to create a table for each field (V1, V2, V3, ...) with the word and thenumber of occuraces in that field (column) . I suppose it could be done in SQL, but what since i saw what R can do i guess this can be done here to? Thanks, m -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Tuesday, November 02, 2010 8:23 PM To: Matevž PavliÄ Cc: Gaj Vidmar; r-h...@stat.math.ethz.ch Subject: Re: [R] spliting first 10 words in a string On Nov 2, 2010, at 3:01 PM, Matevž PavliÄ wrote: Hi all, Thanks for all the help. I managed to do it with what Gaj suggested (Excel :(). The last solution from David is also freat i just don't undestand why R put the words in 14 columns and thre rows? Because the maximum number of words was 14 and the fill argument was TRUE. There were three rows because there were three items in the supplied character vector. I would like it to put just the first 10 words in source field to 10 diefferent destiantion fields, but the same row. And so on...is that possible? I don't know what a destination field might be. Those are not R data types. This would trim the extra columns (in this example set to those greater than 8) by adding a lot of NULL's to the end of a colClasses specification at the expense of a warning message which can be ignored: read.table(textConnection(words), fill=T, colClasses = c(rep(character, 8), rep(NULL, 30) ) , stringsAsFactors=FALSE ) V1V2V3 V4V5V6V7 V8 1 I have a columnn with text that has 2 I would like to split these words in 3 but just first ten wordsin the string. Warning message: In read.table(textConnection(words), fill = T, colClasses = c(rep(character, : cols = 14 != length(data) = 38 If you want to assign the first column to a variable then just: first8 - read.table(textConnection(words), fill=T, colClasses = c(rep(character, 8), rep(NULL, 30) ) , stringsAsFactors=FALSE) var1 - first8[[1]] var1 [1] I I but -- David. Thank you, m -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of David Winsemius Sent: Tuesday, November 02, 2010 3:47 PM To: Gaj Vidmar Cc: r-h...@stat.math.ethz.ch Subject: Re: [R] spliting first 10 words in a string On Nov 2, 2010, at 6:24 AM, Gaj Vidmar wrote: Though forbidden in this list, in Excel it's just (literally!) five clicks away! (with the column in question selected) Data - Text to Columns - Delimited - tick Space - Finish Pa je! (~Voila in Slovenian) (then import back to R, keeping only the first 10 columns if so desired) You could do the same thing without needing to leave R. Just read.table( textConnection(..), header=FALSE, fill=TRUE) read.table(textConnection(words), fill=T) V1V2V3 V4V5V6V7 V8 V9 V10 V11 V12 V13 V14 1 I have a columnn with text that hasquite a few words in it. 2 I would like to split these words in separate columns 3 but just first ten wordsin the string. Isthat possiblein R? Regards, Assist. Prof. Gaj Vidmar, PhD University Rehabilitattion Institute, Republic
[R] Reverting to previous version
R 2.12 is not functioning for me On the MAC what the most painless way of reverting [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to know if a file exists on a remote server?
I would use RCurl. if you have, for example, the url of an ftp site you can merely do a getURL() and the contents will be returned. That call will return data that can be coerced into a data.frame that will look like a directory structure listing the file names. If you need code just ask, but the RCurl docs are pretty good. On Tue, Nov 30, 2010 at 8:10 AM, Baoqiang Cao bqcaom...@gmail.com wrote: Hi, I'd like to download some data files from a remote server, the problem here is that some of the files actually don't exist, which I don't know before try. Just wondering if a function in R could tell me if a file exists on a remote server? I searched this mailing list and after read severals mails, still clueless. Any help will be highly appreciated. B.C. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to know if a file exists on a remote server?
using RCurl getFtpList - function(ftp){ # the structure returned is dependent on the ftp site as there are # various formats for directory listings dependent upon the server # and the OS. you will need to play with this. # have a look at the ftp with your browser first and adjust accordingly. # some formats only return 4 columns. # column 1= literal string first position mean file # column 2= number 1 # column 3 =owner # column 4 = group # column 5 =file size # colmn 6 =Month # column 7 =Day # column 8 =Time (year) # column 9 =FileName # txt - getURL(ftp) dir - read.table( textConnection(txt),as.is=TRUE) out - data.frame(Dir=ftp,Filename=dir[, ncol(dir)],Size=dir[ ,5], Month=dir[ ,6],Day=dir[ ,7],Time=dir[ ,8],stringsAsFactors=FALSE) closeAllConnections() return(out) } On Tue, Nov 30, 2010 at 8:10 AM, Baoqiang Cao bqcaom...@gmail.com wrote: Hi, I'd like to download some data files from a remote server, the problem here is that some of the files actually don't exist, which I don't know before try. Just wondering if a function in R could tell me if a file exists on a remote server? I searched this mailing list and after read severals mails, still clueless. Any help will be highly appreciated. B.C. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to know if a file exists on a remote server?
No problem, you can also get the directory with a curlOption of dirlistonly see the example code in the package. This will depend on the version of libcurl that you have. If you have an older version, my code will get you the directory. From the Rcurl examples: the files within a directory. url = ' ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/' filenames = getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE) # Deal with newlines as \n or \r\n. (BDR) # Or alternatively, instruct libcurl to change \n's to \r\n's for us with crlf = TRUE # filenames = getURL(url, ftp.use.epsv = FALSE, ftplistonly = TRUE, crlf = TRUE) filenames = paste(url, strsplit(filenames, \r*\n)[[1]], sep = ) con = getCurlHandle( ftp.use.epsv = FALSE) contents = sapply(filenames[1:5], getURL, curl = con) names(contents) = filenames[1:length(contents)] On Tue, Nov 30, 2010 at 9:56 AM, Baoqiang Cao bqcaom...@gmail.com wrote: Thanks Steven! It is excellent code indeed! On Tue, Nov 30, 2010 at 11:26 AM, steven mosher mosherste...@gmail.com wrote: I would use RCurl. if you have, for example, the url of an ftp site you can merely do a getURL() and the contents will be returned. That call will return data that can be coerced into a data.frame that will look like a directory structure listing the file names. If you need code just ask, but the RCurl docs are pretty good. On Tue, Nov 30, 2010 at 8:10 AM, Baoqiang Cao bqcaom...@gmail.com wrote: Hi, I'd like to download some data files from a remote server, the problem here is that some of the files actually don't exist, which I don't know before try. Just wondering if a function in R could tell me if a file exists on a remote server? I searched this mailing list and after read severals mails, still clueless. Any help will be highly appreciated. B.C. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to know if a file exists on a remote server?
here: getFtpList - function(ftp){ # column 1= literal string first position mean file # column 2= number 1 # column 3 =owner # column 4 = group # column 5 =file size # colmn 6 =Month # column 7 =Day # column 8 =Time (year) # column 9 =FileName # txt - getURL(ftp) dir - read.table( textConnection(txt),as.is=TRUE) if(ncol(dir)==9)out - data.frame(Dir=ftp,Filename=dir[, ncol(dir)],Size=dir[ ,5], Month=dir[ ,6],Day=dir[ ,7],Time=dir[ ,8],stringsAsFactors=FALSE) if(ncol(dir)==4)out - data.frame(Dir=ftp,Filename=dir[, ncol(dir)],Size=dir[ ,3], Month=dir[ ,1],Time=dir[ ,2],stringsAsFactors=FALSE) closeAllConnections() return(out) } On Tue, Nov 30, 2010 at 9:56 AM, Baoqiang Cao bqcaom...@gmail.com wrote: Thanks Steven! It is excellent code indeed! On Tue, Nov 30, 2010 at 11:26 AM, steven mosher mosherste...@gmail.com wrote: I would use RCurl. if you have, for example, the url of an ftp site you can merely do a getURL() and the contents will be returned. That call will return data that can be coerced into a data.frame that will look like a directory structure listing the file names. If you need code just ask, but the RCurl docs are pretty good. On Tue, Nov 30, 2010 at 8:10 AM, Baoqiang Cao bqcaom...@gmail.com wrote: Hi, I'd like to download some data files from a remote server, the problem here is that some of the files actually don't exist, which I don't know before try. Just wondering if a function in R could tell me if a file exists on a remote server? I searched this mailing list and after read severals mails, still clueless. Any help will be highly appreciated. B.C. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to know if a file exists on a remote server?
study trycatch() also, be awre that even with RCurl, that you may find the file there and then fail or lose the connection. worse still you may get a currupt file on download. So there is a lot of checking to do to make bullet proof code that downloads files. On Tue, Nov 30, 2010 at 3:16 PM, Baoqiang Cao bqcaom...@gmail.com wrote: Hi Georg, Your code does work, I mean, it doesn't give me any error message, which is critical for me because I need use it in a loop and plus I don't know how to catch error message. Before your message, I was using download.file but the loop was stopped because of the error message when a file doesn't exist. So I guess, the option method=wget made the difference. To summarize (in case it is useful to others), there are (at least) two ways to download files: 1) Georg Ruß: v = download.file(url,destf,method=wget) if(v!=0) { #download.file failed } #no error message though 2) Henrique Dallazuanna and Steven Mosher both suggested using RCurl, here is an example code from Henrique for checking if a file exists on a server: library(RCurl) h = basicHeaderGatherer() Lines - getURI(http://www.pdb.org/pdb/files/2J0S.1001;, headerfunction = h$update) h$value()[['status']] If the status is 404, then not found. If exists then status should be 200. What a productive day! BC On Tue, Nov 30, 2010 at 1:34 PM, Georg Ruß resea...@georgruss.de wrote: On 30/11/10 10:10:07, Baoqiang Cao wrote: I'd like to download some data files from a remote server, the problem here is that some of the files actually don't exist, which I don't know before try. Just wondering if a function in R could tell me if a file exists on a remote server? Hi Baoqiang, try downloading the file with R's download.file() function. Then you should examine the returned value. Citing a part of ?download.file below: Value: An (invisible) integer code, 0 for success and non-zero for failure. For the wget and lynx methods this is the status code returned by the external program. The internal method can return 1, but will in most cases throw an error. So if you call your download via v - download.file(url, destfile, method=wget) and v is not equal to zero, then the file is likely to be non-existent (at least the download failed). Note: the method internal doesn't really change the value of v, I just tried that. With wget it returns 0 for success and 2048 (or some other value) for non-success. Regards, Georg. -- Research Assistant Otto-von-Guericke-Universität Magdeburg resea...@georgruss.de http://research.georgruss.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Summing over intervals
Given a matrix of MxN want to take the means of rows in the following fashion m-matrix(seq(1,80),ncol=20, nrow=4) result-matrix(NA,nrow=4,ncol=20/5) result[,1]-apply(m[,1:5],1,mean) result[,2]-apply(m[,6:10],1,mean) result[,3]-apply(m[,11:15],1,mean) result[,4]-apply(m[,16:20],1,mean) result [,1] [,2] [,3] [,4] [1,]9 29 49 69 [2,] 10 30 50 70 [3,] 11 31 51 71 [4,] 12 32 52 72 So, I want the mean of every successive 5 values in a row as the dimension in columns is wide I cant write it with multiple apply as above [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summing over intervals
Eik and Patrick, Thanks I will give those a try. On Thu, Jul 15, 2010 at 8:15 AM, Patrick J Rogers pjrog...@ucsd.edu wrote: Hi Steven, You can just cut the matrix up into a 5 column matrix and use apply as normal m2-matrix(as.vector(t(m)), ncol=5, byrow=TRUE) result-matrix(apply(m2, 1, mean), ncol=ncol(m)/ncol(m2), byrow=TRUE) result -- Patrick Rogers Dept. of Political Science University of California, San Diego On Jul 15, 2010, at 2:39 AM, steven mosher wrote: Given a matrix of MxN want to take the means of rows in the following fashion m-matrix(seq(1,80),ncol=20, nrow=4) result-matrix(NA,nrow=4,ncol=20/5) result[,1]-apply(m[,1:5],1,mean) result[,2]-apply(m[,6:10],1,mean) result[,3]-apply(m[,11:15],1,mean) result[,4]-apply(m[,16:20],1,mean) result [,1] [,2] [,3] [,4] [1,]9 29 49 69 [2,] 10 30 50 70 [3,] 11 31 51 71 [4,] 12 32 52 72 So, I want the mean of every successive 5 values in a row as the dimension in columns is wide I cant write it with multiple apply as above [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Summing by index
# build a sample data frame illustrating the problem ids-c(rep(1234,5),rep(5436,3),rep(7864,4)) years-c(seq(1990,1994,by=1),seq(1991,1993,by=1),seq(1990,1993,by=1)) data-seq(14,25,by=1) data[6]-NA DF-data.frame(Id=ids,Year=years,Data=data) DF Id Year Data 1 1234 1990 14 2 1234 1991 15 3 1234 1992 16 4 1234 1993 17 5 1234 1994 18 6 5436 1991 NA 7 5436 1992 20 8 5436 1993 21 9 7864 1990 22 10 7864 1991 23 11 7864 1992 24 12 7864 1993 25 # The result wanted is a sum of DF$Data, by DF$Id. collect the sum of $Data for each $Id # the result would take the form # Id, sum for each Id # Try using BY result-by(DF$Data,INDICES=Data$Id,FUN=sum,na.rm=T) Error in names(IND) - deparse(substitute(INDICES))[1L] : 'names' attribute [1] must be the same length as the vector [0] idx-as.list(Data$Id) idx2-list(1234,1234,1234,1234,1234,5436,5436,5436,7864,7864,7864,7864) result-by(DF$Data,INDICES=idx,FUN=sum,na.rm=T) result [1] 215 result-by(DF$Data,INDICES=idx2,FUN=sum,na.rm=T) Error in tapply(1L:12L, list(1234, 1234, 1234, 1234, 1234, 5436, 5436, : arguments must have same length idx list() idx[1] [[1]] NULL idx2 [[1]] [1] 1234 [[2]] [1] 1234 [[3]] [1] 1234 [[4]] [1] 1234 [[5]] [1] 1234 [[6]] [1] 5436 [[7]] [1] 5436 [[8]] [1] 5436 [[9]] [1] 7864 [[10]] [1] 7864 [[11]] [1] 7864 [[12]] [1] 7864 aggregate(DF$Data, by=idx2,sum,na.rm=T) Error in aggregate.data.frame(as.data.frame(x), ...) : arguments must have same length The instruction that the INDICES must have the same length is confusing me. the number of indices will always be less than the number of rows because the indices are repeated, we want to sum over multiple instances of the indices to collect the Sum by index. I'm confused. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summing by index
ha. that was a stupid mistake. Thanks. On Fri, Jul 30, 2010 at 11:46 AM, David Winsemius dwinsem...@comcast.netwrote: On Jul 30, 2010, at 2:41 PM, steven mosher wrote: # build a sample data frame illustrating the problem ids-c(rep(1234,5),rep(5436,3),rep(7864,4)) years-c(seq(1990,1994,by=1),seq(1991,1993,by=1),seq(1990,1993,by=1)) data-seq(14,25,by=1) data[6]-NA DF-data.frame(Id=ids,Year=years,Data=data) DF Id Year Data 1 1234 1990 14 2 1234 1991 15 3 1234 1992 16 4 1234 1993 17 5 1234 1994 18 6 5436 1991 NA 7 5436 1992 20 8 5436 1993 21 9 7864 1990 22 10 7864 1991 23 11 7864 1992 24 12 7864 1993 25 # The result wanted is a sum of DF$Data, by DF$Id. collect the sum of $Data for each $Id # the result would take the form # Id, sum for each Id # Try using BY result-by(DF$Data,INDICES=Data$Id,FUN=sum,na.rm=T) Try instead: result-by(DF$Data,INDICES=DF$Id,FUN=sum,na.rm=T) -- David. Error in names(IND) - deparse(substitute(INDICES))[1L] : 'names' attribute [1] must be the same length as the vector [0] idx-as.list(Data$Id) idx2-list(1234,1234,1234,1234,1234,5436,5436,5436,7864,7864,7864,7864) result-by(DF$Data,INDICES=idx,FUN=sum,na.rm=T) result [1] 215 result-by(DF$Data,INDICES=idx2,FUN=sum,na.rm=T) Error in tapply(1L:12L, list(1234, 1234, 1234, 1234, 1234, 5436, 5436, : arguments must have same length idx list() idx[1] [[1]] NULL idx2 [[1]] [1] 1234 [[2]] [1] 1234 [[3]] [1] 1234 [[4]] [1] 1234 [[5]] [1] 1234 [[6]] [1] 5436 [[7]] [1] 5436 [[8]] [1] 5436 [[9]] [1] 7864 [[10]] [1] 7864 [[11]] [1] 7864 [[12]] [1] 7864 aggregate(DF$Data, by=idx2,sum,na.rm=T) Error in aggregate.data.frame(as.data.frame(x), ...) : arguments must have same length The instruction that the INDICES must have the same length is confusing me. the number of indices will always be less than the number of rows because the indices are repeated, we want to sum over multiple instances of the indices to collect the Sum by index. I'm confused. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summing by index
Thanks again david. To finish out the example. DF Id Year Data 1 1234 1990 14 2 1234 1991 15 3 1234 1992 16 4 1234 1993 17 5 1234 1994 18 6 5436 1991 NA 7 5436 1992 20 8 5436 1993 21 9 7864 1990 22 10 7864 1991 23 11 7864 1992 24 12 7864 1993 25 result-by(DF$Data,INDICES=DF$Id,FUN=sum,na.rm=T) id-as.numeric(unlist(names(result))) sums-unlist(result[]) DF2-data.frame(Id=id,Sums=sums) DF2 Id Sums 1 1234 80 2 5436 41 3 7864 94 Thanks again. On Fri, Jul 30, 2010 at 11:46 AM, David Winsemius dwinsem...@comcast.netwrote: On Jul 30, 2010, at 2:41 PM, steven mosher wrote: # build a sample data frame illustrating the problem ids-c(rep(1234,5),rep(5436,3),rep(7864,4)) years-c(seq(1990,1994,by=1),seq(1991,1993,by=1),seq(1990,1993,by=1)) data-seq(14,25,by=1) data[6]-NA DF-data.frame(Id=ids,Year=years,Data=data) DF Id Year Data 1 1234 1990 14 2 1234 1991 15 3 1234 1992 16 4 1234 1993 17 5 1234 1994 18 6 5436 1991 NA 7 5436 1992 20 8 5436 1993 21 9 7864 1990 22 10 7864 1991 23 11 7864 1992 24 12 7864 1993 25 # The result wanted is a sum of DF$Data, by DF$Id. collect the sum of $Data for each $Id # the result would take the form # Id, sum for each Id # Try using BY result-by(DF$Data,INDICES=Data$Id,FUN=sum,na.rm=T) Try instead: result-by(DF$Data,INDICES=DF$Id,FUN=sum,na.rm=T) -- David. Error in names(IND) - deparse(substitute(INDICES))[1L] : 'names' attribute [1] must be the same length as the vector [0] idx-as.list(Data$Id) idx2-list(1234,1234,1234,1234,1234,5436,5436,5436,7864,7864,7864,7864) result-by(DF$Data,INDICES=idx,FUN=sum,na.rm=T) result [1] 215 result-by(DF$Data,INDICES=idx2,FUN=sum,na.rm=T) Error in tapply(1L:12L, list(1234, 1234, 1234, 1234, 1234, 5436, 5436, : arguments must have same length idx list() idx[1] [[1]] NULL idx2 [[1]] [1] 1234 [[2]] [1] 1234 [[3]] [1] 1234 [[4]] [1] 1234 [[5]] [1] 1234 [[6]] [1] 5436 [[7]] [1] 5436 [[8]] [1] 5436 [[9]] [1] 7864 [[10]] [1] 7864 [[11]] [1] 7864 [[12]] [1] 7864 aggregate(DF$Data, by=idx2,sum,na.rm=T) Error in aggregate.data.frame(as.data.frame(x), ...) : arguments must have same length The instruction that the INDICES must have the same length is confusing me. the number of indices will always be less than the number of rows because the indices are repeated, we want to sum over multiple instances of the indices to collect the Sum by index. I'm confused. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summing by index
very slick Thx. On Fri, Jul 30, 2010 at 12:44 PM, Wu Gong w...@mtmail.mtsu.edu wrote: Hi, R has a buildin function ?rowsum rowsum(DF$Data,DF$Id,na.rm=T) - A R learner. -- View this message in context: http://r.789695.n4.nabble.com/Summing-by-index-tp2308332p2308411.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data frame reordering to time series
Given a data frame, or it could be a matrix if I choose to. The data consists of an ID, a year, and data for all 12 months. Missing values are a factor AND missing years. Id-c(rep(67543,4),rep(12345,3),rep(89765,5)) Years-c(seq(1989,1992,by =1),1991,1993,1994,seq(1991,1995,by=1)) Values2-c(12,NA,34,21,NA,65,23,NA,13,NA,13,14) Values-c(12,14,34,21,54,65,23,12,13,13,13,14) Data-data.frame(Index=Id,Year=Years,Jan=Values,Feb=Values/2,Mar=Values2,Apr=Values2,Jun=Values,July=Values/3,Aug=Values2,Sep=Values, + Oct=Values,Nov=Values,Dec=Values2) Data Index Year Jan Feb Mar Apr Jun July Aug Sep Oct Nov Dec 1 67543 1989 12 6.0 12 12 12 4.00 12 12 12 12 12 2 67543 1990 14 7.0 NA NA 14 4.67 NA 14 14 14 NA 3 67543 1991 34 17.0 34 34 34 11.33 34 34 34 34 34 4 67543 1992 21 10.5 21 21 21 7.00 21 21 21 21 21 5 12345 1991 54 27.0 NA NA 54 18.00 NA 54 54 54 NA 6 12345 1993 65 32.5 65 65 65 21.67 65 65 65 65 65 7 12345 1994 23 11.5 23 23 23 7.67 23 23 23 23 23 8 89765 1991 12 6.0 NA NA 12 4.00 NA 12 12 12 NA 9 89765 1992 13 6.5 13 13 13 4.33 13 13 13 13 13 10 89765 1993 13 6.5 NA NA 13 4.33 NA 13 13 13 NA 11 89765 1994 13 6.5 13 13 13 4.33 13 13 13 13 13 12 89765 1995 14 7.0 14 14 14 4.67 14 14 14 14 14 The Goal is to return a Time series object for each ID. Alternatively one could return a matrix that I can turn into a Time series. The final structure would be something like this ( done in matrix form for illustration) 1989.0 1989.083 1991 ..19921993. 1994 1995 67543 12 6.0 12 12 12 4.00 12 12 12 12 12... .34...21.. NA.NANA 12345 NA, NA, NA,.54 27 Basically the time series will have patches at the front, middle and end where you may have years of NA The must be column ordered by time and aligned so that averages for all series can be computed per month. Now I have looping code to do this, where I loop through all the IDs and map the row of data into the correct column. and create column names based on the data and row names based on the ID, but it's painfully slow. Any wizardry would help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame reordering to time series
Thanks Gabor, I probably should have done an example with fewer columns. i will rework the example and post it up so the next guys who has this issue can have a clear example with a solution. On Sat, Aug 7, 2010 at 5:04 PM, Gabor Grothendieck ggrothendi...@gmail.comwrote: On Sat, Aug 7, 2010 at 4:49 PM, steven mosher mosherste...@gmail.com wrote: Given a data frame, or it could be a matrix if I choose to. The data consists of an ID, a year, and data for all 12 months. Missing values are a factor AND missing years. Id-c(rep(67543,4),rep(12345,3),rep(89765,5)) Years-c(seq(1989,1992,by =1),1991,1993,1994,seq(1991,1995,by=1)) Values2-c(12,NA,34,21,NA,65,23,NA,13,NA,13,14) Values-c(12,14,34,21,54,65,23,12,13,13,13,14) Data-data.frame(Index=Id,Year=Years,Jan=Values,Feb=Values/2,Mar=Values2,Apr=Values2,Jun=Values,July=Values/3,Aug=Values2,Sep=Values, + Oct=Values,Nov=Values,Dec=Values2) Data Index Year Jan Feb Mar Apr Jun July Aug Sep Oct Nov Dec 1 67543 1989 12 6.0 12 12 12 4.00 12 12 12 12 12 2 67543 1990 14 7.0 NA NA 14 4.67 NA 14 14 14 NA 3 67543 1991 34 17.0 34 34 34 11.33 34 34 34 34 34 4 67543 1992 21 10.5 21 21 21 7.00 21 21 21 21 21 5 12345 1991 54 27.0 NA NA 54 18.00 NA 54 54 54 NA 6 12345 1993 65 32.5 65 65 65 21.67 65 65 65 65 65 7 12345 1994 23 11.5 23 23 23 7.67 23 23 23 23 23 8 89765 1991 12 6.0 NA NA 12 4.00 NA 12 12 12 NA 9 89765 1992 13 6.5 13 13 13 4.33 13 13 13 13 13 10 89765 1993 13 6.5 NA NA 13 4.33 NA 13 13 13 NA 11 89765 1994 13 6.5 13 13 13 4.33 13 13 13 13 13 12 89765 1995 14 7.0 14 14 14 4.67 14 14 14 14 14 The Goal is to return a Time series object for each ID. Alternatively one could return a matrix that I can turn into a Time series. The final structure would be something like this ( done in matrix form for illustration) 1989.0 1989.083 1991 ..19921993. 1994 1995 67543 12 6.0 12 12 12 4.00 12 12 12 12 12... .34...21.. NA.NANA 12345 NA, NA, NA,.54 27 Basically the time series will have patches at the front, middle and end where you may have years of NA The must be column ordered by time and aligned so that averages for all series can be computed per month. Now I have looping code to do this, where I loop through all the IDs and map the row of data into the correct column. and create column names based on the data and row names based on the ID, but it's painfully slow. Any wizardry would help. Your email came out a bit garbled so its not clear what you want to get out but this code will produce a multivariate ts series, i.e. an mts series, with one column for each series: f - function(x) ts(c(t(x[-(1:2)])), freq = 12, start = x$Year[1]) do.call(cbind, by(Data, Data$Index, f)) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame reordering to time series
Very Slick. Gabor this is a Huge speed up for me. Thanks. ha, Now I want to rewrite a bunch of working code. Id-c(rep(67543,4),rep(12345,3),rep(89765,5)) Years-c(seq(1989,1992,by =1),1991,1993,1994,seq(1991,1995,by=1)) Values2-c(12,NA,34,21,NA,65,23,NA,13,NA,13,14) Values-c(12,14,34,21,54,65,23,12,13,13,13,14) Data-data.frame(Index=Id,Year=Years,Jan=Values,Feb=Values/2,Mar=Values2,Apr=Values2,Jun=Values) Data Index Year Jan Feb Mar Apr Jun 1 67543 1989 12 6.0 12 12 12 2 67543 1990 14 7.0 NA NA 14 3 67543 1991 34 17.0 34 34 34 4 67543 1992 21 10.5 21 21 21 5 12345 1991 54 27.0 NA NA 54 6 12345 1993 65 32.5 65 65 65 7 12345 1994 23 11.5 23 23 23 8 89765 1991 12 6.0 NA NA 12 9 89765 1992 13 6.5 13 13 13 10 89765 1993 13 6.5 NA NA 13 11 89765 1994 13 6.5 13 13 13 12 89765 1995 14 7.0 14 14 14 # Gabor's solution f - function(x) ts(c(t(x[-(1:2)])), freq = 12, start = x$Year[1]) do.call(cbind, by(Data, Data$Index, f)) 12345 67543 89765 Jan 1989NA 12.0NA Feb 1989NA 6.0NA Mar 1989NA 12.0NA Apr 1989NA 12.0NA May 1989NA 12.0NA Jun 1989NA 14.0NA Jul 1989NA 7.0NA Aug 1989NANANA Sep 1989NANANA Oct 1989NA 14.0NA Nov 1989NA 34.0NA Dec 1989NA 17.0NA Jan 1990NA 34.0NA Feb 1990NA 34.0NA Mar 1990NA 34.0NA Apr 1990NA 21.0NA May 1990NA 10.5NA Jun 1990NA 21.0NA Jul 1990NA 21.0NA Aug 1990NA 21.0NA Sep 1990NANANA Oct 1990NANANA Nov 1990NANANA Dec 1990NANANA Jan 1991 54.0NA 12.0 Feb 1991 27.0NA 6.0 ... On Sat, Aug 7, 2010 at 5:09 PM, steven mosher mosherste...@gmail.comwrote: Thanks Gabor, I probably should have done an example with fewer columns. i will rework the example and post it up so the next guys who has this issue can have a clear example with a solution. On Sat, Aug 7, 2010 at 5:04 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Sat, Aug 7, 2010 at 4:49 PM, steven mosher mosherste...@gmail.com wrote: Given a data frame, or it could be a matrix if I choose to. The data consists of an ID, a year, and data for all 12 months. Missing values are a factor AND missing years. Id-c(rep(67543,4),rep(12345,3),rep(89765,5)) Years-c(seq(1989,1992,by =1),1991,1993,1994,seq(1991,1995,by=1)) Values2-c(12,NA,34,21,NA,65,23,NA,13,NA,13,14) Values-c(12,14,34,21,54,65,23,12,13,13,13,14) Data-data.frame(Index=Id,Year=Years,Jan=Values,Feb=Values/2,Mar=Values2,Apr=Values2,Jun=Values,July=Values/3,Aug=Values2,Sep=Values, + Oct=Values,Nov=Values,Dec=Values2) Data Index Year Jan Feb Mar Apr Jun July Aug Sep Oct Nov Dec 1 67543 1989 12 6.0 12 12 12 4.00 12 12 12 12 12 2 67543 1990 14 7.0 NA NA 14 4.67 NA 14 14 14 NA 3 67543 1991 34 17.0 34 34 34 11.33 34 34 34 34 34 4 67543 1992 21 10.5 21 21 21 7.00 21 21 21 21 21 5 12345 1991 54 27.0 NA NA 54 18.00 NA 54 54 54 NA 6 12345 1993 65 32.5 65 65 65 21.67 65 65 65 65 65 7 12345 1994 23 11.5 23 23 23 7.67 23 23 23 23 23 8 89765 1991 12 6.0 NA NA 12 4.00 NA 12 12 12 NA 9 89765 1992 13 6.5 13 13 13 4.33 13 13 13 13 13 10 89765 1993 13 6.5 NA NA 13 4.33 NA 13 13 13 NA 11 89765 1994 13 6.5 13 13 13 4.33 13 13 13 13 13 12 89765 1995 14 7.0 14 14 14 4.67 14 14 14 14 14 The Goal is to return a Time series object for each ID. Alternatively one could return a matrix that I can turn into a Time series. The final structure would be something like this ( done in matrix form for illustration) 1989.0 1989.083 1991 ..19921993. 1994 1995 67543 12 6.0 12 12 12 4.00 12 12 12 12 12... .34...21.. NA.NANA 12345 NA, NA, NA,.54 27 Basically the time series will have patches at the front, middle and end where you may have years of NA The must be column ordered by time and aligned so that averages for all series can be computed per month. Now I have looping code to do this, where I loop through all the IDs and map the row of data into the correct column. and create column names based on the data and row names based on the ID, but it's painfully slow. Any wizardry would help. Your email came out a bit garbled so its not clear what you want to get out but this code will produce a multivariate ts series, i.e. an mts series, with one column for each series: f - function(x) ts(c(t(x[-(1:2)])), freq = 12, start = x$Year[1]) do.call(cbind, by(Data, Data$Index, f)) [[alternative HTML version deleted
Re: [R] Data frame reordering to time series
In the real data the months are all complete, but the years can be missing. So years can be missing up front, in the middle, at the end. but if a year is present than every month has a value or NA. To create regular R ts I had to plow through the data frame, collect a year caluculate an index to put it into the final time series. I had tried zoo out and it handled the irregular spaced data, but a large data structure of zoo objects had stumped me. espcially since I need to do matching and selecting of the zoo objects. In the real data, there are about 7000 time series of 1500 months and those 7000 get averaged and combined in different ways On Sat, Aug 7, 2010 at 8:45 PM, Gabor Grothendieck ggrothendi...@gmail.comwrote: On Sat, Aug 7, 2010 at 9:18 PM, steven mosher mosherste...@gmail.com wrote: Very Slick. Gabor this is a Huge speed up for me. Thanks. ha, Now I want to rewrite a bunch of working code. Id-c(rep(67543,4),rep(12345,3),rep(89765,5)) Years-c(seq(1989,1992,by =1),1991,1993,1994,seq(1991,1995,by=1)) Values2-c(12,NA,34,21,NA,65,23,NA,13,NA,13,14) Values-c(12,14,34,21,54,65,23,12,13,13,13,14) Data-data.frame(Index=Id,Year=Years,Jan=Values,Feb=Values/2,Mar=Values2,Apr=Values2,Jun=Values) Data Index Year Jan Feb Mar Apr Jun 1 67543 1989 12 6.0 12 12 12 2 67543 1990 14 7.0 NA NA 14 3 67543 1991 34 17.0 34 34 34 4 67543 1992 21 10.5 21 21 21 5 12345 1991 54 27.0 NA NA 54 6 12345 1993 65 32.5 65 65 65 7 12345 1994 23 11.5 23 23 23 8 89765 1991 12 6.0 NA NA 12 9 89765 1992 13 6.5 13 13 13 10 89765 1993 13 6.5 NA NA 13 11 89765 1994 13 6.5 13 13 13 12 89765 1995 14 7.0 14 14 14 # Gabor's solution f - function(x) ts(c(t(x[-(1:2)])), freq = 12, start = x$Year[1]) do.call(cbind, by(Data, Data$Index, f)) 12345 67543 89765 The original data had consecutive months in each series (actually there was a missing 1992 in one case but I assumed that was an inadvertent omission and the actual data was complete); however, here we have missing 6 month chunks in addition. That makes the series non-consecutive so to solve that we could either apply this to the data (after putting the missing 1992 year back in): Data - cbind(Data, NA, NA, NA, NA, NA, NA) or we could use a time series class that can handle irregularly spaced data: library(zoo) f - function(x) { dat - x[-(1:2)] tim - as.yearmon(outer(x$Year, seq(0, length = ncol(dat))/12, +)) zoo(c(as.matrix(dat)), tim) } do.call(cbind, by(Data, Data$Index, f)) The last line is unchanged from before. This code will also handle the original situation correctly even if the missing 1992 is truly missing. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame reordering to time series
Ok, I'm a bit confused by what you mean by regularly spaced After I do the do.call I do get a data structure with all the times present and every time has a NA or a data value. Steve On Sun, Aug 8, 2010 at 2:46 AM, Gabor Grothendieck ggrothendi...@gmail.comwrote: On Sun, Aug 8, 2010 at 2:01 AM, steven mosher mosherste...@gmail.com wrote: In the real data the months are all complete, but the years can be missing. So years can be missing up front, in the middle, at the end. but if a year is present than every month has a value or NA. To create regular R ts I had to plow through the data frame, collect a year caluculate an index to put it into the final time series. I had tried zoo out and it handled the irregular spaced data, but a large data structure of zoo objects had stumped me. espcially since I need to do matching and selecting of the zoo objects. In the real data, there are about 7000 time series of 1500 months and those 7000 get averaged and combined in different ways If there are missing years and you want to get a regularly spaced series out then use the zoo version of f (rather than the ts version of f) and if this is the last statement (same as before but assigning it to the variable z): z - do.call(cbind, by(Data, Data$Index, f)) then to get a regularly spaced ts object just do this: as.ts(z) or as.zooreg(as.ts(z)) to create a regularly spaced zooreg object. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame reordering to time series
Thanks again, They worked for me as well. I did a simpler example with fewer years just to show that it worked...( shorted here for display) f - function(x) { +dat - x[-(1:2)] +tim - as.yearmon(outer(x$Year, seq(0, length = ncol(dat))/12, +)) +zoo(c(as.matrix(dat)), tim) + } g-do.call(cbind, by(Data, Data$Index, f)) g X12345 X34567 X56789 Jan 1989 NA 3 6 Feb 1989 NA 3 6 Mar 1989 NA 3 6 Apr 1989 NA 3 6 May 1989 NA 3 6 Jun 1989 NA 3 6 Jul 1989 NA 3 6 Aug 1989 NA 3 6 Sep 1989 NA 3 6 Oct 1989 NA 3 6 Nov 1989 NA 3 6 Dec 1989 NA 3 6 Jan 1990 2 4 6 Feb 1990 2 4 6 Mar 1990 2 4 6 Apr 1990 2 4 6 May 1990 2 4 6 Jun 1990 2 4 6 Jul 1990 2 4 6 Aug 1990 2 4 6 Sep 1990 2 4 6 Oct 1990 2 4 6 Nov 1990 2 4 6 Dec 1990 2 4 6 Jan 1991 NA 5 NA . z-as.zooreg(as.ts(g)) z X12345 X34567 X56789 1989(1) NA 3 6 1989(2) NA 3 6 1989(3) NA 3 6 1989(4) NA 3 6 1989(5) NA 3 6 1989(6) NA 3 6 1989(7) NA 3 6 1989(8) NA 3 6 1989(9) NA 3 6 1989(10) NA 3 6 1989(11) NA 3 6 1989(12) NA 3 6 1990(1) 2 4 6 1990(2) 2 4 6 1990(3) 2 4 6 1990(4) 2 4 6 1990(5) 2 4 6 1990(6) 2 4 6 1990(7) 2 4 6 1990(8) 2 4 6 1990(9) 2 4 6 1990(10) 2 4 6 1990(11) 2 4 6 1990(12) 2 4 6 1991(1) NA 5 NA 1991(2) NA 5 NA 1991(3) NA 5 NA 1991(4) NA 5 NA 1991(5) NA 5 NA 1991(6) NA 5 NA 1991(7) NA 5 NA 1991(8) NA 5 NA 1991(9) NA 5 NA 1991(10) NA 5 NA 1991(11) NA 5 NA 1991(12) NA 5 NA 1992(1) 2 NA NA 1992(2) 2 NA NA *** The interesting this is the change from months to the (1)... On Sun, Aug 8, 2010 at 8:55 AM, Gabor Grothendieck ggrothendi...@gmail.comwrote: On Sun, Aug 8, 2010 at 11:21 AM, steven mosher mosherste...@gmail.com wrote: Ok, I'm a bit confused by what you mean by regularly spaced After I do the do.call I do get a data structure with all the times present and every time has a NA or a data value. Steve regularly spaced means that every observation is one month later than the prior. If there are missing 6 month chunks or missing entire years then the observations are not regularly spaced since there are some months not present. It works for me: Id-c(rep(67543,4),rep(12345,3),rep(89765,5)) Years-c(seq(1989,1992,by =1),1991,1993,1994,seq(1991,1995,by=1)) Values2-c(12,NA,34,21,NA,65,23,NA,13,NA,13,14) Values-c(12,14,34,21,54,65,23,12,13,13,13,14) Data-data.frame(Index=Id,Year=Years,Jan=Values,Feb=Values/2,Mar=Values2,Apr=Values2,Jun=Values,July=Values/3,Aug=Values2,Sep=Values, + Oct=Values,Nov=Values,Dec=Values2) library(zoo) f - function(x) { +dat - x[-(1:2)] +tim - as.yearmon(outer(x$Year, seq(0, length = ncol(dat))/12, +)) +zoo(c(as.matrix(dat)), tim) + } do.call(cbind, by(Data, Data$Index, f)) X12345X67543X89765 Jan 1989NA 12.00NA Feb 1989NA 6.00NA Mar 1989NA 12.00NA Apr 1989NA 12.00NA May 1989NA 12.00NA Jun 1989NA 4.00NA Jul 1989NA 12.00NA Aug 1989NA 12.00NA Sep 1989NA 12.00NA Oct 1989NA 12.00NA Nov 1989NA 12.00NA Jan 1990NA 14.00NA Feb 1990NA 7.00NA Mar 1990NANANA Apr 1990NANANA May 1990NA 14.00NA Jun 1990NA 4.67NA Jul 1990NANANA Aug 1990NA 14.00NA Sep 1990NA 14.00NA Oct 1990NA 14.00NA Nov 1990NANANA Jan 1991 54.00 34.00 12.00 Feb 1991 27.00 17.00 6.00 Mar 1991NA 34.00NA Apr 1991NA 34.00NA May 1991 54.00 34.00 12.00 Jun 1991 18.00 11.33 4.00 Jul 1991NA 34.00NA Aug 1991 54.00 34.00 12.00 Sep 1991 54.00 34.00 12.00 Oct 1991 54.00 34.00 12.00 Nov 1991
[R] nested 'by'
Assuming a data frame or matrix with two columns representing variable that you want to aggregate over. you want to calculate column means, by year, for each Id example-data.frame(id=c(rep(12345,5),rep(54321,6),rep(45678,7)),Year=rep(seq(1900,1902,by=1),6), x=seq(1,18,by=1),y=seq(18,1,by=-1)) example id Year x y 1 12345 1900 1 18 2 12345 1901 2 17 3 12345 1902 3 16 4 12345 1900 4 15 5 12345 1901 5 14 6 54321 1902 6 13 7 54321 1900 7 12 8 54321 1901 8 11 9 54321 1902 9 10 10 54321 1900 10 9 11 54321 1901 11 8 12 45678 1902 12 7 13 45678 1900 13 6 14 45678 1901 14 5 15 45678 1902 15 4 16 45678 1900 16 3 17 45678 1901 17 2 18 45678 1902 18 1 result-by(example[,3:4], example$id, by(example[,3:4], example$Year,colMeans, na.rm=T)) Error in FUN(X[[1L]], ...) : could not find function FUN desired result should look like: id Year meanx mean y 1 12345 1900 ...... 2 12345 1901 ... 3 12345 1902 ... 4 54321 1900 5 54321 1901 6 54321 1902 7 45678 1900 8 45678 1901 9 45678 1902 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nested 'by'
That works. Thanks On Mon, Aug 9, 2010 at 7:55 AM, Henrique Dallazuanna www...@gmail.comwrote: Try this: aggregate(example[c('x', 'y')], example[c('id', 'Year')], 'mean') On Mon, Aug 9, 2010 at 11:46 AM, steven mosher mosherste...@gmail.comwrote: Assuming a data frame or matrix with two columns representing variable that you want to aggregate over. you want to calculate column means, by year, for each Id example-data.frame(id=c(rep(12345,5),rep(54321,6),rep(45678,7)),Year=rep(seq(1900,1902,by=1),6), x=seq(1,18,by=1),y=seq(18,1,by=-1)) example id Year x y 1 12345 1900 1 18 2 12345 1901 2 17 3 12345 1902 3 16 4 12345 1900 4 15 5 12345 1901 5 14 6 54321 1902 6 13 7 54321 1900 7 12 8 54321 1901 8 11 9 54321 1902 9 10 10 54321 1900 10 9 11 54321 1901 11 8 12 45678 1902 12 7 13 45678 1900 13 6 14 45678 1901 14 5 15 45678 1902 15 4 16 45678 1900 16 3 17 45678 1901 17 2 18 45678 1902 18 1 result-by(example[,3:4], example$id, by(example[,3:4], example$Year,colMeans, na.rm=T)) Error in FUN(X[[1L]], ...) : could not find function FUN desired result should look like: id Year meanx mean y 1 12345 1900 ...... 2 12345 1901 ... 3 12345 1902 ... 4 54321 1900 5 54321 1901 6 54321 1902 7 45678 1900 8 45678 1901 9 45678 1902 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sweep and zoo objects
rc-list(c( 123,321,234,543,654,768,986,987,246,284),c(Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec)) # the matrix has rownames that are used as identifiers and columns # of time. 1 years worth of data. Thats the native format test-matrix(seq(1,120, by=1), nrow=10,dimnames=rc) test Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 123 1 11 21 31 41 51 61 71 81 91 101 111 321 2 12 22 32 42 52 62 72 82 92 102 112 234 3 13 23 33 43 53 63 73 83 93 103 113 543 4 14 24 34 44 54 64 74 84 94 104 114 654 5 15 25 35 45 55 65 75 85 95 105 115 768 6 16 26 36 46 56 66 76 86 96 106 116 986 7 17 27 37 47 57 67 77 87 97 107 117 987 8 18 28 38 48 58 68 78 88 98 108 118 246 9 19 29 39 49 59 69 79 89 99 109 119 284 10 20 30 40 50 60 70 80 90 100 110 120 #The desired result would be a merged zoo object with the row names used as the colnames of the multiple zoo series test2-matrix(test,nrow=12, byrow=F) g-zoo(test2[,1],frequency=12) MYZOO -merge(g,test2[,2:10]) # the result MYZOO is a zoo object, but we've lost the row names in the transformation of the matrix #So colnames(MYZOO)-row.names(test) #Fixes that problem. Is there a more elegant way to do this??? # now this zoo object needs to be swept out of a much longer zoo object # with the same column names.. The 'sweep' function is - Sweep works normally by sweeping out a vector from an array (by column or by row sweep(x, MARGIN, STATS, FUN=-, check.margin=TRUE, ...) so in my example x would be a long yearmon zoo object with the same column names as MYZOO above, but decades of data. MARGIN would be rows and the STATS to sweep out would be the values in MYZOO. test3-matrix(seq(1,720, by=1), ncol=10) p-zoo(test3[,1], freq=12) longzoo-merge(p,test3[,2:10]) colnames(longzoo)-row.names(test) what we want to do is to sweep out MYZOO from longzoo. I could just repeat the data in MYZOO 6 times and then subtract MYZOO from longzoo, but thats a potential memory buster in this situation [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sweeping a zoo series
Given a long zoo matrix, the goal is to sweep out a statistic from the entire length of the sequences. longzoomatrix-zoo(matrix(rnorm(720),ncol=6),as.yearmon(outer(1900,seq(0,length=120)/12,+))) cnames-c(12345,23456,34567,45678,56789,67890) colnames(longzoomatrix)-cnames longzoomatrix[1:24,] 12345 23456 34567 45678 56789 67890 Jan 1900 -0.17123165 1.02087086 0.79514870 -0.54519494 -0.13025459 -0.009980402 Feb 1900 1.21729926 -0.74541038 -0.08138406 -2.01180775 0.19256998 0.551965871 Mar 1900 1.13222481 -1.25315703 0.01013473 0.08366155 -0.84246010 -1.405959298 Apr 1900 -0.02352559 -1.25001473 -1.53570550 -0.17945324 0.33368133 2.045125104 May 1900 2.08204920 1.28091067 -0.80888146 0.31796730 0.83248551 1.439049603 Jun 1900 0.62209570 -0.66189249 -0.57923119 -0.04346112 -2.71353384 -0.346826902 Jul 1900 -1.39758918 -0.54525469 -0.05230070 -0.36725079 1.28281798 1.391174712 Aug 1900 0.12594069 0.09303970 0.69916411 -1.01902352 -0.82720898 -0.208113626 Sep 1900 -0.34310543 0.41718435 0.79455765 1.13234707 0.14652667 -0.551426097 Oct 1900 1.70634123 -1.20073104 -1.08771551 -0.01715296 0.24931996 -0.753481196 Nov 1900 0.15224070 -0.05108370 -0.97410069 0.51130170 0.13880814 -2.160811186 Dec 1900 0.34726817 0.61830719 0.84429979 -0.26253635 0.95243068 -0.533562966 Jan 1901 0.28647563 -0.40650198 -1.19640622 0.70267162 0.18867804 0.098855045 Feb 1901 1.27269836 0.31797472 -1.13038040 1.33654480 0.08885501 -0.134690872 Mar 1901 -1.36934330 -0.17244539 0.81705554 -0.09113888 0.90241413 0.473939164 Apr 1901 -0.89768498 0.82497595 0.15684387 2.25294476 -1.72886103 -0.104769411 May 1901 -0.27898445 -1.24348285 1.36203180 0.02422083 -1.33745980 1.098856752 Jun 1901 -0.67968801 0.42082064 0.47056133 -0.12981223 0.19445803 -0.284638114 Jul 1901 0.03791761 -0.22118130 1.96044737 -1.18280989 0.90075205 0.055720535 Aug 1901 1.12904079 0.57177055 0.64300572 -0.16284983 0.07951656 -0.159396821 Sep 1901 -1.43513934 0.03036697 1.09039400 0.99201776 0.98744827 -0.057234838 Oct 1901 0.73828382 0.53967835 2.16608282 -0.82929778 -1.9987 0.352778450 Nov 1901 0.06561583 -1.20126258 0.67427027 0.15493106 0.08867697 1.223073528 Dec 1901 -1.23347027 -1.09699304 0.59398031 -0.22269292 -0.21569543 1.389667825 The statistic to be swept out is itself a zoo series with matching column names. There are twelve valies for each column representing an monthly average for that series. The average is to be subtracted sweepzoo-zoo(matrix(rnorm(72),ncol=6), frequency=12) colnames(sweepzoo)-cnames sweepzoo 12345 23456 34567 45678 56789 67890 1(1) -2.5569706 -0.4375741 -0.1803866 -0.6303760 -0.08995198 2.7293244 2(1) 1.4154202 0.2559212 0.2104513 0.7439446 0.84897905 -0.4144865 3(1) -1.3709275 1.0472759 1.5975148 0.3190503 1.10430959 -1.8285194 4(1) -1.1436430 2.2071763 -0.2637954 -0.4915366 -0.03925020 1.3311624 5(1) -0.8003656 1.6421541 -1.4603128 0.4493069 0.28194066 -0.4728086 6(1) 0.9236015 0.3780122 -1.3848196 0.4263684 0.99584590 -1.4536475 7(1) 0.8810281 0.0381152 0.3810457 -0.6884233 -0.11018089 0.4221188 8(1) 0.3819421 -0.8431364 1.9876901 0.7072257 0.45524929 2.7013515 9(1) -1.1247988 1.3083178 -0.3438442 0.3300832 0.67013503 1.2912443 10(1) -0.3643043 1.0756782 -1.2026318 0.4477054 0.54486700 -0.3369889 11(1) 0.8294049 1.8170357 0.5691249 1.9213791 -0.29295754 -0.2617228 12(1) -1.0085265 -0.7556545 -1.4033321 -0.4646647 -0.14984913 -0.4848657 A brute force way to do this is to repeat the 12 values for each column so that the number of rows in the sweepzoo is equal to the nmber of rows on the long zoo, object and then just subtract them. longzoomatrix-sweepzoo As a function sweep() wont work because it expects a vector whose dimensions matches the dimension of the MARGIN. Is there a elegant way to do this short of creating a sweep zoo that matches the row dimension of longzoo? ( would be a nice addition to sweep) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweeping a zoo series
The colMeans comes closest, for a single series the assume you have 100 years of monthly data. The mean you want to scale by is the mean for a restricted period in the center of the series.. say 1950-1960 for this period you have the average jan (1950-1960) average feb, ect. your final series would be jan 1900 - average jan(1950-60) feb 1990 - average feb jan 2000 - average jan(1950-60) Which gives you a scaling that is not relative to the mean of the whole, but relative to a base period which is selctable. BTW switching to zoo has greatly simplified the code. On Wed, Aug 11, 2010 at 11:21 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Wed, Aug 11, 2010 at 12:22 PM, steven mosher mosherste...@gmail.com wrote: Given a long zoo matrix, the goal is to sweep out a statistic from the entire length of the sequences. longzoomatrix-zoo(matrix(rnorm(720),ncol=6),as.yearmon(outer(1900,seq(0,length=120)/12,+))) cnames-c(12345,23456,34567,45678,56789,67890) colnames(longzoomatrix)-cnames longzoomatrix[1:24,] 12345 23456 34567 45678 56789 67890 Jan 1900 -0.17123165 1.02087086 0.79514870 -0.54519494 -0.13025459 -0.009980402 Feb 1900 1.21729926 -0.74541038 -0.08138406 -2.01180775 0.19256998 0.551965871 Mar 1900 1.13222481 -1.25315703 0.01013473 0.08366155 -0.84246010 -1.405959298 Apr 1900 -0.02352559 -1.25001473 -1.53570550 -0.17945324 0.33368133 2.045125104 May 1900 2.08204920 1.28091067 -0.80888146 0.31796730 0.83248551 1.439049603 Jun 1900 0.62209570 -0.66189249 -0.57923119 -0.04346112 -2.71353384 -0.346826902 Jul 1900 -1.39758918 -0.54525469 -0.05230070 -0.36725079 1.28281798 1.391174712 Aug 1900 0.12594069 0.09303970 0.69916411 -1.01902352 -0.82720898 -0.208113626 Sep 1900 -0.34310543 0.41718435 0.79455765 1.13234707 0.14652667 -0.551426097 Oct 1900 1.70634123 -1.20073104 -1.08771551 -0.01715296 0.24931996 -0.753481196 Nov 1900 0.15224070 -0.05108370 -0.97410069 0.51130170 0.13880814 -2.160811186 Dec 1900 0.34726817 0.61830719 0.84429979 -0.26253635 0.95243068 -0.533562966 Jan 1901 0.28647563 -0.40650198 -1.19640622 0.70267162 0.18867804 0.098855045 Feb 1901 1.27269836 0.31797472 -1.13038040 1.33654480 0.08885501 -0.134690872 Mar 1901 -1.36934330 -0.17244539 0.81705554 -0.09113888 0.90241413 0.473939164 Apr 1901 -0.89768498 0.82497595 0.15684387 2.25294476 -1.72886103 -0.104769411 May 1901 -0.27898445 -1.24348285 1.36203180 0.02422083 -1.33745980 1.098856752 Jun 1901 -0.67968801 0.42082064 0.47056133 -0.12981223 0.19445803 -0.284638114 Jul 1901 0.03791761 -0.22118130 1.96044737 -1.18280989 0.90075205 0.055720535 Aug 1901 1.12904079 0.57177055 0.64300572 -0.16284983 0.07951656 -0.159396821 Sep 1901 -1.43513934 0.03036697 1.09039400 0.99201776 0.98744827 -0.057234838 Oct 1901 0.73828382 0.53967835 2.16608282 -0.82929778 -1.9987 0.352778450 Nov 1901 0.06561583 -1.20126258 0.67427027 0.15493106 0.08867697 1.223073528 Dec 1901 -1.23347027 -1.09699304 0.59398031 -0.22269292 -0.21569543 1.389667825 The statistic to be swept out is itself a zoo series with matching column names. There are twelve valies for each column representing an monthly average for that series. The average is to be subtracted sweepzoo-zoo(matrix(rnorm(72),ncol=6), frequency=12) colnames(sweepzoo)-cnames sweepzoo 12345 23456 34567 45678 56789 67890 1(1) -2.5569706 -0.4375741 -0.1803866 -0.6303760 -0.08995198 2.7293244 2(1) 1.4154202 0.2559212 0.2104513 0.7439446 0.84897905 -0.4144865 3(1) -1.3709275 1.0472759 1.5975148 0.3190503 1.10430959 -1.8285194 4(1) -1.1436430 2.2071763 -0.2637954 -0.4915366 -0.03925020 1.3311624 5(1) -0.8003656 1.6421541 -1.4603128 0.4493069 0.28194066 -0.4728086 6(1) 0.9236015 0.3780122 -1.3848196 0.4263684 0.99584590 -1.4536475 7(1) 0.8810281 0.0381152 0.3810457 -0.6884233 -0.11018089 0.4221188 8(1) 0.3819421 -0.8431364 1.9876901 0.7072257 0.45524929 2.7013515 9(1) -1.1247988 1.3083178 -0.3438442 0.3300832 0.67013503 1.2912443 10(1) -0.3643043 1.0756782 -1.2026318 0.4477054 0.54486700 -0.3369889 11(1) 0.8294049 1.8170357 0.5691249 1.9213791 -0.29295754 -0.2617228 12(1) -1.0085265 -0.7556545 -1.4033321 -0.4646647 -0.14984913 -0.4848657 A brute force way to do this is to repeat the 12 values for each column so that the number of rows in the sweepzoo is equal to the nmber of rows on the long zoo, object and then just subtract them. longzoomatrix-sweepzoo As a function sweep() wont work because it expects a vector whose dimensions matches the dimension of the MARGIN. Is there a elegant way to do this short of creating a sweep zoo that matches the row dimension of longzoo? ( would be a nice addition
[R] Creating list from a long vector
Stupid question, but its been a long night. If I have a long vector how can I turn it into a list of the same length x-rep(seq(1,100,by=1),each=10) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating list from a long vector
Thx, I see my problem. more sleep required On Sat, Aug 14, 2010 at 9:25 AM, Romain Francois romain.franc...@dbmail.com wrote: Le 14/08/10 18:22, steven mosher a écrit : Stupid question, but its been a long night. If I have a long vector how can I turn it into a list of the same length x-rep(seq(1,100,by=1),each=10) Perhaps as.list ? -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://bit.ly/bzoWrs : Rcpp svn revision 2000 |- http://bit.ly/b8VNE2 : Rcpp at LondonR, oct 5th `- http://bit.ly/aAyra4 : highlight 0.2-2 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Trouble loading saved Rdata
In the particular application I have I save test.Rdata to a sub directory dir-Example dir.create(dir) test-data.frame(a=c(1,2,3),b=c(3,4,5) full-file.path(dir,test.Rdata,fsep=.Platform$file.sep) save(test,file=full) load(full) returns NULL it works fine when the object is saved to the working directory, but fails when saved to a sub directory. The Rdata is there. Bytes are in it. but loading it doesnt work. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble loading saved Rdata
The typos were just transcription errors I'' report out the session info On Sat, Aug 14, 2010 at 5:35 PM, Joshua Wiley jwiley.ps...@gmail.comwrote: That worked for me once I properly quoted test.RData on sessionInfo() R version 2.11.1 (2010-05-31) x86_64-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 If correcting the quoting does not help you, perhaps you can report the results of sessionInfo() Cheers, Josh On Sat, Aug 14, 2010 at 5:14 PM, steven mosher mosherste...@gmail.com wrote: In the particular application I have I save test.Rdata to a sub directory dir-Example dir.create(dir) test-data.frame(a=c(1,2,3),b=c(3,4,5) full-file.path(dir,test.Rdata,fsep=.Platform$file.sep) save(test,file=full) load(full) returns NULL it works fine when the object is saved to the working directory, but fails when saved to a sub directory. The Rdata is there. Bytes are in it. but loading it doesnt work. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble loading saved Rdata
Did you exit R and then return? fname-test.Rdata full-file.path(Example,fname,fsep=.Platform$file.sep) full [1] Example/test.Rdata load(full) test NULL sessionInfo() R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.11.1 On Sat, Aug 14, 2010 at 5:35 PM, Joshua Wiley jwiley.ps...@gmail.comwrote: That worked for me once I properly quoted test.RData on sessionInfo() R version 2.11.1 (2010-05-31) x86_64-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 If correcting the quoting does not help you, perhaps you can report the results of sessionInfo() Cheers, Josh On Sat, Aug 14, 2010 at 5:14 PM, steven mosher mosherste...@gmail.com wrote: In the particular application I have I save test.Rdata to a sub directory dir-Example dir.create(dir) test-data.frame(a=c(1,2,3),b=c(3,4,5) full-file.path(dir,test.Rdata,fsep=.Platform$file.sep) save(test,file=full) load(full) returns NULL it works fine when the object is saved to the working directory, but fails when saved to a sub directory. The Rdata is there. Bytes are in it. but loading it doesnt work. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble loading saved Rdata
I think it came down to my actual program having a function that saved the objects it was passed with a .RData extension as opposed to .Rdata Rechecking the whole thing. On Sun, Aug 15, 2010 at 11:05 AM, Joshua Wiley jwiley.ps...@gmail.comwrote: Steven, I have exited my R session and restarted and I can load the file without issue. I have also tried loading the saved data on some older versions of R (2.10.1 and 2.11.0) and Windows (XP). Have you tried recreating the test object, ensuring that it is not NULL itself, resaving it, and then see if loading it works better? Josh On Sun, Aug 15, 2010 at 12:06 AM, steven mosher mosherste...@gmail.com wrote: Did you exit R and then return? fname-test.Rdata full-file.path(Example,fname,fsep=.Platform$file.sep) full [1] Example/test.Rdata load(full) test NULL sessionInfo() R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.11.1 On Sat, Aug 14, 2010 at 5:35 PM, Joshua Wiley jwiley.ps...@gmail.com wrote: That worked for me once I properly quoted test.RData on sessionInfo() R version 2.11.1 (2010-05-31) x86_64-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 If correcting the quoting does not help you, perhaps you can report the results of sessionInfo() Cheers, Josh On Sat, Aug 14, 2010 at 5:14 PM, steven mosher mosherste...@gmail.com wrote: In the particular application I have I save test.Rdata to a sub directory dir-Example dir.create(dir) test-data.frame(a=c(1,2,3),b=c(3,4,5) full-file.path(dir,test.Rdata,fsep=.Platform$file.sep) save(test,file=full) load(full) returns NULL it works fine when the object is saved to the working directory, but fails when saved to a sub directory. The Rdata is there. Bytes are in it. but loading it doesnt work. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble loading saved Rdata
During my session I write several .Rdata objects to a variety of subdirectories so replicating the exact problem wasnt very easy. In the actual program all the files get written. all the files have sizes that fit the amount of data in them. It looks like the problem was naming the files .RData as opposed to .Rdata since there was one function that named all the files before saving, it kinda messed up the program and my ability to replicate the problem. Seems to be working now Thanks On Sun, Aug 15, 2010 at 6:44 AM, David Winsemius dwinsem...@comcast.netwrote: On Aug 15, 2010, at 3:06 AM, steven mosher wrote: Did you exit R and then return? fname-test.Rdata full-file.path(Example,fname,fsep=.Platform$file.sep) full [1] Example/test.Rdata load(full) test NULL I am unable to reproduce the problem (after correcting two different syntactic errors in the initial posting that should have thrown errors and prevented the creation of both test and full . I didn't exit my session and return but I did remove the test object after saving it. My guess is that the test object was not correctly formed at the time it was saved. test-data.frame(a=c(1,2,3),b=c(3,4,5)) save(test,file=full) test a b 1 1 3 2 2 4 3 3 5 full [1] Example/test.Rdata rm(test) load(file=full) test a b 1 1 3 2 2 4 3 3 5 (I pretty much have the same setup that is indicated below running on MacOS 10.5.8.) -- David. sessionInfo() R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.11.1 On Sat, Aug 14, 2010 at 5:35 PM, Joshua Wiley jwiley.ps...@gmail.com wrote: That worked for me once I properly quoted test.RData on sessionInfo() R version 2.11.1 (2010-05-31) x86_64-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 If correcting the quoting does not help you, perhaps you can report the results of sessionInfo() Cheers, Josh On Sat, Aug 14, 2010 at 5:14 PM, steven mosher mosherste...@gmail.com wrote: In the particular application I have I save test.Rdata to a sub directory dir-Example dir.create(dir) test-data.frame(a=c(1,2,3),b=c(3,4,5) full-file.path(dir,test.Rdata,fsep=.Platform$file.sep) save(test,file=full) load(full) returns NULL it works fine when the object is saved to the working directory, but fails when saved to a sub directory. The Rdata is there. Bytes are in it. but loading it doesnt work. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] differecing a zoo series
A quick question x - as.yearmon(2000 + seq(0, 23)/12) x [1] Jan 2000 Feb 2000 Mar 2000 Apr 2000 May 2000 Jun 2000 Jul 2000 Aug 2000 Sep 2000 Oct 2000 Nov 2000 Dec 2000 Jan 2001 [14] Feb 2001 Mar 2001 Apr 2001 May 2001 Jun 2001 Jul 2001 Aug 2001 Sep 2001 Oct 2001 Nov 2001 Dec 2001 data-seq(1,24,by=1) testzoo-zoo(data,order.by=x) The operation I ant to perform on the zoo series is this. I will illustrate with a small example and formula: the coredata of the zoo series is 1,2,3,4,5,6,7,8) I want to calculate Result- zoo[x]-zoo[x-1] (NA,1,1,1,1,1...NA) The first element of course is undefined(NA). is there any method to do this elegantly, padding NAs at the start works but its ugly. if I get a simple function I can apply it to a matrix of zoo series [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] merging two maxtrices
j-matrix(nrow=10,ncol=10) k-matrix(seq(1:50), ncol=10) row.names(k) - seq(2,10,by=2) j [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] NA NA NA NA NA NA NA NA NANA [2,] NA NA NA NA NA NA NA NA NANA [3,] NA NA NA NA NA NA NA NA NANA [4,] NA NA NA NA NA NA NA NA NANA [5,] NA NA NA NA NA NA NA NA NANA [6,] NA NA NA NA NA NA NA NA NANA [7,] NA NA NA NA NA NA NA NA NANA [8,] NA NA NA NA NA NA NA NA NANA [9,] NA NA NA NA NA NA NA NA NANA [10,] NA NA NA NA NA NA NA NA NANA k [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 2 16 11 16 21 26 31 36 4146 4 27 12 17 22 27 32 37 4247 6 38 13 18 23 28 33 38 4348 8 49 14 19 24 29 34 39 4449 105 10 15 20 25 30 35 40 4550 is there a simple way to merge j and k By the row.names in k so that row named '2' is placed in the 2nd row of j.. and so forth through 4,6,8,10 the actual example has a sparse k.. not evenly spaced so this should also be mergeable row.names(k) - c(1,2,5,6,9) k [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 116 11 16 21 26 31 36 4146 227 12 17 22 27 32 37 4247 538 13 18 23 28 33 38 4348 649 14 19 24 29 34 39 4449 95 10 15 20 25 30 35 40 4550 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging two maxtrices
weird, I tried that but it didnt appear to work.. hmm. Thanks I try it again On Sun, Sep 5, 2010 at 12:21 AM, bill.venab...@csiro.au wrote: Is this all you want? j - matrix(nrow=10,ncol=10) k - matrix(seq(1:50), ncol=10) row.names(k) - seq(2,10,by=2) row.names(j) - 1:10 j[row.names(k), ] - k j [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 1NA NA NA NA NA NA NA NA NANA 2 16 11 16 21 26 31 36 4146 3NA NA NA NA NA NA NA NA NANA 4 27 12 17 22 27 32 37 4247 5NA NA NA NA NA NA NA NA NANA 6 38 13 18 23 28 33 38 4348 7NA NA NA NA NA NA NA NA NANA 8 49 14 19 24 29 34 39 4449 9NA NA NA NA NA NA NA NA NANA 105 10 15 20 25 30 35 40 4550 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of steven mosher Sent: Sunday, 5 September 2010 5:10 PM To: r-help Subject: [R] merging two maxtrices j-matrix(nrow=10,ncol=10) k-matrix(seq(1:50), ncol=10) row.names(k) - seq(2,10,by=2) j [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] NA NA NA NA NA NA NA NA NANA [2,] NA NA NA NA NA NA NA NA NANA [3,] NA NA NA NA NA NA NA NA NANA [4,] NA NA NA NA NA NA NA NA NANA [5,] NA NA NA NA NA NA NA NA NANA [6,] NA NA NA NA NA NA NA NA NANA [7,] NA NA NA NA NA NA NA NA NANA [8,] NA NA NA NA NA NA NA NA NANA [9,] NA NA NA NA NA NA NA NA NANA [10,] NA NA NA NA NA NA NA NA NANA k [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 2 16 11 16 21 26 31 36 4146 4 27 12 17 22 27 32 37 4247 6 38 13 18 23 28 33 38 4348 8 49 14 19 24 29 34 39 4449 105 10 15 20 25 30 35 40 4550 is there a simple way to merge j and k By the row.names in k so that row named '2' is placed in the 2nd row of j.. and so forth through 4,6,8,10 the actual example has a sparse k.. not evenly spaced so this should also be mergeable row.names(k) - c(1,2,5,6,9) k [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 116 11 16 21 26 31 36 4146 227 12 17 22 27 32 37 4247 538 13 18 23 28 33 38 4348 649 14 19 24 29 34 39 4449 95 10 15 20 25 30 35 40 4550 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging two maxtrices
ya, unfortunately k was actually a dataframe and not a matrix when it was returned from the function, which explains why I got unexpected results On Sun, Sep 5, 2010 at 12:21 AM, bill.venab...@csiro.au wrote: Is this all you want? j - matrix(nrow=10,ncol=10) k - matrix(seq(1:50), ncol=10) row.names(k) - seq(2,10,by=2) row.names(j) - 1:10 j[row.names(k), ] - k j [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 1NA NA NA NA NA NA NA NA NANA 2 16 11 16 21 26 31 36 4146 3NA NA NA NA NA NA NA NA NANA 4 27 12 17 22 27 32 37 4247 5NA NA NA NA NA NA NA NA NANA 6 38 13 18 23 28 33 38 4348 7NA NA NA NA NA NA NA NA NANA 8 49 14 19 24 29 34 39 4449 9NA NA NA NA NA NA NA NA NANA 105 10 15 20 25 30 35 40 4550 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of steven mosher Sent: Sunday, 5 September 2010 5:10 PM To: r-help Subject: [R] merging two maxtrices j-matrix(nrow=10,ncol=10) k-matrix(seq(1:50), ncol=10) row.names(k) - seq(2,10,by=2) j [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] NA NA NA NA NA NA NA NA NANA [2,] NA NA NA NA NA NA NA NA NANA [3,] NA NA NA NA NA NA NA NA NANA [4,] NA NA NA NA NA NA NA NA NANA [5,] NA NA NA NA NA NA NA NA NANA [6,] NA NA NA NA NA NA NA NA NANA [7,] NA NA NA NA NA NA NA NA NANA [8,] NA NA NA NA NA NA NA NA NANA [9,] NA NA NA NA NA NA NA NA NANA [10,] NA NA NA NA NA NA NA NA NANA k [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 2 16 11 16 21 26 31 36 4146 4 27 12 17 22 27 32 37 4247 6 38 13 18 23 28 33 38 4348 8 49 14 19 24 29 34 39 4449 105 10 15 20 25 30 35 40 4550 is there a simple way to merge j and k By the row.names in k so that row named '2' is placed in the 2nd row of j.. and so forth through 4,6,8,10 the actual example has a sparse k.. not evenly spaced so this should also be mergeable row.names(k) - c(1,2,5,6,9) k [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 116 11 16 21 26 31 36 4146 227 12 17 22 27 32 37 4247 538 13 18 23 28 33 38 4348 649 14 19 24 29 34 39 4449 95 10 15 20 25 30 35 40 4550 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] If then else with command for
not sure how you wanted sampling of x R-142 color-rep(0,142) for(i in 1:R){ y-sample(x,142,replace=FALSE) onebit- as.numeric(3471 %in% y) twobit-as.numeric(6720 %in% y)*2 fourbit-as.numeric(6263 %in% y)*4 colorbit-onebit+twobit+fourbit+1 color[i]-colorbit } On Tue, Sep 14, 2010 at 10:29 AM, Mestat mes...@pop.com.br wrote: Hey listers, I am trying to do something simple... Check the program below... I would like to create a variable named COLOR according to the conditions that I stablished... But the problem is that it seems that my variable COLOR is checking just on sample, may be last in the loop... Certainly, I am missing something... Thanks in advance, Marcio x-c(288,139,196,159,536,134,623,517,96,467,277,155,386,241,422,6263,612,532,250,412,339,55,290,249,164,97,74,144,1277,240,163,63,488,111,128,230,720,179,37,24,65,37,89,187,60,939,1008,81,310,58,169,38,68,190,78,807,220,226,69,179129,119,73,59,92,127,104,75,505,183,49,41,76,113,90,79,408,140,200,284,103,58,654,118,431,192,233,102,97,56,69,73,86,53,105,81,77,472,129,194,299,81,122,113,186,91,145,133,114,78,78,72,70,3471,641,275,815,149,185,172,240,67,526,122,229,298,317,179,233,66,129,87,82,63,65,72,6720,381,240,118,396,66,35,43,166,216,53,82,90,62,77,207,68,52,277,396,220,751,146,95,37,35,39,46,59,44,105,87,66,62,175,252,128,330,57,83,208,74,63,109,37,105,38,82,76,63,86,603,209,100,121,191,130,63,128,90,79,50,1025,121,87,309,75,189,36,82,84,60,132,46,965,155,132,219,112,53,90,66,100,77,52,60,100,153,418,392,76,130,197,262,49,105,87,70,147,720,342,233,203,249,92,134,231,782,184,182,432,49,63,94,124,69,53,91,451,53,21,42,50,40,32,58,26,28,61,60,35,764,105,592,55,28,46,34,123! ,4! 1,54,207,64,562,295,226,63,233) R-142 color-rep(0,142) for(i in 1:R){ x-sample(x,142,replace=FALSE) if (!3471 %in% x !6263 %in% x !6720 %in% x){color[i]-1} else if (3471 %in% x !6263 %in% x !6720 %in% x){color[i]-2} else if (!3471 %in% x 6263 %in% x !6720 %in% x){color[i]-3} else if (!3471 %in% x !6263 %in% x 6720 %in% x){color[i]-4} else if (3471 %in% x 6263 %in% x !6720 %in% x){color[i]-5} else if (3471 %in% x !6263 %in% x 6720 %in% x){color[i]-6} else if (!3471 %in% x 6263 %in% x 6720 %in% x){color[i]-7} else if (3471 %in% x 6263 %in% x 6720 %in% x){color[i]-8} else{color[i]-0} } -- View this message in context: http://r.789695.n4.nabble.com/If-then-else-with-command-for-tp2539341p2539341.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to uncompress a gz file in R
Wongsang, Just to be clear R.utils is different than utils As Henrik notes gunzip has been in R.utils ( see http://cran.r-project.org/) for some time. It works like a champ. R.utils is a great package. On Wed, Sep 15, 2010 at 9:30 AM, Wonsang You y...@ifn-magdeburg.de wrote: Dear Henrik, Thank you so much for your kind help. Unfortunately, I could not find out any function such as 'gunzip' in R.utils package. Instead, I could be successful by using the following command. system(gunzip filename) On the other hand, the function 'gzfile' supports the compression as gz format, but I still do not know how to decompress gz file by using the function 'gzfile'. Best Regards, Wonsang On 14 September 2010 15:23, Henrik Bengtsson h...@stat.berkeley.edu wrote: To uncompress an *.gz file into another file on disk, see also ?gunzip in the R.utils package. /Henrik 2010/9/14 Uwe Ligges lig...@statistik.tu-dortmund.de: See ?gzfile Uwe Ligges On 14.09.2010 11:02, Wonsang You wrote: Dear Fellows, I would like to know how to uncompress a gz file at the R console. I could not find out any help from the R-help archive. Thanks for your great help. Best Regards, Wonsang You - -- Wonsang You Special Lab Non-Invasive Brain Imaging Leibniz Institute for Neurobiology http://www.ifn-magdeburg.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Get File Names in Folder, Read Files, Update, and Write
you are welcomed. Steve On Wed, Sep 15, 2010 at 6:11 PM, Suphajak Ngamlak supha...@phatrasecurities.com wrote: Thank you so much. It works well Best Regards, Suphajak Ngamlak Equity and Derivatives Trading Phatra Securities Public Company Limited Tel: +662-305-9179 Email: supha...@phatrasecurities.com *From:* steven mosher [mailto:mosherste...@gmail.com] *Sent:* Thursday, September 16, 2010 2:29 AM *To:* Suphajak Ngamlak *Subject:* Re: [R] Get File Names in Folder, Read Files, Update, and Write Import-C:/A0810.RSK Table-read.table(file= Import, sep = ,, head=TRUE, na.strings = NA) Table$VALUE -0 Export-C:/A_XVal0810.RSK write.table(Table, file= Export, sep = ,, col.names = TRUE) As uwe, suggests list.files() or you can use dir() A robust way to do this would use R.utils package ( see Cran) but this is only necessary if you want to move the folder and still have the code work. it makes the code work regardless of your working directory assuming you know the name of the folder and its unique. insertString - _XVal targetFolder - yourfoldername folderPath - getAbsolutePath(targetFolder) #see R.utils outputFolder - folderpath # you could create a different output folder name # only grab the .RSK files using a regular expression fullFilenames - list.files(path=folderPath, full.names=TRUE,pattern=(.RSK)) # get only the file names for modification filenames - list.files(path=folderPath, full.names=FALSE,pattern=(.RSK)) for(filenumber in 1:length(fullFilenames)) { Table-read.table(file= fullFilenames[filenumber], sep = ,, head=TRUE, na.strings = NA) Table$VALUE -0 outfileName - paste(substr(filenames[filenumber]1,1), insertString, substr(filenames[filenumber],2,nchar(filenames[filenumber]), sep=) outFilePath -file.path(outputFolder,outfileName,fsep=.Platform$file.sep) write.table(Table, file= outFilePath, sep = ,, col.names = TRUE) } On Wed, Sep 15, 2010 at 1:55 AM, Suphajak Ngamlak supha...@phatrasecurities.com wrote: Dear All, Could you please recommend how I can do this? I have several text files in one folder. Let's name them A0801.RSK, A0802.RSK, I would like R to 1) Know all file names in this folder 2) Update value in one column of these files 3) Write results in another text file with _xval in the file names Below is R code for read, update, and write one file Import-C:/A0810.RSK Table-read.table(file= Import, sep = ,, head=TRUE, na.strings = NA) Table$VALUE -0 Export-C:/A_XVal0810.RSK write.table(Table, file= Export, sep = ,, col.names = TRUE) Thank you Best Regards, Suphajak Ngamlak Equity and Derivatives Trading Phatra Securities Public Company Limited Tel: +662-305-9179 Email: supha...@phatrasecurities.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to uncompress a gz file in R
you are welcome. Henrik's package is an great piece of work. It is worth the time to read through the whole thing and see how you can improve your programs by using other features as well. On Thu, Sep 16, 2010 at 2:16 AM, Wonsang You y...@ifn-magdeburg.de wrote: Dear Henrik and Steven, Thank you for your kind help and guidance even though it is a basic question. I misunderstood that gunzip is a part of not R.utils but utils. I could find out the function in R.utils. Then, it was successful to decompress any gz file as follows. library(R.utils) gunzip(foo.gz) Best Regards, Wonsang [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Substitute NAs by zero
v-c(1,2,3,4,5,6,7,8,97,6,5,4,NA,NA) b-zoo(v) b 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 97 6 5 4 NA NA b[is.na(b)]-0 b 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 97 6 5 4 0 0 is.zoo(b) [1] TRUE On Mon, Sep 20, 2010 at 2:37 AM, skan juanp...@gmail.com wrote: Hello How can I substitute all NA values by zero in a R zoo series? I've been reading about na.locf and na.omit but I think none of them do what I need. thanks. -- View this message in context: http://r.789695.n4.nabble.com/Substitute-NAs-by-zero-tp2546715p2546715.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] diagnosing download.file() problems
I'm accessing around 95 tar files on an FTP server ranging in size between 10 and 40MB a piece. while certainly can click on them and download them outside of R, I'd like to have my script do it. Retrieving the ftp directory with RCurl works fine (about 90% of the time) but downloading the files by looping through all the files is a random process. I may get 1-8 files download and then it throws an error cannot open URL sometimes I only can get 1 file before this error. with tryCatch() I've been able to do some clean up after the crash, but automating this whole download process has turned into a bit of a circus. The parameters (url, destfile, mode) are all correct in the download.file call as the second attempt at a url will often succeed. Is there anyway to get a deeper look at the cause of the problem? I've tried closing all connections in between each download. any pointers would be welcomed. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] diagnosing download.file() problems
That is what I feared. I know other people on slow connections have done this without issue ( at least they didnt report an issue ) I had a similar issue with geonames.org who at least published their terms of service (requests per second or sumptin like that.. so could program around it) I'll hunt around on their ftp and then write the admin a note.. i really don't want to brute force the matter it's government data made available for the public so I expect the admin will be helpful. Thanks for confirming what I suspected, for a minute [ .ed: for two days] I thought I had taken crazy pills. i did note, however, some odd behavior with tryCatch, where statements after the finally={} were executed. Not sure if that deserves a bug report. On Tue, Sep 21, 2010 at 2:33 AM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Tue, Sep 21, 2010 at 9:39 AM, steven mosher mosherste...@gmail.com wrote: I'm accessing around 95 tar files on an FTP server ranging in size between 10 and 40MB a piece. while certainly can click on them and download them outside of R, I'd like to have my script do it. Retrieving the ftp directory with RCurl works fine (about 90% of the time) but downloading the files by looping through all the files is a random process. I may get 1-8 files download and then it throws an error cannot open URL sometimes I only can get 1 file before this error. with tryCatch() I've been able to do some clean up after the crash, but automating this whole download process has turned into a bit of a circus. The parameters (url, destfile, mode) are all correct in the download.file call as the second attempt at a url will often succeed. Is there anyway to get a deeper look at the cause of the problem? I've tried closing all connections in between each download. any pointers would be welcomed. Sounds to me like the FTP server is operating some kind of rate limiting. Do you have access to the server log files, or the server administrator, or perhaps the server's terms and conditions to see if its so :) Barry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] get absolute file path
The package R.utils has a function to get absolutepath On Sun, Sep 26, 2010 at 1:00 AM, Sebastian Gibb li...@sebastiangibb.dewrote: Hello, I get a value which stores a relative file name. (I get it from another function, which I don't want to change.) e.g. fileName - ../data/2010-08.csv; Is it possible to get the absolute file path out of this value? (e.g. /home/sebastian/documents/data/2010-08.csv) Kind regards, Sebastian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Script auto-detecting its own path
in the package R.utils getAbsolutePath() or you can do a file.list(.. full.names=TRUE, recursive=TRUE,pattern=(.R)) the rest will require grep and pulling the file name and directory path apart If its not evident just ask and I'll write something for you. basically you want a call that returns the full path of a R script? On Mon, Oct 4, 2010 at 12:13 PM, Hadley Wickham had...@rice.edu wrote: I'm not sure this will solve the issue because if I move the script, I would still have to go into the script and edit the /path/to/my/script.r, or do I misunderstand your workaround? I'm looking for something like: file.path.is.here(myscript.r) and which would return something like: [1] c:/user/Desktop/ so that regardless of where the script is, as long as the accompanying scripts are in the same directory, they can be easily sourced with something like: dirX - file.path.is.here(MasterScript.r) source(paste(dirX, AuxillaryFile.r, sep=)) If you use relative paths like so: # master.r source(AuxillaryFile.r) Then source(path/to/master.r, chdir = T) will work. Mastering working directories is a much better idea than coming up with your own workarounds. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RE : R getting slower until it breaks...
I know its no consolation, but I have a similar issue with R on a MAC, also ploting out large numbers of raster layers. sometimes the problem lingers even after I clear the workspace, do gc() etc. Almost as if R wont ask for processor resources. weird. On Wed, Oct 6, 2010 at 12:06 PM, Bastien Ferland-Raymond bastien.ferland-raymon...@ulaval.ca wrote: Thanks a lot for your quick answer. Here is my answer to your questions: Have you looked to see how fast your memory might be growing? BFR- Yes I did, it's not to bad, it's starts around 60 000ko, rise up to 120 000 at the most, so not too scary. Are you leaving around any large objects that should be removed? BFR- I was carefull making sure the function doesn't create anything that would be visible with objects(). Could it be creating other type (hidden) objects? Maybe, but I'm not very familliar with that stuff. Have you looked to see if you are paging? BFR- I just red the wiki about paging, didn't know that term before. If I look at perfmon, its looks like keeping steady at 6000 pages/s with rare peaks as high as 900 000. Does it sounds normal? How can it affect R? Is it your CPU time that is increasing, or your wall clock time? BFR- If I go to the task manager - performance. R is initially using around 40% of the processor (so around 80% of 1 core) but with (real) time passing, it gets lower and lower to get as low as 6% (12% of one core). I was surprized to see that as usually my simulation in R use one whole core. It sounds like there might be some memory leak that might be causing your process size to grow and possibly causing paging. You will need to gather some of the performance data that perfmon can provide and look at the memory usage, CPU time and I/O rates over time to see if there are any changes. BFR- The term Memory leak feels right with my problem. Is there ways I can control/detect/prevent this kind of problem in R. Also, how can I check the I/O, i never looked at that before. Thanks again Bastien On Wed, Oct 6, 2010 at 2:11 PM, Bastien Ferland-Raymond bastien.ferland-raymon...@ulaval.ca wrote: Hello R-users, I'm currently facing a pretty hard problem which I'm hopping you'll be able to help me with. I'm using R to create images. That alone is not the problem, the problem is that I'm using R to create 168 000 images... My code (which is given below) use different package (raster and rgdal) to import a image (size 20gig) and divide it into 168 000 pictures that are 100 pixel x 100 pixel. The code works fine for making the images, but if I ask it to run all 168 000, it always breaks around 15 000. It starts with the code being able to make around 2 pictures per second, but then it slows down and after around 2000 pictures it's only 1 picture per second. Later on it's getting closer to 1 pictures every 3 seconds etc. until it bugs. I have no error message, only Windows that tells me that R encounter a problem and most be close... Initially I though it was a Windows problem, that I couldn't put too many file into a folder and it was slowering it down. Then I divided my batch process into smaller (5000 files) folder but it didn't help, still breaks at 15 000. I also try to do gc() after each 5000 pictures to save memory but it didn't help either. I removed every loops from the code because I thought it was the problem, but it was just faster at bugging... After the bug, I need to restart the computer if I want to go back to the initial speed. I'm pretty much running out of options. It's there limitation in R as the number of files it can create in one session? Is it a windows problem? Is there better way to clear the memory than gc()? Any thought on that? I'm using R 2.11.1, win XP, my hard drive is NTSF, computer: intel core2 duo E6750 32 bit with 2 gig of Ram. Here is my code, but I doubt it would help much with my problem: # It made of 4 functions (sorry, it's french): ## ## ### Ensemble des fonctions pour faire les images NDVI rouge et verte ### ## ## Bastien Ferland-Raymond, 5 oct 2010 # ## ## Simplement rouler le script au complet ### Library nécessaire: library(raster) library(rgdal) library(shapefiles) # ## Fonction 1 - NDVI a partir de coordonnee Pixel et largeur # calculate_NDVI- function(Type, object, VALUE) { redorgreen - ifelse(Type==red,2,3) list1 - unstack(object) rast1 - list1[[1]] rast2 - list1[[redorgreen]] NAvalue(rast1)-
Re: [R] RE : R getting slower until it breaks...
Thanks, haven't used valgrind in years, this should be fun. Steve On Wed, Oct 6, 2010 at 1:55 PM, Ben Bolker bbol...@gmail.com wrote: steven mosher moshersteven at gmail.com writes: I know its no consolation, but I have a similar issue with R on a MAC, also ploting out large numbers of raster layers. sometimes the problem lingers even after I clear the workspace, do gc() etc. Almost as if R wont ask for processor resources. If it is a memory leak, it might be worth reading up on the use of valgrind (section 4.3.2 in the 'R extensions' manual). The information provided by valgrind might not be immediately interpretable, but it could help others track down a problem ... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Loss of precision in read.csv.
Given a csv file from this location Airports-http://www.ourairports.com/data/airports.csv; download.file(Airports,basename(Airports)) airports -read.csv(airports.csv,encoding=UTF-8) airports[1,] id ident type name latitude_deg longitude_deg elevation_ft continent iso_country iso_region municipality scheduled_service 1 6523 00A heliport Total Rf Heliport *40.0708 -74.9336 * 11 NA US US-PA Bensalemno gps_code iata_code local_code home_link wikipedia_link keywords 1 00A 00A And the precision is lost which we can show by using readLines: fred-readLines(airports.csv) fred[2] [1] 6523,\00A\,\heliport\,\Total Rf Heliport\,* 40.07080078125,-74.9336013793945* ,11,\NA\,\US\,\US-PA\,\Bensalem\,\no\,\00A\,,\00A\,,, I tried various approaches, using colClasses, switching to read.tables, specifying dec=. I tested read.csv and it does preserve precision on my test case, but not on this data. Ideas? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loss of precision in read.csv.
Ha Thanks, That was it. On Sat, Oct 9, 2010 at 2:38 PM, Joshua Wiley jwiley.ps...@gmail.com wrote: Hi Steven, As near as I can tell, no precision is lost. R is just being courteous and not excessively filling our consoles. Try: print(airports[1,latitude_deg], digits = 22) which is the most digits R will print (although internally it can store more I believe). Alternately, you can convert it to character class: as.character(airports[1, ]) So in short, this is just a cosmetic feature of presenting the data, not its actual storage. Cheers, Josh On Sat, Oct 9, 2010 at 1:33 PM, steven mosher mosherste...@gmail.com wrote: Given a csv file from this location Airports-http://www.ourairports.com/data/airports.csv; download.file(Airports,basename(Airports)) airports -read.csv(airports.csv,encoding=UTF-8) airports[1,] id ident type name latitude_deg longitude_deg elevation_ft continent iso_country iso_region municipality scheduled_service 1 6523 00A heliport Total Rf Heliport *40.0708 -74.9336 * 11 NA US US-PA Bensalemno gps_code iata_code local_code home_link wikipedia_link keywords 1 00A 00A And the precision is lost which we can show by using readLines: fred-readLines(airports.csv) fred[2] [1] 6523,\00A\,\heliport\,\Total Rf Heliport\,* 40.07080078125,-74.9336013793945* ,11,\NA\,\US\,\US-PA\,\Bensalem\,\no\,\00A\,,\00A\,,, I tried various approaches, using colClasses, switching to read.tables, specifying dec=. I tested read.csv and it does preserve precision on my test case, but not on this data. Ideas? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Read from a website
Hmm, RCurl might have something on this. otherwise pull you can figure out their scheme and just construct the url from scratch. when you finish filling in the form, look at the url they construct. do it a few times and you can just emulate that. I've done that in the past without problems. depends on the site. On Tue, Oct 12, 2010 at 2:32 AM, Santosh Srinivas santosh.srini...@gmail.com wrote: Something similar to this was discussed recently, but I'm unable to find the thread. I want to read from a site where I need to enter the date into a form before I am presented with the CSV link. E.g. like reading ticker data from yahoo (but assuming you HAVE to enter the dates and click on request). How do I simulate this from R? Thanks for the help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating a Maximum for a row or column with NA's
Is there a simple way to calculate the maximum for a row or column of a matrix when there are NA,s present. # given a matrix that has any number of NA per row m-matrix(c(seq(1,9)),nrow=3) m [,1] [,2] [,3] [1,]147 [2,]258 [3,]369 m[3,1]=NA m[1,]=NA m [,1] [,2] [,3] [1,] NA NA NA [2,]258 [3,] NA69 # applying max to rows doesnt work as max returns # NA if any of the elements is NA. row_max-apply(m,1,max) row_max [1] NA 8 NA # my desired result given m would be: # NA, 8, 9 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating a Maximum for a row or column with NA's
Ya I got that result but fixing it was a mystery. especially since I will eventually want to subtract the row max from the row Min ( or calculate the range) if a matrix thus is: [,1] [,2] [,3] [1,] NA NA NA [2,]258 [3,] NA69 and apply(m,1,max,na.rm=TRUE) yeilds [1] -Inf89 Then rowmin yeilds [1] -Inf26 need to see what happens if I subtract these two vectors. [,1] [,2] [,3] [1,] NA NA NA [2,]258 [3,] NA69 rmax-apply(m,1,max,na.rm=TRUE) Warning message: In FUN(newX[, i], ...) : no non-missing arguments to max; returning -Inf rmax [1] -Inf89 rmin-apply(m,1,min,na.rm=TRUE) Warning message: In FUN(newX[, i], ...) : no non-missing arguments to min; returning Inf rmin [1] Inf 2 6 rmax-rmin [1] -Inf63 rrange-rmax-rmin rrange [1] -Inf63 The final maxtrix may have a large number of these -Inf.. I Was looking at maxtrixStats package but it still beta On Sat, Apr 17, 2010 at 10:01 PM, David Winsemius dwinsem...@comcast.netwrote: On Apr 18, 2010, at 12:16 AM, steven mosher wrote: Is there a simple way to calculate the maximum for a row or column of a matrix when there are NA,s present. # given a matrix that has any number of NA per row m-matrix(c(seq(1,9)),nrow=3) m [,1] [,2] [,3] [1,]147 [2,]258 [3,]369 m[3,1]=NA m[1,]=NA m [,1] [,2] [,3] [1,] NA NA NA [2,]258 [3,] NA69 # applying max to rows doesnt work as max returns # NA if any of the elements is NA. row_max-apply(m,1,max) row_max [1] NA 8 NA # my desired result given m would be: # NA, 8, 9 Not exactly your desired result, but surely you could fix that: row_max-apply(m,1,max, na.rm=TRUE) Warning message: In FUN(newX[, i], ...) : no non-missing arguments to max; returning -Inf row_max [1] -Inf89 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating a Maximum for a row or column with NA's
thx jorge, I was playing around with ifelse to solve the problem, but was unaware of the all(is.na(x)). RTFM I guess. On Sat, Apr 17, 2010 at 10:31 PM, Jorge Ivan Velez jorgeivanve...@gmail.com wrote: Hi Steven, Try this: R apply(m,1, function(x) ifelse(all(is.na(x)), NA, max(x, na.rm = TRUE))) [1] NA 8 9 See ?ifelse, ?all and ?max for more information. HTH, Jorge On Sun, Apr 18, 2010 at 12:16 AM, steven mosher wrote: Is there a simple way to calculate the maximum for a row or column of a matrix when there are NA,s present. # given a matrix that has any number of NA per row m-matrix(c(seq(1,9)),nrow=3) m [,1] [,2] [,3] [1,]147 [2,]258 [3,]369 m[3,1]=NA m[1,]=NA m [,1] [,2] [,3] [1,] NA NA NA [2,]258 [3,] NA69 # applying max to rows doesnt work as max returns # NA if any of the elements is NA. row_max-apply(m,1,max) row_max [1] NA 8 NA # my desired result given m would be: # NA, 8, 9 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating a Maximum for a row or column with NA's
Henrik, Thanks! I was just recommending the package to another fellow who is learning R as I am. I was going crazy. Jorge gave me a solution that works, however the data set I'm working with is huge so I'm hoping that switching to your package will give both readability and performance improvements. On Sun, Apr 18, 2010 at 2:47 AM, Henrik Bengtsson h...@stat.berkeley.eduwrote: On Sun, Apr 18, 2010 at 7:26 AM, steven mosher mosherste...@gmail.com wrote: Ya I got that result but fixing it was a mystery. especially since I will eventually want to subtract the row max from the row Min ( or calculate the range) if a matrix thus is: [,1] [,2] [,3] [1,] NA NA NA [2,]258 [3,] NA69 and apply(m,1,max,na.rm=TRUE) yeilds [1] -Inf89 Then rowmin yeilds [1] -Inf26 need to see what happens if I subtract these two vectors. [,1] [,2] [,3] [1,] NA NA NA [2,]258 [3,] NA69 rmax-apply(m,1,max,na.rm=TRUE) Warning message: In FUN(newX[, i], ...) : no non-missing arguments to max; returning -Inf rmax [1] -Inf89 rmin-apply(m,1,min,na.rm=TRUE) Warning message: In FUN(newX[, i], ...) : no non-missing arguments to min; returning Inf rmin [1] Inf 2 6 rmax-rmin [1] -Inf63 rrange-rmax-rmin rrange [1] -Inf63 The final maxtrix may have a large number of these -Inf.. I Was looking at maxtrixStats package but it still beta The matrixStats package is labelled beta, because the author of it is *extremely* picky when it comes to bumping code up to be labelled release; he often requires a code base to be stable for years before removing the label beta. I would give matrixStats' rowMaxs() a try. /Henrik (author of matrixStats) On Sat, Apr 17, 2010 at 10:01 PM, David Winsemius dwinsem...@comcast.netwrote: On Apr 18, 2010, at 12:16 AM, steven mosher wrote: Is there a simple way to calculate the maximum for a row or column of a matrix when there are NA,s present. # given a matrix that has any number of NA per row m-matrix(c(seq(1,9)),nrow=3) m [,1] [,2] [,3] [1,]147 [2,]258 [3,]369 m[3,1]=NA m[1,]=NA m [,1] [,2] [,3] [1,] NA NA NA [2,]258 [3,] NA69 # applying max to rows doesnt work as max returns # NA if any of the elements is NA. row_max-apply(m,1,max) row_max [1] NA 8 NA # my desired result given m would be: # NA, 8, 9 Not exactly your desired result, but surely you could fix that: row_max-apply(m,1,max, na.rm=TRUE) Warning message: In FUN(newX[, i], ...) : no non-missing arguments to max; returning -Inf row_max [1] -Inf89 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Noobie question on aggregate tapply and by
I have a 43MB dataframe ( 5 variables) and I'm trying to summarize subsets of the data. I've RTFM ( not very clear) and looked at a variety of samples but cant seem to figure out how to make these functions work. A sample of what I want to do would be this: ids-seq(1,50) years-c(rep(5,10),rep(6,10),rep(7,10),rep(8,20)) data-c(rep(23.2,7),rep(14.2,17),rep(29.2,6),rep(13.4,10),rep(16.3,5), NA, rep(40,4)) data2-c(rep(22.2,5),rep(13.2,8),NA, rep(29.8,16),rep(12.4,10),rep(16.3,5), rep(38,5)) DF-data.frame(ids,years,data,data2) That will give you a dataframe that is a good analog of what I have. i would like to calculate means ( with NA removed na.rm) for each level of years. data data2 5 xx. yy. 6 xx yz 7 ... ,,, 8 .. ... And then things like this: 5-7 : xx yy 8 :xy zz [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Noobie question on aggregate tapply and by
Thanks I'll try that, still need to understand how the other functions work.. just to satisfy myself..thanks again On Sun, Apr 25, 2010 at 12:06 AM, Tal Galili tal.gal...@gmail.com wrote: Here is one solution for your question: mean.data - with(DF, tapply(data, years, mean, na.rm = T)) mean.data2 - with(DF, tapply(data2, years, mean, na.rm = T)) cbind(mean.data , mean.data2) Another one would be for you to read about the package plyr (which is better for this job, actually) And regarding the years being recoded, look at either: ?cut or ?recode (from the car package) Best, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Sun, Apr 25, 2010 at 9:29 AM, steven mosher mosherste...@gmail.comwrote: I have a 43MB dataframe ( 5 variables) and I'm trying to summarize subsets of the data. I've RTFM ( not very clear) and looked at a variety of samples but cant seem to figure out how to make these functions work. A sample of what I want to do would be this: ids-seq(1,50) years-c(rep(5,10),rep(6,10),rep(7,10),rep(8,20)) data-c(rep(23.2,7),rep(14.2,17),rep(29.2,6),rep(13.4,10),rep(16.3,5), NA, rep(40,4)) data2-c(rep(22.2,5),rep(13.2,8),NA, rep(29.8,16),rep(12.4,10),rep(16.3,5), rep(38,5)) DF-data.frame(ids,years,data,data2) That will give you a dataframe that is a good analog of what I have. i would like to calculate means ( with NA removed na.rm) for each level of years. data data2 5 xx. yy. 6 xx yz 7 ... ,,, 8 .. ... And then things like this: 5-7 : xx yy 8 :xy zz [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Noobie question on aggregate tapply and by
thx I was struggling with the DF[,3:4] part of it On Sun, Apr 25, 2010 at 10:47 AM, John Kane jrkrid...@yahoo.ca wrote: Here's one way with aggregate() library(car) # You probably will need to install it. aggregate(DF[,3-4], by=list(years), mean,na.rm=TRUE) recode(x, c(1,2)='A'; else='B') DF$years - recode(DF$years, c(5,6,7)= '5-7') DF You may also want to have a look at the reshape and plyr packages. --- On Sun, 4/25/10, steven mosher mosherste...@gmail.com wrote: From: steven mosher mosherste...@gmail.com Subject: [R] Noobie question on aggregate tapply and by To: r-help r-help@r-project.org Received: Sunday, April 25, 2010, 2:29 AM I have a 43MB dataframe ( 5 variables) and I'm trying to summarize subsets of the data. I've RTFM ( not very clear) and looked at a variety of samples but cant seem to figure out how to make these functions work. A sample of what I want to do would be this: ids-seq(1,50) years-c(rep(5,10),rep(6,10),rep(7,10),rep(8,20)) data-c(rep(23.2,7),rep(14.2,17),rep(29.2,6),rep(13.4,10),rep(16.3,5), NA, rep(40,4)) data2-c(rep(22.2,5),rep(13.2,8),NA, rep(29.8,16),rep(12.4,10),rep(16.3,5), rep(38,5)) DF-data.frame(ids,years,data,data2) That will give you a dataframe that is a good analog of what I have. i would like to calculate means ( with NA removed na.rm) for each level of years. data data2 5 xx. yy. 6 xx yz 7 ... ,,, 8 .. ... And then things like this: 5-7 : xx yy 8 :xy zz [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Tapply.
Having some difficulties with understanding how tapply works and getting return values I expect Data: dataframe. DF DF$Id $D $Year... Id D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237 NA NA 11264402000 0 1981 NA NA 243 244 NA NA NA NA 225 NA 231 NA 11264402000 1 1981 NA 251 NA 248 241 NA NA NA 235 NA NA 245 11264402000 0 1982 236 237 242 240 242 205 199 NA NA NA NA NA 11264402000 1 1982 236 NA NA 240 242 NA NA NA NA NA NA NA 11264402000 0 1983 NA 247 NA NA NA NA NA 205 NA NA NA NA 11264402000 1 1983 NA 247 NA NA NA NA NA NA NA 225 NA NA 11264402000 0 1986 NA NA NA 240 NA NA NA 213 NA NA NA NA 11264402000 0 1987 241 NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 1 1987 NA NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 3 1987 NA NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 0 1988 238 246 249 NA 244 213 212 224 232 238 232 230 11264402000 1 1988 238 246 249 246 244 213 212 224 232 NA NA 230 11264402000 3 1988 238 246 249 246 244 213 212 224 232 NA NA 230 11264402000 0 1989 232 233 238 239 231 NA 215 NA NA NA NA 238 11264402000 1 1989 232 233 238 239 231 NA NA NA NA NA NA 238 11264402000 3 1989 232 233 238 239 231 NA NA NA NA NA NA 238 and the result should be a dataframe of column means by year with the variable D dropped (or kept doesnt matter) 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237 NA NA 11264402000.5 1981 NA NA 243 244 NA NA NA NA 225 NA 231 NA 11264402000.5 1982 236 237 242 240 242 205 199 NA NA NA NA NA 11264402000.5 1983 NA 247 NA NA NA NA NA 205 NA 225 NA NA 112644020001 1986 NA NA NA 240 NA NA NA 213 NA NA NA NA 11264402000 2 1987 241 NA NA NA NA 218 NA NA 235 243 240 NA 112644020001.33 1988 238 246 249 246 244 213 212 224 232 238 232 230 112644020001.33 1989 232 233 238 239 231 NA 215 NA NA NA NA 238 It would seem that Tapply should work result-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T) but i get errors about the length of arguments, which [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tapply.
I've tried both mean and colMean. I did success with one attempt using mean, however if only have 1 year and its a NA then I get NaN ( which I can replace). I'll keep trying. On Mon, Apr 26, 2010 at 12:26 AM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi r-help-boun...@r-project.org napsal dne 26.04.2010 06:52:55: Having some difficulties with understanding how tapply works and getting return values I expect Data: dataframe. DF DF$Id $D $Year... Id D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237 NA NA 11264402000 0 1981 NA NA 243 244 NA NA NA NA 225 NA 231 NA 11264402000 1 1981 NA 251 NA 248 241 NA NA NA 235 NA NA 245 11264402000 0 1982 236 237 242 240 242 205 199 NA NA NA NA NA 11264402000 1 1982 236 NA NA 240 242 NA NA NA NA NA NA NA 11264402000 0 1983 NA 247 NA NA NA NA NA 205 NA NA NA NA 11264402000 1 1983 NA 247 NA NA NA NA NA NA NA 225 NA NA 11264402000 0 1986 NA NA NA 240 NA NA NA 213 NA NA NA NA 11264402000 0 1987 241 NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 1 1987 NA NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 3 1987 NA NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 0 1988 238 246 249 NA 244 213 212 224 232 238 232 230 11264402000 1 1988 238 246 249 246 244 213 212 224 232 NA NA 230 11264402000 3 1988 238 246 249 246 244 213 212 224 232 NA NA 230 11264402000 0 1989 232 233 238 239 231 NA 215 NA NA NA NA 238 11264402000 1 1989 232 233 238 239 231 NA NA NA NA NA NA 238 11264402000 3 1989 232 233 238 239 231 NA NA NA NA NA NA 238 and the result should be a dataframe of column means by year with the variable D dropped (or kept doesnt matter) 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237 NA NA 11264402000.5 1981 NA NA 243 244 NA NA NA NA 225 NA 231 NA 11264402000.5 1982 236 237 242 240 242 205 199 NA NA NA NA NA 11264402000.5 1983 NA 247 NA NA NA NA NA 205 NA 225 NA NA 112644020001 1986 NA NA NA 240 NA NA NA 213 NA NA NA NA 11264402000 2 1987 241 NA NA NA NA 218 NA NA 235 243 240 NA 112644020001.33 1988 238 246 249 246 244 213 212 224 232 238 232 230 112644020001.33 1989 232 233 238 239 231 NA 215 NA NA NA NA 238 It would seem that Tapply should work result-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T) Why colMeans? It is function used instead of apply(...,.. ,mean). Maybe you want result-tapply( DF[,1:15], DF$Year, mean,na.rm=T) Regards Petr but i get errors about the length of arguments, which [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tapply.
That fails: The manual says: tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE) ArgumentsXan atomic object, typically a vector.INDEXlist of factors, each of same length as X. The elements are coerced to factors by as.factorhttp://127.0.0.1:31214/library/base/help/as.factor . my error says: Error in tapply(DF[, 1:15], DF$Year, mean, na.rm = T) : arguments must have same length The issue that I have is I dont understand what the requirements for the list of factors are. In my example DF$Years is a sequence of years..1979,1980,1982,1983, 1987.. like that with missing years: so when the manual say: list of factors each the same length as X? what does that mean? I could have a DF with 20 rows and only two different years. or 20 rows and 20 different years. Suppose: a- c(1,2,3,4) b-c(2,3,4,5) df=data.frame(a,b) length(df) The length of DF is 2. Does that mean the list of factors, each of same length as X. would have to be 2? that doesnt seem to make sense. On Mon, Apr 26, 2010 at 12:26 AM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi r-help-boun...@r-project.org napsal dne 26.04.2010 06:52:55: Having some difficulties with understanding how tapply works and getting return values I expect Data: dataframe. DF DF$Id $D $Year... Id D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237 NA NA 11264402000 0 1981 NA NA 243 244 NA NA NA NA 225 NA 231 NA 11264402000 1 1981 NA 251 NA 248 241 NA NA NA 235 NA NA 245 11264402000 0 1982 236 237 242 240 242 205 199 NA NA NA NA NA 11264402000 1 1982 236 NA NA 240 242 NA NA NA NA NA NA NA 11264402000 0 1983 NA 247 NA NA NA NA NA 205 NA NA NA NA 11264402000 1 1983 NA 247 NA NA NA NA NA NA NA 225 NA NA 11264402000 0 1986 NA NA NA 240 NA NA NA 213 NA NA NA NA 11264402000 0 1987 241 NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 1 1987 NA NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 3 1987 NA NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 0 1988 238 246 249 NA 244 213 212 224 232 238 232 230 11264402000 1 1988 238 246 249 246 244 213 212 224 232 NA NA 230 11264402000 3 1988 238 246 249 246 244 213 212 224 232 NA NA 230 11264402000 0 1989 232 233 238 239 231 NA 215 NA NA NA NA 238 11264402000 1 1989 232 233 238 239 231 NA NA NA NA NA NA 238 11264402000 3 1989 232 233 238 239 231 NA NA NA NA NA NA 238 and the result should be a dataframe of column means by year with the variable D dropped (or kept doesnt matter) 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237 NA NA 11264402000.5 1981 NA NA 243 244 NA NA NA NA 225 NA 231 NA 11264402000.5 1982 236 237 242 240 242 205 199 NA NA NA NA NA 11264402000.5 1983 NA 247 NA NA NA NA NA 205 NA 225 NA NA 112644020001 1986 NA NA NA 240 NA NA NA 213 NA NA NA NA 11264402000 2 1987 241 NA NA NA NA 218 NA NA 235 243 240 NA 112644020001.33 1988 238 246 249 246 244 213 212 224 232 238 232 230 112644020001.33 1989 232 233 238 239 231 NA 215 NA NA NA NA 238 It would seem that Tapply should work result-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T) Why colMeans? It is function used instead of apply(...,.. ,mean). Maybe you want result-tapply( DF[,1:15], DF$Year, mean,na.rm=T) Regards Petr but i get errors about the length of arguments, which [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tapply.
Thanks, I was trying to stick with the base package and figure out how the base routines worked. I looked at plyer and it was very appealing. I guess i'll give in and use it On Mon, Apr 26, 2010 at 2:33 AM, Dennis Murphy djmu...@gmail.com wrote: Hi: Use of ddply() in the plyr package appears to work. library(plyr) ddply(df[, -1], .(Year), colwise(mean), na.rm = TRUE) D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1 1.00 1980 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN 2 0.50 1981 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245 3 0.50 1982 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN 4 0.50 1983 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN 5 0.00 1986 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN 6 1.33 1987 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN 7 1.33 1988 238 246 249 246 244 213 212 224 232 238 232 230 8 1.33 1989 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238 Replace the NaNs with NAs and that should do it HTH, Dennis On Sun, Apr 25, 2010 at 9:52 PM, steven mosher mosherste...@gmail.comwrote: Having some difficulties with understanding how tapply works and getting return values I expect Data: dataframe. DF DF$Id $D $Year... Id D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237 NA NA 11264402000 0 1981 NA NA 243 244 NA NA NA NA 225 NA 231 NA 11264402000 1 1981 NA 251 NA 248 241 NA NA NA 235 NA NA 245 11264402000 0 1982 236 237 242 240 242 205 199 NA NA NA NA NA 11264402000 1 1982 236 NA NA 240 242 NA NA NA NA NA NA NA 11264402000 0 1983 NA 247 NA NA NA NA NA 205 NA NA NA NA 11264402000 1 1983 NA 247 NA NA NA NA NA NA NA 225 NA NA 11264402000 0 1986 NA NA NA 240 NA NA NA 213 NA NA NA NA 11264402000 0 1987 241 NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 1 1987 NA NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 3 1987 NA NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 0 1988 238 246 249 NA 244 213 212 224 232 238 232 230 11264402000 1 1988 238 246 249 246 244 213 212 224 232 NA NA 230 11264402000 3 1988 238 246 249 246 244 213 212 224 232 NA NA 230 11264402000 0 1989 232 233 238 239 231 NA 215 NA NA NA NA 238 11264402000 1 1989 232 233 238 239 231 NA NA NA NA NA NA 238 11264402000 3 1989 232 233 238 239 231 NA NA NA NA NA NA 238 and the result should be a dataframe of column means by year with the variable D dropped (or kept doesnt matter) 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237 NA NA 11264402000.5 1981 NA NA 243 244 NA NA NA NA 225 NA 231 NA 11264402000.5 1982 236 237 242 240 242 205 199 NA NA NA NA NA 11264402000.5 1983 NA 247 NA NA NA NA NA 205 NA 225 NA NA 112644020001 1986 NA NA NA 240 NA NA NA 213 NA NA NA NA 11264402000 2 1987 241 NA NA NA NA 218 NA NA 235 243 240 NA 112644020001.33 1988 238 246 249 246 244 213 212 224 232 238 232 230 112644020001.33 1989 232 233 238 239 231 NA 215 NA NA NA NA 238 It would seem that Tapply should work result-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T) but i get errors about the length of arguments, which [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tapply.
I guess my problem was seeing a bunch of examples where they pulled a variable from a dataframe.. tapply(df$data, index=list(.. and I assumed that the df$data was just generalizable to a collection of vectors a vector of vector being a vector Thanks. On Mon, Apr 26, 2010 at 2:43 AM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi steven mosher mosherste...@gmail.com napsal dne 26.04.2010 10:21:37: That fails: The manual says: tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE) Arguments X an atomic object, typically a vector. INDEX list of factors, each of same length as X. The elements are coerced to factors by as.factor. my error says: Error in tapply(DF[, 1:15], DF$Year, mean, na.rm = T) : arguments must have same length The issue that I have is I dont understand what the requirements for the list of factors are. In my example DF$Years is a sequence of years..1979,1980,1982,1983, 1987.. like that with missing years: so when the manual say: list of factors each the same length as X? what does that mean? I could have a DF with 20 rows and only two different years. or 20 rows and 20 different years. Suppose: a- c(1,2,3,4) b-c(2,3,4,5) df=data.frame(a,b) length(df) data frame is not vector nor atomic but list hence length(df) gives you number of columns. It is similar to length of a list lll-list(a=1, b=2, c=3) length(lll) [1] 3 If you accept that the first argument of tapply has to be vector you can not put data frame there. Next second argument has to be list of factors so you can put there several factors, each of the same length as first argument (a vector). If you want to perform aggregating operation on whole data frame you shall consider ?by or ?aggregate Other options are plyr or doBy packages. Syntax for aggregate is quite similar to tapply, only first argument can be data frame. Regards Petr The length of DF is 2. Does that mean the list of factors, each of same length as X. would have to be 2? that doesnt seem to make sense. On Mon, Apr 26, 2010 at 12:26 AM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi r-help-boun...@r-project.org napsal dne 26.04.2010 06:52:55: Having some difficulties with understanding how tapply works and getting return values I expect Data: dataframe. DF DF$Id $D $Year... Id D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237 NA NA 11264402000 0 1981 NA NA 243 244 NA NA NA NA 225 NA 231 NA 11264402000 1 1981 NA 251 NA 248 241 NA NA NA 235 NA NA 245 11264402000 0 1982 236 237 242 240 242 205 199 NA NA NA NA NA 11264402000 1 1982 236 NA NA 240 242 NA NA NA NA NA NA NA 11264402000 0 1983 NA 247 NA NA NA NA NA 205 NA NA NA NA 11264402000 1 1983 NA 247 NA NA NA NA NA NA NA 225 NA NA 11264402000 0 1986 NA NA NA 240 NA NA NA 213 NA NA NA NA 11264402000 0 1987 241 NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 1 1987 NA NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 3 1987 NA NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 0 1988 238 246 249 NA 244 213 212 224 232 238 232 230 11264402000 1 1988 238 246 249 246 244 213 212 224 232 NA NA 230 11264402000 3 1988 238 246 249 246 244 213 212 224 232 NA NA 230 11264402000 0 1989 232 233 238 239 231 NA 215 NA NA NA NA 238 11264402000 1 1989 232 233 238 239 231 NA NA NA NA NA NA 238 11264402000 3 1989 232 233 238 239 231 NA NA NA NA NA NA 238 and the result should be a dataframe of column means by year with the variable D dropped (or kept doesnt matter) 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237 NA NA 11264402000.5 1981 NA NA 243 244 NA NA NA NA 225 NA 231 NA 11264402000.5 1982 236 237 242 240 242 205 199 NA NA NA NA NA 11264402000.5 1983 NA 247 NA NA NA NA NA 205 NA 225 NA NA 112644020001 1986 NA NA NA 240 NA NA NA 213 NA NA NA NA 11264402000 2 1987 241 NA NA NA NA 218 NA NA 235 243 240 NA 112644020001.33 1988 238 246 249 246 244 213 212 224 232 238 232 230 112644020001.33 1989 232 233 238 239 231 NA 215 NA NA NA NA 238 It would seem that Tapply should work result-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T) Why colMeans? It is function used instead of apply(...,.. ,mean). Maybe you want result-tapply( DF[,1:15], DF$Year, mean,na.rm=T) Regards Petr but i get errors about the length of arguments
Re: [R] Tapply.
Thanks dennis. Is there a book on R u could recommend. On Mon, Apr 26, 2010 at 7:12 PM, Dennis Murphy djmu...@gmail.com wrote: Hi: On Mon, Apr 26, 2010 at 8:01 AM, steven mosher mosherste...@gmail.comwrote: Thanks, I was trying to stick with the base package and figure out how the base routines worked. If you want to use base functions, then here's a solution with aggregate: (the Id column was removed first): with(DF, aggregate(DF[, -2], list(Year = Year), FUN = mean, na.rm = TRUE)) YearD Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1 1980 1.00 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN 2 1981 0.50 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245 3 1982 0.50 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN 4 1983 0.50 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN 5 1986 0.00 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN 6 1987 1.33 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN 7 1988 1.33 238 246 249 246 244 213 212 224 232 238 232 230 8 1989 1.33 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238 The problem with tapply() is that the function has to be called recursively on each column you want to summarize. You could do it in a loop: res - matrix(NA, 8, 14) res[, 1] - unique(DF$Year) res[, 2] - with(DF, tapply(D, Year, mean, na.rm = TRUE)) for(j in 3:14) res[, j] - tapply(DF[, j], DF$Year, mean, na.rm = TRUE) res [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [1,] 1980 1.00 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN [2,] 1981 0.50 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 [3,] 1982 0.50 236 237 242 240 242 205 199 NaN NaN NaN NaN [4,] 1983 0.50 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN [5,] 1986 0.00 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN [6,] 1987 1.33 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 [7,] 1988 1.33 238 246 249 246 244 213 212 224 232 238 232 [8,] 1989 1.33 232 233 238 239 231 NaN 215 NaN NaN NaN NaN [,14] [1,] NaN [2,] 245 [3,] NaN [4,] NaN [5,] NaN [6,] NaN [7,] 230 [8,] 238 but it's not the most efficient way to do things. Essentially, this approach conforms to the 'split-apply-combine' strategy which is more efficiently implemented in functions like aggregate() or in packages such as doBy, plyr, reshape and data.table, some of which were mentioned earlier by Petr Pikal. HTH, Dennis On Mon, Apr 26, 2010 at 8:01 AM, steven mosher mosherste...@gmail.comwrote: Thanks, I was trying to stick with the base package and figure out how the base routines worked. I looked at plyer and it was very appealing. I guess i'll give in and use it On Mon, Apr 26, 2010 at 2:33 AM, Dennis Murphy djmu...@gmail.com wrote: Hi: Use of ddply() in the plyr package appears to work. library(plyr) ddply(df[, -1], .(Year), colwise(mean), na.rm = TRUE) D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1 1.00 1980 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN 2 0.50 1981 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245 3 0.50 1982 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN 4 0.50 1983 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN 5 0.00 1986 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN 6 1.33 1987 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN 7 1.33 1988 238 246 249 246 244 213 212 224 232 238 232 230 8 1.33 1989 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238 Replace the NaNs with NAs and that should do it HTH, Dennis On Sun, Apr 25, 2010 at 9:52 PM, steven mosher mosherste...@gmail.comwrote: Having some difficulties with understanding how tapply works and getting return values I expect Data: dataframe. DF DF$Id $D $Year... Id D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237 NA NA 11264402000 0 1981 NA NA 243 244 NA NA NA NA 225 NA 231 NA 11264402000 1 1981 NA 251 NA 248 241 NA NA NA 235 NA NA 245 11264402000 0 1982 236 237 242 240 242 205 199 NA NA NA NA NA 11264402000 1 1982 236 NA NA 240 242 NA NA NA NA NA NA NA 11264402000 0 1983 NA 247 NA NA NA NA NA 205 NA NA NA NA 11264402000 1 1983 NA 247 NA NA NA NA NA NA NA 225 NA NA 11264402000 0 1986 NA NA NA 240 NA NA NA 213 NA NA NA NA 11264402000 0 1987 241 NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 1 1987 NA NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 3 1987 NA NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 0 1988 238 246 249 NA 244 213 212 224 232 238 232 230 11264402000 1 1988 238 246 249 246
Re: [R] Tapply.
Thanks, I had been wondering what Drop did. That makes it more clear. While I have code that loops and does the problem correctly, I wanted to do things the R way and be fast and terse. hehe. So: ID dy jan ... 11264402000 1 1987 NA NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 3 1987 NA NA NA NA NA 218 NA NA 235 243 240 NA in words : for each id, for each year return the max of jan,feb,.over d the min of jan, feb over d the mean of jan,feb.. over d the (max+min)/2 of jan, feb...over d the count of d for jan.feb.. the results of a function called with all elements of this id Anyway, your kind attention has been greatly appreciated. On Tue, Apr 27, 2010 at 2:40 AM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi r-help-boun...@r-project.org napsal dne 26.04.2010 17:05:54: I guess my problem was seeing a bunch of examples where they pulled a variable from a dataframe.. tapply(df$data, index=list(.. df$data results in vector so as eg. df[,5] unless you use drop=FALSE option and I assumed that the df$data was just generalizable to a collection of vectors a vector of vector being a vector df[,1:15] is not a vector of vectors. R sometimes can give you nasty surprise with object types and modes but changing a type of object merely by selecting some part of it wold be quite problematic. see str(df$data) str(df[, 1]) str(df[,1, drop=FALSE]) str(df[,1:15]) Regards Petr Thanks. On Mon, Apr 26, 2010 at 2:43 AM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi steven mosher mosherste...@gmail.com napsal dne 26.04.2010 10:21:37: That fails: The manual says: tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE) Arguments X an atomic object, typically a vector. INDEX list of factors, each of same length as X. The elements are coerced to factors by as.factor. my error says: Error in tapply(DF[, 1:15], DF$Year, mean, na.rm = T) : arguments must have same length The issue that I have is I dont understand what the requirements for the list of factors are. In my example DF$Years is a sequence of years..1979,1980,1982,1983, 1987.. like that with missing years: so when the manual say: list of factors each the same length as X? what does that mean? I could have a DF with 20 rows and only two different years. or 20 rows and 20 different years. Suppose: a- c(1,2,3,4) b-c(2,3,4,5) df=data.frame(a,b) length(df) data frame is not vector nor atomic but list hence length(df) gives you number of columns. It is similar to length of a list lll-list(a=1, b=2, c=3) length(lll) [1] 3 If you accept that the first argument of tapply has to be vector you can not put data frame there. Next second argument has to be list of factors so you can put there several factors, each of the same length as first argument (a vector). If you want to perform aggregating operation on whole data frame you shall consider ?by or ?aggregate Other options are plyr or doBy packages. Syntax for aggregate is quite similar to tapply, only first argument can be data frame. Regards Petr The length of DF is 2. Does that mean the list of factors, each of same length as X. would have to be 2? that doesnt seem to make sense. On Mon, Apr 26, 2010 at 12:26 AM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi r-help-boun...@r-project.org napsal dne 26.04.2010 06:52:55: Having some difficulties with understanding how tapply works and getting return values I expect Data: dataframe. DF DF$Id $D $Year... Id D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237 NA NA 11264402000 0 1981 NA NA 243 244 NA NA NA NA 225 NA 231 NA 11264402000 1 1981 NA 251 NA 248 241 NA NA NA 235 NA NA 245 11264402000 0 1982 236 237 242 240 242 205 199 NA NA NA NA NA 11264402000 1 1982 236 NA NA 240 242 NA NA NA NA NA NA NA 11264402000 0 1983 NA 247 NA NA NA NA NA 205 NA NA NA NA 11264402000 1 1983 NA 247 NA NA NA NA NA NA NA 225 NA NA 11264402000 0 1986 NA NA NA 240 NA NA NA 213 NA NA NA NA 11264402000 0 1987 241 NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 1 1987 NA NA NA NA NA 218 NA NA 235 243 240 NA 11264402000 3 1987 NA NA NA NA NA 218 NA NA 235 243 240 NA 11264402000
Re: [R] closest match in R to c-like struct?
I was talking with another guy on the list about this very topic. A simple example would help. first a sample C struct, and then how one would do the equivalent in R. In the end i suppose one want to do a an 'array' of these structs, or list of the structs. On Sat, May 1, 2010 at 8:04 AM, Ted Harding ted.hard...@manchester.ac.ukwrote: On 01-May-10 14:46:28, Giovanni Azua wrote: Hello, What would be in R the closest match to a c-struct? e.g. data.frame requires all elements to be of the same length ... or is there a way to circumvent this? TIA, Best regards, Giovanni Well, 'list' must be pretty close! The main difference would be that in C the structure type would be declared first, and then applied to create an object with that structure, whereas an R lists are created straight off. If you want to set up a generic list type for a certain purpose, you would wrap its definition in a function. Another difference is that R lacks the pointer type, so that R's mylist$component is the equivalent of C's mylist.component; I don't think you can do the equivalent in R of C's mylist-component (though I'm likely to be wrong about that, and to be promptly corrected)! Hopingb this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 01-May-10 Time: 16:04:06 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] closest match in R to c-like struct?
Ya, thats a common one. also writing a struct to file and reading a struct from file. mostly in R if I have multiple returns, I'm just talking two or three values so i return a results vector. but thats ugly and prone to very bad things down the road. On Sat, May 1, 2010 at 9:58 AM, Giovanni Azua brave...@gmail.com wrote: On May 1, 2010, at 6:48 PM, steven mosher wrote: I was talking with another guy on the list about this very topic. A simple example would help. first a sample C struct, and then how one would do the equivalent in R. In the end i suppose one want to do a an 'array' of these structs, or list of the structs. Or like in my use-case ... I needed a c-like struct to define the type for aggregating the data to return from a function. Best regards, Giovanni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] closest match in R to c-like struct?
maybe I can illustrate the problem by showing how a c programmer might think about the problem and the kinds of mistakes 'we' ( I) make when trying to do this in R cstruct-function(int, bool){ + + myint- int*2; + + mybool-!bool; + myvec-rep(mybool,10) + mymat-matrix(myint*10,nrow=3,ncol=3) + myframe-data.frame(rep(myint,5),rep(bool,5)) + returnlist-list(myint,mybool,myvec,mymat,myframe) + return(returnlist) + + + + } # so I have a function that returns a list of hetergenous variables. # an int, a bool, a vector of bools, a matrix of ints, a dataframe of ints and bools test-cstruct(3,T) test [[1]] [1] 6 [[2]] [1] FALSE [[3]] [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [[4]] [,1] [,2] [,3] [1,] 60 60 60 [2,] 60 60 60 [3,] 60 60 60 [[5]] rep.myint..5. rep.bool..5. 1 6 TRUE 2 6 TRUE 3 6 TRUE 4 6 TRUE 5 6 TRUE # Now I want to access the first element of my list which is an an int # first mistake I always make is I just revert to thinking in the # 'dot' structure of a c struct. test.myint Error: object 'test.myint' not found # Then I think its stored like a var in a dataframe, accessed by the $ test$myint NULL # then I try to access the first element of the list test[1] [[1]] [1] 6 # That works.. but the [[1]] confuses me when I eval test[1] I want 6 back # again thinking in C. # so I try the third element test[3] [[1]] [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # ok I get my vect of bools back. Now I want the first element # of that thing # well test[3] is that thing.. and I want element 1 of test[3] test[3][1] [[1]] [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE #hmm thats not what I expect. I wanted F back. # frustrated I try this which i know is wrong test[3,1] Error in test[3, 1] : incorrect number of dimensions # crap.. maybe the $ is supposed to be used test$V3 NULL # arrg.. how about 'dot test.myvec Error: object 'test.myvec' not found Anyways, That's the kind of frustration. I have a list, third element is a matrix how do I referernce the 2 row 2 colum of the matrix in my list.. for example. and so forth.. On Sat, May 1, 2010 at 10:56 AM, Ted Harding ted.hard...@manchester.ac.ukwrote: On 01-May-10 16:58:49, Giovanni Azua wrote: On May 1, 2010, at 6:48 PM, steven mosher wrote: I was talking with another guy on the list about this very topic. A simple example would help. first a sample C struct, and then how one would do the equivalent in R. In the end i suppose one want to do a an 'array' of these structs, or list of the structs. Or like in my use-case ... I needed a c-like struct to define the type for aggregating the data to return from a function. Best regards, Giovanni Assuming that I understand what you want, this is straightforward and can be found throughout the many functions available in R. The general form is: myfunction - function(...){ code to compute objects A1, A2, ... , An list(valA1=A1, valA2=A2, ... , valAn=An) } and then a call like myresults - myfunction(...) will create a list myresults with compnents valA1, ... ,valAn which you can access as desired on the lines of myresults$valA5 As a simple example, the following is a function which explores by simulation the power of the Fisher Exact Test for comparing two proportions in a 2x2 table: power.fisher.test - function(p1,p2,n1,n2,alpha=0.05,nsim=100){ y1 - rbinom(nsim,size=n1,prob=p1) y2 - rbinom(nsim,size=n2,prob=p2) y - cbind(y1,n1-y1,y2,n2-y2) p.value - rep(0,nsim) for (i in 1:nsim) p.value[i] - fisher.test(matrix(y[i,],2,2))$p.value list(Pwr=mean(p.value alpha),SE.Pwr=sd(p.value alpha)/sqrt(nsim)) } So, given two binomials B(n1,p1) and B(n2,p2), what would be the power of the Fisher test to detect that p1 was different from p2, at given significance level alpha? This is investigated by repeating, nsim times: sample from Bin(n1,p1), sample from Bin(n2.p2) do a Fisher test and get its P-value; store it in a vector p.value of length nsim and then finally: estimate the power as the proportion Pwr of the nsim cases in which the P-value was less than alpha get the SE of this estimate return these two values as components Pwr and SE.Pwr of a list As it happens, here each component of the resulting list is of the same type (a single number); but in a different computation each component (and of course there could be more than two) could be anything -- even another list. So you can have lists of lists ... ! Thus, instead of the simple returned list above: list(Pwr=mean(p.value alpha), SE.Pwr=sd(p.value alpha)/sqrt(nsim)) you could have list(Binoms=list(Bin1=list(size=n1,prob=p1), Bin2=list(size=n2,prob=p2)) Pwr=mean(p.value alpha
Re: [R] closest match in R to c-like struct?
perfect. I had tried a variant assigning names,to the vars, but that didnt work. now it makes sense why that didnt. I had tried myint-int names(myint)-myint and then returnlist-list(myint, .) and of course test[1] got me myint, 6 Thanks On Sat, May 1, 2010 at 12:42 PM, David Winsemius dwinsem...@comcast.netwrote: On May 1, 2010, at 3:14 PM, steven mosher wrote: maybe I can illustrate the problem by showing how a c programmer might think about the problem and the kinds of mistakes 'we' ( I) make when trying to do this in R cstruct-function(int, bool){ + + myint- int*2; + + mybool-!bool; + myvec-rep(mybool,10) + mymat-matrix(myint*10,nrow=3,ncol=3) + myframe-data.frame(rep(myint,5),rep(bool,5)) + returnlist-list(myint,mybool,myvec,mymat,myframe) + return(returnlist) + + + + } # so I have a function that returns a list of hetergenous variables. # an int, a bool, a vector of bools, a matrix of ints, a dataframe of ints and bools test-cstruct(3,T) test [[1]] [1] 6 [[2]] [1] FALSE [[3]] [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [[4]] [,1] [,2] [,3] [1,] 60 60 60 [2,] 60 60 60 [3,] 60 60 60 [[5]] rep.myint..5. rep.bool..5. 1 6 TRUE 2 6 TRUE 3 6 TRUE 4 6 TRUE 5 6 TRUE # Now I want to access the first element of my list which is an an int # first mistake I always make is I just revert to thinking in the # 'dot' structure of a c struct. test.myint Error: object 'test.myint' not found There is no dot . accessor function. If the first element were named (which ist is not) then you could have used test$myint. If you wnated to access the elements of htat list with names you need to assing to names at the time it is created, eg.: returnlist-list(myint=myint, mybool=mybool, myvec-myvec, mymat=mymat, myframe=myframe) As it is you need to do this to get what you later indicate you want, an atomic object: test[[1]] Double-brackets yield the thing itself, whereas single brackets yield a sub-list. test[4] [[1]] [,1] [,2] [,3] [1,] 60 60 60 [2,] 60 60 60 [3,] 60 60 60 test[[4]] [,1] [,2] [,3] [1,] 60 60 60 [2,] 60 60 60 [3,] 60 60 60 class(test[4]) [1] list class(test[[4]]) [1] matrix # Then I think its stored like a var in a dataframe, accessed by the $ test$myint NULL # then I try to access the first element of the list test[1] [[1]] [1] 6 # That works.. but the [[1]] confuses me when I eval test[1] I want 6 back # again thinking in C. # so I try the third element test[3] [[1]] [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # ok I get my vect of bools back. Now I want the first element # of that thing # well test[3] is that thing.. and I want element 1 of test[3] test[3][1] [[1]] [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE #hmm thats not what I expect. I wanted F back. # frustrated I try this which i know is wrong test[3,1] Error in test[3, 1] : incorrect number of dimensions # crap.. maybe the $ is supposed to be used test$V3 NULL # arrg.. how about 'dot test.myvec Error: object 'test.myvec' not found Anyways, That's the kind of frustration. I have a list, third element is a matrix how do I referernce the 2 row 2 colum of the matrix in my list.. for example. str(test) List of 5 $ : num 6 $ : logi FALSE $ : logi [1:10] FALSE FALSE FALSE FALSE FALSE FALSE ... $ : num [1:3, 1:3] 60 60 60 60 60 60 60 60 60 $ :'data.frame': 5 obs. of 2 variables: ..$ rep.myint..5.: num [1:5] 6 6 6 6 6 ..$ rep.bool..5. : logi [1:5] TRUE TRUE TRUE TRUE TRUE test[[4]][2,2] [1] 60 and so forth.. On Sat, May 1, 2010 at 10:56 AM, Ted Harding ted.hard...@manchester.ac.ukwrote: On 01-May-10 16:58:49, Giovanni Azua wrote: On May 1, 2010, at 6:48 PM, steven mosher wrote: I was talking with another guy on the list about this very topic. A simple example would help. first a sample C struct, and then how one would do the equivalent in R. In the end i suppose one want to do a an 'array' of these structs, or list of the structs. Or like in my use-case ... I needed a c-like struct to define the type for aggregating the data to return from a function. Best regards, Giovanni Assuming that I understand what you want, this is straightforward and can be found throughout the many functions available in R. The general form is: myfunction - function(...){ code to compute objects A1, A2, ... , An list(valA1=A1, valA2=A2, ... , valAn=An) } and then a call like myresults - myfunction(...) will create a list myresults with compnents valA1, ... ,valAn which you can access as desired on the lines of myresults$valA5 As a simple example, the following is a function which explores
Re: [R] closest match in R to c-like struct?
cstruct-function(int, bool){ + + myint- int*2; + + mybool-!bool; + myvec-rep(mybool,10) + + mymat-matrix(myint*10,nrow=3,ncol=3) + myframe-data.frame(int=rep(myint,5),bool=rep(bool,5)) + returnlist-list(myint=myint,mybool=mybool,myvec=myvec,mymat=mymat,myframe + =myframe) + return(returnlist) + + + + } test-cstruct(3,T) test $myint [1] 6 $mybool [1] FALSE $myvec [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE $mymat [,1] [,2] [,3] [1,] 60 60 60 [2,] 60 60 60 [3,] 60 60 60 $myframe int bool 1 6 TRUE 2 6 TRUE 3 6 TRUE 4 6 TRUE 5 6 TRUE test$myint [1] 6 test$mybool [1] FALSE test$myvec [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE test$myvec[2] [1] FALSE test$mymat [,1] [,2] [,3] [1,] 60 60 60 [2,] 60 60 60 [3,] 60 60 60 test$mymat[2,2] [1] 60 test$mymat[,2] [1] 60 60 60 test$myframe int bool 1 6 TRUE 2 6 TRUE 3 6 TRUE 4 6 TRUE 5 6 TRUE test$myframe$int [1] 6 6 6 6 6 test$myframe$bool [1] TRUE TRUE TRUE TRUE TRUE test$myframe$int[2] [1] 6 test$myframe$bool[3] [1] TRUE listoftest-list(cstruct(3,T),cstruct(4,F),cstruct(5,T)) listoftest[1] [[1]] [[1]]$myint [1] 6 [[1]]$mybool [1] FALSE [[1]]$myvec [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [[1]]$mymat [,1] [,2] [,3] [1,] 60 60 60 [2,] 60 60 60 [3,] 60 60 60 [[1]]$myframe int bool 1 6 TRUE 2 6 TRUE 3 6 TRUE 4 6 TRUE 5 6 TRUE listoftest[1]$myframe$int[3] NULL listoftest[1]$myframe$int NULL listoftest[1]$myframe NULL listoftest[[1]]$myframe int bool 1 6 TRUE 2 6 TRUE 3 6 TRUE 4 6 TRUE 5 6 TRUE listoftest[[1]]$myframe$int[1] [1] 6 listoftest2-list(flist=cstruct(65,T),slist=cstruct(12,F)) listoftest2$flist$myframe$int[3] [1] 130 TADA! On Sat, May 1, 2010 at 12:42 PM, David Winsemius dwinsem...@comcast.netwrote: On May 1, 2010, at 3:14 PM, steven mosher wrote: maybe I can illustrate the problem by showing how a c programmer might think about the problem and the kinds of mistakes 'we' ( I) make when trying to do this in R cstruct-function(int, bool){ + + myint- int*2; + + mybool-!bool; + myvec-rep(mybool,10) + mymat-matrix(myint*10,nrow=3,ncol=3) + myframe-data.frame(rep(myint,5),rep(bool,5)) + returnlist-list(myint,mybool,myvec,mymat,myframe) + return(returnlist) + + + + } # so I have a function that returns a list of hetergenous variables. # an int, a bool, a vector of bools, a matrix of ints, a dataframe of ints and bools test-cstruct(3,T) test [[1]] [1] 6 [[2]] [1] FALSE [[3]] [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [[4]] [,1] [,2] [,3] [1,] 60 60 60 [2,] 60 60 60 [3,] 60 60 60 [[5]] rep.myint..5. rep.bool..5. 1 6 TRUE 2 6 TRUE 3 6 TRUE 4 6 TRUE 5 6 TRUE # Now I want to access the first element of my list which is an an int # first mistake I always make is I just revert to thinking in the # 'dot' structure of a c struct. test.myint Error: object 'test.myint' not found There is no dot . accessor function. If the first element were named (which ist is not) then you could have used test$myint. If you wnated to access the elements of htat list with names you need to assing to names at the time it is created, eg.: returnlist-list(myint=myint, mybool=mybool, myvec-myvec, mymat=mymat, myframe=myframe) As it is you need to do this to get what you later indicate you want, an atomic object: test[[1]] Double-brackets yield the thing itself, whereas single brackets yield a sub-list. test[4] [[1]] [,1] [,2] [,3] [1,] 60 60 60 [2,] 60 60 60 [3,] 60 60 60 test[[4]] [,1] [,2] [,3] [1,] 60 60 60 [2,] 60 60 60 [3,] 60 60 60 class(test[4]) [1] list class(test[[4]]) [1] matrix # Then I think its stored like a var in a dataframe, accessed by the $ test$myint NULL # then I try to access the first element of the list test[1] [[1]] [1] 6 # That works.. but the [[1]] confuses me when I eval test[1] I want 6 back # again thinking in C. # so I try the third element test[3] [[1]] [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # ok I get my vect of bools back. Now I want the first element # of that thing # well test[3] is that thing.. and I want element 1 of test[3] test[3][1] [[1]] [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE #hmm thats not what I expect. I wanted F back. # frustrated I try this which i know is wrong test[3,1] Error in test[3, 1] : incorrect number of dimensions # crap.. maybe the $ is supposed to be used test$V3 NULL # arrg.. how about 'dot test.myvec Error: object 'test.myvec' not found Anyways, That's the kind of frustration. I have a list, third element is a matrix how
Re: [R] closest match in R to c-like struct?
thanks ted.. being new the R thsi has been a huge help, espececially on the Myint=myint thing... I assummed the name was just implicit. On Sat, May 1, 2010 at 1:19 PM, Ted Harding ted.hard...@manchester.ac.ukwrote: See below. On 01-May-10 19:14:08, steven mosher wrote: maybe I can illustrate the problem by showing how a c programmer might think about the problem and the kinds of mistakes 'we' (I) make when trying to do this in R cstruct-function(int, bool){ + + myint- int*2; + + mybool-!bool; + myvec-rep(mybool,10) + mymat-matrix(myint*10,nrow=3,ncol=3) + myframe-data.frame(rep(myint,5),rep(bool,5)) + returnlist-list(myint,mybool,myvec,mymat,myframe) + return(returnlist) + + + + } # so I have a function that returns a list of hetergenous variables. # an int, a bool, a vector of bools, a matrix of ints, a dataframe of # ints and bools test-cstruct(3,T) test [[1]] [1] 6 [[2]] [1] FALSE [[3]] [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [[4]] [,1] [,2] [,3] [1,] 60 60 60 [2,] 60 60 60 [3,] 60 60 60 [[5]] rep.myint..5. rep.bool..5. 1 6 TRUE 2 6 TRUE 3 6 TRUE 4 6 TRUE 5 6 TRUE # Now I want to access the first element of my list which is an # an int # first mistake I always make is I just revert to thinking in the # 'dot' structure of a c struct. test.myint Error: object 'test.myint' not found # Then I think its stored like a var in a dataframe, accessed by # the $ test$myint NULL # then I try to access the first element of the list test[1] [[1]] [1] 6 # That works.. but the [[1]] confuses me when I eval test[1] I want 6 # back # again thinking in C. # so I try the third element test[3] [[1]] [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # ok I get my vect of bools back. Now I want the first element # of that thing # well test[3] is that thing.. and I want element 1 of test[3] test[3][1] [[1]] [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE #hmm thats not what I expect. I wanted F back. # frustrated I try this which i know is wrong test[3,1] Error in test[3, 1] : incorrect number of dimensions # crap.. maybe the $ is supposed to be used test$V3 NULL # arrg.. how about 'dot test.myvec Error: object 'test.myvec' not found Anyways, That's the kind of frustration. I have a list, third element is a matrix how do I referernce the 2 row 2 colum of the matrix in my list.. for example. and so forth.. When you constructed your return-list, you simply entered the list components using the R-names of the objects, as used in the code: returnlist-list(myint,mybool,myvec,mymat,myframe) To use the $ extractor, you need to give them external names, so you could modify the above to: returnlist-list(Myint=myint,Mybool=mybool, Myvec=myvec,Mymat=mymat,Myframe=myframe) Then, after test-cstruct(3,T) you can access test$Myint, test$mybool, etc.; and, in particular, test$Mymat will be the matrix mymat you put in there, so you can extract elements of this using test$Mymat[2,2] for the element in row 2, column 2, and so on. Without making the return-list a named list, its components have no names, so then test$mymat (as you did) will not work because there is no component with name mymat (there is no component with any name). The name mymat was used by R to identify the object whose contents were to be placed in the list; that internal object-name does not get placed in the list. Note: In my modification above I used Myint=myint etc. instead of myint=myint to highlight the distinction between the component-name and the object-name. But you can just as well use exactly the same name for component-name as for object-name: R will recognise them as distinct and do the right thing. So you could just as well do: returnlist-list(myint=myint,mybool=mybool, myvec=myvec,mymat=mymat,myframe=myframe) and then, after test-cstruct(3,T) do test$Mymat[2,2] You can also use positional references if the list components have no names. Since your mymat is in position 4, test[[4]] would return the whole matrix. Then test[[4]][2,2] would return the element in row 2, column 2. As a standard example, try for instance X - 0.1*((-10):10) Y - 0.5*X + 0.2*rnorm(length(X)) LM - lm(Y ~ X) summary(LM) # Call: # lm(formula = Y ~ X) # Residuals: # Min1QMedian3Q Max # -0.373283 -0.083458 0.009206 0.139763 0.278242 # Coefficients: # Estimate Std. Error t value Pr(|t|) # (Intercept) -0.060180.04448 -1.3530.192 # X0.462700.07345 6.299 4.78e-06 *** # --- # Signif. codes: 0 ?***? 0.001 ?**? 0.01
[R] Adding a header after the file is written
The situation arises where I open a file to write a data.frame to it. with write.table. multiple lines are written to the file and the file is kept in Append=TRUE mode. If one sets the col.names to the names of the variables being written, you have output that looks like this... name1 name2 name3. x x x x x x x x x name1 name2 name 3 x x x x x x x x x And so forth as each time write is called, the col.names are written. Setting col.names=NULL obviously removes them. I thought a simple solution would be to check for the file existence first and on the first write, include the col.names. with append=T. on subsequent writes, col.names would be set to NULL. that didnt work and threw warnings. Is there anyway to do this. basically open a file for writing, with append=TRUE and only write the col.names once at the first write. or am I stuck and forced to write the whole file without the col.names and then read back in and rewrite with col.names=the cols names I want [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding a header after the file is written
Ok I will try that. I think I set it to False when I tried it the first time, maybe that was my mistake On Mon, May 3, 2010 at 3:28 PM, Ista Zahn istaz...@gmail.com wrote: Hi Steve, I think you just need to set col.names = FALSE (instead of col.names =NULL) on subsequent writes. -Ista On Mon, May 3, 2010 at 5:19 PM, steven mosher mosherste...@gmail.com wrote: The situation arises where I open a file to write a data.frame to it. with write.table. multiple lines are written to the file and the file is kept in Append=TRUE mode. If one sets the col.names to the names of the variables being written, you have output that looks like this... name1 name2 name3. x x x x x x x x x name1 name2 name 3 x x x x x x x x x And so forth as each time write is called, the col.names are written. Setting col.names=NULL obviously removes them. I thought a simple solution would be to check for the file existence first and on the first write, include the col.names. with append=T. on subsequent writes, col.names would be set to NULL. that didnt work and threw warnings. Is there anyway to do this. basically open a file for writing, with append=TRUE and only write the col.names once at the first write. or am I stuck and forced to write the whole file without the col.names and then read back in and rewrite with col.names=the cols names I want [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding a header after the file is written
thanks.. worked On Mon, May 3, 2010 at 3:24 PM, Ted Harding ted.hard...@manchester.ac.ukwrote: On 03-May-10 21:19:34, steven mosher wrote: The situation arises where I open a file to write a data.frame to it. with write.table. multiple lines are written to the file and the file is kept in Append=TRUE mode. If one sets the col.names to the names of the variables being written, you have output that looks like this... name1 name2 name3. x x x x x x x x x name1 name2 name 3 x x x x x x x x x And so forth as each time write is called, the col.names are written. Setting col.names=NULL obviously removes them. I thought a simple solution would be to check for the file existence first and on the first write, include the col.names. with append=T. on subsequent writes, col.names would be set to NULL. that didnt work and threw warnings. Is there anyway to do this. basically open a file for writing, with append=TRUE and only write the col.names once at the first write. or am I stuck and forced to write the whole file without the col.names and then read back in and rewrite with col.names=the cols names I want The following (which uses a tiny dataframe I had lying around after responding to an earlier query) looks like what you want to do (provided you first test existince of the file before switching to the second form of write.table()): foo # $Bar1 # [1] 1 # $Bar2 # [1] 2 # $Bar3 # [1] 3 # $Bar4 # [1] 4 write.table(foo,file=foo.txt,row.names=FALSE, col.names=c(Bar.1,Bar.2,Bar.3,Bar.4), append=FALSE) write.table(foo,file=foo.txt,row.names=FALSE, col.names=FALSE,append=TRUE) write.table(foo,file=foo.txt,row.names=FALSE, col.names=FALSE,append=TRUE) write.table(foo,file=foo.txt,row.names=FALSE, col.names=FALSE,append=TRUE) write.table(foo,file=foo.txt,row.names=FALSE, col.names=FALSE,append=TRUE) Contents of foo.txt after the above: Bar.1 Bar.2 Bar.3 Bar.4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 03-May-10 Time: 23:24:55 -- XFMail -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error in La.svd Lapack routine 'dgesdd'
Error in La.svd(x, nu, nv) : error code 1 from Lapack routine dgesdd what resources are there to track down errors like this [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] extracting a matched string using regexpr
Given a text like I want to be able to extract a matched regular expression from a piece of text. this apparently works, but is pretty ugly # some html test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th # a pattern to extract 5 digits pattern-[0-9]{5} # regexpr returns a start point[1] and an attribute match.length attr(,match.length) # get the substring from the start point to the stop point.. where stop = start +length-1 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1) answer [1] 88958 I tried using sub(pattern, replacement, x ) with a regexp that captured the group. I'd found an example of this in the mails but it didnt seem to work.. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
Thanks I was looking at that package and reading your mails in the archive. I think my tiny mind got twisted in the regexp.. On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck ggrothendi...@gmail.comwrote: Here are two ways to extract 5 digits. In the first one \\1 refers to the portion matched between the parentheses in the regular expression. In the second one strapply is like apply where the object to be worked on is the first argument (array for apply, string for strapply) the second modifies it (which dimension for apply, regular expression for strapply) and the last is a function which acts on each value (typically each row or column for apply and each match for strapply). In this case we use c as our function to just return all the results. They are returned in a list with one component per string but here test is just a single string so we get a list one long and we ask for the contents of the first component using [[1]]. # 1 - sub sub(.*(\\d{5}).*, \\1, test) # 2 - strapply - see http://gsubfn.googlecode.com library(gsubfn) strapply(test, \\d{5}, c)[[1]] On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com wrote: Given a text like I want to be able to extract a matched regular expression from a piece of text. this apparently works, but is pretty ugly # some html test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th # a pattern to extract 5 digits pattern-[0-9]{5} # regexpr returns a start point[1] and an attribute match.length attr(,match.length) # get the substring from the start point to the stop point.. where stop = start +length-1 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1) answer [1] 88958 I tried using sub(pattern, replacement, x ) with a regexp that captured the group. I'd found an example of this in the mails but it didnt seem to work.. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
test [1] /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] /th sub(.*([0-9]{5}).*,\\1,test) [1] 88958 I think the / in the source throws something off. as the group capture appears to not be working, except the bracket version it did. On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck ggrothendi...@gmail.comwrote: Here are two ways to extract 5 digits. In the first one \\1 refers to the portion matched between the parentheses in the regular expression. In the second one strapply is like apply where the object to be worked on is the first argument (array for apply, string for strapply) the second modifies it (which dimension for apply, regular expression for strapply) and the last is a function which acts on each value (typically each row or column for apply and each match for strapply). In this case we use c as our function to just return all the results. They are returned in a list with one component per string but here test is just a single string so we get a list one long and we ask for the contents of the first component using [[1]]. # 1 - sub sub(.*(\\d{5}).*, \\1, test) # 2 - strapply - see http://gsubfn.googlecode.com library(gsubfn) strapply(test, \\d{5}, c)[[1]] On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com wrote: Given a text like I want to be able to extract a matched regular expression from a piece of text. this apparently works, but is pretty ugly # some html test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th # a pattern to extract 5 digits pattern-[0-9]{5} # regexpr returns a start point[1] and an attribute match.length attr(,match.length) # get the substring from the start point to the stop point.. where stop = start +length-1 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1) answer [1] 88958 I tried using sub(pattern, replacement, x ) with a regexp that captured the group. I'd found an example of this in the mails but it didnt seem to work.. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
Hmm. I have R11 just downloaded fresh. I'll reload a new session..and revert. I will note that I've had trouble with \\d which is why I was using [0-9] MAC here. On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck ggrothendi...@gmail.comwrote: That's not what I get: test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] 88958 R.version.string [1] R version 2.10.1 (2009-12-14) I also got the above in R 2.11.0 patched as well. On Wed, May 5, 2010 at 5:55 PM, steven mosher mosherste...@gmail.com wrote: test [1] /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] /th sub(.*([0-9]{5}).*,\\1,test) [1] 88958 I think the / in the source throws something off. as the group capture appears to not be working, except the bracket version it did. On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Here are two ways to extract 5 digits. In the first one \\1 refers to the portion matched between the parentheses in the regular expression. In the second one strapply is like apply where the object to be worked on is the first argument (array for apply, string for strapply) the second modifies it (which dimension for apply, regular expression for strapply) and the last is a function which acts on each value (typically each row or column for apply and each match for strapply). In this case we use c as our function to just return all the results. They are returned in a list with one component per string but here test is just a single string so we get a list one long and we ask for the contents of the first component using [[1]]. # 1 - sub sub(.*(\\d{5}).*, \\1, test) # 2 - strapply - see http://gsubfn.googlecode.com library(gsubfn) strapply(test, \\d{5}, c)[[1]] On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com wrote: Given a text like I want to be able to extract a matched regular expression from a piece of text. this apparently works, but is pretty ugly # some html test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th # a pattern to extract 5 digits pattern-[0-9]{5} # regexpr returns a start point[1] and an attribute match.length attr(,match.length) # get the substring from the start point to the stop point.. where stop = start +length-1 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1) answer [1] 88958 I tried using sub(pattern, replacement, x ) with a regexp that captured the group. I'd found an example of this in the mails but it didnt seem to work.. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
Thnks, perhaps we should report it On Wed, May 5, 2010 at 4:52 PM, Gabor Grothendieck ggrothendi...@gmail.comwrote: I am using Vista. Another thing to try is strapply using the tcl engine (assuming you do have tcltk capabilities) and the R engine. On Vista R 2.11.0 patched I get the same result: capabilities()[[tcltk]] [1] TRUE strapply(test, \\d{5}, c, engine = tcl)[[1]] [1] 88958 strapply(test, \\d{5}, c, engine = R)[[1]] [1] 88958 On Vista with R 2.9.2 I do get bad results: test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test, extended = TRUE) [1] /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th R.version.string [1] R version 2.9.2 Patched (2009-09-08 r49647) win.version() [1] Windows Vista (build 6002) Service Pack 2 On Wed, May 5, 2010 at 6:20 PM, steven mosher mosherste...@gmail.com wrote: Hmm. I have R11 just downloaded fresh. I'll reload a new session..and revert. I will note that I've had trouble with \\d which is why I was using [0-9] MAC here. On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: That's not what I get: test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] 88958 R.version.string [1] R version 2.10.1 (2009-12-14) I also got the above in R 2.11.0 patched as well. On Wed, May 5, 2010 at 5:55 PM, steven mosher mosherste...@gmail.com wrote: test [1] /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] /th sub(.*([0-9]{5}).*,\\1,test) [1] 88958 I think the / in the source throws something off. as the group capture appears to not be working, except the bracket version it did. On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Here are two ways to extract 5 digits. In the first one \\1 refers to the portion matched between the parentheses in the regular expression. In the second one strapply is like apply where the object to be worked on is the first argument (array for apply, string for strapply) the second modifies it (which dimension for apply, regular expression for strapply) and the last is a function which acts on each value (typically each row or column for apply and each match for strapply). In this case we use c as our function to just return all the results. They are returned in a list with one component per string but here test is just a single string so we get a list one long and we ask for the contents of the first component using [[1]]. # 1 - sub sub(.*(\\d{5}).*, \\1, test) # 2 - strapply - see http://gsubfn.googlecode.com library(gsubfn) strapply(test, \\d{5}, c)[[1]] On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com wrote: Given a text like I want to be able to extract a matched regular expression from a piece of text. this apparently works, but is pretty ugly # some html test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th # a pattern to extract 5 digits pattern-[0-9]{5} # regexpr returns a start point[1] and an attribute match.length attr(,match.length) # get the substring from the start point to the stop point.. where stop = start +length-1 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1) answer [1] 88958 I tried using sub(pattern, replacement, x ) with a regexp that captured the group. I'd found an example of this in the mails but it didnt seem to work.. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] quick question on getting a listing of files on ftp site
Given a valid ftp address, is there a package that will allow me to get a listing of the files/directory structure on that site? RCurl looks to have this ability are there others? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] quick question on ftp access
I'm looking for a function or package that will allow me to get a list of the files at an ftp site. RCurl looks promising. Are there other packages that have similar functionality [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matrix to Vector
Given a matrix of m*n, I want to reorder it as a vector, using a row major transpose. so: m-matrix(seq(1,48),nrow=6,byrow=T) m [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,]12345678 [2,]9 10 11 12 13 14 15 16 [3,] 17 18 19 20 21 22 23 24 [4,] 25 26 27 28 29 30 31 32 [5,] 33 34 35 36 37 38 39 40 [6,] 41 42 43 44 45 46 47 48 I want to reorder this as a vector copying by row, so that the final vector has elements ordered thusly: row 1, column 1:N (m[1,1:n]) maps to row 1-n, and m[2,1:n] maps to row[n+1:2n] ... this obviously is not a solution: as the inherent column major storage paradigm of a matrix defeats the approach. dim(m)-c(48,1) m [,1] [1,]1 [2,]9 [3,] 17 [4,] 25 [5,] 33 [6,] 41 [7,]2 [8,] 10 [9,] 18 [10,] 26 [11,] 34 [12,] 42 [13,]3 [14,] 11 [15,] 19 [16,] 27 [17,] 35 [18,] 43 [19,]4 [20,] 12 [21,] 20 [22,] 28 [23,] 36 [24,] 44 [25,]5 [26,] 13 [27,] 21 [28,] 29 [29,] 37 [30,] 45 [31,]6 [32,] 14 [33,] 22 [34,] 30 [35,] 38 [36,] 46 [37,]7 [38,] 15 [39,] 23 [40,] 31 [41,] 39 [42,] 47 [43,]8 [44,] 16 [45,] 24 [46,] 32 [47,] 40 [48,] 48 I already have a version that loops through the data ( this is actually a portion of a data frame ) to reorder this into a vector, but I was hoping there was an elegant way [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix to Vector
as.vector(t(m)) [1] 1 9 17 25 33 41 2 10 18 26 34 42 3 11 19 27 35 43 4 12 20 28 36 44 5 13 21 29 37 45 6 14 22 30 38 46 7 15 23 31 39 47 8 16 24 [46] 32 40 48 the result I want is this: [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 [46] 46 47 48 On Sat, Jun 5, 2010 at 11:17 AM, Henrique Dallazuanna www...@gmail.comwrote: Try this: as.vector(t(m)) On Sat, Jun 5, 2010 at 3:12 PM, steven mosher mosherste...@gmail.comwrote: Given a matrix of m*n, I want to reorder it as a vector, using a row major transpose. so: m-matrix(seq(1,48),nrow=6,byrow=T) m [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,]12345678 [2,]9 10 11 12 13 14 15 16 [3,] 17 18 19 20 21 22 23 24 [4,] 25 26 27 28 29 30 31 32 [5,] 33 34 35 36 37 38 39 40 [6,] 41 42 43 44 45 46 47 48 I want to reorder this as a vector copying by row, so that the final vector has elements ordered thusly: row 1, column 1:N (m[1,1:n]) maps to row 1-n, and m[2,1:n] maps to row[n+1:2n] ... this obviously is not a solution: as the inherent column major storage paradigm of a matrix defeats the approach. dim(m)-c(48,1) m [,1] [1,]1 [2,]9 [3,] 17 [4,] 25 [5,] 33 [6,] 41 [7,]2 [8,] 10 [9,] 18 [10,] 26 [11,] 34 [12,] 42 [13,]3 [14,] 11 [15,] 19 [16,] 27 [17,] 35 [18,] 43 [19,]4 [20,] 12 [21,] 20 [22,] 28 [23,] 36 [24,] 44 [25,]5 [26,] 13 [27,] 21 [28,] 29 [29,] 37 [30,] 45 [31,]6 [32,] 14 [33,] 22 [34,] 30 [35,] 38 [36,] 46 [37,]7 [38,] 15 [39,] 23 [40,] 31 [41,] 39 [42,] 47 [43,]8 [44,] 16 [45,] 24 [46,] 32 [47,] 40 [48,] 48 I already have a version that loops through the data ( this is actually a portion of a data frame ) to reorder this into a vector, but I was hoping there was an elegant way [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix to Vector
I bet that is what I did. On Sat, Jun 5, 2010 at 11:54 AM, John Kane jrkrid...@yahoo.ca wrote: m-matrix(seq(1,48),nrow=6,byrow=T) as.vector(t(m)) gives me the correct result. Any chance you may have already transformed m ? --- On Sat, 6/5/10, steven mosher mosherste...@gmail.com wrote: From: steven mosher mosherste...@gmail.com Subject: Re: [R] Matrix to Vector To: Henrique Dallazuanna www...@gmail.com Cc: r-help@r-project.org Received: Saturday, June 5, 2010, 2:44 PM as.vector(t(m)) [1] 1 9 17 25 33 41 2 10 18 26 34 42 3 11 19 27 35 43 4 12 20 28 36 44 5 13 21 29 37 45 6 14 22 30 38 46 7 15 23 31 39 47 8 16 24 [46] 32 40 48 the result I want is this: [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 [46] 46 47 48 On Sat, Jun 5, 2010 at 11:17 AM, Henrique Dallazuanna www...@gmail.comwrote: Try this: as.vector(t(m)) On Sat, Jun 5, 2010 at 3:12 PM, steven mosher mosherste...@gmail.com wrote: Given a matrix of m*n, I want to reorder it as a vector, using a row major transpose. so: m-matrix(seq(1,48),nrow=6,byrow=T) m [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,]12 3456 78 [2,] 9 10 11 12 13 14 15 16 [3,] 17 18 19 20 21 22 23 24 [4,] 25 26 27 28 29 30 31 32 [5,] 33 34 35 36 37 38 39 40 [6,] 41 42 43 44 45 46 47 48 I want to reorder this as a vector copying by row, so that the final vector has elements ordered thusly: row 1, column 1:N (m[1,1:n]) maps to row 1-n, and m[2,1:n] maps to row[n+1:2n] ... this obviously is not a solution: as the inherent column major storage paradigm of a matrix defeats the approach. dim(m)-c(48,1) m [,1] [1,]1 [2,]9 [3,] 17 [4,] 25 [5,] 33 [6,] 41 [7,]2 [8,] 10 [9,] 18 [10,] 26 [11,] 34 [12,] 42 [13,]3 [14,] 11 [15,] 19 [16,] 27 [17,] 35 [18,] 43 [19,]4 [20,] 12 [21,] 20 [22,] 28 [23,] 36 [24,] 44 [25,]5 [26,] 13 [27,] 21 [28,] 29 [29,] 37 [30,] 45 [31,]6 [32,] 14 [33,] 22 [34,] 30 [35,] 38 [36,] 46 [37,]7 [38,] 15 [39,] 23 [40,] 31 [41,] 39 [42,] 47 [43,]8 [44,] 16 [45,] 24 [46,] 32 [47,] 40 [48,] 48 I already have a version that loops through the data ( this is actually a portion of a data frame ) to reorder this into a vector, but I was hoping there was an elegant way [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] -Inline Attachment Follows- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] counting Na/not NA by groups by column
# create a matrix with some random NAs in it m-matrix(NA,nrow=15,ncol=14) m[,3:14]-52 m[13,9]-NA m[4:7,8]-NA m[1:2,5]-NA m[,2]-rep(1800:1804, by=3) y-order(m[,2]) m-m[y,] m[,1]-rep(1:3,by=5) m [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [1,]1 1800 52 52 NA 52 52 52 5252525252 52 [2,]2 1800 52 52 52 52 52 NA 5252525252 52 [3,]3 1800 52 52 52 52 52 52 5252525252 52 [4,]1 1801 52 52 NA 52 52 52 5252525252 52 [5,]2 1801 52 52 52 52 52 NA 5252525252 52 [6,]3 1801 52 52 52 52 52 52 5252525252 52 [7,]1 1802 52 52 52 52 52 52 5252525252 52 [8,]2 1802 52 52 52 52 52 52 5252525252 52 [9,]3 1802 52 52 52 52 52 52 NA52525252 52 [10,]1 1803 52 52 52 52 52 NA 5252525252 52 [11,]2 1803 52 52 52 52 52 52 5252525252 52 [12,]3 1803 52 52 52 52 52 52 5252525252 52 [13,]1 1804 52 52 52 52 52 NA 5252525252 52 [14,]2 1804 52 52 52 52 52 52 5252525252 52 [15,]3 1804 52 52 52 52 52 52 5252525252 52 # the goal is to count all NON NA by changes in column 2 # we can get the count for all rows easily. col.sum-(apply(!is.na(m[,3:14]),2,sum)) col.sum [1] 15 15 13 15 15 11 14 15 15 15 15 15 # what we want is a result that looks like this 1800 3 3 2 3 3 2 3 3 3 3 3 3 1801 3 3 2 3 3 2 3 3 3 3 3 3 1802 3 3 3 3 3 3 2 3 3 3 3 3 1803 3 3 3 3 3 2 3 3 3 3 3 3 1804 3 3 3 3 3 2 3 3 3 3 3 3 I've toyed a bit with By mask-!is.na(m[,3:14]) test-cbind(m[,1:2],mask) test [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [1,]1 18001101111 1 1 1 1 1 [2,]2 18001111101 1 1 1 1 1 [3,]3 18001111111 1 1 1 1 1 [4,]1 18011101111 1 1 1 1 1 [5,]2 18011111101 1 1 1 1 1 [6,]3 18011111111 1 1 1 1 1 [7,]1 18021111111 1 1 1 1 1 [8,]2 18021111111 1 1 1 1 1 [9,]3 18021111110 1 1 1 1 1 [10,]1 18031111101 1 1 1 1 1 [11,]2 18031111111 1 1 1 1 1 [12,]3 18031111111 1 1 1 1 1 [13,]1 18041111101 1 1 1 1 1 [14,]2 18041111111 1 1 1 1 1 [15,]3 18041111111 1 1 1 1 1 result-by(test[,3:14],test[,2], sum) result INDICES: 1800 [1] 34 - INDICES: 1801 [1] 34 - INDICES: 1802 [1] 35 - INDICES: 1803 [1] 35 - INDICES: 1804 [1] 35 as this sums all the values and not by column. it's wrong so is there an elegant way to get the number of NON Nas.. by column governed by changes in the values of a variable. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting Na/not NA by groups by column
thats beautiful apply(m[, 3:14], 2, + function(x) tapply(x, m[,2], function(x) sum(!is.na(x [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] 1800332332333 3 3 3 1801332332333 3 3 3 1802333333233 3 3 3 1803333332333 3 3 3 1804333332333 3 3 3 i was thinking of doing 'by' inside an apply, but this is perfect. thx On Wed, Jun 9, 2010 at 6:16 PM, Erik Iverson er...@ccbr.umn.edu wrote: Hello, steven mosher wrote: # create a matrix with some random NAs in it m-matrix(NA,nrow=15,ncol=14) m[,3:14]-52 m[13,9]-NA m[4:7,8]-NA m[1:2,5]-NA m[,2]-rep(1800:1804, by=3) y-order(m[,2]) m-m[y,] m[,1]-rep(1:3,by=5) # what we want is a result that looks like this 1800 3 3 2 3 3 2 3 3 3 3 3 3 1801 3 3 2 3 3 2 3 3 3 3 3 3 1802 3 3 3 3 3 3 2 3 3 3 3 3 1803 3 3 3 3 3 2 3 3 3 3 3 3 1804 3 3 3 3 3 2 3 3 3 3 3 3 This should work: apply(m[, 3:14], 2, function(x) tapply(x, m[,2], function(x) sum(!is.na(x It uses tapply inside of apply to break up the groups by m[, 2]. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] working with zoo time index ??
Hi Gabor, Not sure where to report this, but Mac 10.5.8 R: 11.1 When you examine the zoo vignette and hit the back button, you get a hang. I havent tested with other vignettes and cant imagine that is is specific to yours FWIW. Did I mention that zoo is great. Thx for your work on it. On Tue, Jun 15, 2010 at 6:30 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Tue, Jun 15, 2010 at 8:27 AM, skan juanp...@gmail.com wrote: Hello Where could I find examples on how to work with the time index in a timeseries or zoo series? Let say I've got this series DATA 1990-01-01 10:00:00 0.900 1990-01-01 10:01:00 0.910 1990-01-01 10:03:00 0.905 1990-01-01 10:04:00 0.905 1990-01-01 10:05:00 0.890 ... 2000-12-31 20:00:00 0.992 How do I make simple calculations such as ... ? Calculate the mean of the first data every day. (mapply, for loop, tapply ?) Transform data to a table, with dates in one axis and times in the other. There are three vignettes that come with zoo. vignette() lists their names and vignette(zoo) displays the one called zoo (similarly for the other two). Also see the help files: ?zoo, ?read.zoo, ?aggregate.zoo and note the examples at the bottom of the help files. Also library(help = zoo) lists the help files available. Lines - 1990-01-01 10:00:00 0.900 1990-01-01 10:01:00 0.910 1990-01-01 10:03:00 0.905 1990-01-01 10:04:00 0.905 1990-01-01 10:05:00 0.890 1990-01-02 10:00:00 0.940 1990-01-02 10:01:00 0.990 library(zoo) library(chron) z - read.zoo(textConnection(Lines), index = 1:2, FUN = function(x) as.chron(paste(x[,1], x[,2]))) # take first data value for each day and then take their mean mean(aggregate(z, as.Date, head, 1)) # create data frame from z made up of dates, times and value # dates and times are chron package functions. # (If you use a different date and time class then it would be different.) data.frame(dates = dates(time(z)), times = times(time(z)), value = coredata(z)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] applying ifelse to dataframe
The following dataframe will illustrate the problem DF-data.frame(name=rep(1:5,each=2),x1=rep(A,10),x2=seq(10,19,by=1),x3=rep(NA,10),x4=seq(20,29,by=1)) DF$x3[5]-50 # we have a data frame. we are interested in the columns x2,x3,x4 which contain sparse # values and many NA. DF name x1 x2 x3 x4 1 1 A 10 NA 20 2 1 A 11 NA 21 3 2 A 12 NA 22 4 2 A 13 NA 23 5 3 A 14 50 24 6 3 A 15 NA 25 7 4 A 16 NA 26 8 4 A 17 NA 27 9 5 A 18 NA 28 105 A 19 NA 29 # we have a list of target values that we want to search for in the data frame # if the value is in the data frame we want to keep it there, otherwise, replace it with NA targets-c(11,12,13,16,19,50,27,24,22,26) # so we apply a test by column to the last 3 columns using the in test # this gives us a mask of whether the data frame 'contains' elements in the # target list mask-apply(DF[,3:5],2, %in% ,targets) mask x2x3x4 [1,] FALSE FALSE FALSE [2,] TRUE FALSE FALSE [3,] TRUE FALSE TRUE [4,] TRUE FALSE FALSE [5,] FALSE TRUE TRUE [6,] FALSE FALSE FALSE [7,] TRUE FALSE TRUE [8,] FALSE FALSE TRUE [9,] FALSE FALSE FALSE [10,] TRUE FALSE FALSE # and so DF[2,3] is equal to 11 and 11 is in the target list, so the mask is True # now something like DF- ifelse(mask==T,DF,NA) is CONCEPTUALLY what I want to do in the end I'd Like a result that looks like name x1 x2 x3 x4 1 1 A NA NA NA 2 1 A 11 NA NA 3 2 A 12 NA 22 4 2 A 13 NANA 5 3 A NA 50 24 6 3 A NA NA NA 7 4 A 16 NA 26 8 4 A NA NA 27 9 5 A NA NA NA 105 A 19 NA NA Ive tried forcing the DF and the mask into vectors so that ifelse() would work and have tried apply using ifelse.. without much luck. any thoughts? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] applying ifelse to dataframe
Thanks, the dataframe, is indeed clever at preserving its dimensions. I'll try your solution with the real data On Tue, Jun 22, 2010 at 12:23 M, Petr PIKAL petr.pi...@precheza.cz wrote: Hi r-help-boun...@r-project.org napsal dne 22.06.2010 08:28:04: The following dataframe will illustrate the problem DF-data.frame(name=rep(1:5,each=2),x1=rep(A,10),x2=seq(10,19,by=1),x3=rep (NA,10),x4=seq(20,29,by=1)) DF$x3[5]-50 # we have a data frame. we are interested in the columns x2,x3,x4 which contain sparse # values and many NA. DF name x1 x2 x3 x4 1 1 A 10 NA 20 2 1 A 11 NA 21 3 2 A 12 NA 22 4 2 A 13 NA 23 5 3 A 14 50 24 6 3 A 15 NA 25 7 4 A 16 NA 26 8 4 A 17 NA 27 9 5 A 18 NA 28 105 A 19 NA 29 # we have a list of target values that we want to search for in the data frame # if the value is in the data frame we want to keep it there, otherwise, replace it with NA targets-c(11,12,13,16,19,50,27,24,22,26) # so we apply a test by column to the last 3 columns using the in test # this gives us a mask of whether the data frame 'contains' elements in the # target list mask-apply(DF[,3:5],2, %in% ,targets) mask x2x3x4 [1,] FALSE FALSE FALSE [2,] TRUE FALSE FALSE [3,] TRUE FALSE TRUE [4,] TRUE FALSE FALSE [5,] FALSE TRUE TRUE [6,] FALSE FALSE FALSE [7,] TRUE FALSE TRUE [8,] FALSE FALSE TRUE [9,] FALSE FALSE FALSE [10,] TRUE FALSE FALSE # and so DF[2,3] is equal to 11 and 11 is in the target list, so the mask is True # now something like DF- ifelse(mask==T,DF,NA) is CONCEPTUALLY what I want Data frames are quite clever in preserving their dimensions. I would do mask=data.frame(a=TRUE, b=TRUE, !mask) to add column 1 and 2 and DF[mask]-NA Regards Petr to do in the end I'd Like a result that looks like name x1 x2 x3 x4 1 1 A NA NA NA 2 1 A 11 NA NA 3 2 A 12 NA 22 4 2 A 13 NANA 5 3 A NA 50 24 6 3 A NA NA NA 7 4 A 16 NA 26 8 4 A NA NA 27 9 5 A NA NA NA 105 A 19 NA NA Ive tried forcing the DF and the mask into vectors so that ifelse() would work and have tried apply using ifelse.. without much luck. any thoughts? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] applying ifelse to dataframe
Hmm DF-data.frame(name=rep(1:5,each=2),x1=rep(A,10),x2=seq(10,19,by=1),x3=rep(NA,10),x4=seq(20,29,by=1)) DF$x3[5]-50 mask-apply(sample,2,%in%, target) DF name x1 x2 x3 x4 1 1 A 10 NA 20 2 1 A 11 NA 21 3 2 A 12 NA 22 4 2 A 13 NA 23 5 3 A 14 50 24 6 3 A 15 NA 25 7 4 A 16 NA 26 8 4 A 17 NA 27 9 5 A 18 NA 28 105 A 19 NA 29 mask [,1] [,2] [,3] [,4] [,5] [1,] FALSE FALSE FALSE FALSE FALSE [2,] FALSE FALSE FALSE FALSE FALSE [3,] TRUE TRUE FALSE TRUE FALSE [4,] FALSE FALSE FALSE FALSE FALSE [5,] TRUE FALSE FALSE FALSE FALSE mask-data.frame(a=TRUE,b=TRUE,!mask) DF[mask]-NA Error in FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables DF2-data.frame(DF[,3:5]) mask-apply(sample,2,%in%, target) mask-data.frame(!mask) DF2[mask]-NA Error in FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables DF2 x2 x3 x4 1 10 NA 20 2 11 NA 21 3 12 NA 22 4 13 NA 23 5 14 50 24 6 15 NA 25 7 16 NA 26 8 17 NA 27 9 18 NA 28 10 19 NA 29 mask-apply(DF2,2,%in%, target) mask-data.frame(!mask) DF2[mask]-NA Error in FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables On Tue, Jun 22, 2010 at 12:23 AM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi r-help-boun...@r-project.org napsal dne 22.06.2010 08:28:04: The following dataframe will illustrate the problem DF-data.frame(name=rep(1:5,each=2),x1=rep(A,10),x2=seq(10,19,by=1),x3=rep (NA,10),x4=seq(20,29,by=1)) DF$x3[5]-50 # we have a data frame. we are interested in the columns x2,x3,x4 which contain sparse # values and many NA. DF name x1 x2 x3 x4 1 1 A 10 NA 20 2 1 A 11 NA 21 3 2 A 12 NA 22 4 2 A 13 NA 23 5 3 A 14 50 24 6 3 A 15 NA 25 7 4 A 16 NA 26 8 4 A 17 NA 27 9 5 A 18 NA 28 105 A 19 NA 29 # we have a list of target values that we want to search for in the data frame # if the value is in the data frame we want to keep it there, otherwise, replace it with NA targets-c(11,12,13,16,19,50,27,24,22,26) # so we apply a test by column to the last 3 columns using the in test # this gives us a mask of whether the data frame 'contains' elements in the # target list mask-apply(DF[,3:5],2, %in% ,targets) mask x2x3x4 [1,] FALSE FALSE FALSE [2,] TRUE FALSE FALSE [3,] TRUE FALSE TRUE [4,] TRUE FALSE FALSE [5,] FALSE TRUE TRUE [6,] FALSE FALSE FALSE [7,] TRUE FALSE TRUE [8,] FALSE FALSE TRUE [9,] FALSE FALSE FALSE [10,] TRUE FALSE FALSE # and so DF[2,3] is equal to 11 and 11 is in the target list, so the mask is True # now something like DF- ifelse(mask==T,DF,NA) is CONCEPTUALLY what I want Data frames are quite clever in preserving their dimensions. I would do mask=data.frame(a=TRUE, b=TRUE, !mask) to add column 1 and 2 and DF[mask]-NA Regards Petr to do in the end I'd Like a result that looks like name x1 x2 x3 x4 1 1 A NA NA NA 2 1 A 11 NA NA 3 2 A 12 NA 22 4 2 A 13 NANA 5 3 A NA 50 24 6 3 A NA NA NA 7 4 A 16 NA 26 8 4 A NA NA 27 9 5 A NA NA NA 105 A 19 NA NA Ive tried forcing the DF and the mask into vectors so that ifelse() would work and have tried apply using ifelse.. without much luck. any thoughts? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] applying ifelse to dataframe
Thanks for the solution On Tue, Jun 22, 2010 at 1:02 AM, Peter Ehlers ehl...@ucalgary.ca wrote: On 2010-06-22 1:45, steven mosher wrote: Hmm DF-data.frame(name=rep(1:5,each=2),x1=rep(A,10),x2=seq(10,19,by=1),x3=rep(NA,10),x4=seq(20,29,by=1)) DF$x3[5]-50 mask-apply(sample,2,%in%, target) This is getting confusing. What's 'sample'? What's 'target'? Probably what you originally called 'targets'. DF name x1 x2 x3 x4 1 1 A 10 NA 20 2 1 A 11 NA 21 3 2 A 12 NA 22 4 2 A 13 NA 23 5 3 A 14 50 24 6 3 A 15 NA 25 7 4 A 16 NA 26 8 4 A 17 NA 27 9 5 A 18 NA 28 105 A 19 NA 29 mask [,1] [,2] [,3] [,4] [,5] [1,] FALSE FALSE FALSE FALSE FALSE [2,] FALSE FALSE FALSE FALSE FALSE [3,] TRUE TRUE FALSE TRUE FALSE [4,] FALSE FALSE FALSE FALSE FALSE [5,] TRUE FALSE FALSE FALSE FALSE This suggests that 'sample' may be a matrix, not a dataframe. Anyway, try this on your original problem: targets-c(11,12,13,16,19,50,27,24,22,26) mask-apply(DF[,3:5],2, %in% ,targets) is.na(DF[3:5]) - !mask -Peter Ehlers mask-data.frame(a=TRUE,b=TRUE,!mask) DF[mask]-NA Error in FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables DF2-data.frame(DF[,3:5]) mask-apply(sample,2,%in%, target) mask-data.frame(!mask) DF2[mask]-NA Error in FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables DF2 x2 x3 x4 1 10 NA 20 2 11 NA 21 3 12 NA 22 4 13 NA 23 5 14 50 24 6 15 NA 25 7 16 NA 26 8 17 NA 27 9 18 NA 28 10 19 NA 29 mask-apply(DF2,2,%in%, target) mask-data.frame(!mask) DF2[mask]-NA Error in FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables On Tue, Jun 22, 2010 at 12:23 AM, Petr PIKALpetr.pi...@precheza.cz wrote: Hi r-help-boun...@r-project.org napsal dne 22.06.2010 08:28:04: The following dataframe will illustrate the problem DF-data.frame(name=rep(1:5,each=2),x1=rep(A,10),x2=seq(10,19,by=1),x3=rep (NA,10),x4=seq(20,29,by=1)) DF$x3[5]-50 # we have a data frame. we are interested in the columns x2,x3,x4 which contain sparse # values and many NA. DF name x1 x2 x3 x4 1 1 A 10 NA 20 2 1 A 11 NA 21 3 2 A 12 NA 22 4 2 A 13 NA 23 5 3 A 14 50 24 6 3 A 15 NA 25 7 4 A 16 NA 26 8 4 A 17 NA 27 9 5 A 18 NA 28 105 A 19 NA 29 # we have a list of target values that we want to search for in the data frame # if the value is in the data frame we want to keep it there, otherwise, replace it with NA targets-c(11,12,13,16,19,50,27,24,22,26) # so we apply a test by column to the last 3 columns using the in test # this gives us a mask of whether the data frame 'contains' elements in the # target list mask-apply(DF[,3:5],2, %in% ,targets) mask x2x3x4 [1,] FALSE FALSE FALSE [2,] TRUE FALSE FALSE [3,] TRUE FALSE TRUE [4,] TRUE FALSE FALSE [5,] FALSE TRUE TRUE [6,] FALSE FALSE FALSE [7,] TRUE FALSE TRUE [8,] FALSE FALSE TRUE [9,] FALSE FALSE FALSE [10,] TRUE FALSE FALSE # and so DF[2,3] is equal to 11 and 11 is in the target list, so the mask is True # now something like DF- ifelse(mask==T,DF,NA) is CONCEPTUALLY what I want Data frames are quite clever in preserving their dimensions. I would do mask=data.frame(a=TRUE, b=TRUE, !mask) to add column 1 and 2 and DF[mask]-NA Regards Petr to do in the end I'd Like a result that looks like name x1 x2 x3 x4 1 1 A NA NA NA 2 1 A 11 NA NA 3 2 A 12 NA 22 4 2 A 13 NANA 5 3 A NA 50 24 6 3 A NA NA NA 7 4 A 16 NA 26 8 4 A NA NA 27 9 5 A NA NA NA 105 A 19 NA NA Ive tried forcing the DF and the mask into vectors so that ifelse() would work and have tried apply using ifelse.. without much luck. any thoughts? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why software fails in scientific research
Thomas, How popular is R inside of NOAA? On Thu, Jul 1, 2010 at 11:25 AM, Thomas Adams thomas.ad...@noaa.gov wrote: OK My Grandfather, who was a farmer, was outstanding in his field Cheers Murray M Cooper, PhD wrote: For what its worth! A good friend who also happens to be an ecologist told me An ecologist is a statistician who likes to be outside. Murray M Cooper, Phd Richland Statistics - Original Message - From: Gavin Simpson gavin.simp...@ucl.ac.uk To: Bert Gunter gunter.ber...@gene.com Cc: r-help@r-project.org Sent: Thursday, July 01, 2010 11:57 AM Subject: Re: [R] Why software fails in scientific research On Wed, 2010-06-30 at 11:17 -0700, Bert Gunter wrote: Just one small additional note below ... Bert Gunter Genentech Nonclinical Biostatistics But a lot of academics are not going to waste their time documenting code properly, so others can reap the benefits of it. They would rather get on with the next project, to get the next paper. -- Indeed. My personal experience over 3 decades in industrial (private) research is that data analysis is viewed as relatively unimportant/straightforward/pedestrian and is left to technicians (or postdocs) -- often with what is done being largely dictated by the conventions of a particular journal or discipline. The lab heads and research directors are responsible for the grand research strategies, managing resources, etc. and don't want to waste much time on something that routine. So worrying about reproducibility of data analysis code (if there is any, given the use of GUI software like Excel) falls beneath their radar. Clearly there are disciplines (e.g. ecology?) where this may NOT be the case. If ecology is anything to go by (and I am an ecologist, sort of, just about), there is a large body of the community doing things because i) that is how they've always been done, or ii) because that's what reviewers/editors expect etc. with a much smaller group of researchers pushing at the boundaries (of their field) to use techniques statisticians and the like have been using for a very long time. Reproducible research is still very much in the (very, very) small minority of the work I come across reviewing papers etc. But I am encouraged by the number of people I know who are starting to use tools like R to conduct their research. -- Bert G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Thomas E Adams National Weather Service Ohio River Forecast Center 1901 South State Route 134 Wilmington, OH 45177 EMAIL: thomas.ad...@noaa.gov VOICE: 937-383-0528 FAX:937-383-0033 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.