Re: [R] What made us so popular Nov 16-20?
Duncan asks: Did we get mentioned somewhere (e.g. Slashdot), or was someone just experimenting with some automated downloading? R was mentioned in last week's (I think) O'Reilly newsletter, which included a link to a short article showing how easy it is to get R to graph stuff like stock price histories. That's the publisher, not the talking head. For what it's worth, the article isn't worth chasing down. It left a beginner like me disappointed that R's capabilities weren't better shown, and that he relied on Perl to do data manipulation. cur -- Curt Seeliger, Data Ranger CSC, EPA/WED contractor 541/754-4638 [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Assign references
Folks, I've run into trouble while writing functions that I hope will create and modify a dataframe or two. To that end I've written a toy function that simply sets a couple of variables (well, tries but fails). Searching the archives, Thomas Lumley recently explained the - operator, showing that it was necessary for x and y to exist prior to the function call, but I haven't the faintest why this isn't working: myFunk-function(a,b,foo,bar) {foo-a+b; bar-a*b;} x-0; y-0; myFunk(4,5,x,y) x-0; y-0; myFunk(4,5,x,y) x [1] 0 y [1] 0 What (no doubt simple) reason is there for x and y not changing? Thank you, cur -- Curt Seeliger, Data Ranger CSC, EPA/WED contractor 541/754-4638 [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Calculation of group summaries
Several people suggested specific functions (by, tapply, sapply and others); thanks for not blowing off a simple question regarding how to do the following SQL in R: select year, site_id, visit_no, mean(undercut) AS meanUndercut, count(undercut) AS nUndercut, std(undercut) AS stdUndercut from channelMorphology group by year, site_id, visit_no ; I'd spent quite a bit of time with the suggested functions earlier but had no luck as I'd misread the docs and put the entire dataframe where it only wants the columns to be processed. Sometimes it's the simplest of things. This has lead to another confoundment-- sd() acts differently than mean() for some reason, at least with R 1.9.0. For some reason, means generate NA results and a warning message for each group: argument is not numeric or logical: returning NA in: mean.default(data[x, ], ...) Of course, the argument is numeric, or there'd be no sd value. Or more likely, I'm still missing something really basic. If I wrap the value in as.numeric() things work fine. Why should I have to do this for mean and median, but not sd? The code below should reproduce this error # Fake data for demo: nsites-6 yearList-1999:2001 fakesub-as.data.frame(cbind( year =rep(yearList,nsites/length(yearList),each=11) ,site_id =rep(c('site1','site2'),each=11*nsites) ,visit_no =rep(1,11*2*nsites) ,transect =rep(LETTERS[1:11],nsites,each=2) ,transdir =rep(c('LF','RT'),11*nsites) ,undercut =abs(rnorm(11*2*nsites,10)) ,angle=runif(11*2*nsites,0,180) )) # Create group summaries: sdmets-by(fakesub$undercut ,list(fakesub$year,fakesub$site_id,fakesub$visit_no) ,sd ) nmets-by(fakesub$undercut ,list(fakesub$year,fakesub$site_id,fakesub$visit_no) ,length ) xmets-by(fakesub$undercut ,list(fakesub$year,fakesub$site_id,fakesub$visit_no) ,mean ) xmets-by(as.numeric(fakesub$undercut) ,list(fakesub$year,fakesub$site_id,fakesub$visit_no) ,mean ) # Put site id values (year, site_id and visit_no) into results: # List unique id combinations as a list of lists. Then # reorganize that into 3 vectors for final results. # Certainly, there MUST be a better way... foo-strsplit(unique(paste(fakesub$year ,fakesub$site_id ,fakesub$visit_no ,sep='#')) ,split='#' ) year-list() for(i in 1:length(foo)) {year-rbind(year,foo[[i]][1])} site_id-list() for(i in 1:length(foo)) {site_id-rbind(site_id,foo[[i]][2])} visit_no-list() for(i in 1:length(foo)) {visit_no-rbind(visit_no,foo[[i]][3])} # Final result, more or less data.frame(cbind(a=year,b=site_id,c=visit_no,sdmets,nmets,xmets)) cur -- Curt Seeliger, Data Ranger CSC, EPA/WED contractor 541/754-4638 [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Calculation of group summaries
I know R has a steep learning curve, but from where I stand the slope looks like a sheer cliff. I'm pawing through the available docs and have come across examples which come close to what I want but are proving difficult for me to modify for my use. Calculating simple group means is fairly straight forward: data(PlantGrowth) attach(PlantGrowth) stack(mean(unstack(PlantGrowth))) I'd like to do something slightly more complex, using a data frame and groups identified by unique combinations of three id variables. There may be thousands of such combinations in the data. This is easy in SQL: select year, site_id, visit_no, mean(undercut) AS meanUndercut, count(undercut) AS nUndercut, std(undercut) AS stdUndercut from channelMorphology group by year, site_id, visit_no ; Reading a CSV written by SAS and selecting only records expected to have values is also straight forward in R, but getting those summary values for each site visit is currently beyond me: sub-read.csv('c:/data/channelMorphology.csv' ,header=TRUE ,na.strings='.' ,sep=',' ,strip.white=TRUE ) undercut-subset(sub, ,TRANSDIR %in% c('LF','RT') ,select=c('YEAR','SITE_ID','VISIT_NO','TRANSECT','TRANSDIR' ,'UNDERCUT' ) ,drop=TRUE ) Thanks all for your help. cur -- Curt Seeliger, Data Ranger CSC, EPA/WED contractor 541/754-4638 [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html