[R] aggregate factor
Hi, I am using aggregate to compute means for later plotting. There are two factors involved and the problem is that the values of the second factor ( Age ) in the means are not in the right order because 10 comes inbetween 1 and 2 What I really want is the numeric value of Age but as.numeric and as.integer returns the level value instead. Is there a way to easily get the numeric value? I am using Windows R 2.5.1 Thanks, str(fishdata) 'data.frame': 372 obs. of 6 variables: $ Lake: Factor w/ 3 levels EVANS,JOLLIET,..: 3 3 3 3 3 3 3 3 3 3 ... $ Age : int 1 1 1 1 1 1 1 1 1 1 ... $ TL : int 132 120 125 115 130 120 115 110 117 116 ... $ W : int 10 10 10 10 10 10 10 10 10 20 ... $ Sex : Factor w/ 3 levels F,I,M: 1 1 2 2 2 1 1 1 2 2 ... $ WT : num 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 ... fishdatameans=aggregate(fishdata$TL, list(Lake = fishdata$Lake, Age=fishdata$Age), mean) # Now Age is a Factor but 10 is in the wrong position. fishdatameans$Age [1] 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 6 6 6 7 8 9 10 Levels: 0 1 10 2 3 4 5 6 7 8 9 as.numeric(fishdatameans$Age) [1] 1 2 2 2 4 4 4 5 5 5 6 6 6 7 7 8 8 8 9 10 11 3 # What I want is 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 6 6 6 7 8 9 10 Bill __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate factor
Try this. as.numeric(levels(fishdata$Age))[fishdata$Age] HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 [EMAIL PROTECTED] www.inbo.be Do not put your faith in what statistics say until you have carefully considered what they do not say. ~William W. Watt A statistical analysis, properly conducted, is a delicate dissection of uncertainties, a surgery of suppositions. ~M.J.Moroney -Oorspronkelijk bericht- Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Namens Bill Szkotnicki Verzonden: vrijdag 7 september 2007 21:10 Aan: r-help@stat.math.ethz.ch Onderwerp: [R] aggregate factor Hi, I am using aggregate to compute means for later plotting. There are two factors involved and the problem is that the values of the second factor ( Age ) in the means are not in the right order because 10 comes inbetween 1 and 2 What I really want is the numeric value of Age but as.numeric and as.integer returns the level value instead. Is there a way to easily get the numeric value? I am using Windows R 2.5.1 Thanks, str(fishdata) 'data.frame': 372 obs. of 6 variables: $ Lake: Factor w/ 3 levels EVANS,JOLLIET,..: 3 3 3 3 3 3 3 3 3 3 ... $ Age : int 1 1 1 1 1 1 1 1 1 1 ... $ TL : int 132 120 125 115 130 120 115 110 117 116 ... $ W : int 10 10 10 10 10 10 10 10 10 20 ... $ Sex : Factor w/ 3 levels F,I,M: 1 1 2 2 2 1 1 1 2 2 ... $ WT : num 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 ... fishdatameans=aggregate(fishdata$TL, list(Lake = fishdata$Lake, Age=fishdata$Age), mean) # Now Age is a Factor but 10 is in the wrong position. fishdatameans$Age [1] 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 6 6 6 7 8 9 10 Levels: 0 1 10 2 3 4 5 6 7 8 9 as.numeric(fishdatameans$Age) [1] 1 2 2 2 4 4 4 5 5 5 6 6 6 7 7 8 8 8 9 10 11 3 # What I want is 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 6 6 6 7 8 9 10 Bill __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Aggregate daily data into weekly sums
Dear Lest, I have a two-variable data frame as follows (the time peirod of the actual data set is 10 years): Date Amount 1 6/1/2007 1 2 6/1/2007 1 3 6/4/2007 2 4 6/5/2007 2 5 6/11/2007 3 6 6/12/2007 3 7 6/12/2007 3 8 6/13/2007 3 9 6/13/2007 3 10 6/18/2007 4 11 6/18/2007 4 12 6/25/2007 5 13 6/28/2007 5 Basically, I would like to collapse the daily data into weekly sums such that the result should look like the following: Date Amount 1 2007/6/Week1 2 2 2007/6/Week2 4 3 2007/6/Week3 15 4 2007/6/Week4 8 5 2007/6/Week5 10 Does there already exist a function that aggregates the data at user-defined time frequency? Any pointers would be greatly appreciated. Jacques version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 5.0 year 2007 month 04 day23 svn rev41293 language R version.string R version 2.5.0 (2007-04-23) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggregate daily data into weekly sums
Hi, Perhaps you can try: df Date Amount 1 2007-06-01 1 2 2007-06-01 1 3 2007-06-04 2 4 2007-06-05 2 5 2007-06-11 3 6 2007-06-12 3 7 2007-06-12 3 8 2007-06-13 3 9 2007-06-13 3 10 2007-06-18 4 11 2007-06-18 4 12 2007-06-25 5 13 2007-06-28 5 df_ok - aggregate(df$Amount, by=list(df$Amount), FUN=sum) levels(df_ok$Group.1)- paste(2007/06/Week, 1:5, sep=) -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O On 23/07/07, Jacques Wagnor [EMAIL PROTECTED] wrote: Dear Lest, I have a two-variable data frame as follows (the time peirod of the actual data set is 10 years): Date Amount 1 6/1/2007 1 2 6/1/2007 1 3 6/4/2007 2 4 6/5/2007 2 5 6/11/2007 3 6 6/12/2007 3 7 6/12/2007 3 8 6/13/2007 3 9 6/13/2007 3 10 6/18/2007 4 11 6/18/2007 4 12 6/25/2007 5 13 6/28/2007 5 Basically, I would like to collapse the daily data into weekly sums such that the result should look like the following: Date Amount 1 2007/6/Week1 2 2 2007/6/Week2 4 3 2007/6/Week3 15 4 2007/6/Week4 8 5 2007/6/Week5 10 Does there already exist a function that aggregates the data at user-defined time frequency? Any pointers would be greatly appreciated. Jacques version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 5.0 year 2007 month 04 day23 svn rev41293 language R version.string R version 2.5.0 (2007-04-23) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggregate daily data into weekly sums
Or, z-mydata #zoo object new.time - as.Date(7 * floor(as.numeric(time(z))/7) + 7) z2 - aggregate(z, new.time, mean) Henrique Dallazuanna escribió: Hi, Perhaps you can try: df Date Amount 1 2007-06-01 1 2 2007-06-01 1 3 2007-06-04 2 4 2007-06-05 2 5 2007-06-11 3 6 2007-06-12 3 7 2007-06-12 3 8 2007-06-13 3 9 2007-06-13 3 10 2007-06-18 4 11 2007-06-18 4 12 2007-06-25 5 13 2007-06-28 5 df_ok - aggregate(df$Amount, by=list(df$Amount), FUN=sum) levels(df_ok$Group.1)- paste(2007/06/Week, 1:5, sep=) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- = Por favor, si me mandas correos con copia a varias personas, pon mi dirección de correo en copia oculta (CCO), para evitar que acabe en montones de sitios, eliminando mi privacidad, favoreciendo la propagación de virus y la proliferación del SPAM. Gracias. - If you send me e-mail which has also been sent to several other people, kindly mark my address as blind-carbon-copy (or BCC), to avoid its distribution, which affects my privacy, increases the likelihood of spreading viruses, and leads to more SPAM. Thanks. = Antes de imprimir este e-mail piense bien si es necesario hacerlo: El medioambiente es cosa de todos. Before printing this email, assess if it is really needed. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggregate daily data into weekly sums
Try this. I have changed output format to yyy/mm/Weekw so its ordered. Lines - Date Amount 6/1/2007 1 6/1/2007 1 6/4/2007 2 6/5/2007 2 6/11/2007 3 6/12/2007 3 6/12/2007 3 6/13/2007 3 6/13/2007 3 6/18/2007 4 6/18/2007 4 6/25/2007 5 6/28/2007 5 # replace next line with # DF - read.table(myfile.dat, header = TRUE) DF - read.table(textConnection(Lines), header = TRUE) DF$Date - as.Date(DF$Date, %m/%d/%Y) # weeks since first Sunday after Epoch # assumes week starts on Sunday. Change 3 to 4 for Monday. fmt - function(x) { weeks - function(x) as.numeric(x + 3) %/% 7 + 1 sprintf(%s%05d, format(x, %Y/%m/Week), weeks(x) - weeks(x[1]) + 1) } aggregate(DF$Amount, list(Date = fmt(DF$Date)), sum) # alternative to above using zoo. DF and fmt are from above. # Returns a zoo object. library(zoo) aggregate(zoo(DF$Amount), fmt(DF$Date), sum) On 7/23/07, Jacques Wagnor [EMAIL PROTECTED] wrote: Dear Lest, I have a two-variable data frame as follows (the time peirod of the actual data set is 10 years): Date Amount 1 6/1/2007 1 2 6/1/2007 1 3 6/4/2007 2 4 6/5/2007 2 5 6/11/2007 3 6 6/12/2007 3 7 6/12/2007 3 8 6/13/2007 3 9 6/13/2007 3 10 6/18/2007 4 11 6/18/2007 4 12 6/25/2007 5 13 6/28/2007 5 Basically, I would like to collapse the daily data into weekly sums such that the result should look like the following: Date Amount 1 2007/6/Week1 2 2 2007/6/Week2 4 3 2007/6/Week3 15 4 2007/6/Week4 8 5 2007/6/Week5 10 Does there already exist a function that aggregates the data at user-defined time frequency? Any pointers would be greatly appreciated. Jacques version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 5.0 year 2007 month 04 day23 svn rev41293 language R version.string R version 2.5.0 (2007-04-23) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] aggregate by two columns, sum not working while mean is
Dear Fellow Rers, I have a table looks like this: ca, la, 12 ca, sd, 22 ca, la, 33 nm, al, 9 ma, lx, 18 ma, bs, 90 ma, lx, 22 I want to sum the 3rd column grouped by the first and the second column, so the result look like this table: ca, la, 45 ca, sd, 22 nm, al, 9 ma, lx, 40 ma, bs, 90 The two rows with are sums. I tried aggregate(table,list(table$V1,table$V2),sum/mean), sum was not working while mean worked. Can anybody give a hint? Thanks. Guanrao __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate by two columns, sum not working while mean is
This seems to work fine: x - ca, la, 12 + ca, sd, 22 + ca, la, 33 + nm, al, 9 + ma, lx, 18 + ma, bs, 90 + ma, lx, 22 + table - read.csv(textConnection(x), header=FALSE) aggregate(table$V3,list(table$V1,table$V2),mean) Group.1 Group.2x 1 nm al 9.0 2 ma bs 90.0 3 ca la 22.5 4 ma lx 20.0 5 ca sd 22.0 aggregate(table$V3,list(table$V1,table$V2),sum) Group.1 Group.2 x 1 nm al 9 2 ma bs 90 3 ca la 45 4 ma lx 40 5 ca sd 22 On 6/7/07, Guanrao Chen [EMAIL PROTECTED] wrote: Dear Fellow Rers, I have a table looks like this: ca, la, 12 ca, sd, 22 ca, la, 33 nm, al, 9 ma, lx, 18 ma, bs, 90 ma, lx, 22 I want to sum the 3rd column grouped by the first and the second column, so the result look like this table: ca, la, 45 ca, sd, 22 nm, al, 9 ma, lx, 40 ma, bs, 90 The two rows with are sums. I tried aggregate(table,list(table$V1,table$V2),sum/mean), sum was not working while mean worked. Can anybody give a hint? Thanks. Guanrao __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate in zoo
On 6/1/07, Alfonso Sammassimo [EMAIL PROTECTED] wrote: Hi R-experts, Thanks very much to Jim Holtman and Gabor on my previous question. I am having another problem with data manipulation in zoo. The following is data (Z) for first business day of every month in zoo format. I am trying to get mean of open for each year. I subset Z - Z[,2] then sapply(split(Z, format(index(Z), %Y)),mean) I get error message: 2000 2001 2002 2003 2004 2005 2006 2007 NA NA NA NA NA NA NA NA Warning messages: 1: argument is not numeric or logical: returning NA in: mean.default(X[[1]], ...) 2: argument is not numeric or logical: returning NA in: mean.default(X[[2]], ...) etc... Any help on what I'm missing would be appreciated. I am particularly confused by the fact that the command used works fine on the original data file (i.e. before subsetting by first day of month). Sorry if I have overlooked something very simple. Z dayofmonthopen 2000-02-01 011636.10 2000-03-01 011596.75 2000-04-03 031737.70 2000-05-01 011695.65 2000-06-01 011651.90 2000-07-03 031669.20 2000-08-01 011628.35 2000-09-01 011717.35 2000-10-02 021614.55 2000-11-01 011587.10 2000-12-01 011475.60 2001-01-02 021450.65 2001-02-01 011503.60 2001-03-01 011351.95 2001-04-02 021268.10 2001-05-01 011369.20 2001-06-01 011362.75 2001-07-02 021331.55 2001-08-01 011309.70 2001-09-04 041235.55 2001-10-01 011109.20 2001-11-01 011155.55 2001-12-03 031207.30 Can't tell what your Z really looks like, try posting dput(Z) or explain how to create Z from scratch, but at any rate your code has two problems: 1. the result is not a zoo object (that may or may not be a problem) 2. your are combining the two columns altogether and then taking the mean of that Try copying and pasting this into your session: Lines - date dayofmonthopen 2000-02-01 011636.10 2000-03-01 011596.75 2000-04-03 031737.70 2000-05-01 011695.65 2000-06-01 011651.90 2000-07-03 031669.20 2000-08-01 011628.35 2000-09-01 011717.35 2000-10-02 021614.55 2000-11-01 011587.10 2000-12-01 011475.60 2001-01-02 021450.65 2001-02-01 011503.60 2001-03-01 011351.95 2001-04-02 021268.10 2001-05-01 011369.20 2001-06-01 011362.75 2001-07-02 021331.55 2001-08-01 011309.70 2001-09-04 041235.55 2001-10-01 011109.20 2001-11-01 011155.55 2001-12-03 031207.30 library(zoo) z - read.zoo(textConnection(Lines), header = TRUE) year - function(x) as.numeric(format(x, %Y)) sapply(split(z[,2], year(index(z))), mean) # last line could be replaced with just this aggregate(z[,2], year, mean) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Aggregate to find majority level of a factor
I want to use the aggregate function to summarize data by a factor (my field plots), but I want the summary to be the majority level of another factor. For example, given the dataframe: Plot1 big Plot1 big Plot1 small Plot2 big Plot2 small Plot2 small Plot3 small Plot3 small Plot3 small My desired result would be: Plot1 big Plot2 small Plot3 small I can't seem to find a scalar function that will give me the majority level. Thanks in advance, Jonathan Thompson __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggregate to find majority level of a factor
How about tapply? plot - gl(2,3); plot type - letters[c(1,2,2,1,1,1)]; type tapply(type, list(plot), function(x) {tabl - table(x) names(tabl[tabl==max (tabl)])}) Hank On May 31, 2007, at 3:25 PM, Thompson, Jonathan wrote: I want to use the aggregate function to summarize data by a factor (my field plots), but I want the summary to be the majority level of another factor. For example, given the dataframe: Plot1 big Plot1 big Plot1 small Plot2 big Plot2 small Plot2 small Plot3 small Plot3 small Plot3 small My desired result would be: Plot1 big Plot2 small Plot3 small I can't seem to find a scalar function that will give me the majority level. Thanks in advance, Jonathan Thompson __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Dr. Hank Stevens, Assistant Professor 338 Pearson Hall Botany Department Miami University Oxford, OH 45056 Office: (513) 529-4206 Lab: (513) 529-4262 FAX: (513) 529-4243 http://www.cas.muohio.edu/~stevenmh/ http://www.muohio.edu/ecology/ http://www.muohio.edu/botany/ E Pluribus Unum __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggregate to find majority level of a factor
On Thu, 2007-05-31 at 12:25 -0700, Thompson, Jonathan wrote: I want to use the aggregate function to summarize data by a factor (my field plots), but I want the summary to be the majority level of another factor. For example, given the dataframe: Plot1 big Plot1 big Plot1 small Plot2 big Plot2 small Plot2 small Plot3 small Plot3 small Plot3 small My desired result would be: Plot1 big Plot2 small Plot3 small I can't seem to find a scalar function that will give me the majority level. Thanks in advance, Jonathan Thompson Jonathan, Try this: DF V1V2 1 Plot1 big 2 Plot1 big 3 Plot1 small 4 Plot2 big 5 Plot2 small 6 Plot2 small 7 Plot3 small 8 Plot3 small 9 Plot3 small with(DF, aggregate(V2, list(V1), function(x) names(which.max(table(x) Group.1 x 1 Plot1 big 2 Plot2 small 3 Plot3 small See ?which.max, ?names and ?table. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggregate to find majority level of a factor
Jon One way: assuming your data.frame is 'jon' aggregate(jon[,2], list(jon[,1]), function(x) levels(x)[which.max(table(x))]) Group.1 x 1 Plot1 big 2 Plot2 small 3 Plot3 small HTH Peter Alspach -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Thompson, Jonathan Sent: Friday, 1 June 2007 7:26 a.m. To: r-help@stat.math.ethz.ch Subject: [R] Aggregate to find majority level of a factor I want to use the aggregate function to summarize data by a factor (my field plots), but I want the summary to be the majority level of another factor. For example, given the dataframe: Plot1 big Plot1 big Plot1 small Plot2 big Plot2 small Plot2 small Plot3 small Plot3 small Plot3 small My desired result would be: Plot1 big Plot2 small Plot3 small I can't seem to find a scalar function that will give me the majority level. Thanks in advance, Jonathan Thompson __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggregate to find majority level of a factor
This should do the trick. Also labels ties with NA. a=as.data.frame(cbind(c(1,1,1,2,2,2,3,3,3,4,4),c ('big','big','small','big','small','small','small','small','small','big' ,'small'))) a$V2=factor(a$V2) maj=function(x){ y=table(x) z=which.max(y) if(sum(y==max(y))==1){ return(names(y)[z]) }else{ return(NA) } } aggregate(a$V2,list(a$V1),maj) On 31-May-07, at 4:25 PM, Thompson, Jonathan wrote: I want to use the aggregate function to summarize data by a factor (my field plots), but I want the summary to be the majority level of another factor. For example, given the dataframe: Plot1 big Plot1 big Plot1 small Plot2 big Plot2 small Plot2 small Plot3 small Plot3 small Plot3 small My desired result would be: Plot1 big Plot2 small Plot3 small I can't seem to find a scalar function that will give me the majority level. Thanks in advance, Jonathan Thompson __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Mike Lawrence Graduate Student, Department of Psychology, Dalhousie University Website: http://myweb.dal.ca/mc973993 Public calendar: http://icalx.com/public/informavore/Public The road to wisdom? Well, it's plain and simple to express: Err and err and err again, but less and less and less. - Piet Hein __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] aggregate in zoo
Hi R-experts, Thanks very much to Jim Holtman and Gabor on my previous question. I am having another problem with data manipulation in zoo. The following is data (Z) for first business day of every month in zoo format. I am trying to get mean of open for each year. I subset Z - Z[,2] then sapply(split(Z, format(index(Z), %Y)),mean) I get error message: 2000 2001 2002 2003 2004 2005 2006 2007 NA NA NA NA NA NA NA NA Warning messages: 1: argument is not numeric or logical: returning NA in: mean.default(X[[1]], ...) 2: argument is not numeric or logical: returning NA in: mean.default(X[[2]], ...) etc... Any help on what I'm missing would be appreciated. I am particularly confused by the fact that the command used works fine on the original data file (i.e. before subsetting by first day of month). Sorry if I have overlooked something very simple. Z dayofmonthopen 2000-02-01 011636.10 2000-03-01 011596.75 2000-04-03 031737.70 2000-05-01 011695.65 2000-06-01 011651.90 2000-07-03 031669.20 2000-08-01 011628.35 2000-09-01 011717.35 2000-10-02 021614.55 2000-11-01 011587.10 2000-12-01 011475.60 2001-01-02 021450.65 2001-02-01 011503.60 2001-03-01 011351.95 2001-04-02 021268.10 2001-05-01 011369.20 2001-06-01 011362.75 2001-07-02 021331.55 2001-08-01 011309.70 2001-09-04 041235.55 2001-10-01 011109.20 2001-11-01 011155.55 2001-12-03 031207.30 Thank you, Alfonso Sammassimo Melbourne, Australia __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] aggregate similar to SPSS
Hi, Does anyone know if: with R can you take a set of numbers and aggregate them like you can in SPSS? For example, could you calculate the percentage of people who smoke based on a dataset like the following: smoke = 1 non-smoke = 2 variable 1 1 1 2 2 1 1 1 2 2 2 2 2 2 When aggregated, SPSS can tell you what percentage of persons are smokers based on the frequency of 1's and 2's. Can R statistical package do a similar thing? Thanks, Nat __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate similar to SPSS
?table On Wednesday 25 April 2007 14:32, Natalie O'Toole wrote: Hi, Does anyone know if: with R can you take a set of numbers and aggregate them like you can in SPSS? For example, could you calculate the percentage of people who smoke based on a dataset like the following: smoke = 1 non-smoke = 2 variable 1 1 1 2 2 1 1 1 2 2 2 2 2 2 When aggregated, SPSS can tell you what percentage of persons are smokers based on the frequency of 1's and 2's. Can R statistical package do a similar thing? Thanks, Nat __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate similar to SPSS
Hi Nat, can I suggest, without offending, that you purchase and read Peter Dalgaard's Introductory Statistics with R or Michael Crawley's Statistics: An Introduction using R or Venables and Ripley's Modern Applied Statistics with S or Maindonald and Braun's Data Analysis and Graphics Using R: An Example-based Approach, or download and read An Introduction to R http://cran.r-project.org/doc/manuals/R-intro.pdf or one of the numerous contributed documents at http://cran.r-project.org/other-docs.html ? I hope that this helps, Andrew. On Wed, Apr 25, 2007 at 03:32:11PM -0600, Natalie O'Toole wrote: Hi, Does anyone know if: with R can you take a set of numbers and aggregate them like you can in SPSS? For example, could you calculate the percentage of people who smoke based on a dataset like the following: smoke = 1 non-smoke = 2 variable 1 1 1 2 2 1 1 1 2 2 2 2 2 2 When aggregated, SPSS can tell you what percentage of persons are smokers based on the frequency of 1's and 2's. Can R statistical package do a similar thing? Thanks, Nat __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Department of Mathematics and StatisticsTel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 http://www.ms.unimelb.edu.au/~andrewpr http://blogs.mbs.edu/fishing-in-the-bay/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate similar to SPSS
Andrew Robinson [EMAIL PROTECTED] wrote: can I suggest, without offending, that you purchase and read Peter Dalgaard's Introductory Statistics with R or Michael Crawley's Statistics: An Introduction using R or Venables and Ripley's Modern Applied Statistics with S or Maindonald and Braun's Data Analysis and Graphics Using R: An Example-based Approach, or download and read An Introduction to R http://cran.r-project.org/doc/manuals/R-intro.pdf or one of the numerous contributed documents at http://cran.r-project.org/other-docs.html For Natalie, who is an SPSS user, may I strongly recommend R FOR SAS AND SPSS USERS by Bob Muenchen at http://oit.utk.edu/scc/RforSASSPSSusers.pdf This is a really, really excellent document which has proven to be an invaluable resource in introducing my SAS and SPSS using collegaues tot he delights or R. And it is free (as in available at no cost). Tim C On Wed, Apr 25, 2007 at 03:32:11PM -0600, Natalie O'Toole wrote: Hi, Does anyone know if: with R can you take a set of numbers and aggregate them like you can in SPSS? For example, could you calculate the percentage of people who smoke based on a dataset like the following: smoke = 1 non-smoke = 2 variable 1 1 1 2 2 1 1 1 2 2 2 2 2 2 When aggregated, SPSS can tell you what percentage of persons are smokers based on the frequency of 1's and 2's. Can R statistical package do a similar thing? Thanks, Nat __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Department of Mathematics and StatisticsTel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 http://www.ms.unimelb.edu.au/~andrewpr http://blogs.mbs.edu/fishing-in-the-bay/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] aggregate function
Hello, is there a way to use the aggregate function to calculate monthly mean in case i have one row in data frame that holds the date like -mm-dd? i know that it works for daily means. i also like to do it for monthly and yearly means. maybe there is something like aggregate(x, list(Date[%m]), mean)? the data frame looks like: DateTimez 2006-01-01 21:00 6,2 2006-01-01 22:00 5,7 2006-01-01 23:00 3,2 2006-01-02 00:00 7,8 2006-01-02 01:00 6,8 2006-01-02 02:00 5,6 . . . 2007-03-30 22:00 5,2 2007-03-30 23:00 8,3 2007-03-31 00:00 6,4 2007-03-31 01:00 7,4 thanks for help! -- Michél Schnitz [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate function
try this. The first group of lines recreates your data frame, DF, and the last line is the aggregate: Input - DateTimez 2006-01-01 21:00 6,2 2006-01-01 22:00 5,7 2006-01-01 23:00 3,2 2006-01-02 00:00 7,8 2006-01-02 01:00 6,8 2006-01-02 02:00 5,6 2007-03-30 22:00 5,2 2007-03-30 23:00 8,3 2007-03-31 00:00 6,4 2007-03-31 01:00 7,4 DF - read.table(textConnection(Input), header = TRUE, as.is = TRUE) DF$z - as.numeric(sub(,, ., DF$z)) DF$Date - as.Date(DF$Date) aggregate(DF[z], list(yearmon = format(DF$Date, %Y-%m)), mean) On 4/23/07, Michel Schnitz [EMAIL PROTECTED] wrote: Hello, is there a way to use the aggregate function to calculate monthly mean in case i have one row in data frame that holds the date like -mm-dd? i know that it works for daily means. i also like to do it for monthly and yearly means. maybe there is something like aggregate(x, list(Date[%m]), mean)? the data frame looks like: DateTimez 2006-01-01 21:00 6,2 2006-01-01 22:00 5,7 2006-01-01 23:00 3,2 2006-01-02 00:00 7,8 2006-01-02 01:00 6,8 2006-01-02 02:00 5,6 . . . 2007-03-30 22:00 5,2 2007-03-30 23:00 8,3 2007-03-31 00:00 6,4 2007-03-31 01:00 7,4 thanks for help! -- Michél Schnitz [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate function
it works. thanks a lot. Gabor Grothendieck wrote: try this. The first group of lines recreates your data frame, DF, and the last line is the aggregate: Input - DateTimez 2006-01-01 21:00 6,2 2006-01-01 22:00 5,7 2006-01-01 23:00 3,2 2006-01-02 00:00 7,8 2006-01-02 01:00 6,8 2006-01-02 02:00 5,6 2007-03-30 22:00 5,2 2007-03-30 23:00 8,3 2007-03-31 00:00 6,4 2007-03-31 01:00 7,4 DF - read.table(textConnection(Input), header = TRUE, as.is = TRUE) DF$z - as.numeric(sub(,, ., DF$z)) DF$Date - as.Date(DF$Date) aggregate(DF[z], list(yearmon = format(DF$Date, %Y-%m)), mean) On 4/23/07, Michel Schnitz [EMAIL PROTECTED] wrote: Hello, is there a way to use the aggregate function to calculate monthly mean in case i have one row in data frame that holds the date like -mm-dd? i know that it works for daily means. i also like to do it for monthly and yearly means. maybe there is something like aggregate(x, list(Date[%m]), mean)? the data frame looks like: DateTimez 2006-01-01 21:00 6,2 2006-01-01 22:00 5,7 2006-01-01 23:00 3,2 2006-01-02 00:00 7,8 2006-01-02 01:00 6,8 2006-01-02 02:00 5,6 . . . 2007-03-30 22:00 5,2 2007-03-30 23:00 8,3 2007-03-31 00:00 6,4 2007-03-31 01:00 7,4 thanks for help! -- Michél Schnitz [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michél Schnitz [EMAIL PROTECTED] Scharrenstrasse 07 06108 Halle-Saale phone: +0049-(0)345- 290 85 24 mobile:+0049-(0)176- 239 000 64 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate function
If monthly should aggregate per -mm combination, you could try something like aggregate(x$z,list(cut(as.Date(x$Date),m)),mean) for monthly aggregation and aggregate(x$z,list(cut(as.Date(x$Date),y)),mean) for yearly means. If monthly aggregation should aggregate over different years (and produce only 12 numbers), maybe aggregate(x$z, list(format(as.Date(x$Date),%m)),mean) works (everything untested). Be sure to use R 2.4.1 patched or 2.5.0, since there was a bug in cut.Date which prevents the yearly aggregation from working properly before R 2.4.1 patched! Regards, Martin Michel Schnitz wrote: Hello, is there a way to use the aggregate function to calculate monthly mean in case i have one row in data frame that holds the date like -mm-dd? i know that it works for daily means. i also like to do it for monthly and yearly means. maybe there is something like aggregate(x, list(Date[%m]), mean)? the data frame looks like: Date Timez 2006-01-0121:00 6,2 2006-01-0122:00 5,7 2006-01-0123:00 3,2 2006-01-0200:00 7,8 2006-01-0201:00 6,8 2006-01-0202:00 5,6 . . . 2007-03-3022:00 5,2 2007-03-3023:00 8,3 2007-03-3100:00 6,4 2007-03-3101:00 7,4 thanks for help! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Aggregate with numerous factors
Dear list members, I am facing some problems using the aggregate() function. I want to calculate a sum and a mean of one variable over the combination of 12 factors with the aggregate() function to avoid loops but it doesn't work (or the job is far too long, it exceeds 2 hours). It works with a fewer number of factors, so I constructed a factor being the levels combination of 7 factors (I need the other ones being on their own). I had then 6 factors, but it still doesn't work. Could someone tell me how to fix the problem or know another function I could use ? Thank you very much, Joachim Claudet. -- º))) º))) º))) º))) º))) º))) º))) º))) Joachim Claudet PhD EPHE - CNRS FRE 2935 52, avenue Paul Alduy 66860 Perpignan cedex Tel : 33 4 68662055 Fax : 33 4 68503686 º))) º))) º))) º))) º))) º))) º))) º))) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggregate with numerous factors
Joachim Claudet wrote: Dear list members, I am facing some problems using the aggregate() function. I want to calculate a sum and a mean of one variable over the combination of 12 factors with the aggregate() function to avoid loops but it doesn't work (or the job is far too long, it exceeds 2 hours). It works with a fewer number of factors, so I constructed a factor being the levels combination of 7 factors (I need the other ones being on their own). I had then 6 factors, but it still doesn't work. Could someone tell me how to fix the problem or know another function I could use ? Thank you very much, Joachim Claudet. aggregate() is (currently) a wrapper for tapply(), so generates a table which is indexed by the cartesian product of all the factors. If many cells are empty, you can reduce the work by calculating the interaction factor up front and remove levels that are not present in the data. This is pretty much the idea you already had, unless you forgot the bit about removing unused levels. You could potentially extend the idea to all 12 factors, and then extract the ones you want on their own from the result. Alternatively, rewrite aggregate() and send us a patch ;-) It is not necessarily all that hard. Here's a rough idea IX - as.data.frame(by) OO - do.call(order,IX) Y - x[OO,] g - cumsum(!duplicated(IX)) FF - unique(IX) cbind(FF, sapply(split(x,g),FUN)) (completely untested, of course, and if it works, it works only for a single-column x; otherwise, you need a loop over the columns somehow.) -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggregate with numerous factors
Peter Dalgaard wrote: Alternatively, rewrite aggregate() and send us a patch ;-) It is not necessarily all that hard. Here's a rough idea IX - as.data.frame(by) OO - do.call(order,IX) Y - x[OO,] g - cumsum(!duplicated(IX)) FF - unique(IX) cbind(FF, sapply(split(x,g),FUN)) (completely untested, of course, and if it works, it works only for a single-column x; otherwise, you need a loop over the columns somehow. I see two glaring blunders already... You need IX[OO,] in two places, and split(Y, g) not x -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Aggregate?
Hi All, I think i'm failing to undersatnd how aggregate() is supposed to work. example: test1-sample(c(0,1),100,replace=T) test2-sample(letters,100,replace=T) aggregate(test1,list(test2),sum) Error in data.frame(w, lapply(y, unlist, use.names = FALSE)) : arguments imply differing number of rows: 26, 0 I thought this would give me a list containing the number of ones that belong to each letter. What am I doing wrong? Thanks in advance, Gustaf -- email:[EMAIL PROTECTED] tel: +46(0)703051451 address: Kantorsgatan 50:190 75424 Uppsala Sweden __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggregate?
It does that for me without errors ... (R 2.3.1 on Mac OSX 10.4.8) Best, Ingmar From: Gustaf Rydevik [EMAIL PROTECTED] Date: Fri, 8 Dec 2006 12:58:01 +0300 To: r-help@stat.math.ethz.ch Subject: [R] Aggregate? Hi All, I think i'm failing to undersatnd how aggregate() is supposed to work. example: test1-sample(c(0,1),100,replace=T) test2-sample(letters,100,replace=T) aggregate(test1,list(test2),sum) Error in data.frame(w, lapply(y, unlist, use.names = FALSE)) : arguments imply differing number of rows: 26, 0 I thought this would give me a list containing the number of ones that belong to each letter. What am I doing wrong? Thanks in advance, Gustaf -- email:[EMAIL PROTECTED] tel: +46(0)703051451 address: Kantorsgatan 50:190 75424 Uppsala Sweden __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggregate?
Hi look to your workspace by ls(). I bet there is some mismatch in variables as your example works for me without any error. You probably redefined sum function. test1-sample(c(0,1),100,replace=T) test2-sample(letters,100,replace=T) aggregate(test1,list(test2),sum) Group.1 x 1b 1 2c 3 3d 1 4e 4 sum-5 aggregate(test1,list(test2),sum) Error in FUN(X[[1]], ...) : argument INDEX is missing, with no default HTH Petr On 8 Dec 2006 at 12:58, Gustaf Rydevik wrote: Date sent: Fri, 8 Dec 2006 12:58:01 +0300 From: Gustaf Rydevik [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Subject:[R] Aggregate? Hi All, I think i'm failing to undersatnd how aggregate() is supposed to work. example: test1-sample(c(0,1),100,replace=T) test2-sample(letters,100,replace=T) aggregate(test1,list(test2),sum) Error in data.frame(w, lapply(y, unlist, use.names = FALSE)) : arguments imply differing number of rows: 26, 0 I thought this would give me a list containing the number of ones that belong to each letter. What am I doing wrong? Thanks in advance, Gustaf -- email:[EMAIL PROTECTED] tel: +46(0)703051451 address: Kantorsgatan 50:190 75424 Uppsala Sweden __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggregate with multiple statistics?
Try the summary function, which pretty much does exactly that. -Alex On 20 Oct 2006, at 23:44, Jonathan Greenberg wrote: Is there a way to calculate, say, the mean, min and max using aggregate using one line of code? Or do I need to call them separately (e.g. aggregate(...,mean); aggregate(...,min)) and then merge the data back together? --j -- Jonathan A. Greenberg, PhD NRC Research Associate NASA Ames Research Center MS 242-4 Moffett Field, CA 94035-1000 Office: 650-604-5896 Cell: 415-794-5043 AIM: jgrn307 MSN: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Aggregate with multiple statistics?
Is there a way to calculate, say, the mean, min and max using aggregate using one line of code? Or do I need to call them separately (e.g. aggregate(...,mean); aggregate(...,min)) and then merge the data back together? --j -- Jonathan A. Greenberg, PhD NRC Research Associate NASA Ames Research Center MS 242-4 Moffett Field, CA 94035-1000 Office: 650-604-5896 Cell: 415-794-5043 AIM: jgrn307 MSN: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggregate with multiple statistics?
Try summaryBy in package doBy. e.g. using the built in dataset CO2: summaryBy(uptake ~ Plant, CO2, FUN = c(mean, min, max)) On 10/20/06, Jonathan Greenberg [EMAIL PROTECTED] wrote: Is there a way to calculate, say, the mean, min and max using aggregate using one line of code? Or do I need to call them separately (e.g. aggregate(...,mean); aggregate(...,min)) and then merge the data back together? --j -- Jonathan A. Greenberg, PhD NRC Research Associate NASA Ames Research Center MS 242-4 Moffett Field, CA 94035-1000 Office: 650-604-5896 Cell: 415-794-5043 AIM: jgrn307 MSN: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggregate with multiple statistics?
Try summaryBy in package doBy. e.g. using the built in dataset CO2: summaryBy(uptake ~ Plant, CO2, FUN = c(mean, min, max)) Or with reshape with a little more work: cm - melt(CO2, id=1:4) cast(cm, Type ~ Treatment, c(min,mean,max)) but you get some extra flexibility: cast(cm, result_variable + Type ~ Treatment, c(min,mean,max)) cast(cm, Type ~ Treatment ~ result_variable, c(min,mean,max)) cast(cm, Type + Treatment ~ result_variable, c(min,mean,max)) Regards, Hadley __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Aggregate Values for All Levels of a Factor
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I'm a novice user trying to figure out how to retain NA aggregate values. For example, given a data frame with data for 3 of the 4 possible factor colors(orange is omitted from the data frame), I want to calculate the average height by color, but I'd like to retain the knowledge that orange is a possible factor, its just missing. Here is the example code: data - data.frame(color = factor(c(blue,red,red,green,blue), levels = c(blue,red,green,orange)), height = c(2,8,4,4,5)) aggregate(data$height, list(color = data$color), mean) color x 1 blue 3.5 2 red 6.0 3 green 4.0 Instead I would like to get color x 1 blue 3.5 2red 6.0 3 green 4.0 4 orange NA Is this possible. I've read as much documentation as I can find, but am unable to find the solution. It seems like something people would need to do. So I would assume it must be built in somewhere or do I need to write my own version of aggregate? Thanks in advance, Kaom -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFJYrLaaZgZdCbWv4RApNoAJ9jqKXne3IlQnd+PprS+7Kz1l4oRACfeu5I Nv/xYWVsSGJD5+fdCP+02jk= =b5TI -END PGP SIGNATURE- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggregate Values for All Levels of a Factor
On Thu, 2006-10-05 at 15:44 -0700, Kaom Te wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I'm a novice user trying to figure out how to retain NA aggregate values. For example, given a data frame with data for 3 of the 4 possible factor colors(orange is omitted from the data frame), I want to calculate the average height by color, but I'd like to retain the knowledge that orange is a possible factor, its just missing. Here is the example code: data - data.frame(color = factor(c(blue,red,red,green,blue), levels = c(blue,red,green,orange)), height = c(2,8,4,4,5)) aggregate(data$height, list(color = data$color), mean) color x 1 blue 3.5 2 red 6.0 3 green 4.0 Instead I would like to get color x 1 blue 3.5 2red 6.0 3 green 4.0 4 orange NA Is this possible. I've read as much documentation as I can find, but am unable to find the solution. It seems like something people would need to do. So I would assume it must be built in somewhere or do I need to write my own version of aggregate? Thanks in advance, Kaom If you review the Details section of ?aggregate, you will note: Empty subsets are removed, ... Thus, one approach is: tmp - tapply(data$height, data$color, mean, na.rm = TRUE) tmp bluered green orange 3.56.04.0 NA DF - data.frame(color = names(tmp), mean.height = tmp, row.names = seq(along = tmp)) DF color mean.height 1 blue 3.5 2red 6.0 3 green 4.0 4 orange NA HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] aggregate function with 'NA'
Dear r-help reader, I have some problems with the aggregate function. My datframe looks like frame Day Time V1 V2 1 M0 3 NA 2 M0 4 NA 3 M0 5 2 4 M1 NA 4 5 M1 10 6 6 T0 4 45 7 T1 4 3 8 T1 3 2 9 T1 6 1 I used the aggegate function to obtain the mean in V1 and V2 over the grouping variable Time and Day aggregate(frame[,c(-1)],list(frame$Day,frame$Time),mean) Group.1 Group.2 Time V1 V2 1 M 00 4.00 NA 2 T 00 4.00 45 3 M 11 NA 5 4 T 11 4.33 2 My problem is now that I do not obtain a 'mean' for Day=M/Time=0 and Day=M/Time=1, because aggregate ignores all values for a grouping variable if NA occurs. I'm now hoping for some help so that the mean is still calculated in this group. My table should look like: aggregate(frame[,c(-1)],list(frame$Day,frame$Time),mean) Group.1 Group.2 Time V1 V2 1 M 00 4.00 2 2 T 00 4.00 45 3 M 11 10 5 4 T 11 4.33 2 I hope my description makes sense and appreciate any help. Yours Frank [[alternative text/enriched version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate function with 'NA'
Frank [EMAIL PROTECTED] writes: aggregate(frame[,c(-1)],list(frame$Day,frame$Time),mean) My problem is now that I do not obtain a 'mean' for Day=M/Time=0 and Day=M/Time=1, because aggregate ignores all values for a grouping variable if NA occurs. No. But mean() will give an NA result if any vaues are NA. I'm now hoping for some help so that the mean is still calculated in this group. add na.rm=TRUE -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate function with 'NA'
aggregate(frame[,c(-1)],list(frame$Day,frame$Time),mean, na.rm=T) 2006/10/1, Frank [EMAIL PROTECTED]: Dear r-help reader, I have some problems with the aggregate function. My datframe looks like frame Day Time V1 V2 1 M0 3 NA 2 M0 4 NA 3 M0 5 2 4 M1 NA 4 5 M1 10 6 6 T0 4 45 7 T1 4 3 8 T1 3 2 9 T1 6 1 I used the aggegate function to obtain the mean in V1 and V2 over the grouping variable Time and Day aggregate(frame[,c(-1)],list(frame$Day,frame$Time),mean) Group.1 Group.2 Time V1 V2 1 M 00 4.00 NA 2 T 00 4.00 45 3 M 11 NA 5 4 T 11 4.33 2 My problem is now that I do not obtain a 'mean' for Day=M/Time=0 and Day=M/Time=1, because aggregate ignores all values for a grouping variable if NA occurs. I'm now hoping for some help so that the mean is still calculated in this group. My table should look like: aggregate(frame[,c(-1)],list(frame$Day,frame$Time),mean) Group.1 Group.2 Time V1 V2 1 M 00 4.00 2 2 T 00 4.00 45 3 M 11 10 5 4 T 11 4.33 2 I hope my description makes sense and appreciate any help. Yours Frank [[alternative text/enriched version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Johan Sandblom N8, MRC, Karolinska sjh t +46851776108 17176 Stockholm m +46735521477 Sweden What is wanted is not the will to believe, but the will to find out, which is the exact opposite - Bertrand Russell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate function with 'NA'
See ?mean and note the na.rm= argument: aggregate(frame[-1], frame[1:2], mean, na.rm = TRUE) On 10/1/06, Frank [EMAIL PROTECTED] wrote: Dear r-help reader, I have some problems with the aggregate function. My datframe looks like frame Day Time V1 V2 1 M0 3 NA 2 M0 4 NA 3 M0 5 2 4 M1 NA 4 5 M1 10 6 6 T0 4 45 7 T1 4 3 8 T1 3 2 9 T1 6 1 I used the aggegate function to obtain the mean in V1 and V2 over the grouping variable Time and Day aggregate(frame[,c(-1)],list(frame$Day,frame$Time),mean) Group.1 Group.2 Time V1 V2 1 M 00 4.00 NA 2 T 00 4.00 45 3 M 11 NA 5 4 T 11 4.33 2 My problem is now that I do not obtain a 'mean' for Day=M/Time=0 and Day=M/Time=1, because aggregate ignores all values for a grouping variable if NA occurs. I'm now hoping for some help so that the mean is still calculated in this group. My table should look like: aggregate(frame[,c(-1)],list(frame$Day,frame$Time),mean) Group.1 Group.2 Time V1 V2 1 M 00 4.00 2 2 T 00 4.00 45 3 M 11 10 5 4 T 11 4.33 2 I hope my description makes sense and appreciate any help. Yours Frank [[alternative text/enriched version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate example : where is the state.region variable?
--- MARK LEEDS [EMAIL PROTECTED] wrote: these people/experts provide all these packages and documentation as a FAVOR and for the fact that they enjoy spreading knowledge/statistical computing abilities etc. It's not their job so I think criticism of the docs and the fact that they use a variable from another place is kind of harsh. Mark I am very appeciative of the time, expertise and great helpfulness that I have seen in the R community. If there is no criticism of R then how do we find out about problems that may exist? - Original Message - From: John Kane [EMAIL PROTECTED] To: Gabor Grothendieck [EMAIL PROTECTED] Cc: R R-help r-help@stat.math.ethz.ch Sent: Monday, August 21, 2006 6:59 PM Subject: Re: [R] aggregate example : where is the state.region variable? --- Gabor Grothendieck [EMAIL PROTECTED] wrote: Its not part of state.x77. Its a completely separate variable. Try ls(package:datasets) and notice its in the list or try ?state.region and note that its a variable in datasets. Thanks. I was wondering if it was going something like that. However, it is a bloody stupid example, at least to a newbie. A call to another data.set in what is supposed to be a simple example is very confusing. When someone is apparently illustrating a function with a simple one line command I don't expect them to call another data set, apparently create a new variable (Region), and use that new variable as the grouping variable without a word of explanation of what the example is doing. If I sound a bit annoyed it is because I am. It might be nice to have an example illlustate the funtion,not do a couple of other undocumented things as well. On 8/21/06, John Kane [EMAIL PROTECTED] wrote: I was looking ?aggregate and ran the first example aggregate(state.x77, list(Region = state.region), mean) The variables in state.x77 appear to be : state.x77 Population Income Illiteracy Life Exp Murder HS Grad Frost Area Where is the state.region variable coming from? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate example : where is the state.region variable?
Gabor == Gabor Grothendieck [EMAIL PROTECTED] on Mon, 21 Aug 2006 21:03:49 -0400 writes: Gabor It is worthwhile to note that what is being Gabor illustrated here is aggregating a numeric matrix by a Gabor factor using the aggregate.default method and, of Gabor course, a factor can't be part of a numeric matrix. Gabor Of course, that is not say that the examples could Gabor not be improved in terms of clarity, simplicity and Gabor comprehensiveness (there is no example of Gabor aggregate.data.frame). yes, thank you, Gabor . and we (the R developers) have accepted and incorporated quite a few constructive proposals for improvement. Just offending the original authors (bloody ..) without adding any constructive proposal for improvement doesn't really help. You can always get the money back you paid for R. You can also decide to leave this mailing list and get the money back you paid for that service. Unfortunately, we can't get the time and energy back we've lost when dealing with such postings... Martin Maechler, ETH Zurich Gabor On 8/21/06, John Kane [EMAIL PROTECTED] wrote: --- Gabor Grothendieck [EMAIL PROTECTED] wrote: Its not part of state.x77. Its a completely separate variable. Try ls(package:datasets) and notice its in the list or try ?state.region and note that its a variable in datasets. Thanks. I was wondering if it was going something like that. However, it is a bloody stupid example, at least to a newbie. A call to another data.set in what is supposed to be a simple example is very confusing. When someone is apparently illustrating a function with a simple one line command I don't expect them to call another data set, apparently create a new variable (Region), and use that new variable as the grouping variable without a word of explanation of what the example is doing. If I sound a bit annoyed it is because I am. It might be nice to have an example illlustate the funtion,not do a couple of other undocumented things as well. On 8/21/06, John Kane [EMAIL PROTECTED] wrote: I was looking ?aggregate and ran the first example aggregate(state.x77, list(Region = state.region), mean) The variables in state.x77 appear to be : state.x77 Population Income Illiteracy Life Exp Murder HS Grad Frost Area Where is the state.region variable coming from? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com Gabor __ Gabor R-help@stat.math.ethz.ch mailing list Gabor https://stat.ethz.ch/mailman/listinfo/r-help PLEASE Gabor do read the posting guide Gabor http://www.R-project.org/posting-guide.html and Gabor provide commented, minimal, self-contained, Gabor reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate example : where is the state.region variable?
there is no factor in the dataset but why there is not one and why a call to another dataset is totally opaque. The reason is purely historical. The state dataset is about 10 years older than the data.frame concept. At the time the state.* variables were constructed it was not possible to put numeric data and factor data into the same rectangular structure. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate example : where is the state.region variable?
--- Richard M. Heiberger [EMAIL PROTECTED] wrote: there is no factor in the dataset but why there is not one and why a call to another dataset is totally opaque. The reason is purely historical. The state dataset is about 10 years older than the data.frame concept. At the time the state.* variables were constructed it was not possible to put numeric data and factor data into the same rectangular structure. I see. So originally the example would have been more obvious. Thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] aggregate example : where is the state.region variable?
I was looking ?aggregate and ran the first example aggregate(state.x77, list(Region = state.region), mean) The variables in state.x77 appear to be : state.x77 Population Income Illiteracy Life Exp Murder HS Grad Frost Area Where is the state.region variable coming from? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate example : where is the state.region variable?
Its not part of state.x77. Its a completely separate variable. Try ls(package:datasets) and notice its in the list or try ?state.region and note that its a variable in datasets. On 8/21/06, John Kane [EMAIL PROTECTED] wrote: I was looking ?aggregate and ran the first example aggregate(state.x77, list(Region = state.region), mean) The variables in state.x77 appear to be : state.x77 Population Income Illiteracy Life Exp Murder HS Grad Frost Area Where is the state.region variable coming from? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate example : where is the state.region variable?
On Mon, 21 Aug 2006, John Kane wrote: I was looking ?aggregate and ran the first example aggregate(state.x77, list(Region = state.region), mean) The variables in state.x77 appear to be : state.x77 Population Income Illiteracy Life Exp Murder HS Grad Frost Area Where is the state.region variable coming from? find(state.region) [1] package:datasets Try ?state.region for more info. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate example : where is the state.region variable?
--- Gabor Grothendieck [EMAIL PROTECTED] wrote: Its not part of state.x77. Its a completely separate variable. Try ls(package:datasets) and notice its in the list or try ?state.region and note that its a variable in datasets. Thanks. I was wondering if it was going something like that. However, it is a bloody stupid example, at least to a newbie. A call to another data.set in what is supposed to be a simple example is very confusing. When someone is apparently illustrating a function with a simple one line command I don't expect them to call another data set, apparently create a new variable (Region), and use that new variable as the grouping variable without a word of explanation of what the example is doing. If I sound a bit annoyed it is because I am. It might be nice to have an example illlustate the funtion,not do a couple of other undocumented things as well. On 8/21/06, John Kane [EMAIL PROTECTED] wrote: I was looking ?aggregate and ran the first example aggregate(state.x77, list(Region = state.region), mean) The variables in state.x77 appear to be : state.x77 Population Income Illiteracy Life Exp Murder HS Grad Frost Area Where is the state.region variable coming from? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate example : where is the state.region variable?
these people/experts provide all these packages and documentation as a FAVOR and for the fact that they enjoy spreading knowledge/statistical computing abilities etc. It's not their job so I think criticism of the docs and the fact that they use a variable from another place is kind of harsh. Mark - Original Message - From: John Kane [EMAIL PROTECTED] To: Gabor Grothendieck [EMAIL PROTECTED] Cc: R R-help r-help@stat.math.ethz.ch Sent: Monday, August 21, 2006 6:59 PM Subject: Re: [R] aggregate example : where is the state.region variable? --- Gabor Grothendieck [EMAIL PROTECTED] wrote: Its not part of state.x77. Its a completely separate variable. Try ls(package:datasets) and notice its in the list or try ?state.region and note that its a variable in datasets. Thanks. I was wondering if it was going something like that. However, it is a bloody stupid example, at least to a newbie. A call to another data.set in what is supposed to be a simple example is very confusing. When someone is apparently illustrating a function with a simple one line command I don't expect them to call another data set, apparently create a new variable (Region), and use that new variable as the grouping variable without a word of explanation of what the example is doing. If I sound a bit annoyed it is because I am. It might be nice to have an example illlustate the funtion,not do a couple of other undocumented things as well. On 8/21/06, John Kane [EMAIL PROTECTED] wrote: I was looking ?aggregate and ran the first example aggregate(state.x77, list(Region = state.region), mean) The variables in state.x77 appear to be : state.x77 Population Income Illiteracy Life Exp Murder HS Grad Frost Area Where is the state.region variable coming from? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate example : where is the state.region variable?
It is worthwhile to note that what is being illustrated here is aggregating a numeric matrix by a factor using the aggregate.default method and, of course, a factor can't be part of a numeric matrix. Of course, that is not say that the examples could not be improved in terms of clarity, simplicity and comprehensiveness (there is no example of aggregate.data.frame). On 8/21/06, John Kane [EMAIL PROTECTED] wrote: --- Gabor Grothendieck [EMAIL PROTECTED] wrote: Its not part of state.x77. Its a completely separate variable. Try ls(package:datasets) and notice its in the list or try ?state.region and note that its a variable in datasets. Thanks. I was wondering if it was going something like that. However, it is a bloody stupid example, at least to a newbie. A call to another data.set in what is supposed to be a simple example is very confusing. When someone is apparently illustrating a function with a simple one line command I don't expect them to call another data set, apparently create a new variable (Region), and use that new variable as the grouping variable without a word of explanation of what the example is doing. If I sound a bit annoyed it is because I am. It might be nice to have an example illlustate the funtion,not do a couple of other undocumented things as well. On 8/21/06, John Kane [EMAIL PROTECTED] wrote: I was looking ?aggregate and ran the first example aggregate(state.x77, list(Region = state.region), mean) The variables in state.x77 appear to be : state.x77 Population Income Illiteracy Life Exp Murder HS Grad Frost Area Where is the state.region variable coming from? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] aggregate data.frame by one column
Hi, everyone, I have a data.frame named eva like this: IND PARTNO VC1 EO1 EO2 EO3 EO4 EO5 114 114001 2 5 4 4 5 4 114 114001 2 4 4 4 4 4 114 114001 2 4 NA NA NA NA 112 112002 2 3 3 6 2 6 112 112002 2 1 1 3 4 4 112 112003 2 6 6 6 5 6 112 112003 2 5 7 6 6 6 112 112003 2 6 6 6 4 5 114 114004 2 2 3 3 2 4 114 114004 2 5 3 4 4 2 114 114004 2 NA NA NA NA NA 113 113005 2 5 5 6 6 5 113 113005 2 7 7 4 7 6 111 111006 2 5 7 7 7 7 112 112007 2 7 7 7 2 2 112 112007 2 6 6 6 1 2 112 112007 2 7 6 6 2 2 111 111008 2 4 1 3 1 4 111 111008 2 3 1 5 3 2 This is only a small part of the whole data. PARTNO is a digit variable and I want to use it as a group variable to aggreate other variables. What I want to get looks like this: IND PARTNO NUM VC1 EO1 EO2 EO3 EO4 EO5 114 114001 3 2 4.3 4 4 4.5 4 112 112002 2 2 2 2 4.5 3 5 112 112003 3 2 5.7 6.3 6 5 5.7 114 114004 3 2 3.5 3 3.5 3 3 113 113005 2 2 6 6 5 6.5 5.5 111 111006 1 2 5 7 7 7 7 112 112007 3 2 6.7 6.3 6.3 1.7 2 111 111008 2 2 3.5 1 4 2 3 NUM is a newly added variable which indicates the case number of each group grouped by PARTNO. I have two questions on this manipulation. The first is how to get the newly added variable NUM. I have no idea on this question. The second is how to average other variables by group. If there are NA, I want the average operation is done on other cases. For example, the variable EO1 has values of 2, 5, and NA on case 114004. What I have done is aggregate(eva[,-2], by=eva[,-2], mean) But it seems because there are NAs, the aggregate cannot process. Because the NA values are not a small part, I cannot use imputation methods. I'm not sure whether my operation is right. Does anyone have any suggestion on the two problems? Thanks in advance! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate data.frame by one column
Hi Wei-Wei, try this: eva.agg - aggregate(x = list( VC1=eva$VC1, EO1=eva$EO1, EO2=eva$EO2, EO3=eva$EO3, EO4=eva$EO4, EO5=eva$EO5 ), by = list(PARTNO=eva$PARTNO), FUN = mean, na.rm = TRUE) eva.agg$NUM - aggregate(eva$PARTNO, list(eva$PARTNO), length) Cheers Andrew On Fri, Jun 30, 2006 at 10:54:47AM +0800, Guo Wei-Wei wrote: Hi, everyone, I have a data.frame named eva like this: IND PARTNO VC1 EO1 EO2 EO3 EO4 EO5 114 114001 2 5 4 4 5 4 114 114001 2 4 4 4 4 4 114 114001 2 4 NA NA NA NA 112 112002 2 3 3 6 2 6 112 112002 2 1 1 3 4 4 112 112003 2 6 6 6 5 6 112 112003 2 5 7 6 6 6 112 112003 2 6 6 6 4 5 114 114004 2 2 3 3 2 4 114 114004 2 5 3 4 4 2 114 114004 2 NA NA NA NA NA 113 113005 2 5 5 6 6 5 113 113005 2 7 7 4 7 6 111 111006 2 5 7 7 7 7 112 112007 2 7 7 7 2 2 112 112007 2 6 6 6 1 2 112 112007 2 7 6 6 2 2 111 111008 2 4 1 3 1 4 111 111008 2 3 1 5 3 2 This is only a small part of the whole data. PARTNO is a digit variable and I want to use it as a group variable to aggreate other variables. What I want to get looks like this: IND PARTNO NUM VC1 EO1 EO2 EO3 EO4 EO5 114 114001 3 2 4.3 4 4 4.5 4 112 112002 2 2 2 2 4.5 3 5 112 112003 3 2 5.7 6.3 6 5 5.7 114 114004 3 2 3.5 3 3.5 3 3 113 113005 2 2 6 6 5 6.5 5.5 111 111006 1 2 5 7 7 7 7 112 112007 3 2 6.7 6.3 6.3 1.7 2 111 111008 2 2 3.5 1 4 2 3 NUM is a newly added variable which indicates the case number of each group grouped by PARTNO. I have two questions on this manipulation. The first is how to get the newly added variable NUM. I have no idea on this question. The second is how to average other variables by group. If there are NA, I want the average operation is done on other cases. For example, the variable EO1 has values of 2, 5, and NA on case 114004. What I have done is aggregate(eva[,-2], by=eva[,-2], mean) But it seems because there are NAs, the aggregate cannot process. Because the NA values are not a small part, I cannot use imputation methods. I'm not sure whether my operation is right. Does anyone have any suggestion on the two problems? Thanks in advance! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Andrew Robinson Department of Mathematics and StatisticsTel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 Email: [EMAIL PROTECTED] http://www.ms.unimelb.edu.au __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate data.frame by one column
Hi Andrew, Thank you very much! It works so well than I can expect. All the best, Wei-Wei 2006/6/30, Andrew Robinson [EMAIL PROTECTED]: Hi Wei-Wei, try this: eva.agg - aggregate(x = list( VC1=eva$VC1, EO1=eva$EO1, EO2=eva$EO2, EO3=eva$EO3, EO4=eva$EO4, EO5=eva$EO5 ), by = list(PARTNO=eva$PARTNO), FUN = mean, na.rm = TRUE) eva.agg$NUM - aggregate(eva$PARTNO, list(eva$PARTNO), length) Cheers Andrew __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Aggregate?
Hello, I have a data set with a grouping variable (TRIPID) and several other variables. TRIPID is repeated in some areas and I would like to use a function like aggregate to sum the variable UNITS according to TRIPID. However I would also like to retain the other variables as they are in the data set with the new summed TRIPID. So what I have is something like this: YEARMONTH DAY CONTINUESPL AREACOUNTY DEPTH DEPUNIT GEARGEAR2 TRAPS SOAKTIMEUNITS FACTOR DISPOSIT NUMSETS TRIPST TRIPID 19921 26 1 SP0073928 8 25 4 NA 100 NA NA NA 161 1 NA NA NA 02163399054 19921 26 1 SP0073928 8 25 4 NA 100 NA NA NA 8 1 NA NA NA 02163399054 19921 26 2 SP0004228 8 25 4 NA 100 NA NA NA 161 1 NA NA NA 02163399054 19921 26 2 SP0004228 8 25 4 NA 100 NA NA NA 8 1 NA NA NA 02163399054 19921 25 NA SP0052652 8 25 4 NA 100 NA NA NA 85 1 NA NA NA 02163399057 19921 26 NA SP0037940 8 25 4 NA 100 NA NA NA 70 1 NA NA NA 02163399058 19921 27 NA SP0072357 8 25 4 NA 100 NA NA NA 15 1 NA NA NA 02163399059 19921 27 NA SP0072357 8 25 4 NA 100 NA NA NA 20 1 NA NA NA 02163399059 19921 27 NA SP0026324 8 25 4 NA 100 NA NA NA 8 1 NA NA NA 02163399060 19921 28 1 SP0072357 8 25 4 NA 100 NA NA NA 2001 NA NA NA 02163399062 And what I want is this: YEARMONTH DAY CONTINUESPL AREACOUNTY DEPTH DEPUNIT GEARGEAR2 TRAPS SOAKTIMEUNITS FACTOR DISPOSIT NUMSETS TRIPST TRIPID 19921 26 1 SP0073928 8 25 4 NA 100 NA NA NA 3381 NA NA NA 02163399054 19921 25 NA SP0052652 8 25 4 NA 100 NA NA NA 85 1 NA NA NA 02163399057 19921 26 NA SP0037940 8 25 4 NA 100 NA NA NA 70 1 NA NA NA 02163399058 19921 27 NA SP0072357 8 25 4 NA 100 NA NA NA 35 1 NA NA NA 02163399059 19921 27 NA SP0026324 8 25 4 NA 100 NA NA NA 8 1 NA NA NA 02163399060 19921 28 1 SP0072357 8 25 4 NA 100 NA NA NA 2001 NA NA NA 02163399062 Does anyone know how to do this. Data file is attached. Thanks in advance Cameron Guenther, Ph.D. Associate Research Scientist FWC/FWRI, Marine Fisheries Research 100 8th Avenue S.E. St. Petersburg, FL 33701 (727)896-8626 Ext. 4305 [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Aggregate?
Suppose we want to sum C over levels of A and that B is constant within levels of A. Then: DF - data.frame(A = gl(2,2), B = gl(2,2), C = 1:4) # test data do.call(rbind, by(DF, DF$A, function(x) replace(x[1,], C, sum(x$C On 5/3/06, Guenther, Cameron [EMAIL PROTECTED] wrote: Hello, I have a data set with a grouping variable (TRIPID) and several other variables. TRIPID is repeated in some areas and I would like to use a function like aggregate to sum the variable UNITS according to TRIPID. However I would also like to retain the other variables as they are in the data set with the new summed TRIPID. So what I have is something like this: YEARMONTH DAY CONTINUESPL AREACOUNTY DEPTH DEPUNIT GEARGEAR2 TRAPS SOAKTIMEUNITS FACTOR DISPOSIT NUMSETS TRIPST TRIPID 19921 26 1 SP0073928 8 25 4 NA 100 NA NA NA 161 1 NA NA NA 02163399054 19921 26 1 SP0073928 8 25 4 NA 100 NA NA NA 8 1 NA NA NA 02163399054 19921 26 2 SP0004228 8 25 4 NA 100 NA NA NA 161 1 NA NA NA 02163399054 19921 26 2 SP0004228 8 25 4 NA 100 NA NA NA 8 1 NA NA NA 02163399054 19921 25 NA SP0052652 8 25 4 NA 100 NA NA NA 85 1 NA NA NA 02163399057 19921 26 NA SP0037940 8 25 4 NA 100 NA NA NA 70 1 NA NA NA 02163399058 19921 27 NA SP0072357 8 25 4 NA 100 NA NA NA 15 1 NA NA NA 02163399059 19921 27 NA SP0072357 8 25 4 NA 100 NA NA NA 20 1 NA NA NA 02163399059 19921 27 NA SP0026324 8 25 4 NA 100 NA NA NA 8 1 NA NA NA 02163399060 19921 28 1 SP0072357 8 25 4 NA 100 NA NA NA 2001 NA NA NA 02163399062 And what I want is this: YEARMONTH DAY CONTINUESPL AREACOUNTY DEPTH DEPUNIT GEARGEAR2 TRAPS SOAKTIMEUNITS FACTOR DISPOSIT NUMSETS TRIPST TRIPID 19921 26 1 SP0073928 8 25 4 NA 100 NA NA NA 3381 NA NA NA 02163399054 19921 25 NA SP0052652 8 25 4 NA 100 NA NA NA 85 1 NA NA NA 02163399057 19921 26 NA SP0037940 8 25 4 NA 100 NA NA NA 70 1 NA NA NA 02163399058 19921 27 NA SP0072357 8 25 4 NA 100 NA NA NA 35 1 NA NA NA 02163399059 19921 27 NA SP0026324 8 25 4 NA 100 NA NA NA 8 1 NA NA NA 02163399060 19921 28 1 SP0072357 8 25 4 NA 100 NA NA NA 2001 NA NA NA 02163399062 Does anyone know how to do this. Data file is attached. Thanks in advance Cameron Guenther, Ph.D. Associate Research Scientist FWC/FWRI, Marine Fisheries Research 100 8th Avenue S.E. St. Petersburg, FL 33701 (727)896-8626 Ext. 4305 [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list
Re: [R] aggregate function....
Nice trick, thx... Stéphane. On Wed, 2006-03-29 at 11:17 -0500, jim holtman wrote: try 'by': x S_id AF_Class count... R_gc_percent S_length 5 82644971 30 0.4835678 6 826449737 0.4835678 8 82645541 31 0.5138894 9 82645542 11 0.5138894 10 826455431 0.5138894 do.call('rbind', by(x, x$S_id, function(y) y[which.max(y $AF_Class),])) S_id AF_Class count... R_gc_percent S_length 8264497 826449737 0.4835678 8264554 826455431 0.5138894 On 3/29/06, Stephane CRUVEILLER [EMAIL PROTECTED] wrote: Dear R users, I have some trouble with the aggregate function. Here are my data daf S_id AF_Class count... R_gc_percent S_length 5 82644971 30 0.4835678 6 826449737 0.4835678 8 82645541 31 0.5138894 9 82645542 11 0.5138894 10 826455431 0.5138894 for a given S_id, I would like to select the line corresponding to the max count. To perform this, I used: aggregate(daf,list(daf$S_id),max) Group.1S_id AF_Class count... R_gc_percent S_length 1 8264497 82644973 30 0.4835678 2 8264554 82645543 31 0.5138894 which is ok for the count. But I realized that max function is also applied to AF_class (should be 1 and 1 instead of 3 and 3), so it seems that aggregate is not the appropriate function for that I want to do. Is there any other function I could use instead? Best whishes, Stéphane. -- == Stephane CRUVEILLER Ph. D. Genoscope - Centre National de Sequencage Atelier de Genomique Comparative 2, Rue Gaston Cremieux CP 5706 91057 Evry Cedex - France Phone: +33 (0)1 60 87 84 58 Fax: +33 (0)1 60 87 25 14 EMails: [EMAIL PROTECTED] ,[EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jim Holtman Cincinnati, OH +1 513 646 9390 (Cell) +1 513 247 0281 (Home) What the problem you are trying to solve? -- == Stephane CRUVEILLER Ph. D. Genoscope - Centre National de Sequencage Atelier de Genomique Comparative 2, Rue Gaston Cremieux CP 5706 91057 Evry Cedex - France Phone: +33 (0)1 60 87 84 58 Fax: +33 (0)1 60 87 25 14 EMails: [EMAIL PROTECTED] ,[EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aggregate function....
Dear R users, I have some trouble with the aggregate function. Here are my data daf S_id AF_Class count... R_gc_percent S_length 5 82644971 30 0.4835678 6 826449737 0.4835678 8 82645541 31 0.5138894 9 82645542 11 0.5138894 10 826455431 0.5138894 for a given S_id, I would like to select the line corresponding to the max count. To perform this, I used: aggregate(daf,list(daf$S_id),max) Group.1S_id AF_Class count... R_gc_percent S_length 1 8264497 82644973 30 0.4835678 2 8264554 82645543 31 0.5138894 which is ok for the count. But I realized that max function is also applied to AF_class (should be 1 and 1 instead of 3 and 3), so it seems that aggregate is not the appropriate function for that I want to do. Is there any other function I could use instead? Best whishes, Stéphane. -- == Stephane CRUVEILLER Ph. D. Genoscope - Centre National de Sequencage Atelier de Genomique Comparative 2, Rue Gaston Cremieux CP 5706 91057 Evry Cedex - France Phone: +33 (0)1 60 87 84 58 Fax: +33 (0)1 60 87 25 14 EMails: [EMAIL PROTECTED] ,[EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate function....
try 'by': x S_id AF_Class count... R_gc_percent S_length 5 82644971 30 0.4835678 6 826449737 0.4835678 8 82645541 31 0.5138894 9 82645542 11 0.5138894 10 826455431 0.5138894 do.call('rbind', by(x, x$S_id, function(y) y[which.max(y$AF_Class),])) S_id AF_Class count... R_gc_percent S_length 8264497 826449737 0.4835678 8264554 826455431 0.5138894 On 3/29/06, Stephane CRUVEILLER [EMAIL PROTECTED] wrote: Dear R users, I have some trouble with the aggregate function. Here are my data daf S_id AF_Class count... R_gc_percent S_length 5 82644971 30 0.4835678 6 826449737 0.4835678 8 82645541 31 0.5138894 9 82645542 11 0.5138894 10 826455431 0.5138894 for a given S_id, I would like to select the line corresponding to the max count. To perform this, I used: aggregate(daf,list(daf$S_id),max) Group.1S_id AF_Class count... R_gc_percent S_length 1 8264497 82644973 30 0.4835678 2 8264554 82645543 31 0.5138894 which is ok for the count. But I realized that max function is also applied to AF_class (should be 1 and 1 instead of 3 and 3), so it seems that aggregate is not the appropriate function for that I want to do. Is there any other function I could use instead? Best whishes, Stéphane. -- == Stephane CRUVEILLER Ph. D. Genoscope - Centre National de Sequencage Atelier de Genomique Comparative 2, Rue Gaston Cremieux CP 5706 91057 Evry Cedex - France Phone: +33 (0)1 60 87 84 58 Fax: +33 (0)1 60 87 25 14 EMails: [EMAIL PROTECTED] ,[EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jim Holtman Cincinnati, OH +1 513 646 9390 (Cell) +1 513 247 0281 (Home) What the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aggregate data.frame using column-specific functions
Dear Colleagues, does anybody know how to aggregate a data.frame using different functions for different columns? Sincerely ___ Markus Preisetanz Consultant Client Vela GmbH Albert-Roßhaupter-Str. 32 81369 München fon: +49 (0) 89 742 17-113 fax: +49 (0) 89 742 17-150 mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser E-Mail ist nicht gestattet. This e-mail may contain confidential and/or privileged infor...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate data.frame using column-specific functions
you can use mapply()... z - as.data.frame(matrix(1:3,3,3,T)) mapply(function(x,y) x(y), c(sum,prod,sum), z) Markus Preisetanz a écrit : Dear Colleagues, does anybody know how to aggregate a data.frame using different functions for different columns? Sincerely ___ Markus Preisetanz Consultant Client Vela GmbH Albert-Roßhaupter-Str. 32 81369 München fon: +49 (0) 89 742 17-113 fax: +49 (0) 89 742 17-150 mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser E-Mail ist nicht gestattet. This e-mail may contain confidential and/or privileged infor...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate data.frame using column-specific functions
Hi have you tried ?aggregate eg df1-aggregate(mydata, list(mean1=x1,mean2=x2),mean) - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate vs tapply; is there a middle ground?
Thanks Peter! I had a feeling that there must be a simpler, better, more elegant solution. /Hans Peter Dalgaard wrote: hadley wickham [EMAIL PROTECTED] writes: I faced a similar problem. Here's what I did tmp - data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10)) tmp1 - with(tmp,aggregate(C,list(A=A,B=B),sum)) tmp2 - expand.grid(A=sort(unique(tmp$A)),B=sort(unique(tmp$B))) merge(tmp2,tmp1,all.x=T) At least fewer than 10 extra lines of code. Anyone with a simpler solution? Well, you can almost do this in with the reshape package: tmp - data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10)) a - recast(tmp, A + B ~ ., sum) # see also recast(tmp, A ~ B, sum) add.all.combinations(a, row=A, cols = B) Where add.all.combinations basically does what you outlined above -- it would be easy enough to generalise to multiple dimensions. Anything wrong with as.data.frame(with(tmp,as.table(tapply(C,list(A=A,B=B),sum A B Freq 1 A a NA 2 B a -0.2524320 3 C a 3.8539264 4 D a NA 5 A c 0.7227294 6 B c -0.2694669 7 C c 0.4760957 8 D c NA 9 A e NA 10 B e 0.1800500 11 C e NA 12 D e -1.0350928 (except the silly colname, responseName=sum should fix that). -- * Hans Gardfjell Ecology and Environmental Science Umeå University 90187 Umeå, Sweden email: [EMAIL PROTECTED] phone: +46 907865267 mobile: +46 705984464 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aggregate vs tapply; is there a middle ground?
Dear all, I'm wanting to do a series of comparisons among 4 categorical variables: a - aggregate(y, list(var1, var2, var3, var4), sum) This gets me a very nice 2-dimensional data frame with one column per variable, BUT, as help for aggregate says, empty subsets are removed. I don't see in help(aggregate) how I can change this. In contrast, a - tapply(y, list(var1, var2, var3, var4), sum) gives me results for everything including empty subsets, but in an awkward 4-dimensional array that takes me another 10 lines of inefficient code to turn into a 2D data.frame. Is there a way to directly do this calculation INCLUDING results for empty subsets, and still obtain a 2D array, matrix, or data.frame? OR alternatively is there a simple way to mush the 4D result from the tapply into a 2D matrix/data.frame? thanks very much in advance for any help! -jlb -- Joseph P. LeBouton Forest Ecology PhD Candidate Department of Forestry Michigan State University East Lansing, Michigan 48824 Office phone: 517-355-7744 email: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aggregate vs tapply; is there a middle ground?
I faced a similar problem. Here's what I did tmp - data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10)) tmp1 - with(tmp,aggregate(C,list(A=A,B=B),sum)) tmp2 - expand.grid(A=sort(unique(tmp$A)),B=sort(unique(tmp$B))) merge(tmp2,tmp1,all.x=T) At least fewer than 10 extra lines of code. Anyone with a simpler solution? Cheers, Hans lebouton wrote: Dear all, I'm wanting to do a series of comparisons among 4 categorical variables: a - aggregate(y, list(var1, var2, var3, var4), sum) This gets me a very nice 2-dimensional data frame with one column per variable, BUT, as help for aggregate says, empty subsets are removed. I don't see in help(aggregate) how I can change this. In contrast, a - tapply(y, list(var1, var2, var3, var4), sum) gives me results for everything including empty subsets, but in an awkward 4-dimensional array that takes me another 10 lines of inefficient code to turn into a 2D data.frame. Is there a way to directly do this calculation INCLUDING results for empty subsets, and still obtain a 2D array, matrix, or data.frame? OR alternatively is there a simple way to mush the 4D result from the tapply into a 2D matrix/data.frame? thanks very much in advance for any help! -jlb -- Joseph P. LeBouton Forest Ecology PhD Candidate Department of Forestry Michigan State University East Lansing, Michigan 48824 Office phone: 517-355-7744 email: lebouton at msu.edu https://stat.ethz.ch/mailman/listinfo/r-help -- * Hans Gardfjell Ecology and Environmental Science Umeå University 90187 Umeå, Sweden email: [EMAIL PROTECTED] phone: +46 907865267 mobile: +46 705984464 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate vs tapply; is there a middle ground?
I faced a similar problem. Here's what I did tmp - data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10)) tmp1 - with(tmp,aggregate(C,list(A=A,B=B),sum)) tmp2 - expand.grid(A=sort(unique(tmp$A)),B=sort(unique(tmp$B))) merge(tmp2,tmp1,all.x=T) At least fewer than 10 extra lines of code. Anyone with a simpler solution? Well, you can almost do this in with the reshape package: tmp - data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10)) a - recast(tmp, A + B ~ ., sum) # see also recast(tmp, A ~ B, sum) add.all.combinations(a, row=A, cols = B) Where add.all.combinations basically does what you outlined above -- it would be easy enough to generalise to multiple dimensions. Hadley __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate vs tapply; is there a middle ground?
Thanks, Phil! I've literally spent two hours on my own trying to find something that does exactly that. Thanks for another pair of functions added to my (slowly!) growing R vocabulary. -jlb Phil Spector wrote: Joseph - I'm sure there are clearer and more efficient ways to do it, but here's something that seems to do what you want: z = tapply(y,list(var1,var2,var3,var4),sum) data.frame(do.call('expand.grid',dimnames(z)),y=do.call('rbind',as.list(z))) - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley [EMAIL PROTECTED] On Sat, 11 Feb 2006, Joseph LeBouton wrote: Dear all, I'm wanting to do a series of comparisons among 4 categorical variables: a - aggregate(y, list(var1, var2, var3, var4), sum) This gets me a very nice 2-dimensional data frame with one column per variable, BUT, as help for aggregate says, empty subsets are removed. I don't see in help(aggregate) how I can change this. In contrast, a - tapply(y, list(var1, var2, var3, var4), sum) gives me results for everything including empty subsets, but in an awkward 4-dimensional array that takes me another 10 lines of inefficient code to turn into a 2D data.frame. Is there a way to directly do this calculation INCLUDING results for empty subsets, and still obtain a 2D array, matrix, or data.frame? OR alternatively is there a simple way to mush the 4D result from the tapply into a 2D matrix/data.frame? thanks very much in advance for any help! -jlb -- Joseph P. LeBouton Forest Ecology PhD Candidate Department of Forestry Michigan State University East Lansing, Michigan 48824 Office phone: 517-355-7744 email: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Joseph P. LeBouton Forest Ecology PhD Candidate Department of Forestry Michigan State University East Lansing, Michigan 48824 Office phone: 517-355-7744 email: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate vs tapply; is there a middle ground?
hadley wickham [EMAIL PROTECTED] writes: I faced a similar problem. Here's what I did tmp - data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10)) tmp1 - with(tmp,aggregate(C,list(A=A,B=B),sum)) tmp2 - expand.grid(A=sort(unique(tmp$A)),B=sort(unique(tmp$B))) merge(tmp2,tmp1,all.x=T) At least fewer than 10 extra lines of code. Anyone with a simpler solution? Well, you can almost do this in with the reshape package: tmp - data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10)) a - recast(tmp, A + B ~ ., sum) # see also recast(tmp, A ~ B, sum) add.all.combinations(a, row=A, cols = B) Where add.all.combinations basically does what you outlined above -- it would be easy enough to generalise to multiple dimensions. Anything wrong with as.data.frame(with(tmp,as.table(tapply(C,list(A=A,B=B),sum A B Freq 1 A a NA 2 B a -0.2524320 3 C a 3.8539264 4 D a NA 5 A c 0.7227294 6 B c -0.2694669 7 C c 0.4760957 8 D c NA 9 A e NA 10 B e 0.1800500 11 C e NA 12 D e -1.0350928 (except the silly colname, responseName=sum should fix that). -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aggregate and ordered factors, feature?
Hi, aggregate() does not preserve the order of levels for ordered factors, e.g., levs - c(Low, Med, Hi) d - data.frame(x = 1:30, fac = ordered(rep(levs, 10), levels = levs)) out - aggregate(d[,x], by = list(fac=d$f), FUN = mean) cat(Original ordered levels:, levels(d$fac), \n) cat(Levels in aggregated output:, levels(out$fac), \n) Perhaps this is unintended? If intended, a note in its documentation could be helpful to alert users. ? version _ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status beta major2 minor2.1 year 2005 month12 day 18 svn rev 36792 language R -- David __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate slow with many rows - alternative?
Hi, Yesterday, I have analysed data with 16 rows and 10 columns. Aggregation would be impossible with a data frame format, but when converting it to a matrix with *numeric* entries (check, if the variables are of class numeric!) the computation needs only 7 seconds on a Pentium III. I´m sadly to say, that this is also slow in comparsion with the proc summary in SAS (less than one second), but the code is much more elegant in R! Best, Matthias Hi, I use the code below to aggregate / cnt my test data. It works fine, but the problem is with my real data (33'000 rows) where the function is really slow (nothing happened in half an hour). Does anybody know of other functions that I could use? Thanks, Hans-Peter -- dat - data.frame( Datum = c( 32586, 32587, 32587, 32625, 32656, 32656, 32656, 32672, 32672, 32699 ), FischerID = c( 58395, 58395, 58395, 88434, 89953, 89953, 89953, 64395, 62896, 62870 ), Anzahl = c( 2, 2, 1, 1, 2, 1, 7, 1, 1, 2 ) ) f - function(x) data.frame( Datum = x[1,1], FischerID = x[1,2], Anzahl = sum( x[,3] ), Cnt = dim( x )[1] ) t.a - do.call(rbind, by(dat, dat[,1:2], f)) # slow for 33'000 rows t.a - t.a[order( t.a[,1], t.a[,2] ),] # show data dat t.a __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate slow with many rows - alternative?
Here is the way that I would do it. Using 'lapply' to process the list and create a matrix; take less than 1 second: dat - data.frame(D=sample(32000:33000, 33000, T), + Fid=sample(1:10,33000,T), A=sample(1:5,33000,T)) system.time({ + result - lapply(split(seq(nrow(dat)), dat$D), function(.d){ # split by first level + lapply(split(.d, dat$Fid[.d]), function(.f){ # now by the second + # create the sum and count + c(D=dat$D[.f[1]], Fid=dat$Fid[.f[1]], sum=sum(dat$A[.f]), cnt=length(.f)) + }) + }) + mat - do.call('rbind',lapply(result, function(x) do.call('rbind',x))) + }) [1] 0.66 0.00 0.73 NA NA mat[1:20,] D Fid sum cnt 1 32000 1 8 3 2 32000 2 11 4 3 32000 3 11 3 4 32000 4 2 1 5 32000 5 8 2 6 32000 6 4 2 7 32000 7 21 6 8 32000 8 13 3 9 32000 9 12 4 10 32000 10 10 3 1 32001 1 12 4 2 32001 2 2 1 3 32001 3 10 4 4 32001 4 12 3 5 32001 5 10 3 6 32001 6 8 2 7 32001 7 22 7 8 32001 8 3 2 9 32001 9 7 3 10 32001 10 3 2 On 10/14/05, TEMPL Matthias [EMAIL PROTECTED] wrote: Hi, Yesterday, I have analysed data with 16 rows and 10 columns. Aggregation would be impossible with a data frame format, but when converting it to a matrix with *numeric* entries (check, if the variables are of class numeric!) the computation needs only 7 seconds on a Pentium III. I´m sadly to say, that this is also slow in comparsion with the proc summary in SAS (less than one second), but the code is much more elegant in R! Best, Matthias Hi, I use the code below to aggregate / cnt my test data. It works fine, but the problem is with my real data (33'000 rows) where the function is really slow (nothing happened in half an hour). Does anybody know of other functions that I could use? Thanks, Hans-Peter -- dat - data.frame( Datum = c( 32586, 32587, 32587, 32625, 32656, 32656, 32656, 32672, 32672, 32699 ), FischerID = c( 58395, 58395, 58395, 88434, 89953, 89953, 89953, 64395, 62896, 62870 ), Anzahl = c( 2, 2, 1, 1, 2, 1, 7, 1, 1, 2 ) ) f - function(x) data.frame( Datum = x[1,1], FischerID = x[1,2], Anzahl = sum( x[,3] ), Cnt = dim( x )[1] ) t.a - do.call(rbind, by(dat, dat[,1:2], f)) # slow for 33'000 rows t.a - t.a[order( t.a[,1], t.a[,2] ),] # show data dat t.a __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jim Holtman Cincinnati, OH +1 513 247 0281 What the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate slow with many rows - alternative?
Many thanks for all your answers. Converting to a matrix didn't help, I tried with Hmisc but didn't get anywhere (different summary functions, multiple levels). 2005/10/14, jim holtman [EMAIL PROTECTED]: Here is the way that I would do it. Using 'lapply' to process the list and create a matrix [snip] Wow! That's a wonderful suggestion, Your code works just fine with my data (takes 11 seconds). Thanks a lot, I couldn't have written such code (reading some help entries now...). Hans-Peter __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aggregate slow with many rows - alternative?
Hi, I use the code below to aggregate / cnt my test data. It works fine, but the problem is with my real data (33'000 rows) where the function is really slow (nothing happened in half an hour). Does anybody know of other functions that I could use? Thanks, Hans-Peter -- dat - data.frame( Datum = c( 32586, 32587, 32587, 32625, 32656, 32656, 32656, 32672, 32672, 32699 ), FischerID = c( 58395, 58395, 58395, 88434, 89953, 89953, 89953, 64395, 62896, 62870 ), Anzahl = c( 2, 2, 1, 1, 2, 1, 7, 1, 1, 2 ) ) f - function(x) data.frame( Datum = x[1,1], FischerID = x[1,2], Anzahl = sum( x[,3] ), Cnt = dim( x )[1] ) t.a - do.call(rbind, by(dat, dat[,1:2], f)) # slow for 33'000 rows t.a - t.a[order( t.a[,1], t.a[,2] ),] # show data dat t.a __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate slow with many rows - alternative?
Convert dat to a matrix and see if working with the matrix instead of a data frame speeds things up enough. On 10/13/05, Hans-Peter [EMAIL PROTECTED] wrote: Hi, I use the code below to aggregate / cnt my test data. It works fine, but the problem is with my real data (33'000 rows) where the function is really slow (nothing happened in half an hour). Does anybody know of other functions that I could use? Thanks, Hans-Peter -- dat - data.frame( Datum = c( 32586, 32587, 32587, 32625, 32656, 32656, 32656, 32672, 32672, 32699 ), FischerID = c( 58395, 58395, 58395, 88434, 89953, 89953, 89953, 64395, 62896, 62870 ), Anzahl = c( 2, 2, 1, 1, 2, 1, 7, 1, 1, 2 ) ) f - function(x) data.frame( Datum = x[1,1], FischerID = x[1,2], Anzahl = sum( x[,3] ), Cnt = dim( x )[1] ) t.a - do.call(rbind, by(dat, dat[,1:2], f)) # slow for 33'000 rows t.a - t.a[order( t.a[,1], t.a[,2] ),] # show data dat t.a __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate slow with many rows - alternative?
Gabor Grothendieck wrote: Convert dat to a matrix and see if working with the matrix instead of a data frame speeds things up enough. In the Hmisc package the asNumericMatrix and matrix2dataFrame functions facilite this. Also look at the summarize and mApply functions in Hmisc, which can be quite fast. Frank Harrell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aggregate
How can I aggregate this data.frame to list the min and max date for each unique id? From this : r = data.frame(id=rep(seq(1:3), 3), date= as.Date(c(rep(2005-08-25,3), rep(2005-08-26,3), rep(2005-08-29, 3)), %Y-%m-%d)) r id date 1 2005-08-25 2 2005-08-25 3 2005-08-25 1 2005-08-26 2 2005-08-26 3 2005-08-26 1 2005-08-29 2 2005-08-29 3 2005-08-29 I want to get to this: idstart end 12005-08-252005-08-29 22005-08-252005-08-29 32005-08-252005-08-29 I tried aggregate and aggregate.data.frame but the date column keeps getting converted into a number. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate
maybe you could use something like this: dat - data.frame(id = rep(1:3, 3), date = as.Date(rep(c(2005-08-25, 2005-08-26, 2005-08-29), each = 3))) do.call(rbind, lapply(split(dat, dat$id), function(x) data.frame(id = x$id[1], start = min(x$date), end = max(x$date I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/336899 Fax: +32/16/337015 Web: http://www.med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Omar Lakkis [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Tuesday, August 30, 2005 4:36 PM Subject: [R] aggregate How can I aggregate this data.frame to list the min and max date for each unique id? From this : r = data.frame(id=rep(seq(1:3), 3), date= as.Date(c(rep(2005-08-25,3), rep(2005-08-26,3), rep(2005-08-29, 3)), %Y-%m-%d)) r id date 1 2005-08-25 2 2005-08-25 3 2005-08-25 1 2005-08-26 2 2005-08-26 3 2005-08-26 1 2005-08-29 2 2005-08-29 3 2005-08-29 I want to get to this: idstart end 12005-08-252005-08-29 22005-08-252005-08-29 32005-08-252005-08-29 I tried aggregate and aggregate.data.frame but the date column keeps getting converted into a number. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aggregate?
Dear all: Here is my problem: Example data: dat-data.frame(x=rep(c(a,b,c,d),2),y=c(10:17)) If I wanted to aggregate each level of column dat$x I could use: aggregate(dat$y,list(x=dat$x),sum) But I just want to aggregate two levels (?c? and ?d?) to obtain a new level ?e? I am expecting something like: x y 1 a 10 2 b 11 3 e 25 4 a 14 5 b 15 6 e 33 How can I make it? Thanks in advance and best for all A. Diaz - Email Enviado utilizando o servio MegaMail __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate?
On 6/17/05, alex diaz [EMAIL PROTECTED] wrote: Dear all: Here is my problem: Example data: dat-data.frame(x=rep(c(a,b,c,d),2),y=c(10:17)) If I wanted to aggregate each level of column dat$x I could use: aggregate(dat$y,list(x=dat$x),sum) But I just want to aggregate two levels (c and d) to obtain a new level e I am expecting something like: x y 1 a 10 2 b 11 3 e 25 4 a 14 5 b 15 6 e 33 In the example - dat$y[3:4] are summed and - dat$y[7:8] are summed so we assume that what is being requested is that d is to be replaced by c and runs of any level are to be summed. To do that: - create xx such that a, b, c and d in dat$x are replaced with with 1, 2, 3 and 3 in xx. - in the second statement calculate a running sum except if the last observation was the same as the current observation then the Last Observation is Carried Forward (locf) so that all entries in a run have the same number. e.g. in this case locf is c(1, 2, 3, 3, 4, 5, 6, 6) - Finally the 'by' collapses dat using locf rbinds the resulting rows together to create a data frame. xx - ifelse(dat$x == d, 3, dat$x) locf - cumsum(c(TRUE, xx[-1] != xx[-length(xx)])) f - function(x) data.frame(x=x[1,1], y=sum(x[,2])) dat2 - do.call(rbind, by(dat, locf, f)) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aggregate and stack
Dear All, I have tried to calculate tree mean growth but I think the structure I used below (growthresumo) is not the most elegant, even though it worked. The only problem I had in this first part was that I cannot use 'summary', just 'mean' (sorry but 'R' is pretty new for me). growthresumo - aggregate(growth[,c(16,19,23,27,31,35,39,43,47,52,56,60,64,68,72,76,81,85,89,93,97,101,105,109,113,117,121,125,129,133,137, 141,145,149,153,157,161,165,169,173,177,181,185,189,194,197,201,205,209,213,217,221,225,229,233,237,241)], by=(growth[,c(3,8)]),MEAN,na.rm=TRUE) #after growth is calculated, I want to stack the results in just one colunm. growthvertical - c(growthresumo[,3],...,growthresumo[,50]) # this is very time consuming though Parcel - c(C9,S8...C9,S8) # 50 items date c(DATE1DATE50) growthpermonth - data.frame(Parcel, Date, growthvertical) Thank you very much! Paulo Paulo Brando Inst. de Pesquisa Ambiental da Amazônia (IPAM) Rua Rui Barbosa,136. 68.005.080 Santarém, PA, Brasil. Fone/Fax ++ 55 93 522 5538 www.ipam.org.br [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate and stack
Dear Paulo, On May 25, 2005, at 8:01 PM, Paulo Brando wrote: Dear All, I have tried to calculate tree mean growth but I think the structure I used below (growthresumo) is not the most elegant, even though it worked. The only problem I had in this first part was that I cannot use 'summary', just 'mean' (sorry but 'R' is pretty new for me). In case you didn't notice, help(aggregate) indicates that 'FUN' should be a scalar function, so summary won't work for that reason. growthresumo - aggregate(growth[,c(16,19,23,27,31,35,39,43,47,52,56,60,64,68,72,76,81 ,85,89,93,97,101,105,109,113,117,121,125,129,133,137, 141,145,149,153,157,161,165,169,173,177,181,185,189,194,197,201,205,209 ,213,217,221,225,229,233,237,241)], by=(growth[,c(3,8)]),MEAN,na.rm=TRUE) It's hard to know where 'growth' came from. Is it your own data.frame, or from a package? It's better to provide a reproducible or toy example (as you'll often read here). #after growth is calculated, I want to stack the results in just one colunm. growthvertical - c(growthresumo[,3],...,growthresumo[,50]) # this is very time consuming though This comes to my mind: as.vector(as.matrix(growthresumo[,3:50])) but look up the help on stack() because it's a very powerful tool that is aptly named (and might do everything you want). Parcel - c(C9,S8...C9,S8) # 50 items rep() could help with the above. date c(DATE1DATE50) paste() will help with this. growthpermonth - data.frame(Parcel, Date, growthvertical) Thank you very much! Good luck with R! Stephen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aggregate
I have a data frame of daily open, high, low and settle prices. How can I aggregate this data weekly? The data frame has five columns, the first is the date column and the rest are the prices. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] aggregate
Assuming dfr[day,o,h,l,c] and day like 2004-12-28: dt - strptime(as.character(dfr$day),format=%Y-%m-%d) + 0 wk - format(dt,%Yw%U) aggr - aggregate(list(dfr$o,dfr$h,dfr$l,dfr$c),list(wk),mean) colnames(aggr) - etc -Original Message- From: Omar Lakkis [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 11, 2005 3:45 PM To: r-help@stat.math.ethz.ch Subject: [R] aggregate I have a data frame of daily open, high, low and settle prices. How can I aggregate this data weekly? The data frame has five columns, the first is the date column and the rest are the prices. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate
In fact since you have dates and not datetimes use as.Date() instead of strptime(). On 5/11/05, bogdan romocea wrote: Assuming dfr[day,o,h,l,c] and day like 2004-12-28: dt - strptime(as.character(dfr$day),format=%Y-%m-%d) + 0 wk - format(dt,%Yw%U) aggr - aggregate(list(dfr$o,dfr$h,dfr$l,dfr$c),list(wk),mean) colnames(aggr) - etc -Original Message- From: Omar Lakkis [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 11, 2005 3:45 PM To: r-help@stat.math.ethz.ch Subject: [R] aggregate I have a data frame of daily open, high, low and settle prices. How can I aggregate this data weekly? The data frame has five columns, the first is the date column and the rest are the prices. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Aggregate lag
hello, Does anybody know how to aggregate a lag series ? when I try to use aggregate I get the following message try-ts(1:100,start=c(1985,1),freq=12) aggregate(try,4,mean,na.rm=T) Qtr1 Qtr2 Qtr3 Qtr4 1985258 11 1986 14 17 20 23 1987 26 29 32 35 1988 38 41 44 47 1989 50 53 56 59 1990 62 65 68 71 1991 74 77 80 83 1992 86 89 92 95 1993 98 aggregate(lag(try,-1),4,mean,na.rm=T) Error in rep.int(, start.pad) : invalid number of copies in rep() Matthieu __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Aggregate lag
On Tue, 10 May 2005 12:55:52 +0200 Matthieu Cornec wrote: hello, Does anybody know how to aggregate a lag series ? when I try to use aggregate I get the following message try-ts(1:100,start=c(1985,1),freq=12) aggregate(try,4,mean,na.rm=T) Qtr1 Qtr2 Qtr3 Qtr4 1985258 11 1986 14 17 20 23 1987 26 29 32 35 1988 38 41 44 47 1989 50 53 56 59 1990 62 65 68 71 1991 74 77 80 83 1992 86 89 92 95 1993 98 aggregate(lag(try,-1),4,mean,na.rm=T) Error in rep.int(, start.pad) : invalid number of copies in rep() The ts-method seems to expect full blocks of observations. Note, that also the last observation (100 in April 1993) is dropped from the aggregate call above. I'm not sure what is the recommended way to circumvent this problem with ts: probably, you have to do some padding with NAs yourself. Example: R x - ts(1:20,start=c(1990,1),freq=12) R aggregate(window(x, start = c(1990, 1), end = c(1991, 9), extend = TRUE), 4, mean, na.rm = TRUE) Qtr1 Qtr2 Qtr3 Qtr4 1990 2.0 5.0 8.0 11.0 1991 14.0 17.0 19.5 R aggregate(window(lag(x, k = -1), start = c(1990, 1), end = c(1991, 9), extend = TRUE), 4, mean, na.rm = TRUE) Qtr1 Qtr2 Qtr3 Qtr4 1990 1.5 4.0 7.0 10.0 1991 13.0 16.0 19.0 In zoo this can be done a bit easier: R z - zooreg(1:20, start = yearmon(1990), freq = 12) R aggregate(z, as.yearqtr(time(z)), mean) 1990 Q1 1990 Q2 1990 Q3 1990 Q4 1991 Q1 1991 Q2 1991 Q3 2.0 5.0 8.011.014.017.019.5 R aggregate(lag(z, k = -1), as.yearqtr(time(lag(z, -1))), mean) 1990 Q1 1990 Q2 1990 Q3 1990 Q4 1991 Q1 1991 Q2 1991 Q3 1.5 4.0 7.010.013.016.019.0 hth, Z Matthieu __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aggregate slow with variables of type 'dates' - how to solve
Dear all I use aggregate with variables of type numeric and dates. For type numeric functions, such as sum() are very fast, but similar simple functions, such as min() are much slower for the variables of type 'dates'. The difference gets bigger the larger the 'id' var is - but see this sample code: dts - dates(c(02/27/92, 02/27/92, 01/14/92, 02/28/92, 02/01/92)) ntimes - 70 dts - data.frame(rep(c(1:40), ntimes/8), chron(rep(dts, ntimes), format = c(dates = m/d/y)), rep(c(0.123, 0.245, 0.423, 0.634, 0.256), ntimes)) names(dts) - c(id, date, tbs) date() dat.1st - aggregate(dts$date, list(id = dts$id), min)$x dat.1st - chron(dat.1st, format = c(dates = m/d/y)) dat.1st date() #82 seconds date() tbs.s - aggregate(as.numeric(dts$tbs),list(id = dts$id), sum) tbs.s date() #17 seconds --- is it a problem of data-type 'dates' ? if yes, is there any solution to solve this, since for huge data-sets, this can be a problem... as I mentioned, e.g. if we have for variable 'id' eg just 5 levels, the two times are roughly the same, but with the 40 different ids, we have this big difference thanks a lot Christoph -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate slow with variables of type 'dates' - how to solve
On 4/15/05, Christoph Lehmann [EMAIL PROTECTED] wrote: Dear all I use aggregate with variables of type numeric and dates. For type numeric functions, such as sum() are very fast, but similar simple functions, such as min() are much slower for the variables of type 'dates'. The difference gets bigger the larger the 'id' var is - but see this sample code: dts - dates(c(02/27/92, 02/27/92, 01/14/92, 02/28/92, 02/01/92)) ntimes - 70 dts - data.frame(rep(c(1:40), ntimes/8), chron(rep(dts, ntimes), format = c(dates = m/d/y)), rep(c(0.123, 0.245, 0.423, 0.634, 0.256), ntimes)) names(dts) - c(id, date, tbs) date() dat.1st - aggregate(dts$date, list(id = dts$id), min)$x dat.1st - chron(dat.1st, format = c(dates = m/d/y)) dat.1st date() #82 seconds date() tbs.s - aggregate(as.numeric(dts$tbs),list(id = dts$id), sum) tbs.s date() #17 seconds --- is it a problem of data-type 'dates' ? if yes, is there any solution to solve this, since for huge data-sets, this can be a problem... as I mentioned, e.g. if we have for variable 'id' eg just 5 levels, the two times are roughly the same, but with the 40 different ids, we have this big difference thanks a lot Christoph -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate slow with variables of type 'dates' - how to solve
On 4/15/05, Christoph Lehmann [EMAIL PROTECTED] wrote: Dear all I use aggregate with variables of type numeric and dates. For type numeric functions, such as sum() are very fast, but similar simple functions, such as min() are much slower for the variables of type 'dates'. The difference gets bigger the larger the 'id' var is - but see this sample code: dts - dates(c(02/27/92, 02/27/92, 01/14/92, 02/28/92, 02/01/92)) ntimes - 70 dts - data.frame(rep(c(1:40), ntimes/8), chron(rep(dts, ntimes), format = c(dates = m/d/y)), rep(c(0.123, 0.245, 0.423, 0.634, 0.256), ntimes)) names(dts) - c(id, date, tbs) date() dat.1st - aggregate(dts$date, list(id = dts$id), min)$x dat.1st - chron(dat.1st, format = c(dates = m/d/y)) dat.1st date() #82 seconds date() tbs.s - aggregate(as.numeric(dts$tbs),list(id = dts$id), sum) tbs.s date() #17 seconds --- is it a problem of data-type 'dates' ? if yes, is there any solution to solve this, since for huge data-sets, this can be a problem... as I mentioned, e.g. if we have for variable 'id' eg just 5 levels, the two times are roughly the same, but with the 40 different ids, we have this big difference Just convert the dates to numeric first. You are converting them back anyways. system.time({ + dat.1st - chron(aggregate(dts$date, list(id = dts$id), min)$x) + }, TRUE) [1] 0.86 0.00 0.86 NA NA system.time({ + dat.1st.2 - chron(aggregate(as.numeric(dts$date), list(id = dts$id), min)$x) + }, TRUE) [1] 0.12 0.00 0.12 NA NA identical(dat.1st, dat.1st.2) [1] TRUE __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aggregate question...
R-folks, Is there a function, like aggregate, that allows users to bin values? I've got to break down a data frame into classes of 5cm (or something like it), and I only know how to do it using code like, signif - symnum( stems$dbh, corr = FALSE, na = FALSE, cutpoints = c(0,10,20,30,40,999), symbols = c(0,10,20,30,40) ) rt - data.frame( stems$expf, signif = ordered( signif, levels = c(0,10,20,30,40) ) st - aggregate( rt$stems.expf, by=list(signif), sum ) Is there a one line command to do this? -- Jeff D. Hamann Forest Informatics, Inc. PO Box 1421 Corvallis, Oregon 97339-1421 phone 541-754-1428 fax 541-752-0288 [EMAIL PROTECTED] http://www.forestinformatics.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate question...
On Thu, 2005-03-31 at 09:17 -0800, Jeff D. Hamann wrote: R-folks, Is there a function, like aggregate, that allows users to bin values? I've got to break down a data frame into classes of 5cm (or something like it), and I only know how to do it using code like, signif - symnum( stems$dbh, corr = FALSE, na = FALSE, cutpoints = c(0,10,20,30,40,999), symbols = c(0,10,20,30,40) ) rt - data.frame( stems$expf, signif = ordered( signif, levels = c(0,10,20,30,40) ) st - aggregate( rt$stems.expf, by=list(signif), sum ) Is there a one line command to do this? Jeff, Sometimes the notion of a single line command is in the eye of the beholder, since things can become easily obfuscated. However, something like the following could work: stems - data.frame(expf = 1:100, dbh = sample(1:500, 100, replace = TRUE)) st - aggregate(stems$expf, by=list(cut(stems$dbh, breaks = c(0, 10, 20, 30, 40, 999))), sum) st Group.1x 1 (0,10] 69 2 (10,20] 172 3 (20,30] 181 4 (30,40] 131 5 (40,999] 4497 Note that in the use of cut(), there are additional arguments relative to including or not including the left and/or right hand interval values in the respective intervals and what the labels should be. See ?cut for more information. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate()
On 6 Jan 2005 at 16:55, Karla Meurk wrote: Hi, some time ago I asked R-help about aggregating data as a result I was able to put together some code which includes the line rain.ag - aggregate(newdata, list(hod6=cut(mindata,6 hours)), mean, na.rm=T) I also want to aggregate daily, and 30 minutely etc. My question is why is it that I get answers with list(.. hours) but R cannot cope with list(..6 hours) or any other multiple. I have tried overcoming this using nfrequency= but to no avail Hi Karla aggregate(rnorm(100), list(weeks5 = cut(as.Date(2001/1/1) + 70*runif(100), 5 weeks)),mean) weeks5 x 1 2001-01-01 0.1272008 2 2001-02-05 0.1808671 This works as expected so you have some problems in your data and without giving more information what is mindata or what sort of answer you did get from above mentioned code nobody can help. Cheers Petr can someone help? Thanks Carla __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aggregate()
Hi, some time ago I asked R-help about aggregating data as a result I was able to put together some code which includes the line rain.ag - aggregate(newdata, list(hod6=cut(mindata,6 hours)), mean, na.rm=T) I also want to aggregate daily, and 30 minutely etc. My question is why is it that I get answers with list(.. hours) but R cannot cope with list(..6 hours) or any other multiple. I have tried overcoming this using nfrequency= but to no avail can someone help? Thanks Carla __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate()
Karla Meurk ksm32 at student.canterbury.ac.nz writes: : : Hi, some time ago I asked R-help about aggregating data as a result I : was able to put together some code which includes the line : : rain.ag - aggregate(newdata, list(hod6=cut(mindata,6 hours)), mean, : na.rm=T) : : I also want to aggregate daily, and 30 minutely etc. : : My question is why is it that I get answers with list(.. hours) but R : cannot cope with list(..6 hours) or any other multiple. I have tried : overcoming this using nfrequency= but to no avail : : can someone help? You need to provide a short reproducible example to illustrate your problem with an explanation of what you expect from the code. That means that someone can just copy the code from your posting and paste it into their session and see the exact same incorrect output or error that you got. If its not short you need to boil it down to something that is short before posting it. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aggregate and median
I am trying to use the function aggregate with the median function but I get the following error: Error in FUN(X[[1]], ...) : Argument INDEX When I replace median by mean, it works perfectly Can someone tell me where the problem comes from? Thx I am running R 2.0.0 on SunOS 5.9 -- Philippe Hupé UMR 144 - Service Bioinformatique Institut Curie Laboratoire de Transfert (4ème étage) 26 rue d'Ulm 75005 Paris - France Email : [EMAIL PROTECTED] Tél :+33 (0)1 44 32 42 75 Fax :+33 (0)1 42 34 65 28 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate and median
On 21 Dec 2004 at 12:46, Philippe Hup wrote: I am trying to use the function aggregate with the median function but I get the following error: Error in FUN(X[[1]], ...) : Argument INDEX When I replace median by mean, it works perfectly Hi Philippe I suppose that you have some typo in your aggregate construction or you redefined median. aggregate(x, list(rrr), mean) Group.1 x 11 -0.19455580 22 -0.06877719 33 -0.47657192 44 -0.41082682 55 1.27739323 66 1.15004620 77 -0.40064292 88 -0.02360514 99 -0.24954037 10 10 0.13480356 11 11 0.24179472 aggregate(x, list(rrr), median) Group.1 x 11 -0.19455580 22 -0.06877719 33 -0.47657192 44 -0.41082682 55 1.27739323 66 1.15004620 77 -0.40064292 88 -0.02360514 99 -0.24954037 10 10 0.13480356 11 11 0.24179472 Works for me as supposed. Or do you do something completely defferent? Cheers Petr Can someone tell me where the problem comes from? Thx I am running R 2.0.0 on SunOS 5.9 -- Philippe Hup UMR 144 - Service Bioinformatique Institut Curie Laboratoire de Transfert (4me tage) 26 rue d'Ulm 75005 Paris - France Email : [EMAIL PROTECTED] Tl : +33 (0)1 44 32 42 75 Fax : +33 (0)1 42 34 65 28 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Petr Pikal [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aggregate function
Hi all, I have the folowing frame(there are more columns than shown), 1 2 34 5 Year Total TusWhi Norw 1994 1.00 1830 0 355 1995 1.00 0 00 1995 1.00 0 00 1995 1.00 49104280 695 1997 1.00 0 0 110 1997 0.58 0 00 1997 1.00 0 00 1994 1.00 0 00 1997 1.00 0 40 70 1998 1.00 0 0 1252 1999 1.04 0 740 1999 1.00 0 00 1999 1.02 0 00 1999 1.00 0 00 1999 1.00 0 0 171 1999 1.00 1794 0 229 1999 1.00 035250 1997 1.00 13351185 147 1997 1.00 49251057 4801 1997 1.00 06275 1773 I try to get sum(Total) by Year in which Tus0, sum(Total) by Year in which Whi0,,,and so on. I have done something like this; a-as.list(numeric(3)) for (i in 3:5) { a[[i]]-aggregate(frame[,Total],list(Year=frame$Year, Tus=frame$i0),sum) } But I get Error in FUN(X[[as.integer(1)]], ...) : arguments must have same length Also by doing one by one aggregate(frame[,Total],list(Year=frame$Year, Tus=frame$Tus0),sum) The result is something like; Year Tus x 1994 FALSE 49.69 1995 FALSE 49.35 1996 FALSE 56.95 1997 FALSE 57.00 1998 FALSE 57.00 1999 FALSE 58.09 2000 FALSE 56.97 2001 FALSE 57.95 2002 FALSE 57.10 2003 FALSE 56.16 2000 TRUE 1.00 2002 TRUE 1.00 2003 TRUE 2.01 Help Thank you Luis Ridao Cruz Fiskirannsóknarstovan Nóatún 1 P.O. Box 3051 FR-110 Tórshavn Faroe Islands Phone: +298 353900 Phone(direct): +298 353912 Mobile: +298 580800 Fax: +298 353901 E-mail: [EMAIL PROTECTED] Web:www.frs.fo __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate function
Hi, # x ... your frame attach(x) sum(Total[Year==1997 Tus 0]) I hope this helps Best, Matthias -Ursprüngliche Nachricht- Von: Luis Rideau Cruz [mailto:[EMAIL PROTECTED] Gesendet: Montag, 26. Juli 2004 14:52 An: [EMAIL PROTECTED] Betreff: [R] aggregate function Hi all, I have the folowing frame(there are more columns than shown), 1 2 34 5 Year Total TusWhi Norw 1994 1.00 1830 0 355 1995 1.00 0 00 1995 1.00 0 00 1995 1.00 49104280 695 1997 1.00 0 0 110 1997 0.58 0 00 1997 1.00 0 00 1994 1.00 0 00 1997 1.00 0 40 70 1998 1.00 0 0 1252 1999 1.04 0 740 1999 1.00 0 00 1999 1.02 0 00 1999 1.00 0 00 1999 1.00 0 0 171 1999 1.00 1794 0 229 1999 1.00 035250 1997 1.00 13351185 147 1997 1.00 49251057 4801 1997 1.00 06275 1773 I try to get sum(Total) by Year in which Tus0, sum(Total) by Year in which Whi0,,,and so on. I have done something like this; a-as.list(numeric(3)) for (i in 3:5) { a[[i]]-aggregate(frame[,Total],list(Year=frame$Year, Tus=frame$i0),sum) } But I get Error in FUN(X[[as.integer(1)]], ...) : arguments must have same length Also by doing one by one aggregate(frame[,Total],list(Year=frame$Year, Tus=frame$Tus0),sum) The result is something like; Year Tus x 1994 FALSE 49.69 1995 FALSE 49.35 1996 FALSE 56.95 1997 FALSE 57.00 1998 FALSE 57.00 1999 FALSE 58.09 2000 FALSE 56.97 2001 FALSE 57.95 2002 FALSE 57.10 2003 FALSE 56.16 2000 TRUE 1.00 2002 TRUE 1.00 2003 TRUE 2.01 Help Thank you Luis Ridao Cruz Fiskirannsóknarstovan Nóatún 1 P.O. Box 3051 FR-110 Tórshavn Faroe Islands Phone: +298 353900 Phone(direct): +298 353912 Mobile: +298 580800 Fax: +298 353901 E-mail: [EMAIL PROTECTED] Web:www.frs.fo __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo /r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregate function
Hi, # x ... your frame attach(x) sum(Total[Year==1997 Tus 0]) I hope this helps Best, Matthias Templ -Ursprüngliche Nachricht- Von: Luis Rideau Cruz [mailto:[EMAIL PROTECTED] Gesendet: Montag, 26. Juli 2004 14:52 An: [EMAIL PROTECTED] Betreff: [R] aggregate function Hi all, I have the folowing frame(there are more columns than shown), 1 2 34 5 Year Total TusWhi Norw 1994 1.00 1830 0 355 1995 1.00 0 00 1995 1.00 0 00 1995 1.00 49104280 695 1997 1.00 0 0 110 1997 0.58 0 00 1997 1.00 0 00 1994 1.00 0 00 1997 1.00 0 40 70 1998 1.00 0 0 1252 1999 1.04 0 740 1999 1.00 0 00 1999 1.02 0 00 1999 1.00 0 00 1999 1.00 0 0 171 1999 1.00 1794 0 229 1999 1.00 035250 1997 1.00 13351185 147 1997 1.00 49251057 4801 1997 1.00 06275 1773 I try to get sum(Total) by Year in which Tus0, sum(Total) by Year in which Whi0,,,and so on. I have done something like this; a-as.list(numeric(3)) for (i in 3:5) { a[[i]]-aggregate(frame[,Total],list(Year=frame$Year, Tus=frame$i0),sum) } But I get Error in FUN(X[[as.integer(1)]], ...) : arguments must have same length Also by doing one by one aggregate(frame[,Total],list(Year=frame$Year, Tus=frame$Tus0),sum) The result is something like; Year Tus x 1994 FALSE 49.69 1995 FALSE 49.35 1996 FALSE 56.95 1997 FALSE 57.00 1998 FALSE 57.00 1999 FALSE 58.09 2000 FALSE 56.97 2001 FALSE 57.95 2002 FALSE 57.10 2003 FALSE 56.16 2000 TRUE 1.00 2002 TRUE 1.00 2003 TRUE 2.01 Help Thank you Luis Ridao Cruz Fiskirannsóknarstovan Nóatún 1 P.O. Box 3051 FR-110 Tórshavn Faroe Islands Phone: +298 353900 Phone(direct): +298 353912 Mobile: +298 580800 Fax: +298 353901 E-mail: [EMAIL PROTECTED] Web:www.frs.fo __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo /r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] aggregate function
I would try something like: lapply(frame[3:5], function(i) tapply(frame$Total[i0], frame$Year[i0], sum)) $Tus 1994 1995 1997 1999 1121 $Whi 1995 1997 1999 1.00 4.00 2.04 $Norw 1994 1995 1997 1998 1999 11512 HTH, Andy From: Luis Rideau Cruz Hi all, I have the folowing frame(there are more columns than shown), 1 2 34 5 Year Total TusWhi Norw 1994 1.00 1830 0 355 1995 1.00 0 00 1995 1.00 0 00 1995 1.00 49104280 695 1997 1.00 0 0 110 1997 0.58 0 00 1997 1.00 0 00 1994 1.00 0 00 1997 1.00 0 40 70 1998 1.00 0 0 1252 1999 1.04 0 740 1999 1.00 0 00 1999 1.02 0 00 1999 1.00 0 00 1999 1.00 0 0 171 1999 1.00 1794 0 229 1999 1.00 035250 1997 1.00 13351185 147 1997 1.00 49251057 4801 1997 1.00 06275 1773 I try to get sum(Total) by Year in which Tus0, sum(Total) by Year in which Whi0,,,and so on. I have done something like this; a-as.list(numeric(3)) for (i in 3:5) { a[[i]]-aggregate(frame[,Total],list(Year=frame$Year, Tus=frame$i0),sum) } But I get Error in FUN(X[[as.integer(1)]], ...) : arguments must have same length Also by doing one by one aggregate(frame[,Total],list(Year=frame$Year, Tus=frame$Tus0),sum) The result is something like; Year Tus x 1994 FALSE 49.69 1995 FALSE 49.35 1996 FALSE 56.95 1997 FALSE 57.00 1998 FALSE 57.00 1999 FALSE 58.09 2000 FALSE 56.97 2001 FALSE 57.95 2002 FALSE 57.10 2003 FALSE 56.16 2000 TRUE 1.00 2002 TRUE 1.00 2003 TRUE 2.01 Help Thank you Luis Ridao Cruz Fiskirannsóknarstovan Nóatún 1 P.O. Box 3051 FR-110 Tórshavn Faroe Islands Phone: +298 353900 Phone(direct): +298 353912 Mobile: +298 580800 Fax: +298 353901 E-mail: [EMAIL PROTECTED] Web:www.frs.fo __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] aggregate function
[Sorry if this gets posted twice. I have been having some problems with gmane posting.] We can use rowsum like this: rowsum(frame$Total * (frame[,3:5]0), frame$Year) Tus Whi Norw 1994 1 0.00 1 1995 1 1.00 1 1997 2 4.00 5 1998 0 0.00 1 1999 1 2.04 2 Note that only years that are actually present will be in the resulting matrix. 1996 is not in the sample data in your post so there is no row for 1996. If that's not a problem or if your real data covers all the years anyways we are done. If missing years is a problem then merge in some zero rows with the years first. The first two lines below do this and the third line is the same as the line above: frame - merge(frame, 1994:1999, by = 1, all = TRUE) frame[is.na(frame)] - 0 rowsum(frame$Total * (frame[,3:5]0), frame$Year) Tus Whi Norw 1994 1 0.00 1 1995 1 1.00 1 1996 0 0.00 0 -- now we have a row for 1996 1997 2 4.00 5 1998 0 0.00 1 1999 1 2.04 2 Luis Rideau Cruz [EMAIL PROTECTED] : I have the folowing frame(there are more columns than shown), 1 2 3 4 5 Year Total Tus Whi Norw 1994 1.00 1830 0 355 1995 1.00 0 0 0 1995 1.00 0 0 0 1995 1.00 4910 4280 695 1997 1.00 0 0 110 1997 0.58 0 0 0 1997 1.00 0 0 0 1994 1.00 0 0 0 1997 1.00 0 40 70 1998 1.00 0 0 1252 1999 1.04 0 74 0 1999 1.00 0 0 0 1999 1.02 0 0 0 1999 1.00 0 0 0 1999 1.00 0 0 171 1999 1.00 1794 0 229 1999 1.00 0 3525 0 1997 1.00 1335 1185 147 1997 1.00 4925 1057 4801 1997 1.00 0 6275 1773 I try to get sum(Total) by Year in which Tus0, sum(Total) by Year in which Whi0,,,and so on. I have done something like this; a-as.list(numeric(3)) for (i in 3:5) { a[[i]]-aggregate(frame[,Total],list(Year=frame$Year, Tus=frame$i0),sum) } But I get Error in FUN(X[[as.integer(1)]], ...) : arguments must have same length Also by doing one by one aggregate(frame[,Total],list(Year=frame$Year, Tus=frame$Tus0),sum) The result is something like; Year Tus x 1994 FALSE 49.69 1995 FALSE 49.35 1996 FALSE 56.95 1997 FALSE 57.00 1998 FALSE 57.00 1999 FALSE 58.09 2000 FALSE 56.97 2001 FALSE 57.95 2002 FALSE 57.10 2003 FALSE 56.16 2000 TRUE 1.00 2002 TRUE 1.00 2003 TRUE 2.01 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Aggregate rows to see the number of occurences
Hi, I have a set of data like the following: [,1] [,2] [1,] 102 [2,]70 [3,]10 [4,]10 [5,] 150 [6,] 174 [7,]40 [8,] 198 [9,] 102 [10,] 195 I'd like to aggregate it in order to obtain the frequency (the number of occurences) for each couple of values (e.g.: (10,2) appears twice, (7,0) appears once). Something cool would be to have this value in a third column... I've been looking at aggregate() but either I couldn't get the right parameters, or this is not the right tool to use... Thank's for any help ! -- Nicolas STRANSKY Équipe Oncologie Moléculaire Institut Curie - UMR 144 - CNRS Tel : +33 1 42 34 63 40 26, rue d'Ulm - 75248 Paris Cedex 5 - FRANCEFax : +33 1 42 34 63 49 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html