[R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))
Dear all, Lets say I have the following data frame: set.seed(1) col1 - c(rep('happy',9), rep('sad', 9)) col2 - rep(c(rep('alpha', 3), rep('beta', 3), rep('gamma', 3)),2) dates - as.Date(rep(c('2009-10-13', '2009-10-14', '2009-10-15'),6)) score=rnorm(18, 10, 3) df1-data.frame(col1=col1, col2=col2, Date=dates, score=score) col1 col2 Date score 1 happy alpha 2009-10-13 8.120639 2 happy alpha 2009-10-14 10.550930 3 happy alpha 2009-10-15 7.493114 4 happy beta 2009-10-13 14.785842 5 happy beta 2009-10-14 10.988523 6 happy beta 2009-10-15 7.538595 7 happy gamma 2009-10-13 11.462287 8 happy gamma 2009-10-14 12.214974 9 happy gamma 2009-10-15 11.727344 10 sad alpha 2009-10-13 9.083835 11 sad alpha 2009-10-14 14.535344 12 sad alpha 2009-10-15 11.169530 13 sad beta 2009-10-13 8.136278 14 sad beta 2009-10-14 3.355900 15 sad beta 2009-10-15 13.374793 16 sad gamma 2009-10-13 9.865199 17 sad gamma 2009-10-14 9.951429 18 sad gamma 2009-10-15 12.831509 Is it possible to get the following, whereby I am averaging the values within each group of values in col2: col1 col2 Date score Average 1 happy alpha 13/10/2009 8.120639 8.721561 2 happy alpha 14/10/2009 10.550930 8.721561 3 happy alpha 15/10/2009 7.493114 8.721561 4 happy beta 13/10/2009 14.785842 11.104320 5 happy beta 14/10/2009 10.988523 11.104320 6 happy beta 15/10/2009 7.538595 11.104320 7 happy gamma 13/10/2009 11.462287 11.801535 8 happy gamma 14/10/2009 12.214974 11.801535 9 happy gamma 15/10/2009 11.727344 11.801535 10 sad alpha 13/10/2009 9.083835 11.596236 11 sad alpha 14/10/2009 14.535344 11.596236 12 sad alpha 15/10/2009 11.169530 11.596236 13 sad beta 13/10/2009 8.136278 8.288990 14 sad beta 14/10/2009 3.355900 8.288990 15 sad beta 15/10/2009 13.374793 8.288990 16 sad gamma 13/10/2009 9.865199 10.882712 17 sad gamma 14/10/2009 9.951429 10.882712 18 sad gamma 15/10/2009 12.831509 10.882712 My feeling is that I should be using the ?aggregate is some fashion but can't see how to do it. Or possibly there's another function i should be using? Thanks in advance, Tony O/S: Windows Vista Ultimate sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. 1252;LC_MONETARY=English_United Kingdom. 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))
aves = aggregate(df1$score, by=list(col1=df1$col1, col2=df1$col2), mean) results = merge(df1, aves) b On Oct 21, 2009, at 9:03 AM, Tony Breyal wrote: Dear all, Lets say I have the following data frame: set.seed(1) col1 - c(rep('happy',9), rep('sad', 9)) col2 - rep(c(rep('alpha', 3), rep('beta', 3), rep('gamma', 3)),2) dates - as.Date(rep(c('2009-10-13', '2009-10-14', '2009-10-15'),6)) score=rnorm(18, 10, 3) df1-data.frame(col1=col1, col2=col2, Date=dates, score=score) col1 col2 Date score 1 happy alpha 2009-10-13 8.120639 2 happy alpha 2009-10-14 10.550930 3 happy alpha 2009-10-15 7.493114 4 happy beta 2009-10-13 14.785842 5 happy beta 2009-10-14 10.988523 6 happy beta 2009-10-15 7.538595 7 happy gamma 2009-10-13 11.462287 8 happy gamma 2009-10-14 12.214974 9 happy gamma 2009-10-15 11.727344 10 sad alpha 2009-10-13 9.083835 11 sad alpha 2009-10-14 14.535344 12 sad alpha 2009-10-15 11.169530 13 sad beta 2009-10-13 8.136278 14 sad beta 2009-10-14 3.355900 15 sad beta 2009-10-15 13.374793 16 sad gamma 2009-10-13 9.865199 17 sad gamma 2009-10-14 9.951429 18 sad gamma 2009-10-15 12.831509 Is it possible to get the following, whereby I am averaging the values within each group of values in col2: col1 col2 Date score Average 1 happy alpha 13/10/2009 8.120639 8.721561 2 happy alpha 14/10/2009 10.550930 8.721561 3 happy alpha 15/10/2009 7.493114 8.721561 4 happy beta 13/10/2009 14.785842 11.104320 5 happy beta 14/10/2009 10.988523 11.104320 6 happy beta 15/10/2009 7.538595 11.104320 7 happy gamma 13/10/2009 11.462287 11.801535 8 happy gamma 14/10/2009 12.214974 11.801535 9 happy gamma 15/10/2009 11.727344 11.801535 10 sad alpha 13/10/2009 9.083835 11.596236 11 sad alpha 14/10/2009 14.535344 11.596236 12 sad alpha 15/10/2009 11.169530 11.596236 13 sad beta 13/10/2009 8.136278 8.288990 14 sad beta 14/10/2009 3.355900 8.288990 15 sad beta 15/10/2009 13.374793 8.288990 16 sad gamma 13/10/2009 9.865199 10.882712 17 sad gamma 14/10/2009 9.951429 10.882712 18 sad gamma 15/10/2009 12.831509 10.882712 My feeling is that I should be using the ?aggregate is some fashion but can't see how to do it. Or possibly there's another function i should be using? Thanks in advance, Tony O/S: Windows Vista Ultimate sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. 1252;LC_MONETARY=English_United Kingdom. 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))
On 10/21/2009 7:03 AM, Tony Breyal wrote: Dear all, Lets say I have the following data frame: set.seed(1) col1 - c(rep('happy',9), rep('sad', 9)) col2 - rep(c(rep('alpha', 3), rep('beta', 3), rep('gamma', 3)),2) dates - as.Date(rep(c('2009-10-13', '2009-10-14', '2009-10-15'),6)) score=rnorm(18, 10, 3) df1-data.frame(col1=col1, col2=col2, Date=dates, score=score) col1 col2 Date score 1 happy alpha 2009-10-13 8.120639 2 happy alpha 2009-10-14 10.550930 3 happy alpha 2009-10-15 7.493114 4 happy beta 2009-10-13 14.785842 5 happy beta 2009-10-14 10.988523 6 happy beta 2009-10-15 7.538595 7 happy gamma 2009-10-13 11.462287 8 happy gamma 2009-10-14 12.214974 9 happy gamma 2009-10-15 11.727344 10 sad alpha 2009-10-13 9.083835 11 sad alpha 2009-10-14 14.535344 12 sad alpha 2009-10-15 11.169530 13 sad beta 2009-10-13 8.136278 14 sad beta 2009-10-14 3.355900 15 sad beta 2009-10-15 13.374793 16 sad gamma 2009-10-13 9.865199 17 sad gamma 2009-10-14 9.951429 18 sad gamma 2009-10-15 12.831509 Is it possible to get the following, whereby I am averaging the values within each group of values in col2: col1 col2 Date score Average 1 happy alpha 13/10/2009 8.120639 8.721561 2 happy alpha 14/10/2009 10.550930 8.721561 3 happy alpha 15/10/2009 7.493114 8.721561 4 happy beta 13/10/2009 14.785842 11.104320 5 happy beta 14/10/2009 10.988523 11.104320 6 happy beta 15/10/2009 7.538595 11.104320 7 happy gamma 13/10/2009 11.462287 11.801535 8 happy gamma 14/10/2009 12.214974 11.801535 9 happy gamma 15/10/2009 11.727344 11.801535 10 sad alpha 13/10/2009 9.083835 11.596236 11 sad alpha 14/10/2009 14.535344 11.596236 12 sad alpha 15/10/2009 11.169530 11.596236 13 sad beta 13/10/2009 8.136278 8.288990 14 sad beta 14/10/2009 3.355900 8.288990 15 sad beta 15/10/2009 13.374793 8.288990 16 sad gamma 13/10/2009 9.865199 10.882712 17 sad gamma 14/10/2009 9.951429 10.882712 18 sad gamma 15/10/2009 12.831509 10.882712 My feeling is that I should be using the ?aggregate is some fashion but can't see how to do it. Or possibly there's another function i should be using? ?ave For example, try something like this: transform(df1, Average = ave(score, col1, col2)) Thanks in advance, Tony O/S: Windows Vista Ultimate sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. 1252;LC_MONETARY=English_United Kingdom. 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))
In article 800acfc0-2c3c-41f1-af18-3b52f7e43...@jhsph.edu, bcarv...@jhsph.edu says... aves = aggregate(df1$score, by=list(col1=df1$col1, col2=df1$col2), mean) results = merge(df1, aves) Or, with the 'plyr' package, which has a very nice syntax: library(plyr) ddply(df1, .(col1, col2), transform, Average=mean(score)) It may be a bit slow for very large datasets, though. Here's an alternative, which will be as fast as the aggregate solution. within(df1, { Average=ave(score, col1, col2, FUN=mean) } ) Which one you use is a matter of taste. And of course, the 'within' function is not the important part here; 'ave' is. For example, if you have attached your data frame, you just have to type Average=ave(score, col1, col2, FUN=mean) -- Karl Ove Hufthammer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))
Thank you all for your responses, i have now achieved the desired output for my own real data using your suggestions. I will also have to look into this 'plyr' package as i have noticed that it gets mentioned a lot. On 21 Oct, 13:33, Karl Ove Hufthammer k...@huftis.org wrote: In article 800acfc0-2c3c-41f1-af18-3b52f7e43...@jhsph.edu, bcarv...@jhsph.edu says... aves = aggregate(df1$score, by=list(col1=df1$col1, col2=df1$col2), mean) results = merge(df1, aves) Or, with the 'plyr' package, which has a very nice syntax: library(plyr) ddply(df1, .(col1, col2), transform, Average=mean(score)) It may be a bit slow for very large datasets, though. Here's an alternative, which will be as fast as the aggregate solution. within(df1, { Average=ave(score, col1, col2, FUN=mean) } ) Which one you use is a matter of taste. And of course, the 'within' function is not the important part here; 'ave' is. For example, if you have attached your data frame, you just have to type Average=ave(score, col1, col2, FUN=mean) -- Karl Ove Hufthammer __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.