[R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))

2009-10-21 Thread Tony Breyal
Dear all,

Lets say I have the following data frame:

 set.seed(1)
 col1 - c(rep('happy',9), rep('sad', 9))
 col2 - rep(c(rep('alpha', 3), rep('beta', 3), rep('gamma', 3)),2)
 dates - as.Date(rep(c('2009-10-13', '2009-10-14', '2009-10-15'),6))
 score=rnorm(18, 10, 3)
 df1-data.frame(col1=col1, col2=col2, Date=dates, score=score)

col1  col2   Date score
1  happy alpha 2009-10-13  8.120639
2  happy alpha 2009-10-14 10.550930
3  happy alpha 2009-10-15  7.493114
4  happy  beta 2009-10-13 14.785842
5  happy  beta 2009-10-14 10.988523
6  happy  beta 2009-10-15  7.538595
7  happy gamma 2009-10-13 11.462287
8  happy gamma 2009-10-14 12.214974
9  happy gamma 2009-10-15 11.727344
10   sad alpha 2009-10-13  9.083835
11   sad alpha 2009-10-14 14.535344
12   sad alpha 2009-10-15 11.169530
13   sad  beta 2009-10-13  8.136278
14   sad  beta 2009-10-14  3.355900
15   sad  beta 2009-10-15 13.374793
16   sad gamma 2009-10-13  9.865199
17   sad gamma 2009-10-14  9.951429
18   sad gamma 2009-10-15 12.831509


Is it possible to get the following, whereby I am averaging the values
within each group of values in col2:

col1  col2   Date score   Average
1  happy alpha 13/10/2009  8.120639  8.721561
2  happy alpha 14/10/2009 10.550930  8.721561
3  happy alpha 15/10/2009  7.493114  8.721561
4  happy  beta 13/10/2009 14.785842 11.104320
5  happy  beta 14/10/2009 10.988523 11.104320
6  happy  beta 15/10/2009  7.538595 11.104320
7  happy gamma 13/10/2009 11.462287 11.801535
8  happy gamma 14/10/2009 12.214974 11.801535
9  happy gamma 15/10/2009 11.727344 11.801535
10   sad alpha 13/10/2009  9.083835 11.596236
11   sad alpha 14/10/2009 14.535344 11.596236
12   sad alpha 15/10/2009 11.169530 11.596236
13   sad  beta 13/10/2009  8.136278  8.288990
14   sad  beta 14/10/2009  3.355900  8.288990
15   sad  beta 15/10/2009 13.374793  8.288990
16   sad gamma 13/10/2009  9.865199 10.882712
17   sad gamma 14/10/2009  9.951429 10.882712
18   sad gamma 15/10/2009 12.831509 10.882712


My feeling is that I should be using the ?aggregate is some fashion
but can't see how to do it. Or possibly there's another function i
should be using?

Thanks in advance,
Tony

O/S: Windows Vista Ultimate
 sessionInfo()
R version 2.9.2 (2009-08-24)
i386-pc-mingw32

locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.
1252;LC_MONETARY=English_United Kingdom.
1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))

2009-10-21 Thread Benilton Carvalho

aves = aggregate(df1$score, by=list(col1=df1$col1, col2=df1$col2), mean)
results = merge(df1, aves)

b

On Oct 21, 2009, at 9:03 AM, Tony Breyal wrote:


Dear all,

Lets say I have the following data frame:


set.seed(1)
col1 - c(rep('happy',9), rep('sad', 9))
col2 - rep(c(rep('alpha', 3), rep('beta', 3), rep('gamma', 3)),2)
dates - as.Date(rep(c('2009-10-13', '2009-10-14', '2009-10-15'),6))
score=rnorm(18, 10, 3)
df1-data.frame(col1=col1, col2=col2, Date=dates, score=score)


   col1  col2   Date score
1  happy alpha 2009-10-13  8.120639
2  happy alpha 2009-10-14 10.550930
3  happy alpha 2009-10-15  7.493114
4  happy  beta 2009-10-13 14.785842
5  happy  beta 2009-10-14 10.988523
6  happy  beta 2009-10-15  7.538595
7  happy gamma 2009-10-13 11.462287
8  happy gamma 2009-10-14 12.214974
9  happy gamma 2009-10-15 11.727344
10   sad alpha 2009-10-13  9.083835
11   sad alpha 2009-10-14 14.535344
12   sad alpha 2009-10-15 11.169530
13   sad  beta 2009-10-13  8.136278
14   sad  beta 2009-10-14  3.355900
15   sad  beta 2009-10-15 13.374793
16   sad gamma 2009-10-13  9.865199
17   sad gamma 2009-10-14  9.951429
18   sad gamma 2009-10-15 12.831509


Is it possible to get the following, whereby I am averaging the values
within each group of values in col2:

   col1  col2   Date score   Average
1  happy alpha 13/10/2009  8.120639  8.721561
2  happy alpha 14/10/2009 10.550930  8.721561
3  happy alpha 15/10/2009  7.493114  8.721561
4  happy  beta 13/10/2009 14.785842 11.104320
5  happy  beta 14/10/2009 10.988523 11.104320
6  happy  beta 15/10/2009  7.538595 11.104320
7  happy gamma 13/10/2009 11.462287 11.801535
8  happy gamma 14/10/2009 12.214974 11.801535
9  happy gamma 15/10/2009 11.727344 11.801535
10   sad alpha 13/10/2009  9.083835 11.596236
11   sad alpha 14/10/2009 14.535344 11.596236
12   sad alpha 15/10/2009 11.169530 11.596236
13   sad  beta 13/10/2009  8.136278  8.288990
14   sad  beta 14/10/2009  3.355900  8.288990
15   sad  beta 15/10/2009 13.374793  8.288990
16   sad gamma 13/10/2009  9.865199 10.882712
17   sad gamma 14/10/2009  9.951429 10.882712
18   sad gamma 15/10/2009 12.831509 10.882712


My feeling is that I should be using the ?aggregate is some fashion
but can't see how to do it. Or possibly there's another function i
should be using?

Thanks in advance,
Tony

O/S: Windows Vista Ultimate

sessionInfo()

R version 2.9.2 (2009-08-24)
i386-pc-mingw32

locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United  
Kingdom.

1252;LC_MONETARY=English_United Kingdom.
1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))

2009-10-21 Thread Chuck Cleland
On 10/21/2009 7:03 AM, Tony Breyal wrote:
 Dear all,
 
 Lets say I have the following data frame:
 
 set.seed(1)
 col1 - c(rep('happy',9), rep('sad', 9))
 col2 - rep(c(rep('alpha', 3), rep('beta', 3), rep('gamma', 3)),2)
 dates - as.Date(rep(c('2009-10-13', '2009-10-14', '2009-10-15'),6))
 score=rnorm(18, 10, 3)
 df1-data.frame(col1=col1, col2=col2, Date=dates, score=score)
 
 col1  col2   Date score
 1  happy alpha 2009-10-13  8.120639
 2  happy alpha 2009-10-14 10.550930
 3  happy alpha 2009-10-15  7.493114
 4  happy  beta 2009-10-13 14.785842
 5  happy  beta 2009-10-14 10.988523
 6  happy  beta 2009-10-15  7.538595
 7  happy gamma 2009-10-13 11.462287
 8  happy gamma 2009-10-14 12.214974
 9  happy gamma 2009-10-15 11.727344
 10   sad alpha 2009-10-13  9.083835
 11   sad alpha 2009-10-14 14.535344
 12   sad alpha 2009-10-15 11.169530
 13   sad  beta 2009-10-13  8.136278
 14   sad  beta 2009-10-14  3.355900
 15   sad  beta 2009-10-15 13.374793
 16   sad gamma 2009-10-13  9.865199
 17   sad gamma 2009-10-14  9.951429
 18   sad gamma 2009-10-15 12.831509
 
 
 Is it possible to get the following, whereby I am averaging the values
 within each group of values in col2:
 
 col1  col2   Date score   Average
 1  happy alpha 13/10/2009  8.120639  8.721561
 2  happy alpha 14/10/2009 10.550930  8.721561
 3  happy alpha 15/10/2009  7.493114  8.721561
 4  happy  beta 13/10/2009 14.785842 11.104320
 5  happy  beta 14/10/2009 10.988523 11.104320
 6  happy  beta 15/10/2009  7.538595 11.104320
 7  happy gamma 13/10/2009 11.462287 11.801535
 8  happy gamma 14/10/2009 12.214974 11.801535
 9  happy gamma 15/10/2009 11.727344 11.801535
 10   sad alpha 13/10/2009  9.083835 11.596236
 11   sad alpha 14/10/2009 14.535344 11.596236
 12   sad alpha 15/10/2009 11.169530 11.596236
 13   sad  beta 13/10/2009  8.136278  8.288990
 14   sad  beta 14/10/2009  3.355900  8.288990
 15   sad  beta 15/10/2009 13.374793  8.288990
 16   sad gamma 13/10/2009  9.865199 10.882712
 17   sad gamma 14/10/2009  9.951429 10.882712
 18   sad gamma 15/10/2009 12.831509 10.882712
 
 
 My feeling is that I should be using the ?aggregate is some fashion
 but can't see how to do it. Or possibly there's another function i
 should be using?

?ave

  For example, try something like this:

transform(df1, Average = ave(score, col1, col2))

 Thanks in advance,
 Tony
 
 O/S: Windows Vista Ultimate
 sessionInfo()
 R version 2.9.2 (2009-08-24)
 i386-pc-mingw32
 
 locale:
 LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.
 1252;LC_MONETARY=English_United Kingdom.
 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods
 base
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))

2009-10-21 Thread Karl Ove Hufthammer
In article 800acfc0-2c3c-41f1-af18-3b52f7e43...@jhsph.edu, 
bcarv...@jhsph.edu says...
 aves = aggregate(df1$score, by=list(col1=df1$col1, col2=df1$col2), mean)
 results = merge(df1, aves)

Or, with the 'plyr' package, which has a very nice syntax:

library(plyr)
ddply(df1, .(col1, col2), transform, Average=mean(score))

It may be a bit slow for very large datasets, though.

Here's an alternative, which will be as fast as the aggregate solution.

within(df1, { Average=ave(score, col1, col2, FUN=mean) } )

Which one you use is a matter of taste.

And of course, the 'within' function is not the important part here; 
'ave' is. For example, if you have attached your data frame, you just 
have to type

Average=ave(score, col1, col2, FUN=mean)

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))

2009-10-21 Thread Tony Breyal
Thank you all for your responses, i have now achieved the desired
output for my own real data using your suggestions. I will also have
to look into this 'plyr' package as i have noticed that it gets
mentioned a lot.


On 21 Oct, 13:33, Karl Ove Hufthammer k...@huftis.org wrote:
 In article 800acfc0-2c3c-41f1-af18-3b52f7e43...@jhsph.edu,
 bcarv...@jhsph.edu says...

  aves = aggregate(df1$score, by=list(col1=df1$col1, col2=df1$col2), mean)
  results = merge(df1, aves)

 Or, with the 'plyr' package, which has a very nice syntax:

 library(plyr)
 ddply(df1, .(col1, col2), transform, Average=mean(score))

 It may be a bit slow for very large datasets, though.

 Here's an alternative, which will be as fast as the aggregate solution.

 within(df1, { Average=ave(score, col1, col2, FUN=mean) } )

 Which one you use is a matter of taste.

 And of course, the 'within' function is not the important part here;
 'ave' is. For example, if you have attached your data frame, you just
 have to type

 Average=ave(score, col1, col2, FUN=mean)

 --
 Karl Ove Hufthammer

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.