Re: [R] function in aggregate applied to specific columns only

2010-01-04 Thread Matthew Dowle
 That makes eight solutions. Any others?  :)
A ninth was detailed in two other threads last month. The first link 
compares to ave().
http://tolstoy.newcastle.edu.au/R/e8/help/09/12/9014.html
http://tolstoy.newcastle.edu.au/R/e8/help/09/12/8830.html

Dennis Murphy djmu...@gmail.com wrote in message 
news:9a8a6c631001032057qc5cd68j9ec3882043dec...@mail.gmail.com...
 Just for the fun of it, here are two more: by and ave.


 with(basicSub, by(score, student, mean))
 student: 1
 [1] 55
 
 student: 2
 [1] 60
 
 student: 3
 [1] 67.5

 Not my favorite print method;  to return a vector, do instead
 as.vector(with(basicSub, by(score, student, mean)))
 [1] 55.0 60.0 67.5
 You can cbind the unique student IDs to get a matrix result.

 ave() is used to map the average (or comparable summary) to each
 observation.
 By itself, it returns a vector of the same length as the number of
 observations:
 with(basicSub, ave(score, student))
 [1] 55.0 60.0 67.5 67.5 55.0

 It's more useful if you want to add the means to the data frame:
 transform(basicSub, avg = ave(score, student))
  student gender score  avg
 1   1  m50 55.0
 2   2  m60 60.0
 3   3  f70 67.5
 4   3  f65 67.5
 5   1  m60 55.0

 That makes eight solutions. Any others?  :)

 Dennis


 On Sun, Jan 3, 2010 at 8:14 PM, Gabor Grothendieck
 ggrothendi...@gmail.comwrote:

 Here are 6 ways:

 1. aggregate

  aggregate(basicSub[score], basicSub[student], mean)
  student score
 1   1  55.0
 2   2  60.0
 3   3  67.5

 2. tapply

  with(basicSub, tapply(score, student, mean))
   123
 55.0 60.0 67.5

 3. summaryBy in doBy package

  library(doBy)
  summaryBy(. ~ student, basicSub)
  student score.mean
 1   1   55.0
 2   2   60.0
 3   3   67.5

 4. sqldf in sqldf package.  Uses SQL:

  library(sqldf)
  sqldf(select student, avg(score) from basicSub group by student)
  student avg(score)
 1   1   55.0
 2   2   60.0
 3   3   67.5

 5. summary.formula in Hmisc

  summary(score ~ student, basicSub)
 scoreN=5

 +---+-+-+-+
 |   | |N|score|
 +---+-+-+-+
 |student|1|2|55.0 |
 |   |2|1|60.0 |
 |   |3|2|67.5 |
 +---+-+-+-+
 |Overall| |5|61.0 |
 +---+-+-+-+

 6. plyr (see Dennis Murphy's solution in this thread)


 On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook
 dhsha...@acad.umass.edu wrote:
  I want to use aggregate with the mean function on specific columns
 
  gender - factor(c(m, m, f, f, m))
  student - c(0001, 0002, 0003, 0003, 0001)
  score - c(50, 60, 70, 65, 60)
  basicSub - data.frame(student, gender, score)
  basicSubMean - aggregate(basicSub, by=list(basicSub$student), 
  FUN=mean,
 na.rm=TRUE)
 
  This doesn't work, one cannot take the mean of a factor (gender).  Is
 there any way of specifying which columns to use for the mean?  I want to
 aggregate by student, obtaining mean scores, and assume any other factors
 are unchanging in a specific student, ie. gender.
 
  Thanks
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] function in aggregate applied to specific columns only

2010-01-03 Thread david hilton shanabrook
I want to use aggregate with the mean function on specific columns

gender - factor(c(m, m, f, f, m))
student - c(0001, 0002, 0003, 0003, 0001)
score - c(50, 60, 70, 65, 60)
basicSub - data.frame(student, gender, score)
basicSubMean - aggregate(basicSub, by=list(basicSub$student), FUN=mean, 
na.rm=TRUE)

This doesn't work, one cannot take the mean of a factor (gender).  Is there any 
way of specifying which columns to use for the mean?  I want to aggregate by 
student, obtaining mean scores, and assume any other factors are unchanging in 
a specific student, ie. gender.

Thanks
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function in aggregate applied to specific columns only

2010-01-03 Thread David Winsemius


On Jan 3, 2010, at 10:46 PM, david hilton shanabrook wrote:


I want to use aggregate with the mean function on specific columns

gender - factor(c(m, m, f, f, m))
student - c(0001, 0002, 0003, 0003, 0001)
score - c(50, 60, 70, 65, 60)
basicSub - data.frame(student, gender, score)
basicSubMean - aggregate(basicSub, by=list(basicSub$student),  
FUN=mean, na.rm=TRUE)


 basicSubMean - aggregate(basicSub$score, by=list(basicSub 
$student), FUN=mean, na.rm=TRUE)

 basicSubMean
  Group.1x
1   1 55.0
2   2 60.0
3   3 67.5



This doesn't work, one cannot take the mean of a factor (gender).   
Is there any way of specifying which columns to use for the mean?  I  
want to aggregate by student, obtaining mean scores, and assume any  
other factors are unchanging in a specific student, ie. gender.


Thanks
[[alternative HTML version deleted]]

--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function in aggregate applied to specific columns only

2010-01-03 Thread Dennis Murphy
Hi:

Perhaps the plyr package would be useful. It contains functions colwise(),
numcolwise() and
catcolwise() that will perform the same operation on the stated type of
object. In this case,
numcolwise() is appropriate:

 str(basicSub)
'data.frame':   5 obs. of  3 variables:
 $ student: num  1 2 3 3 1
 $ gender : Factor w/ 2 levels f,m: 2 2 1 1 2
 $ score  : num  50 60 70 65 60
 basicSub$student - factor(basicSub$student)  # convert student to factor
 library(plyr)
# First argument is data frame, the next is the grouping variable, the
# third is the function to apply.
 ddply(basicSub, .(student), numcolwise(mean))
  student score
1   1  55.0
2   2  60.0
3   3  67.5

HTH,
Dennis

On Sun, Jan 3, 2010 at 7:46 PM, david hilton shanabrook 
dhsha...@acad.umass.edu wrote:

 I want to use aggregate with the mean function on specific columns

 gender - factor(c(m, m, f, f, m))
 student - c(0001, 0002, 0003, 0003, 0001)
 score - c(50, 60, 70, 65, 60)
 basicSub - data.frame(student, gender, score)
 basicSubMean - aggregate(basicSub, by=list(basicSub$student), FUN=mean,
 na.rm=TRUE)

 This doesn't work, one cannot take the mean of a factor (gender).  Is there
 any way of specifying which columns to use for the mean?  I want to
 aggregate by student, obtaining mean scores, and assume any other factors
 are unchanging in a specific student, ie. gender.

 Thanks
[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function in aggregate applied to specific columns only

2010-01-03 Thread milton ruser
You want this?

 basicSubMean - aggregate(basicSub[c(score)], by=list(basicSub$student),
FUN=mean, na.rm=TRUE)
 basicSubMean
  Group.1 score
1   1  55.0
2   2  60.0
3   3  67.5

bests
milton

On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook 
dhsha...@acad.umass.edu wrote:

 I want to use aggregate with the mean function on specific columns

 gender - factor(c(m, m, f, f, m))
 student - c(0001, 0002, 0003, 0003, 0001)
 score - c(50, 60, 70, 65, 60)
 basicSub - data.frame(student, gender, score)
 basicSubMean - aggregate(basicSub, by=list(basicSub$student), FUN=mean,
 na.rm=TRUE)

 This doesn't work, one cannot take the mean of a factor (gender).  Is there
 any way of specifying which columns to use for the mean?  I want to
 aggregate by student, obtaining mean scores, and assume any other factors
 are unchanging in a specific student, ie. gender.

 Thanks
[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function in aggregate applied to specific columns only

2010-01-03 Thread Gabor Grothendieck
Here are 6 ways:

1. aggregate

 aggregate(basicSub[score], basicSub[student], mean)
  student score
1   1  55.0
2   2  60.0
3   3  67.5

2. tapply

 with(basicSub, tapply(score, student, mean))
   123
55.0 60.0 67.5

3. summaryBy in doBy package

 library(doBy)
 summaryBy(. ~ student, basicSub)
  student score.mean
1   1   55.0
2   2   60.0
3   3   67.5

4. sqldf in sqldf package.  Uses SQL:

 library(sqldf)
 sqldf(select student, avg(score) from basicSub group by student)
  student avg(score)
1   1   55.0
2   2   60.0
3   3   67.5

5. summary.formula in Hmisc

 summary(score ~ student, basicSub)
scoreN=5

+---+-+-+-+
|   | |N|score|
+---+-+-+-+
|student|1|2|55.0 |
|   |2|1|60.0 |
|   |3|2|67.5 |
+---+-+-+-+
|Overall| |5|61.0 |
+---+-+-+-+

6. plyr (see Dennis Murphy's solution in this thread)


On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook
dhsha...@acad.umass.edu wrote:
 I want to use aggregate with the mean function on specific columns

 gender - factor(c(m, m, f, f, m))
 student - c(0001, 0002, 0003, 0003, 0001)
 score - c(50, 60, 70, 65, 60)
 basicSub - data.frame(student, gender, score)
 basicSubMean - aggregate(basicSub, by=list(basicSub$student), FUN=mean, 
 na.rm=TRUE)

 This doesn't work, one cannot take the mean of a factor (gender).  Is there 
 any way of specifying which columns to use for the mean?  I want to aggregate 
 by student, obtaining mean scores, and assume any other factors are 
 unchanging in a specific student, ie. gender.

 Thanks
        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function in aggregate applied to specific columns only

2010-01-03 Thread Dennis Murphy
Just for the fun of it, here are two more: by and ave.


 with(basicSub, by(score, student, mean))
student: 1
[1] 55

student: 2
[1] 60

student: 3
[1] 67.5

Not my favorite print method;  to return a vector, do instead
 as.vector(with(basicSub, by(score, student, mean)))
[1] 55.0 60.0 67.5
You can cbind the unique student IDs to get a matrix result.

ave() is used to map the average (or comparable summary) to each
observation.
By itself, it returns a vector of the same length as the number of
observations:
 with(basicSub, ave(score, student))
[1] 55.0 60.0 67.5 67.5 55.0

It's more useful if you want to add the means to the data frame:
 transform(basicSub, avg = ave(score, student))
  student gender score  avg
1   1  m50 55.0
2   2  m60 60.0
3   3  f70 67.5
4   3  f65 67.5
5   1  m60 55.0

That makes eight solutions. Any others?  :)

Dennis


On Sun, Jan 3, 2010 at 8:14 PM, Gabor Grothendieck
ggrothendi...@gmail.comwrote:

 Here are 6 ways:

 1. aggregate

  aggregate(basicSub[score], basicSub[student], mean)
  student score
 1   1  55.0
 2   2  60.0
 3   3  67.5

 2. tapply

  with(basicSub, tapply(score, student, mean))
   123
 55.0 60.0 67.5

 3. summaryBy in doBy package

  library(doBy)
  summaryBy(. ~ student, basicSub)
  student score.mean
 1   1   55.0
 2   2   60.0
 3   3   67.5

 4. sqldf in sqldf package.  Uses SQL:

  library(sqldf)
  sqldf(select student, avg(score) from basicSub group by student)
  student avg(score)
 1   1   55.0
 2   2   60.0
 3   3   67.5

 5. summary.formula in Hmisc

  summary(score ~ student, basicSub)
 scoreN=5

 +---+-+-+-+
 |   | |N|score|
 +---+-+-+-+
 |student|1|2|55.0 |
 |   |2|1|60.0 |
 |   |3|2|67.5 |
 +---+-+-+-+
 |Overall| |5|61.0 |
 +---+-+-+-+

 6. plyr (see Dennis Murphy's solution in this thread)


 On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook
 dhsha...@acad.umass.edu wrote:
  I want to use aggregate with the mean function on specific columns
 
  gender - factor(c(m, m, f, f, m))
  student - c(0001, 0002, 0003, 0003, 0001)
  score - c(50, 60, 70, 65, 60)
  basicSub - data.frame(student, gender, score)
  basicSubMean - aggregate(basicSub, by=list(basicSub$student), FUN=mean,
 na.rm=TRUE)
 
  This doesn't work, one cannot take the mean of a factor (gender).  Is
 there any way of specifying which columns to use for the mean?  I want to
 aggregate by student, obtaining mean scores, and assume any other factors
 are unchanging in a specific student, ie. gender.
 
  Thanks
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.