Re: [R] function in aggregate applied to specific columns only
That makes eight solutions. Any others? :) A ninth was detailed in two other threads last month. The first link compares to ave(). http://tolstoy.newcastle.edu.au/R/e8/help/09/12/9014.html http://tolstoy.newcastle.edu.au/R/e8/help/09/12/8830.html Dennis Murphy djmu...@gmail.com wrote in message news:9a8a6c631001032057qc5cd68j9ec3882043dec...@mail.gmail.com... Just for the fun of it, here are two more: by and ave. with(basicSub, by(score, student, mean)) student: 1 [1] 55 student: 2 [1] 60 student: 3 [1] 67.5 Not my favorite print method; to return a vector, do instead as.vector(with(basicSub, by(score, student, mean))) [1] 55.0 60.0 67.5 You can cbind the unique student IDs to get a matrix result. ave() is used to map the average (or comparable summary) to each observation. By itself, it returns a vector of the same length as the number of observations: with(basicSub, ave(score, student)) [1] 55.0 60.0 67.5 67.5 55.0 It's more useful if you want to add the means to the data frame: transform(basicSub, avg = ave(score, student)) student gender score avg 1 1 m50 55.0 2 2 m60 60.0 3 3 f70 67.5 4 3 f65 67.5 5 1 m60 55.0 That makes eight solutions. Any others? :) Dennis On Sun, Jan 3, 2010 at 8:14 PM, Gabor Grothendieck ggrothendi...@gmail.comwrote: Here are 6 ways: 1. aggregate aggregate(basicSub[score], basicSub[student], mean) student score 1 1 55.0 2 2 60.0 3 3 67.5 2. tapply with(basicSub, tapply(score, student, mean)) 123 55.0 60.0 67.5 3. summaryBy in doBy package library(doBy) summaryBy(. ~ student, basicSub) student score.mean 1 1 55.0 2 2 60.0 3 3 67.5 4. sqldf in sqldf package. Uses SQL: library(sqldf) sqldf(select student, avg(score) from basicSub group by student) student avg(score) 1 1 55.0 2 2 60.0 3 3 67.5 5. summary.formula in Hmisc summary(score ~ student, basicSub) scoreN=5 +---+-+-+-+ | | |N|score| +---+-+-+-+ |student|1|2|55.0 | | |2|1|60.0 | | |3|2|67.5 | +---+-+-+-+ |Overall| |5|61.0 | +---+-+-+-+ 6. plyr (see Dennis Murphy's solution in this thread) On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook dhsha...@acad.umass.edu wrote: I want to use aggregate with the mean function on specific columns gender - factor(c(m, m, f, f, m)) student - c(0001, 0002, 0003, 0003, 0001) score - c(50, 60, 70, 65, 60) basicSub - data.frame(student, gender, score) basicSubMean - aggregate(basicSub, by=list(basicSub$student), FUN=mean, na.rm=TRUE) This doesn't work, one cannot take the mean of a factor (gender). Is there any way of specifying which columns to use for the mean? I want to aggregate by student, obtaining mean scores, and assume any other factors are unchanging in a specific student, ie. gender. Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] function in aggregate applied to specific columns only
I want to use aggregate with the mean function on specific columns gender - factor(c(m, m, f, f, m)) student - c(0001, 0002, 0003, 0003, 0001) score - c(50, 60, 70, 65, 60) basicSub - data.frame(student, gender, score) basicSubMean - aggregate(basicSub, by=list(basicSub$student), FUN=mean, na.rm=TRUE) This doesn't work, one cannot take the mean of a factor (gender). Is there any way of specifying which columns to use for the mean? I want to aggregate by student, obtaining mean scores, and assume any other factors are unchanging in a specific student, ie. gender. Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function in aggregate applied to specific columns only
On Jan 3, 2010, at 10:46 PM, david hilton shanabrook wrote: I want to use aggregate with the mean function on specific columns gender - factor(c(m, m, f, f, m)) student - c(0001, 0002, 0003, 0003, 0001) score - c(50, 60, 70, 65, 60) basicSub - data.frame(student, gender, score) basicSubMean - aggregate(basicSub, by=list(basicSub$student), FUN=mean, na.rm=TRUE) basicSubMean - aggregate(basicSub$score, by=list(basicSub $student), FUN=mean, na.rm=TRUE) basicSubMean Group.1x 1 1 55.0 2 2 60.0 3 3 67.5 This doesn't work, one cannot take the mean of a factor (gender). Is there any way of specifying which columns to use for the mean? I want to aggregate by student, obtaining mean scores, and assume any other factors are unchanging in a specific student, ie. gender. Thanks [[alternative HTML version deleted]] -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function in aggregate applied to specific columns only
Hi: Perhaps the plyr package would be useful. It contains functions colwise(), numcolwise() and catcolwise() that will perform the same operation on the stated type of object. In this case, numcolwise() is appropriate: str(basicSub) 'data.frame': 5 obs. of 3 variables: $ student: num 1 2 3 3 1 $ gender : Factor w/ 2 levels f,m: 2 2 1 1 2 $ score : num 50 60 70 65 60 basicSub$student - factor(basicSub$student) # convert student to factor library(plyr) # First argument is data frame, the next is the grouping variable, the # third is the function to apply. ddply(basicSub, .(student), numcolwise(mean)) student score 1 1 55.0 2 2 60.0 3 3 67.5 HTH, Dennis On Sun, Jan 3, 2010 at 7:46 PM, david hilton shanabrook dhsha...@acad.umass.edu wrote: I want to use aggregate with the mean function on specific columns gender - factor(c(m, m, f, f, m)) student - c(0001, 0002, 0003, 0003, 0001) score - c(50, 60, 70, 65, 60) basicSub - data.frame(student, gender, score) basicSubMean - aggregate(basicSub, by=list(basicSub$student), FUN=mean, na.rm=TRUE) This doesn't work, one cannot take the mean of a factor (gender). Is there any way of specifying which columns to use for the mean? I want to aggregate by student, obtaining mean scores, and assume any other factors are unchanging in a specific student, ie. gender. Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function in aggregate applied to specific columns only
You want this? basicSubMean - aggregate(basicSub[c(score)], by=list(basicSub$student), FUN=mean, na.rm=TRUE) basicSubMean Group.1 score 1 1 55.0 2 2 60.0 3 3 67.5 bests milton On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook dhsha...@acad.umass.edu wrote: I want to use aggregate with the mean function on specific columns gender - factor(c(m, m, f, f, m)) student - c(0001, 0002, 0003, 0003, 0001) score - c(50, 60, 70, 65, 60) basicSub - data.frame(student, gender, score) basicSubMean - aggregate(basicSub, by=list(basicSub$student), FUN=mean, na.rm=TRUE) This doesn't work, one cannot take the mean of a factor (gender). Is there any way of specifying which columns to use for the mean? I want to aggregate by student, obtaining mean scores, and assume any other factors are unchanging in a specific student, ie. gender. Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function in aggregate applied to specific columns only
Here are 6 ways: 1. aggregate aggregate(basicSub[score], basicSub[student], mean) student score 1 1 55.0 2 2 60.0 3 3 67.5 2. tapply with(basicSub, tapply(score, student, mean)) 123 55.0 60.0 67.5 3. summaryBy in doBy package library(doBy) summaryBy(. ~ student, basicSub) student score.mean 1 1 55.0 2 2 60.0 3 3 67.5 4. sqldf in sqldf package. Uses SQL: library(sqldf) sqldf(select student, avg(score) from basicSub group by student) student avg(score) 1 1 55.0 2 2 60.0 3 3 67.5 5. summary.formula in Hmisc summary(score ~ student, basicSub) scoreN=5 +---+-+-+-+ | | |N|score| +---+-+-+-+ |student|1|2|55.0 | | |2|1|60.0 | | |3|2|67.5 | +---+-+-+-+ |Overall| |5|61.0 | +---+-+-+-+ 6. plyr (see Dennis Murphy's solution in this thread) On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook dhsha...@acad.umass.edu wrote: I want to use aggregate with the mean function on specific columns gender - factor(c(m, m, f, f, m)) student - c(0001, 0002, 0003, 0003, 0001) score - c(50, 60, 70, 65, 60) basicSub - data.frame(student, gender, score) basicSubMean - aggregate(basicSub, by=list(basicSub$student), FUN=mean, na.rm=TRUE) This doesn't work, one cannot take the mean of a factor (gender). Is there any way of specifying which columns to use for the mean? I want to aggregate by student, obtaining mean scores, and assume any other factors are unchanging in a specific student, ie. gender. Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function in aggregate applied to specific columns only
Just for the fun of it, here are two more: by and ave. with(basicSub, by(score, student, mean)) student: 1 [1] 55 student: 2 [1] 60 student: 3 [1] 67.5 Not my favorite print method; to return a vector, do instead as.vector(with(basicSub, by(score, student, mean))) [1] 55.0 60.0 67.5 You can cbind the unique student IDs to get a matrix result. ave() is used to map the average (or comparable summary) to each observation. By itself, it returns a vector of the same length as the number of observations: with(basicSub, ave(score, student)) [1] 55.0 60.0 67.5 67.5 55.0 It's more useful if you want to add the means to the data frame: transform(basicSub, avg = ave(score, student)) student gender score avg 1 1 m50 55.0 2 2 m60 60.0 3 3 f70 67.5 4 3 f65 67.5 5 1 m60 55.0 That makes eight solutions. Any others? :) Dennis On Sun, Jan 3, 2010 at 8:14 PM, Gabor Grothendieck ggrothendi...@gmail.comwrote: Here are 6 ways: 1. aggregate aggregate(basicSub[score], basicSub[student], mean) student score 1 1 55.0 2 2 60.0 3 3 67.5 2. tapply with(basicSub, tapply(score, student, mean)) 123 55.0 60.0 67.5 3. summaryBy in doBy package library(doBy) summaryBy(. ~ student, basicSub) student score.mean 1 1 55.0 2 2 60.0 3 3 67.5 4. sqldf in sqldf package. Uses SQL: library(sqldf) sqldf(select student, avg(score) from basicSub group by student) student avg(score) 1 1 55.0 2 2 60.0 3 3 67.5 5. summary.formula in Hmisc summary(score ~ student, basicSub) scoreN=5 +---+-+-+-+ | | |N|score| +---+-+-+-+ |student|1|2|55.0 | | |2|1|60.0 | | |3|2|67.5 | +---+-+-+-+ |Overall| |5|61.0 | +---+-+-+-+ 6. plyr (see Dennis Murphy's solution in this thread) On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook dhsha...@acad.umass.edu wrote: I want to use aggregate with the mean function on specific columns gender - factor(c(m, m, f, f, m)) student - c(0001, 0002, 0003, 0003, 0001) score - c(50, 60, 70, 65, 60) basicSub - data.frame(student, gender, score) basicSubMean - aggregate(basicSub, by=list(basicSub$student), FUN=mean, na.rm=TRUE) This doesn't work, one cannot take the mean of a factor (gender). Is there any way of specifying which columns to use for the mean? I want to aggregate by student, obtaining mean scores, and assume any other factors are unchanging in a specific student, ie. gender. Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.