Re: [R] Getting the groupmean for each person
On Mon, 10 May 2004, Christophe Pallier wrote: The use of tapply(x,f,mean)[match(f,unique(f))] assumes a particular order in the result of tapply, no? It seems a bit dangerous to me. My original code for the group means problem used rowsum(,reorder=FALSE) rather than tapply(), and we do know that this produces the same order as unique(). -thomas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Getting the groupmean for each person
Liaw, Andy wrote: Suppose I define the function: fun - function(x, f) { m - tapply(x, f, mean) ans - x - m[match(f, unique(f))] names(ans) - names(x) ans } May I ask what is the purpose of match(f,unique(f)) ? To remove the group means, I have be using: x-tapply(x,f,mean)[f] for a while, (and I am now changing to x-tapply(x,f,mean)[as.character(f)] because of the peculiarities of indexing named vectors with factors ) The use of tapply(x,f,mean)[match(f,unique(f))] assumes a particular order in the result of tapply, no? It seems a bit dangerous to me. Christophe Pallier __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Getting the groupmean for each person
Both of you might have missed my question from Friday: For very long `x' (e.g., length=5), indexing by names can take a long time. See that thread for detail. (For small data, you can hardly tell the difference.) Also, I'm trying to write the function in a way that one can pass in more than one grouping variables in a list, much like tapply. The version I shown is a simplified version to demonstrate the `problem' I had. I obviously missed the fact that tapply returns 1D array... Best, Andy From: [EMAIL PROTECTED] On 10 May 2004 at 10:09, Christophe Pallier wrote: Liaw, Andy wrote: Suppose I define the function: fun - function(x, f) { m - tapply(x, f, mean) ans - x - m[match(f, unique(f))] names(ans) - names(x) ans } May I ask what is the purpose of match(f,unique(f)) ? To remove the group means, I have be using: x-tapply(x,f,mean)[f] for a while, (and I am now changing to x-tapply(x,f,mean)[as.character(f)] because of the peculiarities of wouldn't sweep(as.array(x), 1, tapply(x,f,mean)[as.character(f)] , -) be more natural? Kjetil Halvorsen indexing named vectors with factors ) The use of tapply(x,f,mean)[match(f,unique(f))] assumes a particular order in the result of tapply, no? It seems a bit dangerous to me. Christophe Pallier __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Getting the groupmean for each person
On Mon, 10 May 2004, Liaw, Andy wrote: Both of you might have missed my question from Friday: For very long `x' (e.g., length=5), indexing by names can take a long time. See that thread for detail. (For small data, you can hardly tell the difference.) That's solved in R-devel as of this morning. You need a million to see a significant time in indexing. However, I think that in this case you should be indexing by the codes of a factor, as tapply is guaranteed to produce results in the order of the levels of f (after conversion to a factor). So the natural way to index by a factor is the default one. It may come as no surprise then that lda has code like group.means - tapply(x, list(rep(g, p), col(x)), mean) X - x - group.means[g, ] where g is a factor. Also, I'm trying to write the function in a way that one can pass in more than one grouping variables in a list, much like tapply. The version I shown is a simplified version to demonstrate the `problem' I had. I obviously missed the fact that tapply returns 1D array... Best, Andy From: [EMAIL PROTECTED] On 10 May 2004 at 10:09, Christophe Pallier wrote: Liaw, Andy wrote: Suppose I define the function: fun - function(x, f) { m - tapply(x, f, mean) ans - x - m[match(f, unique(f))] names(ans) - names(x) ans } May I ask what is the purpose of match(f,unique(f)) ? To remove the group means, I have be using: x-tapply(x,f,mean)[f] for a while, (and I am now changing to x-tapply(x,f,mean)[as.character(f)] because of the peculiarities of wouldn't sweep(as.array(x), 1, tapply(x,f,mean)[as.character(f)] , -) be more natural? Kjetil Halvorsen indexing named vectors with factors ) The use of tapply(x,f,mean)[match(f,unique(f))] assumes a particular order in the result of tapply, no? It seems a bit dangerous to me. Christophe Pallier -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Getting the groupmean for each person
On Sat, 8 May 2004, Gabor Grothendieck wrote: predict(lm(AV~as.factor(GROUP))) If Felix actually has a huge data frame this will be slow. Instead try groupmeans-rowsum(AV,GROUP,reorder=FALSE) individual.means- groupmeans[match(GROUP, unique(GROUP)] It uses hashing and takes roughly O(MGlogG) time for M measurements on G groups, whereas the lm solution takes O(MG^3) [and the space requirements are O(MG) and O(MG^2)] Admittedly, with only 3000 observations either one will be fast enough. -thomas Felix Eschenburg Atropin75 at t-online.de writes: : : Hello list ! : : I have a huge data.frame with several variables observed on about 3000 : persons. For every person (row) there is variable called GROUP which indices : the group the person belongs to. There is also another variable AV for each : person. Now i want to create a new variable which holds the group mean of AV : as a value for each person. : With tapply(AV,GROUP,mean) i get the means for each level of GROUP, but i : cannot find out, how to give every person the groupmean as a value (every : person should have the same value as every other person in the same group). : : Has anybody any ideas how to do that ? : : Yours sincerly : Felix Eschenburg __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Getting the groupmean for each person
Hello list ! I have a huge data.frame with several variables observed on about 3000 persons. For every person (row) there is variable called GROUP which indices the group the person belongs to. There is also another variable AV for each person. Now i want to create a new variable which holds the group mean of AV as a value for each person. With tapply(AV,GROUP,mean) i get the means for each level of GROUP, but i cannot find out, how to give every person the groupmean as a value (every person should have the same value as every other person in the same group). Has anybody any ideas how to do that ? Yours sincerly Felix Eschenburg __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Getting the groupmean for each person
predict(lm(AV~as.factor(GROUP))) Felix Eschenburg Atropin75 at t-online.de writes: : : Hello list ! : : I have a huge data.frame with several variables observed on about 3000 : persons. For every person (row) there is variable called GROUP which indices : the group the person belongs to. There is also another variable AV for each : person. Now i want to create a new variable which holds the group mean of AV : as a value for each person. : With tapply(AV,GROUP,mean) i get the means for each level of GROUP, but i : cannot find out, how to give every person the groupmean as a value (every : person should have the same value as every other person in the same group). : : Has anybody any ideas how to do that ? : : Yours sincerly : Felix Eschenburg __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html