Re: [R] Getting the groupmean for each person

2004-05-10 Thread Thomas Lumley
On Mon, 10 May 2004, Christophe Pallier wrote:

 The use of tapply(x,f,mean)[match(f,unique(f))] assumes a particular
 order in the result of tapply, no? It seems a bit dangerous to me.


My original code for the group means problem used rowsum(,reorder=FALSE)
rather than tapply(), and we do know that this produces the same order as
unique().

-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Getting the groupmean for each person

2004-05-10 Thread Christophe Pallier


Liaw, Andy wrote:

Suppose I
define the function:
fun - function(x, f) {
   m - tapply(x, f, mean)
   ans - x - m[match(f, unique(f))]
   names(ans) - names(x)
   ans
}
 

May I ask what is the purpose of match(f,unique(f)) ?

To remove the group means, I have be using:

x-tapply(x,f,mean)[f]

for a while, (and I am now changing to 
x-tapply(x,f,mean)[as.character(f)] because of the peculiarities of 
indexing named vectors with factors )

The use of tapply(x,f,mean)[match(f,unique(f))] assumes a particular 
order in the result of tapply, no? It seems a bit dangerous to me.

Christophe Pallier

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Getting the groupmean for each person

2004-05-10 Thread Liaw, Andy
Both of you might have missed my question from Friday:  For very long `x'
(e.g., length=5), indexing by names can take a long time.  See that
thread for detail.  (For small data, you can hardly tell the difference.)

Also, I'm trying to write the function in a way that one can pass in more
than one grouping variables in a list, much like tapply.  The version I
shown is a simplified version to demonstrate the `problem' I had.  I
obviously missed the fact that tapply returns 1D array...

Best,
Andy

 From: [EMAIL PROTECTED] 
 
 On 10 May 2004 at 10:09, Christophe Pallier wrote:
 
  
  
  Liaw, Andy wrote:
  
  Suppose I
  define the function:
  
  fun - function(x, f) {
  m - tapply(x, f, mean)
  ans - x - m[match(f, unique(f))]
  names(ans) - names(x)
  ans
  }
  

  
  
  May I ask what is the purpose of match(f,unique(f)) ?
  
  To remove the group means, I have be using:
  
  x-tapply(x,f,mean)[f]
  
  for a while, (and I am now changing to 
  x-tapply(x,f,mean)[as.character(f)] because of the peculiarities of
 
 wouldn't 
  sweep(as.array(x), 1, tapply(x,f,mean)[as.character(f)] , -)
 
 be more natural?
 
 Kjetil Halvorsen
 
  indexing named vectors with factors )
  
  The use of tapply(x,f,mean)[match(f,unique(f))] assumes a particular
  order in the result of tapply, no? It seems a bit dangerous to me.
  
  
  Christophe Pallier
  
  __
  [EMAIL PROTECTED] mailing list
  https://www.stat.math.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
  http://www.R-project.org/posting-guide.html
  
 
 
 


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Getting the groupmean for each person

2004-05-10 Thread Prof Brian Ripley
On Mon, 10 May 2004, Liaw, Andy wrote:

 Both of you might have missed my question from Friday:  For very long `x'
 (e.g., length=5), indexing by names can take a long time.  See that
 thread for detail.  (For small data, you can hardly tell the difference.)

That's solved in R-devel as of this morning.  You need a million to see a 
significant time in indexing.

However, I think that in this case you should be indexing by the codes of 
a factor, as tapply is guaranteed to produce results in the order of the 
levels of f (after conversion to a factor).  So the natural way to index 
by a factor is the default one.

It may come as no surprise then that lda has code like

group.means - tapply(x, list(rep(g, p), col(x)), mean)
X - x - group.means[g, ]

where g is a factor.

 Also, I'm trying to write the function in a way that one can pass in more
 than one grouping variables in a list, much like tapply.  The version I
 shown is a simplified version to demonstrate the `problem' I had.  I
 obviously missed the fact that tapply returns 1D array...
 
 Best,
 Andy
 
  From: [EMAIL PROTECTED] 
  
  On 10 May 2004 at 10:09, Christophe Pallier wrote:
  
   
   
   Liaw, Andy wrote:
   
   Suppose I
   define the function:
   
   fun - function(x, f) {
   m - tapply(x, f, mean)
   ans - x - m[match(f, unique(f))]
   names(ans) - names(x)
   ans
   }
   
 
   
   
   May I ask what is the purpose of match(f,unique(f)) ?
   
   To remove the group means, I have be using:
   
   x-tapply(x,f,mean)[f]
   
   for a while, (and I am now changing to 
   x-tapply(x,f,mean)[as.character(f)] because of the peculiarities of
  
  wouldn't 
   sweep(as.array(x), 1, tapply(x,f,mean)[as.character(f)] , -)
  
  be more natural?
  
  Kjetil Halvorsen
  
   indexing named vectors with factors )
   
   The use of tapply(x,f,mean)[match(f,unique(f))] assumes a particular
   order in the result of tapply, no? It seems a bit dangerous to me.
   
   
   Christophe Pallier

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Getting the groupmean for each person

2004-05-09 Thread Thomas Lumley
On Sat, 8 May 2004, Gabor Grothendieck wrote:


 predict(lm(AV~as.factor(GROUP)))


If Felix actually has a huge data frame this will be slow. Instead
try

groupmeans-rowsum(AV,GROUP,reorder=FALSE)
individual.means- groupmeans[match(GROUP, unique(GROUP)]

It uses hashing and takes roughly O(MGlogG) time for M measurements on G
groups, whereas the lm solution takes O(MG^3) [and the space requirements
are O(MG) and O(MG^2)]

Admittedly, with only 3000 observations either one will be fast enough.

-thomas





 Felix Eschenburg Atropin75 at t-online.de writes:

 :
 : Hello list !
 :
 : I have a huge data.frame with several variables observed on about 3000
 : persons. For every person (row) there is variable called GROUP which indices
 : the group the person belongs to. There is also another variable AV for each
 : person. Now i want to create a new variable which holds the group mean of AV
 : as a value for each person.
 : With tapply(AV,GROUP,mean) i get the means for each level of GROUP, but i
 : cannot find out, how to give every person the groupmean as a value (every
 : person should have the same value as every other person in the same group).
 :
 : Has anybody any ideas how to do that ?
 :
 : Yours sincerly
 : Felix Eschenburg

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Getting the groupmean for each person

2004-05-08 Thread Felix Eschenburg
Hello list !

I have a huge data.frame with several variables observed on about 3000 
persons. For every person (row) there is variable called GROUP which indices 
the group the person belongs to. There is also another variable AV for each 
person. Now i want to create a new variable which holds the group mean of AV 
as a value for each person.
With tapply(AV,GROUP,mean) i get the means for each level of GROUP, but i 
cannot find out, how to give every person the groupmean as a value (every 
person should have the same value as every other person in the same group). 

Has anybody any ideas how to do that ?

Yours sincerly
Felix Eschenburg

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Getting the groupmean for each person

2004-05-08 Thread Gabor Grothendieck

predict(lm(AV~as.factor(GROUP)))



Felix Eschenburg Atropin75 at t-online.de writes:

: 
: Hello list !
: 
: I have a huge data.frame with several variables observed on about 3000 
: persons. For every person (row) there is variable called GROUP which indices 
: the group the person belongs to. There is also another variable AV for each 
: person. Now i want to create a new variable which holds the group mean of AV 
: as a value for each person.
: With tapply(AV,GROUP,mean) i get the means for each level of GROUP, but i 
: cannot find out, how to give every person the groupmean as a value (every 
: person should have the same value as every other person in the same group). 
: 
: Has anybody any ideas how to do that ?
: 
: Yours sincerly
: Felix Eschenburg

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html