[R] Subsetting a data frame by a factor, using the level that occurs the most times

2005-01-20 Thread michael watson \(IAH-C\)
I think that title makes sense... I hope it does...

I have a data frame, one of the columns of which is a factor.  I want
the rows of data that correspond to the level in that factor which
occurs the most times.  

I can get a list by doing:

by(data,data$pattern,subset)

And go through each element of the list counting the rows, to find the
maximum

BUT I can't help thinking there's a more elegant way of doing this

The second part is figuring out the rows which have the maximum number
of consecutive patterns which are the same... Now that I would love some
help with... :-)

Thanks
Mick

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Subsetting a data frame by a factor, using the level that occurs the most times

2005-01-20 Thread Liaw, Andy
 From: Douglas Bates
 
 michael watson (IAH-C) wrote:
  I think that title makes sense... I hope it does...
  
  I have a data frame, one of the columns of which is a 
 factor.  I want
  the rows of data that correspond to the level in that factor which
  occurs the most times.  
 
 So first you want to determine the mode (in the sense of the most 
 frequently occuring value) of the factor.   One way to do this is
 
 names(which.max(table(fac)))
 
 Use this comparison for the subset as
 
 subset(data, pattern == names(which.max(table(pattern

Just be careful that if there are ties (i.e., more than one level having the
max) which.max() will randomly pick one of them.  That may or may not be
what's desired.  If that is a possibility, Mick will need to think what he
wants in such cases.

Andy

 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Subsetting a data frame by a factor, using the level that occurs the most times

2005-01-20 Thread Douglas Bates
Liaw, Andy wrote:
From: Douglas Bates
michael watson (IAH-C) wrote:
I think that title makes sense... I hope it does...
I have a data frame, one of the columns of which is a 
factor.  I want
the rows of data that correspond to the level in that factor which
occurs the most times.  
So first you want to determine the mode (in the sense of the most 
frequently occuring value) of the factor.   One way to do this is

names(which.max(table(fac)))
Use this comparison for the subset as
subset(data, pattern == names(which.max(table(pattern

Just be careful that if there are ties (i.e., more than one level having the
max) which.max() will randomly pick one of them.  That may or may not be
what's desired.  If that is a possibility, Mick will need to think what he
wants in such cases.
According to the documentation it picks the first one.  Also, that's 
what Martin Maechler told me and he wrote the code so I trust him on 
that.  I figure that if you have to trust someone to be meticulous and 
precise then a German-speaking Swiss is a good choice.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Subsetting a data frame by a factor, using the level that occurs the most times

2005-01-20 Thread Liaw, Andy
 From: Douglas Bates
 
 Liaw, Andy wrote:
 From: Douglas Bates
 
 michael watson (IAH-C) wrote:
 
 I think that title makes sense... I hope it does...
 
 I have a data frame, one of the columns of which is a 
 
 factor.  I want
 
 the rows of data that correspond to the level in that factor which
 occurs the most times.  
 
 So first you want to determine the mode (in the sense of the most 
 frequently occuring value) of the factor.   One way to do this is
 
 names(which.max(table(fac)))
 
 Use this comparison for the subset as
 
 subset(data, pattern == names(which.max(table(pattern
  
  
  Just be careful that if there are ties (i.e., more than one 
 level having the
  max) which.max() will randomly pick one of them.  That may 
 or may not be
  what's desired.  If that is a possibility, Mick will need 
 to think what he
  wants in such cases.
 
 According to the documentation it picks the first one.  Also, that's 
 what Martin Maechler told me and he wrote the code so I trust him on 
 that.  I figure that if you have to trust someone to be 
 meticulous and 
 precise then a German-speaking Swiss is a good choice.

My apologies!  I got it mixed up with max.col, which does the tie-breaking. 

Andy

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html