[R] dataframe indexing by number of cases per group

2011-11-24 Thread Johannes Radinger
Hello,

assume we have following dataframe:

group -c(rep(A,5),rep(B,6),rep(C,4))
x - c(runif(5,1,5),runif(6,1,10),runif(4,2,15))
df - data.frame(group,x)

Now I want to select all cases (rows) for those groups
which have more or equal 5 cases (so I want to select
all cases of group A and B).
How can I use the indexing for such questions?

df[??]... I think it is probably quite easy but I really
don't know how to do that at the moment.

maybe someone can help me...

/johannes
--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dataframe indexing by number of cases per group

2011-11-24 Thread Dennis Murphy
A very similar question was asked a couple of days ago - see the
thread titled Removing rows in dataframe w'o duplicated values - in
particular, the responses by Dimitris Rizopoulos and David Winsemius.
The adaptation to this problem is

df[ave(as.numeric(df$group), as.numeric(df$group), FUN = length)  4, ]
   groupx
1  A 3.903747
2  A 3.599547
3  A 2.449991
4  A 2.740639
5  A 4.268988
6  B 8.649600
7  B 5.493841
8  B 1.892154
9  B 6.781754
10 B 1.459250
11 B 6.749522

HTH,
Dennis

On Thu, Nov 24, 2011 at 4:02 AM, Johannes Radinger jradin...@gmx.at wrote:
 Hello,

 assume we have following dataframe:

 group -c(rep(A,5),rep(B,6),rep(C,4))
 x - c(runif(5,1,5),runif(6,1,10),runif(4,2,15))
 df - data.frame(group,x)

 Now I want to select all cases (rows) for those groups
 which have more or equal 5 cases (so I want to select
 all cases of group A and B).
 How can I use the indexing for such questions?

 df[??]... I think it is probably quite easy but I really
 don't know how to do that at the moment.

 maybe someone can help me...

 /johannes
 --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dataframe indexing by number of cases per group

2011-11-24 Thread Gabor Grothendieck
On Thu, Nov 24, 2011 at 7:02 AM, Johannes Radinger jradin...@gmx.at wrote:
 Hello,

 assume we have following dataframe:

 group -c(rep(A,5),rep(B,6),rep(C,4))
 x - c(runif(5,1,5),runif(6,1,10),runif(4,2,15))
 df - data.frame(group,x)

 Now I want to select all cases (rows) for those groups
 which have more or equal 5 cases (so I want to select
 all cases of group A and B).
 How can I use the indexing for such questions?

 df[??]... I think it is probably quite easy but I really
 don't know how to do that at the moment.

 maybe someone can help me...


Here are three approaches:

subset(merge(df, xtabs(~ group, df)), Freq = 5)
:
subset(transform(df, len = ave(x, group, FUN = length)), len = 5)

library(sqldf)
sqldf('select a.*
from df a join (select group, count(*) count from df group by group)
using (group)
where count = 5')

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dataframe indexing by number of cases per group

2011-11-24 Thread Johannes Radinger
Hi,

thank you for your suggestions.
I think I'll stay with Dennis' approach
as this is a real indexing approach:

df[ave(as.numeric(df$group), as.numeric(df$group), FUN = length)  4, ]

I'll try that out now

best regards
/Johannes

 Original-Nachricht 
 Datum: Thu, 24 Nov 2011 09:12:57 -0500
 Von: Gabor Grothendieck ggrothendi...@gmail.com
 An: Johannes Radinger jradin...@gmx.at
 CC: r-help@r-project.org
 Betreff: Re: [R] dataframe indexing by number of cases per group

 On Thu, Nov 24, 2011 at 7:02 AM, Johannes Radinger jradin...@gmx.at
 wrote:
  Hello,
 
  assume we have following dataframe:
 
  group -c(rep(A,5),rep(B,6),rep(C,4))
  x - c(runif(5,1,5),runif(6,1,10),runif(4,2,15))
  df - data.frame(group,x)
 
  Now I want to select all cases (rows) for those groups
  which have more or equal 5 cases (so I want to select
  all cases of group A and B).
  How can I use the indexing for such questions?
 
  df[??]... I think it is probably quite easy but I really
  don't know how to do that at the moment.
 
  maybe someone can help me...
 
 
 Here are three approaches:
 
 subset(merge(df, xtabs(~ group, df)), Freq = 5)
 :
 subset(transform(df, len = ave(x, group, FUN = length)), len = 5)
 
 library(sqldf)
 sqldf('select a.*
 from df a join (select group, count(*) count from df group by
 group)
 using (group)
 where count = 5')
 
 -- 
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com

--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.