[R] dataframe indexing by number of cases per group
Hello, assume we have following dataframe: group -c(rep(A,5),rep(B,6),rep(C,4)) x - c(runif(5,1,5),runif(6,1,10),runif(4,2,15)) df - data.frame(group,x) Now I want to select all cases (rows) for those groups which have more or equal 5 cases (so I want to select all cases of group A and B). How can I use the indexing for such questions? df[??]... I think it is probably quite easy but I really don't know how to do that at the moment. maybe someone can help me... /johannes -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe indexing by number of cases per group
A very similar question was asked a couple of days ago - see the thread titled Removing rows in dataframe w'o duplicated values - in particular, the responses by Dimitris Rizopoulos and David Winsemius. The adaptation to this problem is df[ave(as.numeric(df$group), as.numeric(df$group), FUN = length) 4, ] groupx 1 A 3.903747 2 A 3.599547 3 A 2.449991 4 A 2.740639 5 A 4.268988 6 B 8.649600 7 B 5.493841 8 B 1.892154 9 B 6.781754 10 B 1.459250 11 B 6.749522 HTH, Dennis On Thu, Nov 24, 2011 at 4:02 AM, Johannes Radinger jradin...@gmx.at wrote: Hello, assume we have following dataframe: group -c(rep(A,5),rep(B,6),rep(C,4)) x - c(runif(5,1,5),runif(6,1,10),runif(4,2,15)) df - data.frame(group,x) Now I want to select all cases (rows) for those groups which have more or equal 5 cases (so I want to select all cases of group A and B). How can I use the indexing for such questions? df[??]... I think it is probably quite easy but I really don't know how to do that at the moment. maybe someone can help me... /johannes -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe indexing by number of cases per group
On Thu, Nov 24, 2011 at 7:02 AM, Johannes Radinger jradin...@gmx.at wrote: Hello, assume we have following dataframe: group -c(rep(A,5),rep(B,6),rep(C,4)) x - c(runif(5,1,5),runif(6,1,10),runif(4,2,15)) df - data.frame(group,x) Now I want to select all cases (rows) for those groups which have more or equal 5 cases (so I want to select all cases of group A and B). How can I use the indexing for such questions? df[??]... I think it is probably quite easy but I really don't know how to do that at the moment. maybe someone can help me... Here are three approaches: subset(merge(df, xtabs(~ group, df)), Freq = 5) : subset(transform(df, len = ave(x, group, FUN = length)), len = 5) library(sqldf) sqldf('select a.* from df a join (select group, count(*) count from df group by group) using (group) where count = 5') -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe indexing by number of cases per group
Hi, thank you for your suggestions. I think I'll stay with Dennis' approach as this is a real indexing approach: df[ave(as.numeric(df$group), as.numeric(df$group), FUN = length) 4, ] I'll try that out now best regards /Johannes Original-Nachricht Datum: Thu, 24 Nov 2011 09:12:57 -0500 Von: Gabor Grothendieck ggrothendi...@gmail.com An: Johannes Radinger jradin...@gmx.at CC: r-help@r-project.org Betreff: Re: [R] dataframe indexing by number of cases per group On Thu, Nov 24, 2011 at 7:02 AM, Johannes Radinger jradin...@gmx.at wrote: Hello, assume we have following dataframe: group -c(rep(A,5),rep(B,6),rep(C,4)) x - c(runif(5,1,5),runif(6,1,10),runif(4,2,15)) df - data.frame(group,x) Now I want to select all cases (rows) for those groups which have more or equal 5 cases (so I want to select all cases of group A and B). How can I use the indexing for such questions? df[??]... I think it is probably quite easy but I really don't know how to do that at the moment. maybe someone can help me... Here are three approaches: subset(merge(df, xtabs(~ group, df)), Freq = 5) : subset(transform(df, len = ave(x, group, FUN = length)), len = 5) library(sqldf) sqldf('select a.* from df a join (select group, count(*) count from df group by group) using (group) where count = 5') -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.