[R] Subsetting by number of observations in a factor

2007-08-09 Thread Ron Crump
Hi, I generally do my data preparation externally to R, so I this is a bit unfamiliar to me, but a colleague has asked me how to do certain data manipulations within R. Anyway, basically I can get his large file into a dataframe. One of the columns is a management group code (mg). There may be

Re: [R] Subsetting by number of observations in a factor

2007-08-09 Thread jim holtman
Does this do what you want? It creates a new dataframe with those 'mg' that have at least a certain number of observation. set.seed(2) # create some test data x - data.frame(mg=sample(LETTERS[1:4], 20, TRUE), data=1:20) # split the data into subsets based on 'mg' x.split - split(x, x$mg)

Re: [R] Subsetting by number of observations in a factor

2007-08-09 Thread Ron Crump
Jim, Does this do what you want? It creates a new dataframe with those 'mg' that have at least a certain number of observation. Looks good. I also have an alternative solution which appears to work, so I'll see which is quicker on the big data set in question. My solution: mgsize -

Re: [R] Subsetting by number of observations in a factor

2007-08-09 Thread jim holtman
Here is an even faster way: # faster way x.mg.size - table(x$mg) # count occurance x.mg.5 - names(x.mg.size)[x.mg.size 5] # select greater than 5 x.new1 - subset(x, x$mg %in% x.mg.5) # use in the subset x.new1 mg data 1 A1 4 A4 5 D5 6 D6 7 A7 8 D8