Hi,
I generally do my data preparation externally to R, so I
this is a bit unfamiliar to me, but a colleague has asked
me how to do certain data manipulations within R.
Anyway, basically I can get his large file into a dataframe.
One of the columns is a management group code (mg). There may be
Does this do what you want? It creates a new dataframe with those
'mg' that have at least a certain number of observation.
set.seed(2)
# create some test data
x - data.frame(mg=sample(LETTERS[1:4], 20, TRUE), data=1:20)
# split the data into subsets based on 'mg'
x.split - split(x, x$mg)
Jim,
Does this do what you want? It creates a new dataframe with those
'mg' that have at least a certain number of observation.
Looks good. I also have an alternative solution which appears to work,
so I'll see which is quicker on the big data set in question.
My solution:
mgsize -
Here is an even faster way:
# faster way
x.mg.size - table(x$mg) # count occurance
x.mg.5 - names(x.mg.size)[x.mg.size 5] # select greater than 5
x.new1 - subset(x, x$mg %in% x.mg.5) # use in the subset
x.new1
mg data
1 A1
4 A4
5 D5
6 D6
7 A7
8 D8