Mikkel Grum wrote: > In a number of different situations I'm trying to > remove factor levels that are represented by less than > a certain number of rows, e.g. if I had the dataset aa > below and wanted to remove the species that are > represented in less than 2 rows: > > data(iris) > aa <- iris[1:101,] > > In this case, since I can see that the species > virginica only has one row, I can write: > > table(aa$Species) > setosa versicolor virginica > 50 50 1 > > aa[aa$Species != "virginica", ] > > but: > > aa[aa$Species == names(table(aa$Species)> 2),] > > does not work. >
If you take a look at "table(aa$Species) > 2" you'll see your first mistake. Namely, the names are all still present. Your second mistake is to use "==" to match two names. "==" does not work like that. What you want is "%in%" instead. I think you want the following: keep <- levels(aa$Species)[table(aa$Species) > 2] aa <- aa[aa$Species %in% keep, ] However, the level for "virginica" is still present in the Species variable. If you would like to drop this completely, then try aa$Species <- aa$Species[drop = TRUE] HTH, --sundar ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
