Re: [R] drop rare factors

2012-01-19 Thread Sarah Goslee
On Thu, Jan 19, 2012 at 4:11 PM, William Dunlap wrote: >> That's the only thing I see, *except* that df() and drop() are base >> functions, >> so you shouldn't use those as variable names. > > I don't think that is much of a problem.  The local > versions will be used in the function. Yes, but s

Re: [R] drop rare factors

2012-01-19 Thread William Dunlap
-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Sarah Goslee > Sent: Thursday, January 19, 2012 1:01 PM > To: s...@gnu.org; Sarah Goslee; r-help@r-project.org > Subject: Re: [R] drop rare factors > > Everywhere that you use > df[column] >

Re: [R] drop rare factors

2012-01-19 Thread Sarah Goslee
Everywhere that you use df[column] should be df[[column]] That's the only thing I see, *except* that df() and drop() are base functions, so you shouldn't use those as variable names. >> Remind the list what you're trying to do. The list gets lots of traffic; >> if you delete out all the context

Re: [R] drop rare factors

2012-01-19 Thread Sam Steingold
create data: mydata <- data.frame(MyFactor = factor(rep(LETTERS[1:4], times=c(1000, 2000, 30, 4))), something = runif(3034)) define function: drop.levels <- function (df, column, threshold) { size <- nrow(df) if (threshold < 1) threshold <- threshold * size tab <- table(df[column]) keep

Re: [R] drop rare factors

2012-01-19 Thread Sarah Goslee
Hi Sam, To be of any use whatsoever, we need a reproducible example. What's frame? What's column? What's threshold? Remind the list what you're trying to do. The list gets lots of traffic; if you delete out all the context nobody will remember what you need. Sarah On Thu, Jan 19, 2012 at 2:44

Re: [R] drop rare factors

2012-01-19 Thread Sam Steingold
> * Sarah Goslee [2012-01-18 17:36:16 -0500]: > > Here's one way, worked out in lots of steps so you can see > how each works: thanks, it all makes perfect sense, and I wrote this function based on your instructions: drop.levels <- function (frame, column, threshold) { size <- nrow(frame) if

Re: [R] drop rare factors

2012-01-18 Thread Sarah Goslee
Here's one way, worked out in lots of steps so you can see how each works: > mydata <- data.frame(MyFactor = factor(rep(LETTERS[1:4], times=c(1000, 2000, > 30, 4))), something = runif(3034)) > str(mydata) 'data.frame': 3034 obs. of 2 variables: $ MyFactor : Factor w/ 4 levels "A","B","C","D":

[R] drop rare factors

2012-01-18 Thread Sam Steingold
I have a data frame with some factor columns. I want to drop the rows with rare factor values (and remove the factor values from the factors). E.g., frame$MyFactor takes values A 1,000 times, B 2,000 times, C 30 times and D 4 times. I want to remove all rows which assume rare values (<1%), i.e., C