On Thu, Jan 19, 2012 at 4:11 PM, William Dunlap wrote:
>> That's the only thing I see, *except* that df() and drop() are base
>> functions,
>> so you shouldn't use those as variable names.
>
> I don't think that is much of a problem. The local
> versions will be used in the function.
Yes, but s
-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf Of Sarah Goslee
> Sent: Thursday, January 19, 2012 1:01 PM
> To: s...@gnu.org; Sarah Goslee; r-help@r-project.org
> Subject: Re: [R] drop rare factors
>
> Everywhere that you use
> df[column]
>
Everywhere that you use
df[column]
should be
df[[column]]
That's the only thing I see, *except* that df() and drop() are base functions,
so you shouldn't use those as variable names.
>> Remind the list what you're trying to do. The list gets lots of traffic;
>> if you delete out all the context
create data:
mydata <- data.frame(MyFactor = factor(rep(LETTERS[1:4], times=c(1000, 2000,
30, 4))), something = runif(3034))
define function:
drop.levels <- function (df, column, threshold) {
size <- nrow(df)
if (threshold < 1) threshold <- threshold * size
tab <- table(df[column])
keep
Hi Sam,
To be of any use whatsoever, we need a reproducible example.
What's frame?
What's column?
What's threshold?
Remind the list what you're trying to do. The list gets lots of traffic;
if you delete out all the context nobody will remember what you need.
Sarah
On Thu, Jan 19, 2012 at 2:44
> * Sarah Goslee [2012-01-18 17:36:16 -0500]:
>
> Here's one way, worked out in lots of steps so you can see
> how each works:
thanks, it all makes perfect sense, and I wrote this function based on
your instructions:
drop.levels <- function (frame, column, threshold) {
size <- nrow(frame)
if
Here's one way, worked out in lots of steps so you can see
how each works:
> mydata <- data.frame(MyFactor = factor(rep(LETTERS[1:4], times=c(1000, 2000,
> 30, 4))), something = runif(3034))
> str(mydata)
'data.frame': 3034 obs. of 2 variables:
$ MyFactor : Factor w/ 4 levels "A","B","C","D":
I have a data frame with some factor columns.
I want to drop the rows with rare factor values
(and remove the factor values from the factors).
E.g., frame$MyFactor takes values
A 1,000 times,
B 2,000 times,
C 30 times and
D 4 times.
I want to remove all rows which assume rare values (<1%), i.e., C
8 matches
Mail list logo