Ista Zahn wrote:
> Hi,
> On Fri, Jul 16, 2010 at 5:18 PM, CC <turtysm...@gmail.com> wrote:
>> I am sure this is a very basic question:
>>
>> I have 600,000 categorical variables in a data.frame - each of which is
>> classified as "0", "1", or "2"
>>
>> What I would like to do is collapse "1" and "2" and leave "0" by itself,
>> such that after re-categorizing "0" = "0"; "1" = "1" and "2" = "1" --- in
>> the end I only want "0" and "1" as categories for each of the variables.
> 
> Something like this should work
> 
> for (i in names(dat)) {
> dat[, i]  <- factor(dat[, i], levels = c("0", "1", "2"), labels =
> c("0", "1", "1))
> }

Unfortunately, it won't:

> d <- 0:2
> factor(d, levels=c(0,1,1))
[1] 0    1    <NA>
Levels: 0 1 1
Warning message:
In `levels<-`(`*tmp*`, value = c("0", "1", "1")) :
  duplicated levels will not be allowed in factors anymore


This effect, I have been told, goes way back to design choices in S
(that you can have repeated level names) plus compatibility ever since.

It would make more sense if it behaved like

d <- factor(d); levels(d) <- c(0,1,1)

and maybe, some time in the future, it will. Meanwhile, the above is the
workaround.

(BTW, if there are 600000 variables, you probably don't want to iterate
over their names, more likely "for(i in seq_along(dat))...")

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to