Re: [R] how to collapse categories or re-categorize variables?

2010-07-20 Thread CC
Thank you very much to all of you for the responses!

Phil, the following two examples worked well in re-categorizing.  Is there
any way I can retain my original data.frame format?  Once I re-categorize
the data, the format becomes numeric and the original column names are not
retained.  I probably should have mentioned this earlier - I plan on using
the re-categorized data for coxph using the surv() function.

Here's how I am applying the re-categorization to my data:

*

library(survival)
library(car)

gg - read.table(k.csv, header=TRUE, sep = ,)
col = dim(genot)[2]

for(i in 1:col) {
aa- recode(gg[,i], c(1,2)='1')
}

for(i in 1:col) {
dd-factor(gg[,i],levels=0:2,labels=c('0','1','1'))
}


Thanks,
CC



On Sat, Jul 17, 2010 at 2:15 PM, Phil Spector spec...@stat.berkeley.eduwrote:

 Please look at Peter Dalgaard's response a little more
 carefully.  There's a big difference between the levels=
 argument (which must be unique) and the labels= argument (which need not
 be).  Here are two ways
 to do what you want:

  d = 0:2
 factor(d,levels=0:2,labels=c('0','1','1'))

 [1] 0 1 1

 library(car)
 recode(d,c(1,2)='1')

 [1] 0 1 1


- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu




-- 
Thanks,
CC

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to collapse categories or re-categorize variables?

2010-07-19 Thread Henric Winell

On 2010-07-17 23:03, Peter Dalgaard wrote:

Ista Zahn wrote:

Hi,
On Fri, Jul 16, 2010 at 5:18 PM, CC turtysm...@gmail.com wrote:

I am sure this is a very basic question:

I have 600,000 categorical variables in a data.frame - each of which is
classified as 0, 1, or 2

What I would like to do is collapse 1 and 2 and leave 0 by itself,
such that after re-categorizing 0 = 0; 1 = 1 and 2 = 1 --- in
the end I only want 0 and 1 as categories for each of the variables.

Something like this should work

for (i in names(dat)) {
dat[, i]  - factor(dat[, i], levels = c(0, 1, 2), labels =
c(0, 1, 1))
}


Unfortunately, it won't:


d - 0:2
factor(d, levels=c(0,1,1))

[1] 01NA
Levels: 0 1 1
Warning message:
In `levels-`(`*tmp*`, value = c(0, 1, 1)) :
  duplicated levels will not be allowed in factors anymore


This effect, I have been told, goes way back to design choices in S
(that you can have repeated level names) plus compatibility ever since.

It would make more sense if it behaved like

d - factor(d); levels(d) - c(0,1,1)

and maybe, some time in the future, it will. Meanwhile, the above is the
workaround.

(BTW, if there are 60 variables, you probably don't want to iterate
over their names, more likely for(i in seq_along(dat))...)


You could also use 'lapply' with 'levels-':

 ### Example data
 set.seed(1)
 d - 0:2
 DF - data.frame(X1 = factor(sample(d, size = 10, replace = TRUE)),
+  X2 = factor(sample(d, size = 10, replace = TRUE)))
 DF
   X1 X2
1   0  0
2   1  0
3   1  2
4   2  1
5   0  2
6   2  1
7   2  2
8   1  2
9   1  1
10  0  2

 ### Reorder levels and replace
 DF[] - lapply(DF, function(x) levels-(x, c(0, 1, 1)))
 DF
   X1 X2
1   0  0
2   1  0
3   1  1
4   1  1
5   0  1
6   1  1
7   1  1
8   1  1
9   1  1
10  0  1


HTH,
Henric

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to collapse categories or re-categorize variables?

2010-07-17 Thread Peter Dalgaard
Ista Zahn wrote:
 Hi,
 On Fri, Jul 16, 2010 at 5:18 PM, CC turtysm...@gmail.com wrote:
 I am sure this is a very basic question:

 I have 600,000 categorical variables in a data.frame - each of which is
 classified as 0, 1, or 2

 What I would like to do is collapse 1 and 2 and leave 0 by itself,
 such that after re-categorizing 0 = 0; 1 = 1 and 2 = 1 --- in
 the end I only want 0 and 1 as categories for each of the variables.
 
 Something like this should work
 
 for (i in names(dat)) {
 dat[, i]  - factor(dat[, i], levels = c(0, 1, 2), labels =
 c(0, 1, 1))
 }

Unfortunately, it won't:

 d - 0:2
 factor(d, levels=c(0,1,1))
[1] 01NA
Levels: 0 1 1
Warning message:
In `levels-`(`*tmp*`, value = c(0, 1, 1)) :
  duplicated levels will not be allowed in factors anymore


This effect, I have been told, goes way back to design choices in S
(that you can have repeated level names) plus compatibility ever since.

It would make more sense if it behaved like

d - factor(d); levels(d) - c(0,1,1)

and maybe, some time in the future, it will. Meanwhile, the above is the
workaround.

(BTW, if there are 60 variables, you probably don't want to iterate
over their names, more likely for(i in seq_along(dat))...)

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to collapse categories or re-categorize variables?

2010-07-17 Thread Phil Spector

Please look at Peter Dalgaard's response a little more
carefully.  There's a big difference between the levels=
argument (which must be unique) and the labels= argument 
(which need not be).  Here are two ways

to do what you want:


d = 0:2
factor(d,levels=0:2,labels=c('0','1','1'))

[1] 0 1 1

library(car)
recode(d,c(1,2)='1')

[1] 0 1 1


- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Sat, 17 Jul 2010, Peter Dalgaard wrote:


Ista Zahn wrote:

Hi,
On Fri, Jul 16, 2010 at 5:18 PM, CC turtysm...@gmail.com wrote:

I am sure this is a very basic question:

I have 600,000 categorical variables in a data.frame - each of which is
classified as 0, 1, or 2

What I would like to do is collapse 1 and 2 and leave 0 by itself,
such that after re-categorizing 0 = 0; 1 = 1 and 2 = 1 --- in
the end I only want 0 and 1 as categories for each of the variables.


Something like this should work

for (i in names(dat)) {
dat[, i]  - factor(dat[, i], levels = c(0, 1, 2), labels =
c(0, 1, 1))
}


Unfortunately, it won't:


d - 0:2
factor(d, levels=c(0,1,1))

[1] 01NA
Levels: 0 1 1
Warning message:
In `levels-`(`*tmp*`, value = c(0, 1, 1)) :
 duplicated levels will not be allowed in factors anymore


This effect, I have been told, goes way back to design choices in S
(that you can have repeated level names) plus compatibility ever since.

It would make more sense if it behaved like

d - factor(d); levels(d) - c(0,1,1)

and maybe, some time in the future, it will. Meanwhile, the above is the
workaround.

(BTW, if there are 60 variables, you probably don't want to iterate
over their names, more likely for(i in seq_along(dat))...)

--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to collapse categories or re-categorize variables?

2010-07-17 Thread Ista Zahn
On Sat, Jul 17, 2010 at 9:03 PM, Peter Dalgaard pda...@gmail.com wrote:
 Ista Zahn wrote:
 Hi,
 On Fri, Jul 16, 2010 at 5:18 PM, CC turtysm...@gmail.com wrote:
 I am sure this is a very basic question:

 I have 600,000 categorical variables in a data.frame - each of which is
 classified as 0, 1, or 2

 What I would like to do is collapse 1 and 2 and leave 0 by itself,
 such that after re-categorizing 0 = 0; 1 = 1 and 2 = 1 --- in
 the end I only want 0 and 1 as categories for each of the variables.

 Something like this should work

 for (i in names(dat)) {
 dat[, i]  - factor(dat[, i], levels = c(0, 1, 2), labels =
 c(0, 1, 1))
 }

 Unfortunately, it won't:

 d - 0:2
 factor(d, levels=c(0,1,1))
 [1] 0    1    NA
 Levels: 0 1 1
 Warning message:
 In `levels-`(`*tmp*`, value = c(0, 1, 1)) :
  duplicated levels will not be allowed in factors anymore


I stand corrected. Thank you Peter.


 This effect, I have been told, goes way back to design choices in S
 (that you can have repeated level names) plus compatibility ever since.

 It would make more sense if it behaved like

 d - factor(d); levels(d) - c(0,1,1)

 and maybe, some time in the future, it will. Meanwhile, the above is the
 workaround.

 (BTW, if there are 60 variables, you probably don't want to iterate
 over their names, more likely for(i in seq_along(dat))...)

 --
 Peter Dalgaard
 Center for Statistics, Copenhagen Business School
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to collapse categories or re-categorize variables?

2010-07-16 Thread Wu Gong

Do you want to replace specific values of a data set?

df - sample(c(0,1,2),600,replace=T)
table(df)
df[df==2]-1
table(df)

-
A R learner.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-collapse-categories-or-re-categorize-variables-tp2291704p2291727.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to collapse categories or re-categorize variables?

2010-07-16 Thread Dennis Murphy
Hi:

See ? levels. Here's a toy example:

 x - factor(sample(0:2, 10, replace = TRUE))
 x
 [1] 1 2 1 0 2 2 2 2 2 1
Levels: 0 1 2

 levels(x) - c(0, 1, 1)# Change level 2 to 1
 x
 [1] 1 1 1 0 1 1 1 1 1 1
Levels: 0 1

HTH,
Dennis


On Fri, Jul 16, 2010 at 10:18 AM, CC turtysm...@gmail.com wrote:

 I am sure this is a very basic question:

 I have 600,000 categorical variables in a data.frame - each of which is
 classified as 0, 1, or 2

 What I would like to do is collapse 1 and 2 and leave 0 by itself,
 such that after re-categorizing 0 = 0; 1 = 1 and 2 = 1 --- in
 the end I only want 0 and 1 as categories for each of the variables.

 Also, if possible I would rather not create 600,000 new variables, if I can
 replace the existing variables with the new values that would be great!

 What would be the best way to do this?

 Thank you!


 --
 Thanks,
 CC

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to collapse categories or re-categorize variables?

2010-07-16 Thread Ista Zahn
Hi,
On Fri, Jul 16, 2010 at 5:18 PM, CC turtysm...@gmail.com wrote:
 I am sure this is a very basic question:

 I have 600,000 categorical variables in a data.frame - each of which is
 classified as 0, 1, or 2

 What I would like to do is collapse 1 and 2 and leave 0 by itself,
 such that after re-categorizing 0 = 0; 1 = 1 and 2 = 1 --- in
 the end I only want 0 and 1 as categories for each of the variables.

Something like this should work

for (i in names(dat)) {
dat[, i]  - factor(dat[, i], levels = c(0, 1, 2), labels =
c(0, 1, 1))
}

-Ista

 Also, if possible I would rather not create 600,000 new variables, if I can
 replace the existing variables with the new values that would be great!

 What would be the best way to do this?

 Thank you!


 --
 Thanks,
 CC

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.