Re: [R] factor with numeric names
Dear all, According to the post I was trying: factorA = c(2,2,3,3,4,4,3,4,2,2) levels(factor - c(lv1,lv2,lv3) ) But this returns NULL and doesn't change factor names. Actually, my factor is included in a data.frame, so I also tried: levels(df$factorA)[levels(df$factorA)==2] - lv1 Also levels(df$factorA)[levels(df$factorA)==2] - lv1 levels(df$factorA)[levels(df$factorA)==3] - lv2 levels(df$factorA)[levels(df$factorA)==4] - lv3 then I type table(df$factorA) and it doesn't work either. Any help would be appreciated. Thanks, u...@host.com -- View this message in context: http://r.789695.n4.nabble.com/factor-with-numeric-names-tp882535p3404942.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] factor with numeric names
Dear all, According to the post I was trying: factorA = c(2,2,3,3,4,4,3,4,2,2) levels(factorA - c(lv1,lv2,lv3) ) But this returns NULL and doesn't change factor names. Actually, my factor is included in a data.frame, so I also tried: levels(df$factorA)[levels(df$factorA)==2] - lv1 Also levels(df$factorA)[levels(df$factorA)==2] - lv1 levels(df$factorA)[levels(df$factorA)==3] - lv2 levels(df$factorA)[levels(df$factorA)==4] - lv3 then I type table(df$factorA) and it doesn't work either. After attach(df), I tried also this for(i in 1:10) { if(factor(factorA)[i]==2) factorA[i]=lv2 else {; if(factor(factorA)[i]==3) factorA[i]=lv3 else {; factorA[i]= lv4 Any help would be appreciated. Thanks, u...@host.com -- View this message in context: http://r.789695.n4.nabble.com/factor-with-numeric-names-tp882535p3405247.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] factor with numeric names
On Mar 25, 2011, at 8:30 AM, agent dunham wrote: Dear all, According to the post I was trying: factorA = c(2,2,3,3,4,4,3,4,2,2) levels(factorA - c(lv1,lv2,lv3) ) Well, this is wrong. Try: levels(factorA) - c(lv1,lv2,lv3) factorA [1] 2 2 3 3 4 4 3 4 2 2 attr(,levels) [1] lv1 lv2 lv3 But this returns NULL and doesn't change factor names. Actually, my factor is included in a data.frame, so I also tried: levels(df$factorA)[levels(df$factorA)==2] - lv1 Also levels(df$factorA)[levels(df$factorA)==2] - lv1 levels(df$factorA)[levels(df$factorA)==3] - lv2 levels(df$factorA)[levels(df$factorA)==4] - lv3 then I type table(df$factorA) and it doesn't work either. After attach(df), I tried also this for(i in 1:10) { if(factor(factorA)[i]==2) factorA[i]=lv2 else {; if(factor(factorA)[i]==3) factorA[i]=lv3 else {; factorA[i]= lv4 That looks painful. What book or resource are you using? -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] factor with numeric names
Thank you so much both for the answer. I think I have a better handle on this now. Yes, Loblolly$Seed is an ordered factor, but I didn't realize that the default for ordered factor is contr.poly. And then I was further confused because I didn't realize the coefficient names generated (not just the model) are different depending on whether there is an intercept term (even though they were both contr.poly). lm(formula = height ~ age + Seed, data = Loblolly) Call: lm(formula = height ~ age + Seed, data = Loblolly) Coefficients: (Intercept) age Seed.L Seed.Q Seed.C Seed^4 -1.31240 2.59052 4.86941 0.87307 0.37894 -0.46853 Seed^5 Seed^6 Seed^7 Seed^8 Seed^9 Seed^10 0.55237 0.39659 -0.06507 0.35074 -0.83442 0.42085 Seed^11 Seed^12 Seed^13 0.53906 -0.29803 -0.77254 lm(formula = height ~ age + Seed - 1, data = Loblolly) Call: lm(formula = height ~ age + Seed - 1, data = Loblolly) Coefficients: age Seed329 Seed327 Seed325 Seed307 Seed331 Seed311 Seed315 Seed321 2.5905 -3.3635 -3.0701 -1.7535 -2.3485 -2.6568 -2.0235 -1.3168 -2.4651 Seed319 Seed301 Seed323 Seed309 Seed303 Seed305 -0.7951 -0.4301 -0.1235 0.1049 0.4299 1.4382 This should have been obvious to me... (for the sake of completeness) I think factor() doesn't change the ordered-ness # as.factor(Loblolly$Seed) doesn't remove the ordered-ness str(Loblolly$Seed) Ord.factor w/ 14 levels 329327325..: 10 10 10 10 10 10 13 13 13 13 ... str(as.factor(Loblolly$Seed)) Ord.factor w/ 14 levels 329327325..: 10 10 10 10 10 10 13 13 13 13 ... # this works though str(factor(Loblolly$Seed, ordered=F)) Factor w/ 14 levels 329,327,325,..: 10 10 10 10 10 10 13 13 13 13 ... Saiwing On Mar 21, 2009, at 3:35 PM, John Fox wrote: Dear Saiwing Yeung, You appear to be using orthogonal-polynomial contrasts (generated by contr.poly) for Seed, which suggests that Seed is either an ordered factor or that you've assigned these contrasts to it. Because Seed has 14 levels, you end up fitting an degree-13 polynomial. If Seed is indeed an ordered factor and you want to use contr.treatment instead then you could, e.g., set Loblolly$Seed - as.factor(Loblolly$Seed). (If I'm right about Seed being an ordered factor, your solution worked because it changed Seed to a factor, not because it used non-numeric level names.) I hope this helps, John -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of Saiwing Yeung Sent: March-21-09 5:02 PM To: r-help@r-project.org Subject: [R] factor with numeric names Hi all, I have a pretty basic question about categorical variables but I can't seem to be able to find answer so I am hoping someone here can help. I found that if the factor names are all in numbers, fitting the model in lm would return labels that are not very recognizable. # Example: let's just assume that we want to fit this model fit - lm(height ~ age + Seed, data=Loblolly) # See the category names are all mangled up here fit Call: lm(formula = height ~ age + Seed, data = Loblolly) Coefficients: (Intercept) age Seed.L Seed.Q Seed.C Seed^4 -1.31240 2.59052 4.86941 0.87307 0.37894 -0.46853 Seed^5 Seed^6 Seed^7 Seed^8 Seed^9 Seed^10 0.55237 0.39659 -0.06507 0.35074 -0.83442 0.42085 Seed^11 Seed^12 Seed^13 0.53906 -0.29803 -0.77254 One possible solution I found is to rename the categorical variables seed.str - paste(S, Loblolly$Seed, sep=) seed.str - factor(seed.str) fit - lm(height ~ age + seed.str, data=Loblolly) fit Call: lm(formula = height ~ age + seed.str, data = Loblolly) Coefficients: (Intercept) age seed.strS303 seed.strS305 seed.strS307 -0.43012.59050.86001.8683 -1.9183 seed.strS309 seed.strS311 seed.strS315 seed.strS319 seed.strS321 0.5350 -1.5933 -0.8867 -0.3650 -2.0350 seed.strS323 seed.strS325 seed.strS327 seed.strS329 seed.strS331 0.3067 -1.3233 -2.6400 -2.9333 -2.2267 Now it is actually possible to see which one is which, but is kind of lame. Can someone point me to a more elegant solution? Thank you so much. Saiwing Yeung __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
[R] factor with numeric names
Hi all, I have a pretty basic question about categorical variables but I can't seem to be able to find answer so I am hoping someone here can help. I found that if the factor names are all in numbers, fitting the model in lm would return labels that are not very recognizable. # Example: let's just assume that we want to fit this model fit - lm(height ~ age + Seed, data=Loblolly) # See the category names are all mangled up here fit Call: lm(formula = height ~ age + Seed, data = Loblolly) Coefficients: (Intercept) age Seed.L Seed.Q Seed.C Seed^4 -1.31240 2.59052 4.86941 0.87307 0.37894 -0.46853 Seed^5 Seed^6 Seed^7 Seed^8 Seed^9 Seed^10 0.55237 0.39659 -0.06507 0.35074 -0.83442 0.42085 Seed^11 Seed^12 Seed^13 0.53906 -0.29803 -0.77254 One possible solution I found is to rename the categorical variables seed.str - paste(S, Loblolly$Seed, sep=) seed.str - factor(seed.str) fit - lm(height ~ age + seed.str, data=Loblolly) fit Call: lm(formula = height ~ age + seed.str, data = Loblolly) Coefficients: (Intercept) age seed.strS303 seed.strS305 seed.strS307 -0.43012.59050.86001.8683 -1.9183 seed.strS309 seed.strS311 seed.strS315 seed.strS319 seed.strS321 0.5350 -1.5933 -0.8867 -0.3650 -2.0350 seed.strS323 seed.strS325 seed.strS327 seed.strS329 seed.strS331 0.3067 -1.3233 -2.6400 -2.9333 -2.2267 Now it is actually possible to see which one is which, but is kind of lame. Can someone point me to a more elegant solution? Thank you so much. Saiwing Yeung __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] factor with numeric names
Dear Saiwing Yeung, You appear to be using orthogonal-polynomial contrasts (generated by contr.poly) for Seed, which suggests that Seed is either an ordered factor or that you've assigned these contrasts to it. Because Seed has 14 levels, you end up fitting an degree-13 polynomial. If Seed is indeed an ordered factor and you want to use contr.treatment instead then you could, e.g., set Loblolly$Seed - as.factor(Loblolly$Seed). (If I'm right about Seed being an ordered factor, your solution worked because it changed Seed to a factor, not because it used non-numeric level names.) I hope this helps, John -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Saiwing Yeung Sent: March-21-09 5:02 PM To: r-help@r-project.org Subject: [R] factor with numeric names Hi all, I have a pretty basic question about categorical variables but I can't seem to be able to find answer so I am hoping someone here can help. I found that if the factor names are all in numbers, fitting the model in lm would return labels that are not very recognizable. # Example: let's just assume that we want to fit this model fit - lm(height ~ age + Seed, data=Loblolly) # See the category names are all mangled up here fit Call: lm(formula = height ~ age + Seed, data = Loblolly) Coefficients: (Intercept) age Seed.L Seed.Q Seed.C Seed^4 -1.31240 2.59052 4.86941 0.87307 0.37894 -0.46853 Seed^5 Seed^6 Seed^7 Seed^8 Seed^9 Seed^10 0.55237 0.39659 -0.06507 0.35074 -0.83442 0.42085 Seed^11 Seed^12 Seed^13 0.53906 -0.29803 -0.77254 One possible solution I found is to rename the categorical variables seed.str - paste(S, Loblolly$Seed, sep=) seed.str - factor(seed.str) fit - lm(height ~ age + seed.str, data=Loblolly) fit Call: lm(formula = height ~ age + seed.str, data = Loblolly) Coefficients: (Intercept) age seed.strS303 seed.strS305 seed.strS307 -0.43012.59050.86001.8683 -1.9183 seed.strS309 seed.strS311 seed.strS315 seed.strS319 seed.strS321 0.5350 -1.5933 -0.8867 -0.3650 -2.0350 seed.strS323 seed.strS325 seed.strS327 seed.strS329 seed.strS331 0.3067 -1.3233 -2.6400 -2.9333 -2.2267 Now it is actually possible to see which one is which, but is kind of lame. Can someone point me to a more elegant solution? Thank you so much. Saiwing Yeung __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] factor with numeric names
Hi Saiwing, If all you are asking is how to rename a factor vector, the easiest way would be to use: levels(Loblolly$Seed) - c( a vector of level names you would like to use for the factor - separated by commas) If you are asking how to make your output look better, I am not sure I have an idea (except for using summary(fit) - but I guess that is not what you mean) Best, Tal On Sat, Mar 21, 2009 at 11:02 PM, Saiwing Yeung saiw...@berkeley.eduwrote: Hi all, I have a pretty basic question about categorical variables but I can't seem to be able to find answer so I am hoping someone here can help. I found that if the factor names are all in numbers, fitting the model in lm would return labels that are not very recognizable. # Example: let's just assume that we want to fit this model fit - lm(height ~ age + Seed, data=Loblolly) # See the category names are all mangled up here fit Call: lm(formula = height ~ age + Seed, data = Loblolly) Coefficients: (Intercept) age Seed.L Seed.Q Seed.C Seed^4 -1.31240 2.59052 4.86941 0.87307 0.37894 -0.46853 Seed^5 Seed^6 Seed^7 Seed^8 Seed^9 Seed^10 0.55237 0.39659 -0.06507 0.35074 -0.83442 0.42085 Seed^11 Seed^12 Seed^13 0.53906 -0.29803 -0.77254 One possible solution I found is to rename the categorical variables seed.str - paste(S, Loblolly$Seed, sep=) seed.str - factor(seed.str) fit - lm(height ~ age + seed.str, data=Loblolly) fit Call: lm(formula = height ~ age + seed.str, data = Loblolly) Coefficients: (Intercept) age seed.strS303 seed.strS305 seed.strS307 -0.43012.59050.86001.8683 -1.9183 seed.strS309 seed.strS311 seed.strS315 seed.strS319 seed.strS321 0.5350 -1.5933 -0.8867 -0.3650 -2.0350 seed.strS323 seed.strS325 seed.strS327 seed.strS329 seed.strS331 0.3067 -1.3233 -2.6400 -2.9333 -2.2267 Now it is actually possible to see which one is which, but is kind of lame. Can someone point me to a more elegant solution? Thank you so much. Saiwing Yeung __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- -- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: http://www.r-statistics.com/ http://www.talgalili.com http://www.biostatistics.co.il [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.