Re: [R] Strange variable names in factor regression
Comment in in-line below On 09/05/2024 13:09, Naresh Gurbuxani wrote: On converting character variables to ordered factors, regression result has strange names. Is it possible to obtain same variable names with and without intercept? Thanks, Naresh mydf <- data.frame(date = seq.Date(as.Date("2024-01-01"), as.Date("2024-03-31"), by = 1)) mydf[, "wday"] <- weekdays(mydf$date, abbreviate = TRUE) mydf.work <- subset(mydf, !(wday %in% c("Sat", "Sun"))) mydf.weekend <- subset(mydf, wday %in% c("Sat", "Sun")) mydf.work[, "volume"] <- round(rnorm(nrow(mydf.work), mean = 20, sd = 5)) mydf.weekend[, "volume"] <- round(rnorm(nrow(mydf.weekend), mean = 10, sd = 5)) mydf <- rbind(mydf.work, mydf.weekend) reg <- lm(volume ~ wday, data = mydf) ## Variable names as expected coef(reg) (Intercept) wdayMon wdaySat wdaySun wdayThu wdayTue 21.3846154 1.3076923 -12.000 -12.9230769 -1.9230769 -0.6923077 wdayWed -1.6153846 reg <- lm(volume ~ wday - 1, data = mydf) # Variable names as expected coef(reg) wdayFri wdayMon wdaySat wdaySun wdayThu wdayTue wdayWed 21.384615 22.692308 9.384615 8.461538 19.461538 20.692308 19.769231 # Ordered factors for weekday sequence mydf$wday <- factor(mydf$wday, levels = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"), ordered = TRUE) reg <- lm(volume ~ wday - 1, data = mydf) # Variable names as expected coef(reg) wdayMon wdayTue wdayWed wdayThu wdayFri wdaySat wdaySun 22.692308 20.692308 19.769231 19.461538 21.384615 9.384615 8.461538 reg <- lm(volume ~ wday, data = mydf) # Strange variable names coef(reg) (Intercept) wday.L wday.Q wday.C wday^4 wday^5 17.406593 -12.036715 -4.968654 -1.852819 3.291477 4.263642 wday^6 2.591317 Yes, that is what ordered is supposed to do, fit polynomial contrasts. When you remove the intercept that breaks it so you get a different fit, in fact the same as you got when it was not ordered. Michael __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange variable names in factor regression
On 09/05/2024 8:09 a.m., Naresh Gurbuxani wrote: On converting character variables to ordered factors, regression result has strange names. Is it possible to obtain same variable names with and without intercept? You are getting polynomial contrasts with the ordered factor, because you have the default setting for options("contrasts"), i.e. unordered ordered "contr.treatment" "contr.poly" If you run options(contrasts = c("contr.treatment", "contr.treatment")) you will get the same coefficient names in both cases. By the way, the coefficients have different meanings, so it makes sense they will have different names. It's perhaps a little bit more of a problem that you *don't* get different variable names when an intercept is included or not, because those coefficients also have different meanings. It may also be a little bit of a surprise that you go back to treatment contrasts when you leave out the intercept with the ordered factor, but then it almost never makes sense to leave out the intercept in a polynomial fit. Duncan Murdoch Thanks, Naresh mydf <- data.frame(date = seq.Date(as.Date("2024-01-01"), as.Date("2024-03-31"), by = 1)) mydf[, "wday"] <- weekdays(mydf$date, abbreviate = TRUE) mydf.work <- subset(mydf, !(wday %in% c("Sat", "Sun"))) mydf.weekend <- subset(mydf, wday %in% c("Sat", "Sun")) mydf.work[, "volume"] <- round(rnorm(nrow(mydf.work), mean = 20, sd = 5)) mydf.weekend[, "volume"] <- round(rnorm(nrow(mydf.weekend), mean = 10, sd = 5)) mydf <- rbind(mydf.work, mydf.weekend) reg <- lm(volume ~ wday, data = mydf) ## Variable names as expected coef(reg) (Intercept) wdayMon wdaySat wdaySun wdayThu wdayTue 21.3846154 1.3076923 -12.000 -12.9230769 -1.9230769 -0.6923077 wdayWed -1.6153846 reg <- lm(volume ~ wday - 1, data = mydf) # Variable names as expected coef(reg) wdayFri wdayMon wdaySat wdaySun wdayThu wdayTue wdayWed 21.384615 22.692308 9.384615 8.461538 19.461538 20.692308 19.769231 # Ordered factors for weekday sequence mydf$wday <- factor(mydf$wday, levels = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"), ordered = TRUE) reg <- lm(volume ~ wday - 1, data = mydf) # Variable names as expected coef(reg) wdayMon wdayTue wdayWed wdayThu wdayFri wdaySat wdaySun 22.692308 20.692308 19.769231 19.461538 21.384615 9.384615 8.461538 reg <- lm(volume ~ wday, data = mydf) # Strange variable names coef(reg) (Intercept) wday.L wday.Q wday.C wday^4 wday^5 17.406593 -12.036715 -4.968654 -1.852819 3.291477 4.263642 wday^6 2.591317 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Strange variable names in factor regression
On converting character variables to ordered factors, regression result has strange names. Is it possible to obtain same variable names with and without intercept? Thanks, Naresh mydf <- data.frame(date = seq.Date(as.Date("2024-01-01"), as.Date("2024-03-31"), by = 1)) mydf[, "wday"] <- weekdays(mydf$date, abbreviate = TRUE) mydf.work <- subset(mydf, !(wday %in% c("Sat", "Sun"))) mydf.weekend <- subset(mydf, wday %in% c("Sat", "Sun")) mydf.work[, "volume"] <- round(rnorm(nrow(mydf.work), mean = 20, sd = 5)) mydf.weekend[, "volume"] <- round(rnorm(nrow(mydf.weekend), mean = 10, sd = 5)) mydf <- rbind(mydf.work, mydf.weekend) reg <- lm(volume ~ wday, data = mydf) ## Variable names as expected coef(reg) (Intercept) wdayMon wdaySat wdaySun wdayThu wdayTue 21.3846154 1.3076923 -12.000 -12.9230769 -1.9230769 -0.6923077 wdayWed -1.6153846 reg <- lm(volume ~ wday - 1, data = mydf) # Variable names as expected coef(reg) wdayFri wdayMon wdaySat wdaySun wdayThu wdayTue wdayWed 21.384615 22.692308 9.384615 8.461538 19.461538 20.692308 19.769231 # Ordered factors for weekday sequence mydf$wday <- factor(mydf$wday, levels = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"), ordered = TRUE) reg <- lm(volume ~ wday - 1, data = mydf) # Variable names as expected coef(reg) wdayMon wdayTue wdayWed wdayThu wdayFri wdaySat wdaySun 22.692308 20.692308 19.769231 19.461538 21.384615 9.384615 8.461538 reg <- lm(volume ~ wday, data = mydf) # Strange variable names coef(reg) (Intercept) wday.L wday.Q wday.C wday^4 wday^5 17.406593 -12.036715 -4.968654 -1.852819 3.291477 4.263642 wday^6 2.591317 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.