Re: [R] Strange variable names in factor regression

2024-05-09 Thread Michael Dewey

Comment in in-line below

On 09/05/2024 13:09, Naresh Gurbuxani wrote:


On converting character variables to ordered factors, regression result
has strange names. Is it possible to obtain same variable names with
and without intercept?

Thanks,
Naresh

mydf <- data.frame(date = seq.Date(as.Date("2024-01-01"),
as.Date("2024-03-31"), by = 1))
mydf[, "wday"] <- weekdays(mydf$date, abbreviate = TRUE)
mydf.work <- subset(mydf, !(wday %in% c("Sat", "Sun")))
mydf.weekend <- subset(mydf, wday %in% c("Sat", "Sun"))
mydf.work[, "volume"] <- round(rnorm(nrow(mydf.work), mean = 20, sd =
5))
mydf.weekend[, "volume"] <- round(rnorm(nrow(mydf.weekend), mean = 10,
sd = 5))
mydf <- rbind(mydf.work, mydf.weekend)

reg <- lm(volume ~ wday, data = mydf)
## Variable names as expected
coef(reg)
(Intercept) wdayMon wdaySat wdaySun wdayThu wdayTue
21.3846154 1.3076923 -12.000 -12.9230769 -1.9230769 -0.6923077
wdayWed
-1.6153846

reg <- lm(volume ~ wday - 1, data = mydf)
# Variable names as expected
coef(reg)
wdayFri wdayMon wdaySat wdaySun wdayThu wdayTue wdayWed
21.384615 22.692308 9.384615 8.461538 19.461538 20.692308 19.769231

# Ordered factors for weekday sequence
mydf$wday <- factor(mydf$wday, levels = c("Mon", "Tue", "Wed", "Thu",
"Fri", "Sat", "Sun"), ordered = TRUE)

reg <- lm(volume ~ wday - 1, data = mydf)
# Variable names as expected
coef(reg)
wdayMon wdayTue wdayWed wdayThu wdayFri wdaySat wdaySun
22.692308 20.692308 19.769231 19.461538 21.384615 9.384615 8.461538

reg <- lm(volume ~ wday, data = mydf)
# Strange variable names
coef(reg)
(Intercept) wday.L wday.Q wday.C wday^4 wday^5
17.406593 -12.036715 -4.968654 -1.852819 3.291477 4.263642
wday^6
2.591317



Yes, that is what ordered is supposed to do, fit polynomial contrasts.
When you remove the intercept that breaks it so you get a different fit, 
in fact the same as you got when it was not ordered.


Michael



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Michael

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange variable names in factor regression

2024-05-09 Thread Duncan Murdoch

On 09/05/2024 8:09 a.m., Naresh Gurbuxani wrote:


On converting character variables to ordered factors, regression result
has strange names. Is it possible to obtain same variable names with
and without intercept?


You are getting polynomial contrasts with the ordered factor, because 
you have the default setting for options("contrasts"), i.e.


unordered   ordered
"contr.treatment"  "contr.poly"

If you run

  options(contrasts = c("contr.treatment", "contr.treatment"))

you will get the same coefficient names in both cases.

By the way, the coefficients have different meanings, so it makes sense 
they will have different names.  It's perhaps a little bit more of a 
problem that you *don't* get different variable names when an intercept 
is included or not, because those coefficients also have different meanings.


It may also be a little bit of a surprise that you go back to treatment 
contrasts when you leave out the intercept with the ordered factor, but 
then it almost never makes sense to leave out the intercept in a 
polynomial fit.


Duncan Murdoch




Thanks,
Naresh

mydf <- data.frame(date = seq.Date(as.Date("2024-01-01"),
as.Date("2024-03-31"), by = 1))
mydf[, "wday"] <- weekdays(mydf$date, abbreviate = TRUE)
mydf.work <- subset(mydf, !(wday %in% c("Sat", "Sun")))
mydf.weekend <- subset(mydf, wday %in% c("Sat", "Sun"))
mydf.work[, "volume"] <- round(rnorm(nrow(mydf.work), mean = 20, sd =
5))
mydf.weekend[, "volume"] <- round(rnorm(nrow(mydf.weekend), mean = 10,
sd = 5))
mydf <- rbind(mydf.work, mydf.weekend)

reg <- lm(volume ~ wday, data = mydf)
## Variable names as expected
coef(reg)
(Intercept) wdayMon wdaySat wdaySun wdayThu wdayTue
21.3846154 1.3076923 -12.000 -12.9230769 -1.9230769 -0.6923077
wdayWed
-1.6153846

reg <- lm(volume ~ wday - 1, data = mydf)
# Variable names as expected
coef(reg)
wdayFri wdayMon wdaySat wdaySun wdayThu wdayTue wdayWed
21.384615 22.692308 9.384615 8.461538 19.461538 20.692308 19.769231

# Ordered factors for weekday sequence
mydf$wday <- factor(mydf$wday, levels = c("Mon", "Tue", "Wed", "Thu",
"Fri", "Sat", "Sun"), ordered = TRUE)

reg <- lm(volume ~ wday - 1, data = mydf)
# Variable names as expected
coef(reg)
wdayMon wdayTue wdayWed wdayThu wdayFri wdaySat wdaySun
22.692308 20.692308 19.769231 19.461538 21.384615 9.384615 8.461538

reg <- lm(volume ~ wday, data = mydf)
# Strange variable names
coef(reg)
(Intercept) wday.L wday.Q wday.C wday^4 wday^5
17.406593 -12.036715 -4.968654 -1.852819 3.291477 4.263642
wday^6
2.591317

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Strange variable names in factor regression

2024-05-09 Thread Naresh Gurbuxani



On converting character variables to ordered factors, regression result
has strange names. Is it possible to obtain same variable names with
and without intercept?

Thanks,
Naresh

mydf <- data.frame(date = seq.Date(as.Date("2024-01-01"),
as.Date("2024-03-31"), by = 1))
mydf[, "wday"] <- weekdays(mydf$date, abbreviate = TRUE)
mydf.work <- subset(mydf, !(wday %in% c("Sat", "Sun")))
mydf.weekend <- subset(mydf, wday %in% c("Sat", "Sun"))
mydf.work[, "volume"] <- round(rnorm(nrow(mydf.work), mean = 20, sd =
5))
mydf.weekend[, "volume"] <- round(rnorm(nrow(mydf.weekend), mean = 10,
sd = 5))
mydf <- rbind(mydf.work, mydf.weekend)

reg <- lm(volume ~ wday, data = mydf)
## Variable names as expected
coef(reg)
(Intercept) wdayMon wdaySat wdaySun wdayThu wdayTue
21.3846154 1.3076923 -12.000 -12.9230769 -1.9230769 -0.6923077
wdayWed
-1.6153846

reg <- lm(volume ~ wday - 1, data = mydf)
# Variable names as expected
coef(reg)
wdayFri wdayMon wdaySat wdaySun wdayThu wdayTue wdayWed
21.384615 22.692308 9.384615 8.461538 19.461538 20.692308 19.769231

# Ordered factors for weekday sequence
mydf$wday <- factor(mydf$wday, levels = c("Mon", "Tue", "Wed", "Thu",
"Fri", "Sat", "Sun"), ordered = TRUE)

reg <- lm(volume ~ wday - 1, data = mydf)
# Variable names as expected
coef(reg)
wdayMon wdayTue wdayWed wdayThu wdayFri wdaySat wdaySun
22.692308 20.692308 19.769231 19.461538 21.384615 9.384615 8.461538

reg <- lm(volume ~ wday, data = mydf)
# Strange variable names
coef(reg)
(Intercept) wday.L wday.Q wday.C wday^4 wday^5
17.406593 -12.036715 -4.968654 -1.852819 3.291477 4.263642
wday^6
2.591317

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.