I am trying to estimate a lm model with one continuous dependent variable
and 11 independent variables that are all categorical, some of which have
many categories (several dozens in some cases).
I am not interested in statistical inference to a larger population. The
objective of my model is to find a way to best predict my continuous
variable within the sample.
When I run the lm model I evidently get many regression coefficients that
are not significant. Is there some way to automatically combine levels of a
categorical variable together if the regression coefficients for the
individual levels are not significant?
My idea is to find some form of grouping of the different categories that
allows me to work with less levels while keeping or even improving the
quality of predictions.
[[alternative HTML version deleted]]
Remail@example.com mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.