This can done by setting a contrast function or matrix on a variable. Look in e.g. chapter 6 of MASS (the only comprehensive tutorial on coding factors in R, it seems).
On Tue, 12 Oct 2004, Peter Holck wrote: > I'm uncertain if this is perhaps a stupid question: > > I want to create "centered" dummy variables to use in a call to glm(), and > wondering if there's some slick method in R to do so. That is, rather than > have a factor, which results in a glm() fit returning coefficients > specifying either absence or presence of the factor, I'd like to fit a glm() > without intercept such that the estimated coefficients (standard errors) > represent the "average" value in my data set for that variable. Is that really what you want? An `average' person having linear predictor 0, or more precisely, the linear predictor have average zero over the dataset? What family of glm is this? > An example: a data set has Race specified with 4 levels. I can manually > specify 4 dummy variables for a no-intercept model with each variable rather > than having a value of zero or one, has a centered value based on its > frequency of occurrence in the data set. Thus if 30% of the records in the > data set have Race of Hispanic, I can define a variable HISP that has a > value of either -.3 or .7, resulting in my coefficient estimate for HISP > representing the effect of an "average" person in the database (and a > corresponding valid standard error). Nope. A person can only have one race, so the coefficient estimates can only represent jointly the effect of picking one of the possible races. I think what you are striving for is that the average of the term `race' be zero over the whole dataset. That's easy -- just compute the average and subtract it via an offset term. Once you have two or more factor predictors you will get aliasing your way. > One way to create these "centered dummy variables" from the original factor > is: > "B"=scale(RACE=="B",scale=F), > "W"=scale(RACE=="W",scale=F), > "H"=scale(RACE=="H",scale=F), > "OTHRACE"=scale(RACE=="OTHER",scale=F) > > However I wonder if there is some method in R to avoid having to manually > define a large number of these dummy variables for a more complicated > dataset. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
