On Apr 10, 2009, at 4:28 PM, Kyle Gorman wrote:
i have three positively-correlated predictors that i'd like to
include in a model. any traditional measure suggests that to include
them as is would introduce a good deal of collinearity. really,
these are a great candidate for either taking the sum of the three,
or for PCA, but hypothetically, let's say i wanted to use a
residualization trick for this three-way interaction.
(they are all on a 15 point scale and I predict they will all have
similar positive betas)
X1 will remain as is.
r.X2 = residuals(lm(X2 ~ X1))
r.X3 = residuals(lm(X3 ~ X1 + r.X2)
then:
outcome ~ X1 + r.X2 + r.X3
this is the solution i vaguely recall seeing in a textbook somewhere
under the name "partialization"
Hi Kyle,
- is this kosher?
Yes, it's kosher, even during Passover :-) Just keep in mind what the
outcome of your regression will be. The coefficient assigned to r.X3
is "that portion of the variability in your outcome that cannot be
expresssed as a linear combination of X1 and X2". Likewise (more
simply) for r.X2.
- should the form of r.X3 be the naive residuals(lm(X3 ~ X1 + X2)?
It won't make a a difference. r.X3 will be the same in either case
(modulo numerical error).
- should the form of r.X2 be the less-naive residuals(lm(X2 ~ X1 +
X3))?
That would be bad. If you did this and then used your original formula
outcome ~ X1 + r.X2 + r.X3
you would be in a more restricted subspace than for
outcome ~ X1 + X2 + X3
which you don't want. Imagine the extreme case where X2 == X3 always.
Then with your proposal, r.X2 and r.X3 would always both be 0.
Roger
--
Roger Levy Email: [email protected]
Assistant Professor Phone: 858-534-7219
Department of Linguistics Fax: 858-534-4789
UC San Diego Web: http://ling.ucsd.edu/~rlevy
_______________________________________________
R-lang mailing list
[email protected]
http://pidgin.ucsd.edu/mailman/listinfo/r-lang