On Apr 10, 2009, at 4:28 PM, Kyle Gorman wrote:

i have three positively-correlated predictors that i'd like to include in a model. any traditional measure suggests that to include them as is would introduce a good deal of collinearity. really, these are a great candidate for either taking the sum of the three, or for PCA, but hypothetically, let's say i wanted to use a residualization trick for this three-way interaction.

(they are all on a 15 point scale and I predict they will all have similar positive betas)

X1 will remain as is.

r.X2 = residuals(lm(X2 ~ X1))
r.X3 = residuals(lm(X3 ~ X1 + r.X2)

then:

outcome ~ X1 + r.X2 + r.X3

this is the solution i vaguely recall seeing in a textbook somewhere under the name "partialization"

Hi Kyle,

- is this kosher?

Yes, it's kosher, even during Passover :-) Just keep in mind what the outcome of your regression will be. The coefficient assigned to r.X3 is "that portion of the variability in your outcome that cannot be expresssed as a linear combination of X1 and X2". Likewise (more simply) for r.X2.

- should the form of r.X3 be the naive residuals(lm(X3 ~ X1 + X2)?

It won't make a a difference. r.X3 will be the same in either case (modulo numerical error).

- should the form of r.X2 be the less-naive residuals(lm(X2 ~ X1 + X3))?

That would be bad.  If you did this and then used your original formula

  outcome ~ X1 + r.X2 + r.X3

you would be in a more restricted subspace than for

  outcome ~ X1 + X2 + X3

which you don't want. Imagine the extreme case where X2 == X3 always. Then with your proposal, r.X2 and r.X3 would always both be 0.

Roger


--

Roger Levy                      Email: [email protected]
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy








_______________________________________________
R-lang mailing list
[email protected]
http://pidgin.ucsd.edu/mailman/listinfo/r-lang

Reply via email to