> From: Peter Flom > > If variables are colinear, then looking at interactions among them > doesn't make much sense. High collinearity means that one variable is > nearly a linear combination of others. IOW, that variable is > not adding > much information. So, if you look at the interaction, you are ALMOST > looking at a quadratic (e.g., if the collinearity involves only 2 > variables, then one is very similar to the other, so X1*X2 is almost > X1*X1). The output will be confusing, to say the least. > > Worse, when you include collinear variables, the resulting equation is > highly sensitive to small (sometimes very small) changes in the data. > Belsley gives an example where changes in the third decimal > place result > in totally different equations. > > For details see Belsley's book titled something like "collinearity and > weak data in regression" (sorry, the book and my files are at the > office, but this should let you find it
I guess you're referring to: "Conditioning Diagnostics: Collinearity and Weak Data in Regression" (Wiley, 1992, rather pricey...). Hocking has a plot that shows the effect of collinearity in a paper from the early '80s (the "picket fence"). The plot is used on the cover of his latest linear model book, also published by Wiley, now in 2nd edition. [An exercise for R newbies: Try reproducing that plot in R, probably using the Scaterplot3D package.] Best, Andy > HTH > > Peter L. Flom, PhD > Assistant Director, Statistics and Data Analysis Core > Center for Drug Use and HIV Research > National Development and Research Institutes > 71 W. 23rd St > www.peterflom.com > New York, NY 10010 > (212) 845-4485 (voice) > (917) 438-0894 (fax) > > > >>> "Devshruti Pahuja" <[EMAIL PROTECTED]> 06/11/04 5:35 AM >>> > Hi > > I have a set of data with both quantitative and categorical > predictors. > After scaling of response variable, i looked for > multicollinearity (VIF > values) among the predictors and removed the predictors who > were hinding > some of the > other significant predictors. I'm curious to know whether the > predictors > (who are not significant) while doing simple 'lm' will be involved in > interactions. How do i take into > account interactions of those predictors whom i removed just on the > basis > of multicollinearity ? > > I'll appreciate if someone can throw some light on this > matter and how > to > use R to detect the interactions effectively . > > Thanks > > Regards > Dev > > > ------Final 'lm model'-------------------- > > > logmodelfull_minus_run_hr_walk_batting <- lm(log(salary) > ~ hit+rbi + > walk > > + obp + > strike.out+free.agent.eligible+free.agent.1991+arbitr.elgible.) > > > summary(logmodelfull_minus_run_hr_walk_batting) > > > > Call: > > lm(formula = log(salary) ~ hit + rbi + walk + obp + strike.out + > > free.agent.eligible + free.agent.1991 + arbitr.elgible.) > > > > Residuals: > > Min 1Q Median 3Q Max > > -2.41786 -0.28911 -0.02814 0.31890 1.49007 > > > > Coefficients: > > Estimate Std. Error t value Pr(>|t|) > > (Intercept) 5.340782 0.251218 21.260 < 2e-16 *** > > hit 0.004479 0.001158 3.867 0.000133 *** > > rbi 0.011102 0.002195 5.059 7.05e-07 *** > > walk 0.005421 0.002206 2.457 0.014533 * > > obp -1.385584 0.824105 -1.681 0.093653 . > > strike.out -0.005399 0.001438 -3.755 0.000205 *** > > free.agent.eligible1 1.611521 0.080657 19.980 < 2e-16 *** > > free.agent.19911 -0.301243 0.103481 -2.911 0.003848 ** > > arbitr.elgible.1 1.293059 0.086696 14.915 < 2e-16 *** > > --- > > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > > > Residual standard error: 0.5351 on 328 degrees of freedom > > Multiple R-Squared: 0.7981, Adjusted R-squared: 0.7932 > > F-statistic: 162.1 on 8 and 328 DF, p-value: < 2.2e-16 > > > > > -------------------------------------------------------------- > ------------ > -- > > ---------------------------------------------------- > > > > > > --------------with > > > interactions-------------------------------------------------- > -------------- > > --------------------------- > > > > > > > > summary(baseball.lgmodel_with_interactions_ALL_arbid) > > > > Call: > > lm(formula = log(salary) ~ hit + rbi + strike.out + > free.agent.eligible + > > free.agent.1991 + arbitr.elgible. + hit * free.agent.1991 + > > hit * arbitr.elgible. + hit * rbi + rbi * free.agent.eligible + > > rbi * arbitr.elgible. + rbi * arbitr.1991 + hit * strike.out + > > strike.out * free.agent.eligible + strike.out * > arbitr.elgible. + > > strike.out * run + strike.out * hr + hit * free.agent.eligible + > > free.agent.eligible * run + hit * free.agent.1991 + strike.out * > > free.agent.1991 + free.agent.1991 * batting + free.agent.1991 * > > obp + arbitr.elgible. * run + batting * double + obp * run + > > obp * hr + walk * stolen.base + hit * arbitr.1991 + > free.agent.eligible > > * > > double + arbitr.elgible. * double + strike.out * triple + > > triple * batting + triple * walk + triple * walk + hit * > > hr + rbi * hr + free.agent.eligible * hr + free.agent.1991 * > > hr + arbitr.elgible. * hr + hr * arbitr.1991 + hit * walk + > > free.agent.eligible * walk + walk * rbi + rbi * stolen.base + > > strike.out * stolen.base + stolen.base * batting + stolen.base * > > walk + stolen.base * rbi + stolen.base * walk + > arbitr.elgible. * > > error) > > > > Residuals: > > Min 1Q Median 3Q Max > > -2.29352 -0.28287 -0.03748 0.29790 1.31590 > > > > Coefficients: > > Estimate Std. Error t > value Pr(>|t|) > > (Intercept) 5.217e+00 3.467e-01 > 15.048 < 2e-16 > *** > > hit 6.927e-03 6.226e-03 > 1.112 0.266889 > > rbi 1.908e-02 1.150e-02 > 1.658 0.098350 > . > > strike.out -5.692e-03 4.586e-03 > -1.241 0.215517 > > free.agent.eligible1 1.287e+00 2.259e-01 > 5.699 3.05e-08 > *** > > free.agent.19911 3.828e-01 6.575e-01 > 0.582 0.560914 > > arbitr.elgible.1 1.038e+00 2.195e-01 > 4.726 3.63e-06 > *** > > arbitr.19911 -1.024e+00 4.392e-01 > -2.331 0.020443 > * > > run 4.932e-02 2.905e-02 > 1.698 0.090682 > . > > hr -1.093e-01 7.208e-02 > -1.516 0.130543 > > batting -1.814e-01 2.558e+00 > -0.071 0.943522 > > obp -1.375e+00 2.253e+00 > -0.610 0.542099 > > double -5.259e-02 4.489e-02 > -1.172 0.242349 > > walk 1.395e-02 9.757e-03 > 1.430 0.153808 > > stolen.base -1.685e-02 4.299e-02 > -0.392 0.695372 > > triple -1.367e-01 1.600e-01 > -0.854 0.393807 > > error -4.097e-03 6.879e-03 > -0.595 0.552007 > > hit:free.agent.19911 8.248e-04 4.611e-03 > 0.179 0.858174 > > hit:arbitr.elgible.1 4.873e-03 6.448e-03 > 0.756 0.450395 > > hit:rbi -1.382e-04 7.709e-05 > -1.792 0.074184 > . > > rbi:free.agent.eligible1 5.352e-03 9.555e-03 > 0.560 0.575855 > > rbi:arbitr.elgible.1 -3.384e-03 1.136e-02 > -0.298 0.766072 > > rbi:arbitr.19911 3.596e-02 2.179e-02 > 1.650 0.100046 > > hit:strike.out 5.480e-06 5.446e-05 > 0.101 0.919917 > > strike.out:free.agent.eligible1 -2.570e-03 4.282e-03 > -0.600 0.548890 > > strike.out:arbitr.elgible.1 -9.703e-04 5.234e-03 > -0.185 0.853068 > > strike.out:run 1.685e-04 1.246e-04 > 1.352 0.177345 > > strike.out:hr -3.088e-04 2.277e-04 > -1.356 0.176229 > > hit:free.agent.eligible1 -1.359e-03 6.224e-03 > -0.218 0.827363 > > free.agent.eligible1:run 1.248e-02 9.109e-03 > 1.370 0.171917 > > strike.out:free.agent.19911 -1.851e-02 5.974e-03 > -3.099 0.002140 > ** > > free.agent.19911:batting 7.076e-01 6.200e+00 > 0.114 0.909215 > > free.agent.19911:obp -1.421e+00 3.952e+00 > -0.360 0.719394 > > arbitr.elgible.1:run -8.541e-03 8.773e-03 > -0.974 0.331100 > > batting:double 2.346e-01 1.609e-01 > 1.458 0.145884 > > run:obp -1.825e-01 7.492e-02 > -2.436 0.015462 > * > > hr:obp 3.687e-01 2.116e-01 > 1.742 0.082608 > . > > walk:stolen.base -6.789e-05 1.557e-04 > -0.436 0.663083 > > hit:arbitr.19911 -5.835e-03 7.084e-03 > -0.824 0.410808 > > free.agent.eligible1:double -1.151e-02 1.663e-02 > -0.692 0.489334 > > arbitr.elgible.1:double 2.169e-03 1.938e-02 > 0.112 0.910985 > > strike.out:triple -8.106e-04 6.023e-04 > -1.346 0.179475 > > batting:triple 5.179e-01 5.599e-01 > 0.925 0.355841 > > walk:triple 8.755e-04 9.262e-04 > 0.945 0.345349 > > hit:hr -3.320e-04 2.626e-04 > -1.264 0.207180 > > rbi:hr 4.748e-04 3.015e-04 > 1.575 0.116414 > > free.agent.eligible1:hr 1.840e-02 2.313e-02 > 0.796 0.426972 > > free.agent.19911:hr 7.216e-02 1.889e-02 > 3.819 0.000165 > *** > > arbitr.elgible.1:hr 4.111e-02 2.803e-02 > 1.467 0.143564 > > arbitr.19911:hr -2.368e-02 4.647e-02 > -0.510 0.610723 > > hit:walk 3.173e-05 7.826e-05 > 0.405 0.685442 > > free.agent.eligible1:walk -5.423e-03 4.984e-03 > -1.088 0.277472 > > rbi:walk -7.569e-05 1.313e-04 > -0.577 0.564598 > > rbi:stolen.base 3.980e-05 1.605e-04 > 0.248 0.804409 > > strike.out:stolen.base -2.611e-04 1.615e-04 > -1.617 0.107004 > > batting:stolen.base 1.552e-01 1.434e-01 > 1.082 0.280020 > > arbitr.elgible.1:error 3.930e-03 1.390e-02 > 0.283 0.777495 > > --- > > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > > > Residual standard error: 0.4925 on 280 degrees of freedom > > Multiple R-Squared: 0.854, Adjusted R-squared: 0.8248 > > F-statistic: 29.24 on 56 and 280 DF, p-value: < 2.2e-16 > > > > ______________________________________________ > [EMAIL PROTECTED] mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > ______________________________________________ > [EMAIL PROTECTED] mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
