I don't have a copy of Belsley's 1991 book here, but I do have Belsley, Kuh, and Welsch, Regression Diagnostics (Wiley, 1980). If my memory is right, the approach is the same: Belsley's collinearity diagnostics are based on a singular-value decomposition of the scaled but uncentred model matrix. A straightforward, if inelegant, rendition is
belsley <- function(model){
X <- model.matrix(model)
X <- scale(X, center=FALSE)/sqrt(nrow(X) - 1)
svd.X <- svd(X)
result <- list(singular.values = svd.X$d, condition.indices = max(svd.X$d)/svd.X$d)
phi <- sweep(svd.X$v^2, 2, svd.X$d^2, "/")
Pi <- t(sweep(phi, 1, rowSums(phi), "/"))
colnames(Pi) <- names(coef(model))
rownames(Pi) <- 1:nrow(Pi)
result$pi <- Pi
class(result) <- "belsley"
result
}
print.belsley <- function(x, digits = 3, ...){
cat("\nSingular values: ", x$singular.values)
cat("\nCondition indices: ", x$condition.indices)
cat("\n\nVariance-decomposition proportions\n")
print(round(x$pi, digits))
invisible(x)
}This gives the singular values, condition indices, and variance-decomposition proportions. (I'm pretty sure that you can get the same thing more elegantly from the qr decomposition, but I don't know how off the top of my head -- someone else on the list doubtless can supply the details.)
For example, for the illustration on p. 161 of BKW,
> X V1 V2 V3 V4 V5 1 -74 80 18 -56 -112 2 14 -69 21 52 104 3 66 -72 -5 764 1528 4 -12 66 -30 4096 8192 5 3 8 -7 -13276 -26552 6 4 -12 4 8421 16842 > mod <- lm(y ~ X - 1) # nb., y was just randomly generated > belsley(mod)
Singular values: 1.414214 1.361734 1.066707 0.08840437 3.614479e-17 Condition indices: 1 1.038538 1.325775 15.9971 3.912635e+16
Variance-decomposition proportions
XV1 XV2 XV3 XV4 XV5
1 0.000 0.000 0.000 0 0
2 0.005 0.005 0.000 0 0
3 0.001 0.001 0.047 0 0
4 0.994 0.994 0.953 0 0
5 0.000 0.000 0.000 1 1which is in good agreement with the values given in the text.
Now some comments:
(1) I've never liked this approach for a model with a constant, where it makes more sense to me to centre the data. I realize that opinions differ here, but it seems to me that failing to centre the data conflates collinearity with numerical instability.
(2) I also disagree with the comment that condition indices are easier to interpret than variance-inflation factors. In either case, since collinearity is a continuous phenomenon, cutoffs for large values are necessarily arbitrary.
(3) If you're interested in figuring out which variables are involved in each collinear relationship, then (for centred and scaled data) you can equivalently (and to me, more intuitively) work with the principal-components analysis of the predictors.
(4) I have doubts about the whole enterprise. Collinearity is one source of imprecision -- others are small sample size, homogeneous predictors, and large error variance. Aren't the coefficient standard errors the bottom line? If these are sufficiently small, why worry?
I hope that this helps.
John
At 05:35 PM 7/23/2003 +0200, Uwe Ligges wrote:
Peter Flom wrote:
Has anyone programmed condition indexes in R? I know that there is a function for variance inflation factors available in the car package; however, Belsley (1991) Conditioning Diagnostics (Wiley) notes that there are several weaknesses of VIFs: e.g. 1) High VIFs are sufficient but not necessary conditions for collinearity 2) VIFs don't diagnose the number of collinearities and 3) No one has determined how high a VIF has to be for the collinearity to be damaging. He then develops and suggests using condition indexes instead, so I was wondering if anyone had programmed them. Thanks Peter
I think Juergen Gross has something like that in his new book
Gross, J. (2003): Linear Regression, Springer (in press - OK, not very helpful here).
You might want to contact him privately (in CC).
Uwe Ligges
----------------------------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 email: [EMAIL PROTECTED] phone: 905-525-9140x23604 web: www.socsci.mcmaster.ca/jfox
______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
