Re: [R] Condition indexes and variance inflation factors

John Fox Wed, 23 Jul 2003 12:55:53 -0700

Dear Peter and Uwe,

I don't have a copy of Belsley's 1991 book here, but I do have Belsley, Kuh, and Welsch, Regression Diagnostics (Wiley, 1980). If my memory is right, the approach is the same: Belsley's collinearity diagnostics are based on a singular-value decomposition of the scaled but uncentred model matrix. A straightforward, if inelegant, rendition is

belsley <- function(model){ X <- model.matrix(model) X <- scale(X, center=FALSE)/sqrt(nrow(X) - 1) svd.X <- svd(X) result <- list(singular.values = svd.X$d, condition.indices = max(svd.X$d)/svd.X$d) phi <- sweep(svd.X$v^2, 2, svd.X$d^2, "/") Pi <- t(sweep(phi, 1, rowSums(phi), "/")) colnames(Pi) <- names(coef(model)) rownames(Pi) <- 1:nrow(Pi) result$pi <- Pi class(result) <- "belsley" result }

print.belsley <- function(x, digits = 3, ...){
    cat("\nSingular values: ", x$singular.values)
    cat("\nCondition indices: ", x$condition.indices)
    cat("\n\nVariance-decomposition proportions\n")
    print(round(x$pi, digits))
    invisible(x)
    }

This gives the singular values, condition indices, and variance-decomposition proportions. (I'm pretty sure that you can get the same thing more elegantly from the qr decomposition, but I don't know how off the top of my head -- someone else on the list doubtless can supply the details.)

For example, for the illustration on p. 161 of BKW,

> X
   V1  V2  V3     V4     V5
1 -74  80  18    -56   -112
2  14 -69  21     52    104
3  66 -72  -5    764   1528
4 -12  66 -30   4096   8192
5   3   8  -7 -13276 -26552
6   4 -12   4   8421  16842
> mod <- lm(y ~ X - 1)  # nb., y was just randomly generated
> belsley(mod)

Singular values:  1.414214 1.361734 1.066707 0.08840437 3.614479e-17
Condition indices:  1 1.038538 1.325775 15.9971 3.912635e+16

Variance-decomposition proportions
    XV1   XV2   XV3 XV4 XV5
1 0.000 0.000 0.000   0   0
2 0.005 0.005 0.000   0   0
3 0.001 0.001 0.047   0   0
4 0.994 0.994 0.953   0   0
5 0.000 0.000 0.000   1   1

which is in good agreement with the values given in the text.

Now some comments:

(1) I've never liked this approach for a model with a constant, where it makes more sense to me to centre the data. I realize that opinions differ here, but it seems to me that failing to centre the data conflates collinearity with numerical instability.

(2) I also disagree with the comment that condition indices are easier to interpret than variance-inflation factors. In either case, since collinearity is a continuous phenomenon, cutoffs for large values are necessarily arbitrary.

(3) If you're interested in figuring out which variables are involved in each collinear relationship, then (for centred and scaled data) you can equivalently (and to me, more intuitively) work with the principal-components analysis of the predictors.

(4) I have doubts about the whole enterprise. Collinearity is one source of imprecision -- others are small sample size, homogeneous predictors, and large error variance. Aren't the coefficient standard errors the bottom line? If these are sufficiently small, why worry?

I hope that this helps.

John

At 05:35 PM 7/23/2003 +0200, Uwe Ligges wrote:

Peter Flom wrote:

Has anyone programmed condition indexes in R?
I know that there is a function for variance inflation factors
available in the car package; however, Belsley (1991) Conditioning
Diagnostics (Wiley) notes that there are several weaknesses of VIFs:
e.g. 1) High VIFs are sufficient but not necessary conditions for
collinearity  2) VIFs don't diagnose the number of collinearities and 3)
No one has determined how high a VIF has to be for the collinearity to
be damaging.
He then develops and suggests using condition indexes instead, so I was
wondering if anyone had programmed them.
Thanks
Peter

I think Juergen Gross has something like that in his new book Gross, J. (2003): Linear Regression, Springer (in press - OK, not very helpful here).

You might want to contact him privately (in CC).

Uwe Ligges


-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: [EMAIL PROTECTED]
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Condition indexes and variance inflation factors

Reply via email to