Dear Agustin & the Listers,

Noncentred PCA is an old and establishes method. It is rarely used, but still (methinks) it is used more often than it should be used. There is nothing wrong in having noncentred PCA in R, and it is a real PCA. Details will follow.

On 08/03/2009, at 11:07 AM, Agustin Lobo wrote:

I do not understand, from a PCA point of view, the option center=F
of prcomp()

According to the help page, the calculation in prcomp() "is done by a singular value decomposition of the (centered and possibly scaled) data matrix, not by using eigen on the covariance matrix" (as it's done by princomp()) .
"This is generally the preferred method for numerical accuracy"

The question is that while
prcomp(X,center=T,scaling=F) is equivalent to princomp(X,scaling=F),
but prcomp(X,center=F) has no equivalent in princomp()

princomp() does not have explicit noncentred analysis because its algorithm is based on the eigen analysis of covariance matrix using funciton cov.wt(). While function cov.wt() knows argument 'center' (which is the US spelling of centre), princomp() does not pass that argument to cov.wt(). However, if you supply a noncentred 'covmat' instead of 'x' (see ?princomp), princomp() will give you noncentred PCA.

We can only guess why the authors of princomp() did not allow noncentred analysis. However, I know why the author of rda() function in vegan did not explicitly allow noncentred analysis: he thought that it is a Patently Bad Idea(TM). Further, he suggests in vegan function that if you think you need noncentred analysis, you are able to edit vegan::rda.default() to do so -- if you cannot edit the function, then you probably are wrong in assuming you need noncentred analysis. It may be that the princomp authors think in the same way (but they also allow noncentred analyis iif you supply a noncentred 'covmat').

One of the sins of my youth was to publish a paper using noncentred PCA. I'm penitent. (However, I commited bigger sins, too.)

Also, the rotation made with either the eigenvectors of prcomp(X,center=T,scaling=F) or the ones of princomp(X,scaling=F)
yields PCs with a minimum correlation, as expected
for a PCA. But the rotation made with the eigenvectors of prcomp(X,center=F) yields axes that are correlated.
Therefore, prcomp(X,center=F) is not really a PCA.

PCA axes are *orthogonal*, but not necessarily uncorrelated (people often mistake these as synonyms). Centred orthogonal axes also are uncorrelated, but noncentred orthogonal are (or may be) correlated.

Your example code may be simplified a bit: prcomp returns the rotated matrix so that you do not need the rotation %*% data multiplication after analysis: you get that in the analysis. Below I use multivariate normal random numbers (MASS::mvrnorm) to generate correlated observations:

library(MASS)
x<- mvrnorm(100, mu = c(1, 3), Sigma=matrix(c(1, 0.6, 0.6, 1), nrow=2))
pc <- prcomp(x, cent=F)

Here the scores are orthogonal:

crossprod(pc$x)

or the off-diagonal elements are numerically zero, but they are correlated:

cor(pc$x)

The only requirement we have is orthogonality, and uncorrelatedness is a collateral damage in centred analysis.

Cheers, Jari Oksanen


See the following example, in which the second column of
data matrix X is linearly correlated to the first column:

> X <- cbind(rnorm(100,100,50),rnorm(100,100,50))
> X[,2] <- X[,1]*1.5-50 +runif(100,-70,70)
> plot(X)
> cor(X[,1],X[,2])
[1] 0.903597

> eigvnocent <- prcomp(X,center=F,scaling=F)[[1]]
> eigvcent <- prcomp(X,center=T,scaling=F)[[1]]
> eigvecnocent <- prcomp(X,center=F,scaling=F)[[2]]
> eigveccent <- prcomp(X,center=T,scaling=F)[[2]]

> PCnocent <- X%*%eigvecnocent
> PCcent <- X%*%eigveccent
> par(mfrow=c(2,2))
> plot(X)
> plot(PCnocent)
> plot(PCcent)

> cor(X[,1],X[,2])
[1] 0.903597
> cor(PCcent[,1],PCcent[,2])
[1] -8.778818e-16
> cor(PCnocent[,1],PCnocent[,2])
[1] -0.6908334
>

Also the help page of prcomp() states:
"Details

The calculation is done by a singular value decomposition of the (centered and possibly scaled) data matrix..."

The parenthesis implies some ambiguity, but I do interpret the sentence as indicating that the calculation should always be done using a centered data matrix. Finally, all the examples in the help page use centering (or scaling, which implies centering)

Therefore, why the option center=F ?

There really is a small conflict between docs and function: the function allows noncentred analysis, and it also performs such an analysis if asked. This is documented for the argument "center", but the option is later ignored in the passage you cite. Probably because the authors of the function think that this offered option should not be used. You may submit a report of this internal conflict in prcomp() documents to R maintainers.

Cheers, Jari Oksanen

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to