Re: [R] prcomp(X,center=F) ??

Jari Oksanen Sun, 08 Mar 2009 08:06:48 -0700

Dear Agustin & the Listers,

Noncentred PCA is an old and establishes method. It is rarely used,but still (methinks) it is used more often than it should be used.There is nothing wrong in having noncentred PCA in R, and it is a realPCA. Details will follow.


On 08/03/2009, at 11:07 AM, Agustin Lobo wrote:

I do not understand, from a PCA point of view, the option center=F
of prcomp()
According to the help page, the calculation in prcomp() "is done bya singular value decomposition of the (centered and possibly scaled)data matrix, not by using eigen on the covariance matrix" (as it'sdone by princomp()) .
"This is generally the preferred method for numerical accuracy"

The question is that while
prcomp(X,center=T,scaling=F) is equivalent to princomp(X,scaling=F),
but prcomp(X,center=F) has no equivalent in princomp()

princomp() does not have explicit noncentred analysis because itsalgorithm is based on the eigen analysis of covariance matrix usingfunciton cov.wt(). While function cov.wt() knows argument'center' (which is the US spelling of centre), princomp() does notpass that argument to cov.wt(). However, if you supply a noncentred'covmat' instead of 'x' (see ?princomp), princomp() will give younoncentred PCA.

We can only guess why the authors of princomp() did not allownoncentred analysis. However, I know why the author of rda() functionin vegan did not explicitly allow noncentred analysis: he thought thatit is a Patently Bad Idea(TM). Further, he suggests in vegan functionthat if you think you need noncentred analysis, you are able to editvegan::rda.default() to do so -- if you cannot edit the function, thenyou probably are wrong in assuming you need noncentred analysis. Itmay be that the princomp authors think in the same way (but they alsoallow noncentred analyis iif you supply a noncentred 'covmat').

One of the sins of my youth was to publish a paper using noncentredPCA. I'm penitent. (However, I commited bigger sins, too.)

Also, the rotation made with either the eigenvectors ofprcomp(X,center=T,scaling=F) or the ones of princomp(X,scaling=F)
yields PCs with a minimum correlation, as expected
for a PCA. But the rotation made with the eigenvectors ofprcomp(X,center=F) yields axes that are correlated.
Therefore, prcomp(X,center=F) is not really a PCA.

PCA axes are *orthogonal*, but not necessarily uncorrelated (peopleoften mistake these as synonyms). Centred orthogonal axes also areuncorrelated, but noncentred orthogonal are (or may be) correlated.

Your example code may be simplified a bit: prcomp returns the rotatedmatrix so that you do not need the rotation %*% data multiplicationafter analysis: you get that in the analysis. Below I usemultivariate normal random numbers (MASS::mvrnorm) to generatecorrelated observations:


library(MASS)
x<- mvrnorm(100, mu = c(1, 3), Sigma=matrix(c(1, 0.6, 0.6, 1), nrow=2))
pc <- prcomp(x, cent=F)

Here the scores are orthogonal:

crossprod(pc$x)

or the off-diagonal elements are numerically zero, but they arecorrelated:


cor(pc$x)

The only requirement we have is orthogonality, and uncorrelatedness isa collateral damage in centred analysis.


Cheers, Jari Oksanen

See the following example, in which the second column of
data matrix X is linearly correlated to the first column:

> X <- cbind(rnorm(100,100,50),rnorm(100,100,50))
> X[,2] <- X[,1]*1.5-50 +runif(100,-70,70)
> plot(X)
> cor(X[,1],X[,2])
[1] 0.903597

> eigvnocent <- prcomp(X,center=F,scaling=F)[[1]]
> eigvcent <- prcomp(X,center=T,scaling=F)[[1]]
> eigvecnocent <- prcomp(X,center=F,scaling=F)[[2]]
> eigveccent <- prcomp(X,center=T,scaling=F)[[2]]

> PCnocent <- X%*%eigvecnocent
> PCcent <- X%*%eigveccent
> par(mfrow=c(2,2))
> plot(X)
> plot(PCnocent)
> plot(PCcent)

> cor(X[,1],X[,2])
[1] 0.903597
> cor(PCcent[,1],PCcent[,2])
[1] -8.778818e-16
> cor(PCnocent[,1],PCnocent[,2])
[1] -0.6908334
>

Also the help page of prcomp() states:
"Details
The calculation is done by a singular value decomposition of the(centered and possibly scaled) data matrix..."
The parenthesis implies some ambiguity, but I do interpret thesentence as indicating that the calculation should always be doneusing a centered data matrix.Finally, all the examples in the help page use centering (orscaling, which implies centering)
Therefore, why the option center=F ?

There really is a small conflict between docs and function: thefunction allows noncentred analysis, and it also performs such ananalysis if asked. This is documented for the argument "center", butthe option is later ignored in the passage you cite. Probably becausethe authors of the function think that this offered option should notbe used. You may submit a report of this internal conflict in prcomp()documents to R maintainers.


Cheers, Jari Oksanen

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] prcomp(X,center=F) ??

Reply via email to