Re: [scikit-learn] 1. Re: unclear help file for sklearn.decomposition.pca

Andreas Mueller Mon, 16 Oct 2017 11:48:42 -0700


On 10/16/2017 02:27 PM, Ismael Lemhadri wrote:

@Andreas Muller:
My references do not assume centering, e.g.http://ufldl.stanford.edu/wiki/index.php/PCA
any reference?

It kinda does but is not very clear about it:

This data has already been pre-processed so that each of thefeatures\textstyle x_1and\textstyle x_2have about the same mean (zero)and variance.




Wikipedia is much clearer:

Consider a datamatrix<https://en.wikipedia.org/wiki/Matrix_%28mathematics%29>,*X*, withcolumn-wise zeroempirical mean<https://en.wikipedia.org/wiki/Empirical_mean>(the sample mean of eachcolumn has been shifted to zero), where each of the/n/rows represents adifferent repetition of the experiment, and each of the/p/columns givesa particular kind of feature (say, the results from a particular sensor).

https://en.wikipedia.org/wiki/Principal_component_analysis#Details

I'm a bit surprised to find that ESL says "The SVD of the centeredmatrix X is another way of expressing the principal components of thevariables in X",so they assume scaling? They don't really have a great treatment of PCA,though.

Bishop <http://www.springer.com/us/book/9780387310732> and Murphy<https://mitpress.mit.edu/books/machine-learning-0> are pretty clearthat they subtract the mean (or assume zero mean) but don't standardize.

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] 1. Re: unclear help file for sklearn.decomposition.pca

Reply via email to