On 10/16/2017 02:27 PM, Ismael Lemhadri wrote:
@Andreas Muller:
My references do not assume centering, e.g. http://ufldl.stanford.edu/wiki/index.php/PCA
any reference?

It kinda does but is not very clear about it:

This data has already been pre-processed so that each of the features\textstyle x_1and\textstyle x_2have about the same mean (zero) and variance.



Wikipedia is much clearer:
Consider a datamatrix <https://en.wikipedia.org/wiki/Matrix_%28mathematics%29>,*X*, with column-wise zeroempirical mean <https://en.wikipedia.org/wiki/Empirical_mean>(the sample mean of each column has been shifted to zero), where each of the/n/rows represents a different repetition of the experiment, and each of the/p/columns gives a particular kind of feature (say, the results from a particular sensor).
https://en.wikipedia.org/wiki/Principal_component_analysis#Details

I'm a bit surprised to find that ESL says "The SVD of the centered matrix X is another way of expressing the principal components of the variables in X", so they assume scaling? They don't really have a great treatment of PCA, though.

Bishop <http://www.springer.com/us/book/9780387310732> and Murphy <https://mitpress.mit.edu/books/machine-learning-0> are pretty clear that they subtract the mean (or assume zero mean) but don't standardize.
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to