On 10/16/2017 02:27 PM, Ismael Lemhadri wrote:
@Andreas Muller:
My references do not assume centering, e.g.
http://ufldl.stanford.edu/wiki/index.php/PCA
any reference?
It kinda does but is not very clear about it:
This data has already been pre-processed so that each of the
features\textstyle x_1and\textstyle x_2have about the same mean (zero)
and variance.
Wikipedia is much clearer:
Consider a datamatrix
<https://en.wikipedia.org/wiki/Matrix_%28mathematics%29>,*X*, with
column-wise zeroempirical mean
<https://en.wikipedia.org/wiki/Empirical_mean>(the sample mean of each
column has been shifted to zero), where each of the/n/rows represents a
different repetition of the experiment, and each of the/p/columns gives
a particular kind of feature (say, the results from a particular sensor).
https://en.wikipedia.org/wiki/Principal_component_analysis#Details
I'm a bit surprised to find that ESL says "The SVD of the centered
matrix X is another way of expressing the principal components of the
variables in X",
so they assume scaling? They don't really have a great treatment of PCA,
though.
Bishop <http://www.springer.com/us/book/9780387310732> and Murphy
<https://mitpress.mit.edu/books/machine-learning-0> are pretty clear
that they subtract the mean (or assume zero mean) but don't standardize.
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn