Hi, if you compute the principal components (i.e., eigendecomposition) from the covariance matrix, it shouldn't matter whether the data is centered or not, since the covariance matrix is computed as
CovMat = \fact{1}{n} \sum_{i=1}^{n} (x_n - \bar{x}) (x_n - \bar{x})^T where \bar{x} = vector of feature means So, if you center the data prior to computing the covariance matrix, \bar{x} is simply 0. Best, Sebastian > On Oct 16, 2017, at 2:27 PM, Ismael Lemhadri <lemha...@stanford.edu> wrote: > > @Andreas Muller: > My references do not assume centering, e.g. > http://ufldl.stanford.edu/wiki/index.php/PCA > <http://ufldl.stanford.edu/wiki/index.php/PCA> > any reference? > > > > On Mon, Oct 16, 2017 at 10:20 AM, <scikit-learn-requ...@python.org > <mailto:scikit-learn-requ...@python.org>> wrote: > Send scikit-learn mailing list submissions to > scikit-learn@python.org <mailto:scikit-learn@python.org> > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > <https://mail.python.org/mailman/listinfo/scikit-learn> > or, via email, send a message with subject or body 'help' to > scikit-learn-requ...@python.org > <mailto:scikit-learn-requ...@python.org> > > You can reach the person managing the list at > scikit-learn-ow...@python.org <mailto:scikit-learn-ow...@python.org> > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Re: unclear help file for sklearn.decomposition.pca > (Andreas Mueller) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 16 Oct 2017 13:19:57 -0400 > From: Andreas Mueller <t3k...@gmail.com <mailto:t3k...@gmail.com>> > To: scikit-learn@python.org <mailto:scikit-learn@python.org> > Subject: Re: [scikit-learn] unclear help file for > sklearn.decomposition.pca > Message-ID: <04fc445c-d8f3-a3a9-4ab2-0535826a2...@gmail.com > <mailto:04fc445c-d8f3-a3a9-4ab2-0535826a2...@gmail.com>> > Content-Type: text/plain; charset="utf-8"; Format="flowed" > > The definition of PCA has a centering step, but no scaling step. > > On 10/16/2017 11:16 AM, Ismael Lemhadri wrote: > > Dear Roman, > > My concern is actually not about not mentioning the scaling but about > > not mentioning the centering. > > That is, the sklearn PCA removes the mean but it does not mention it > > in the help file. > > This was quite messy for me to debug as I expected it to either: 1/ > > center and scale simultaneously or / not scale and not center either. > > It would be beneficial to explicit the behavior in the help file in my > > opinion. > > Ismael > > > > On Mon, Oct 16, 2017 at 8:02 AM, <scikit-learn-requ...@python.org > > <mailto:scikit-learn-requ...@python.org> > > <mailto:scikit-learn-requ...@python.org > > <mailto:scikit-learn-requ...@python.org>>> wrote: > > > > Send scikit-learn mailing list submissions to > > scikit-learn@python.org <mailto:scikit-learn@python.org> > > <mailto:scikit-learn@python.org <mailto:scikit-learn@python.org>> > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/scikit-learn > > <https://mail.python.org/mailman/listinfo/scikit-learn> > > <https://mail.python.org/mailman/listinfo/scikit-learn > > <https://mail.python.org/mailman/listinfo/scikit-learn>> > > or, via email, send a message with subject or body 'help' to > > scikit-learn-requ...@python.org <mailto:scikit-learn-requ...@python.org> > > <mailto:scikit-learn-requ...@python.org > > <mailto:scikit-learn-requ...@python.org>> > > > > You can reach the person managing the list at > > scikit-learn-ow...@python.org <mailto:scikit-learn-ow...@python.org> > > <mailto:scikit-learn-ow...@python.org > > <mailto:scikit-learn-ow...@python.org>> > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of scikit-learn digest..." > > > > > > Today's Topics: > > > > ? ?1. unclear help file for sklearn.decomposition.pca (Ismael > > Lemhadri) > > ? ?2. Re: unclear help file for sklearn.decomposition.pca > > ? ? ? (Roman Yurchak) > > ? ?3. Question about LDA's coef_ attribute (Serafeim Loukas) > > ? ?4. Re: Question about LDA's coef_ attribute (Alexandre Gramfort) > > ? ?5. Re: Question about LDA's coef_ attribute (Serafeim Loukas) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Sun, 15 Oct 2017 18:42:56 -0700 > > From: Ismael Lemhadri <lemha...@stanford.edu > > <mailto:lemha...@stanford.edu> > > <mailto:lemha...@stanford.edu <mailto:lemha...@stanford.edu>>> > > To: scikit-learn@python.org <mailto:scikit-learn@python.org> > > <mailto:scikit-learn@python.org <mailto:scikit-learn@python.org>> > > Subject: [scikit-learn] unclear help file for > > ? ? ? ? sklearn.decomposition.pca > > Message-ID: > > ? ? ? ? > > <CANpSPFTgv+Oz7f97dandmrBBayqf_o9w=18okhcfn0u5dnz...@mail.gmail.com > > <mailto:18okhcfn0u5dnzj%...@mail.gmail.com> > > <mailto:18okhcfn0u5dnzj%...@mail.gmail.com > > <mailto:18okhcfn0u5dnzj%25...@mail.gmail.com>>> > > Content-Type: text/plain; charset="utf-8" > > > > Dear all, > > The help file for the PCA class is unclear about the preprocessing > > performed to the data. > > You can check on line 410 here: > > https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/ > > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/> > > decomposition/pca.py#L410 > > > > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/%0Adecomposition/pca.py#L410 > > > > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/%0Adecomposition/pca.py#L410>> > > that the matrix is centered but NOT scaled, before performing the > > singular > > value decomposition. > > However, the help files do not make any mention of it. > > This is unclear for someone who, like me, just wanted to compare > > that the > > PCA and np.linalg.svd give the same results. In academic settings, > > students > > are often asked to compare different methods and to check that > > they yield > > the same results. I expect that many students have confronted this > > problem > > before... > > Best, > > Ismael Lemhadri > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > <http://mail.python.org/pipermail/scikit-learn/attachments/20171015/c465bde7/attachment-0001.html > > > > <http://mail.python.org/pipermail/scikit-learn/attachments/20171015/c465bde7/attachment-0001.html> > > > > <http://mail.python.org/pipermail/scikit-learn/attachments/20171015/c465bde7/attachment-0001.html > > > > <http://mail.python.org/pipermail/scikit-learn/attachments/20171015/c465bde7/attachment-0001.html>>> > > > > ------------------------------ > > > > Message: 2 > > Date: Mon, 16 Oct 2017 15:16:45 +0200 > > From: Roman Yurchak <rth.yurc...@gmail.com > > <mailto:rth.yurc...@gmail.com> > > <mailto:rth.yurc...@gmail.com <mailto:rth.yurc...@gmail.com>>> > > To: Scikit-learn mailing list <scikit-learn@python.org > > <mailto:scikit-learn@python.org> > > <mailto:scikit-learn@python.org <mailto:scikit-learn@python.org>>> > > Subject: Re: [scikit-learn] unclear help file for > > ? ? ? ? sklearn.decomposition.pca > > Message-ID: <b2abdcfd-4736-929e-6304-b93832932...@gmail.com > > <mailto:b2abdcfd-4736-929e-6304-b93832932...@gmail.com> > > <mailto:b2abdcfd-4736-929e-6304-b93832932...@gmail.com > > <mailto:b2abdcfd-4736-929e-6304-b93832932...@gmail.com>>> > > Content-Type: text/plain; charset=utf-8; format=flowed > > > > Ismael, > > > > as far as I saw the sklearn.decomposition.PCA doesn't mention > > scaling at > > all (except for the whiten parameter which is post-transformation > > scaling). > > > > So since it doesn't mention it, it makes sense that it doesn't do any > > scaling of the input. Same as np.linalg.svd. > > > > You can verify that PCA and np.linalg.svd yield the same results, with > > > > ``` > > ?>>> import numpy as np > > ?>>> from sklearn.decomposition import PCA > > ?>>> import numpy.linalg > > ?>>> X = np.random.RandomState(42).rand(10, 4) > > ?>>> n_components = 2 > > ?>>> PCA(n_components, svd_solver='full').fit_transform(X) > > ``` > > > > and > > > > ``` > > ?>>> U, s, V = np.linalg.svd(X - X.mean(axis=0), full_matrices=False) > > ?>>> (X - X.mean(axis=0)).dot(V[:n_components].T) > > ``` > > > > -- > > Roman > > > > On 16/10/17 03:42, Ismael Lemhadri wrote: > > > Dear all, > > > The help file for the PCA class is unclear about the preprocessing > > > performed to the data. > > > You can check on line 410 here: > > > > > > > https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410 > > > > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410> > > > > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410 > > > > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410>> > > > > > > > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410 > > > > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410> > > > > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410 > > > > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410>>> > > > that the matrix is centered but NOT scaled, before performing the > > > singular value decomposition. > > > However, the help files do not make any mention of it. > > > This is unclear for someone who, like me, just wanted to compare > > that > > > the PCA and np.linalg.svd give the same results. In academic > > settings, > > > students are often asked to compare different methods and to > > check that > > > they yield the same results. I expect that many students have > > confronted > > > this problem before... > > > Best, > > > Ismael Lemhadri > > > > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn@python.org <mailto:scikit-learn@python.org> > > <mailto:scikit-learn@python.org <mailto:scikit-learn@python.org>> > > > https://mail.python.org/mailman/listinfo/scikit-learn > > <https://mail.python.org/mailman/listinfo/scikit-learn> > > <https://mail.python.org/mailman/listinfo/scikit-learn > > <https://mail.python.org/mailman/listinfo/scikit-learn>> > > > > > > > > > > > ------------------------------ > > > > Message: 3 > > Date: Mon, 16 Oct 2017 15:27:48 +0200 > > From: Serafeim Loukas <seral...@gmail.com <mailto:seral...@gmail.com> > > <mailto:seral...@gmail.com <mailto:seral...@gmail.com>>> > > To: scikit-learn@python.org <mailto:scikit-learn@python.org> > > <mailto:scikit-learn@python.org <mailto:scikit-learn@python.org>> > > Subject: [scikit-learn] Question about LDA's coef_ attribute > > Message-ID: <58c6d0da-9de5-4ef5-97c1-48159831f...@gmail.com > > <mailto:58c6d0da-9de5-4ef5-97c1-48159831f...@gmail.com> > > <mailto:58c6d0da-9de5-4ef5-97c1-48159831f...@gmail.com > > <mailto:58c6d0da-9de5-4ef5-97c1-48159831f...@gmail.com>>> > > Content-Type: text/plain; charset="us-ascii" > > > > Dear Scikit-learn community, > > > > Since the documentation of the LDA > > > > (http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html > > > > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html> > > > > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html > > > > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>> > > > > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html > > > > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html> > > > > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html > > > > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>>>) > > is not so clear, I would like to ask if the lda.coef_ attribute > > stores the eigenvectors from the SVD decomposition. > > > > Thank you in advance, > > Serafeim > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/4263df5c/attachment-0001.html > > > > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/4263df5c/attachment-0001.html> > > > > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/4263df5c/attachment-0001.html > > > > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/4263df5c/attachment-0001.html>>> > > > > ------------------------------ > > > > Message: 4 > > Date: Mon, 16 Oct 2017 16:57:52 +0200 > > From: Alexandre Gramfort <alexandre.gramf...@inria.fr > > <mailto:alexandre.gramf...@inria.fr> > > <mailto:alexandre.gramf...@inria.fr > > <mailto:alexandre.gramf...@inria.fr>>> > > To: Scikit-learn mailing list <scikit-learn@python.org > > <mailto:scikit-learn@python.org> > > <mailto:scikit-learn@python.org <mailto:scikit-learn@python.org>>> > > Subject: Re: [scikit-learn] Question about LDA's coef_ attribute > > Message-ID: > > ? ? ? ? > > <cadeotzricoqhuhjmmw2z14cqffeqyndyoxn-ogkavtmq7v0...@mail.gmail.com > > <mailto:cadeotzricoqhuhjmmw2z14cqffeqyndyoxn-ogkavtmq7v0...@mail.gmail.com> > > > > <mailto:cadeotzricoqhuhjmmw2z14cqffeqyndyoxn-ogkavtmq7v0...@mail.gmail.com > > <mailto:cadeotzricoqhuhjmmw2z14cqffeqyndyoxn-ogkavtmq7v0...@mail.gmail.com>>> > > Content-Type: text/plain; charset="UTF-8" > > > > no it stores the direction of the decision function to match the > > API of > > linear models. > > > > HTH > > Alex > > > > On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas > > <seral...@gmail.com <mailto:seral...@gmail.com> > > <mailto:seral...@gmail.com <mailto:seral...@gmail.com>>> wrote: > > > Dear Scikit-learn community, > > > > > > Since the documentation of the LDA > > > > > > > (http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html > > > > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html> > > > > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html > > > > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>>) > > > is not so clear, I would like to ask if the lda.coef_ attribute > > stores the > > > eigenvectors from the SVD decomposition. > > > > > > Thank you in advance, > > > Serafeim > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn@python.org <mailto:scikit-learn@python.org> > > <mailto:scikit-learn@python.org <mailto:scikit-learn@python.org>> > > > https://mail.python.org/mailman/listinfo/scikit-learn > > <https://mail.python.org/mailman/listinfo/scikit-learn> > > <https://mail.python.org/mailman/listinfo/scikit-learn > > <https://mail.python.org/mailman/listinfo/scikit-learn>> > > > > > > > > > ------------------------------ > > > > Message: 5 > > Date: Mon, 16 Oct 2017 17:02:46 +0200 > > From: Serafeim Loukas <seral...@gmail.com <mailto:seral...@gmail.com> > > <mailto:seral...@gmail.com <mailto:seral...@gmail.com>>> > > To: Scikit-learn mailing list <scikit-learn@python.org > > <mailto:scikit-learn@python.org> > > <mailto:scikit-learn@python.org <mailto:scikit-learn@python.org>>> > > Subject: Re: [scikit-learn] Question about LDA's coef_ attribute > > Message-ID: <413210d2-56ae-41a4-873f-d171bb365...@gmail.com > > <mailto:413210d2-56ae-41a4-873f-d171bb365...@gmail.com> > > <mailto:413210d2-56ae-41a4-873f-d171bb365...@gmail.com > > <mailto:413210d2-56ae-41a4-873f-d171bb365...@gmail.com>>> > > Content-Type: text/plain; charset="us-ascii" > > > > Dear Alex, > > > > Thank you for the prompt response. > > > > Are the eigenvectors stored in some variable ? > > Does the lda.scalings_ attribute contain the eigenvectors ? > > > > Best, > > Serafeim > > > > > On 16 Oct 2017, at 16:57, Alexandre Gramfort > > <alexandre.gramf...@inria.fr <mailto:alexandre.gramf...@inria.fr> > > <mailto:alexandre.gramf...@inria.fr <mailto:alexandre.gramf...@inria.fr>>> > > wrote: > > > > > > no it stores the direction of the decision function to match the > > API of > > > linear models. > > > > > > HTH > > > Alex > > > > > > On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas > > <seral...@gmail.com <mailto:seral...@gmail.com> > > <mailto:seral...@gmail.com <mailto:seral...@gmail.com>>> wrote: > > >> Dear Scikit-learn community, > > >> > > >> Since the documentation of the LDA > > >> > > > > (http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html > > > > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html> > > > > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html > > > > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>>) > > >> is not so clear, I would like to ask if the lda.coef_ attribute > > stores the > > >> eigenvectors from the SVD decomposition. > > >> > > >> Thank you in advance, > > >> Serafeim > > >> > > >> _______________________________________________ > > >> scikit-learn mailing list > > >> scikit-learn@python.org <mailto:scikit-learn@python.org> > > <mailto:scikit-learn@python.org <mailto:scikit-learn@python.org>> > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > <https://mail.python.org/mailman/listinfo/scikit-learn> > > <https://mail.python.org/mailman/listinfo/scikit-learn > > <https://mail.python.org/mailman/listinfo/scikit-learn>> > > >> > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn@python.org <mailto:scikit-learn@python.org> > > <mailto:scikit-learn@python.org <mailto:scikit-learn@python.org>> > > > https://mail.python.org/mailman/listinfo/scikit-learn > > <https://mail.python.org/mailman/listinfo/scikit-learn> > > <https://mail.python.org/mailman/listinfo/scikit-learn > > <https://mail.python.org/mailman/listinfo/scikit-learn>> > > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/505c7da3/attachment.html > > > > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/505c7da3/attachment.html> > > > > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/505c7da3/attachment.html > > > > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/505c7da3/attachment.html>>> > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org <mailto:scikit-learn@python.org> > > <mailto:scikit-learn@python.org <mailto:scikit-learn@python.org>> > > https://mail.python.org/mailman/listinfo/scikit-learn > > <https://mail.python.org/mailman/listinfo/scikit-learn> > > <https://mail.python.org/mailman/listinfo/scikit-learn > > <https://mail.python.org/mailman/listinfo/scikit-learn>> > > > > > > ------------------------------ > > > > End of scikit-learn Digest, Vol 19, Issue 25 > > ******************************************** > > > > > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org <mailto:scikit-learn@python.org> > > https://mail.python.org/mailman/listinfo/scikit-learn > > <https://mail.python.org/mailman/listinfo/scikit-learn> > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/f47e63a9/attachment.html > > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/f47e63a9/attachment.html>> > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org <mailto:scikit-learn@python.org> > https://mail.python.org/mailman/listinfo/scikit-learn > <https://mail.python.org/mailman/listinfo/scikit-learn> > > > ------------------------------ > > End of scikit-learn Digest, Vol 19, Issue 28 > ******************************************** > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn