Hi,
if you compute the principal components (i.e., eigendecomposition) from the
covariance matrix, it shouldn't matter whether the data is centered or not,
since the covariance matrix is computed as
CovMat = \fact{1}{n} \sum_{i=1}^{n} (x_n - \bar{x}) (x_n - \bar{x})^T
where \bar{x} = vector of feature means
So, if you center the data prior to computing the covariance matrix, \bar{x} is
simply 0.
Best,
Sebastian
> On Oct 16, 2017, at 2:27 PM, Ismael Lemhadri <[email protected]> wrote:
>
> @Andreas Muller:
> My references do not assume centering, e.g.
> http://ufldl.stanford.edu/wiki/index.php/PCA
> <http://ufldl.stanford.edu/wiki/index.php/PCA>
> any reference?
>
>
>
> On Mon, Oct 16, 2017 at 10:20 AM, <[email protected]
> <mailto:[email protected]>> wrote:
> Send scikit-learn mailing list submissions to
> [email protected] <mailto:[email protected]>
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://mail.python.org/mailman/listinfo/scikit-learn
> <https://mail.python.org/mailman/listinfo/scikit-learn>
> or, via email, send a message with subject or body 'help' to
> [email protected]
> <mailto:[email protected]>
>
> You can reach the person managing the list at
> [email protected] <mailto:[email protected]>
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of scikit-learn digest..."
>
>
> Today's Topics:
>
> 1. Re: unclear help file for sklearn.decomposition.pca
> (Andreas Mueller)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 16 Oct 2017 13:19:57 -0400
> From: Andreas Mueller <[email protected] <mailto:[email protected]>>
> To: [email protected] <mailto:[email protected]>
> Subject: Re: [scikit-learn] unclear help file for
> sklearn.decomposition.pca
> Message-ID: <[email protected]
> <mailto:[email protected]>>
> Content-Type: text/plain; charset="utf-8"; Format="flowed"
>
> The definition of PCA has a centering step, but no scaling step.
>
> On 10/16/2017 11:16 AM, Ismael Lemhadri wrote:
> > Dear Roman,
> > My concern is actually not about not mentioning the scaling but about
> > not mentioning the centering.
> > That is, the sklearn PCA removes the mean but it does not mention it
> > in the help file.
> > This was quite messy for me to debug as I expected it to either: 1/
> > center and scale simultaneously or / not scale and not center either.
> > It would be beneficial to explicit the behavior in the help file in my
> > opinion.
> > Ismael
> >
> > On Mon, Oct 16, 2017 at 8:02 AM, <[email protected]
> > <mailto:[email protected]>
> > <mailto:[email protected]
> > <mailto:[email protected]>>> wrote:
> >
> > Send scikit-learn mailing list submissions to
> > [email protected] <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> > https://mail.python.org/mailman/listinfo/scikit-learn
> > <https://mail.python.org/mailman/listinfo/scikit-learn>
> > <https://mail.python.org/mailman/listinfo/scikit-learn
> > <https://mail.python.org/mailman/listinfo/scikit-learn>>
> > or, via email, send a message with subject or body 'help' to
> > [email protected] <mailto:[email protected]>
> > <mailto:[email protected]
> > <mailto:[email protected]>>
> >
> > You can reach the person managing the list at
> > [email protected] <mailto:[email protected]>
> > <mailto:[email protected]
> > <mailto:[email protected]>>
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of scikit-learn digest..."
> >
> >
> > Today's Topics:
> >
> > ? ?1. unclear help file for sklearn.decomposition.pca (Ismael
> > Lemhadri)
> > ? ?2. Re: unclear help file for sklearn.decomposition.pca
> > ? ? ? (Roman Yurchak)
> > ? ?3. Question about LDA's coef_ attribute (Serafeim Loukas)
> > ? ?4. Re: Question about LDA's coef_ attribute (Alexandre Gramfort)
> > ? ?5. Re: Question about LDA's coef_ attribute (Serafeim Loukas)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Sun, 15 Oct 2017 18:42:56 -0700
> > From: Ismael Lemhadri <[email protected]
> > <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>>
> > To: [email protected] <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> > Subject: [scikit-learn] unclear help file for
> > ? ? ? ? sklearn.decomposition.pca
> > Message-ID:
> > ? ? ? ?
> > <CANpSPFTgv+Oz7f97dandmrBBayqf_o9w=18okhcfn0u5dnz...@mail.gmail.com
> > <mailto:18okhcfn0u5dnzj%[email protected]>
> > <mailto:18okhcfn0u5dnzj%[email protected]
> > <mailto:18okhcfn0u5dnzj%[email protected]>>>
> > Content-Type: text/plain; charset="utf-8"
> >
> > Dear all,
> > The help file for the PCA class is unclear about the preprocessing
> > performed to the data.
> > You can check on line 410 here:
> > https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/
> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/>
> > decomposition/pca.py#L410
> >
> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/%0Adecomposition/pca.py#L410
> >
> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/%0Adecomposition/pca.py#L410>>
> > that the matrix is centered but NOT scaled, before performing the
> > singular
> > value decomposition.
> > However, the help files do not make any mention of it.
> > This is unclear for someone who, like me, just wanted to compare
> > that the
> > PCA and np.linalg.svd give the same results. In academic settings,
> > students
> > are often asked to compare different methods and to check that
> > they yield
> > the same results. I expect that many students have confronted this
> > problem
> > before...
> > Best,
> > Ismael Lemhadri
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL:
> >
> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171015/c465bde7/attachment-0001.html
> >
> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171015/c465bde7/attachment-0001.html>
> >
> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171015/c465bde7/attachment-0001.html
> >
> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171015/c465bde7/attachment-0001.html>>>
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Mon, 16 Oct 2017 15:16:45 +0200
> > From: Roman Yurchak <[email protected]
> > <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>>
> > To: Scikit-learn mailing list <[email protected]
> > <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>>
> > Subject: Re: [scikit-learn] unclear help file for
> > ? ? ? ? sklearn.decomposition.pca
> > Message-ID: <[email protected]
> > <mailto:[email protected]>
> > <mailto:[email protected]
> > <mailto:[email protected]>>>
> > Content-Type: text/plain; charset=utf-8; format=flowed
> >
> > Ismael,
> >
> > as far as I saw the sklearn.decomposition.PCA doesn't mention
> > scaling at
> > all (except for the whiten parameter which is post-transformation
> > scaling).
> >
> > So since it doesn't mention it, it makes sense that it doesn't do any
> > scaling of the input. Same as np.linalg.svd.
> >
> > You can verify that PCA and np.linalg.svd yield the same results, with
> >
> > ```
> > ?>>> import numpy as np
> > ?>>> from sklearn.decomposition import PCA
> > ?>>> import numpy.linalg
> > ?>>> X = np.random.RandomState(42).rand(10, 4)
> > ?>>> n_components = 2
> > ?>>> PCA(n_components, svd_solver='full').fit_transform(X)
> > ```
> >
> > and
> >
> > ```
> > ?>>> U, s, V = np.linalg.svd(X - X.mean(axis=0), full_matrices=False)
> > ?>>> (X - X.mean(axis=0)).dot(V[:n_components].T)
> > ```
> >
> > --
> > Roman
> >
> > On 16/10/17 03:42, Ismael Lemhadri wrote:
> > > Dear all,
> > > The help file for the PCA class is unclear about the preprocessing
> > > performed to the data.
> > > You can check on line 410 here:
> > >
> >
> > https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410
> >
> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410>
> >
> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410
> >
> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410>>
> > >
> >
> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410
> >
> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410>
> >
> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410
> >
> > <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410>>>
> > > that the matrix is centered but NOT scaled, before performing the
> > > singular value decomposition.
> > > However, the help files do not make any mention of it.
> > > This is unclear for someone who, like me, just wanted to compare
> > that
> > > the PCA and np.linalg.svd give the same results. In academic
> > settings,
> > > students are often asked to compare different methods and to
> > check that
> > > they yield the same results. I expect that many students have
> > confronted
> > > this problem before...
> > > Best,
> > > Ismael Lemhadri
> > >
> > >
> > > _______________________________________________
> > > scikit-learn mailing list
> > > [email protected] <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> > > https://mail.python.org/mailman/listinfo/scikit-learn
> > <https://mail.python.org/mailman/listinfo/scikit-learn>
> > <https://mail.python.org/mailman/listinfo/scikit-learn
> > <https://mail.python.org/mailman/listinfo/scikit-learn>>
> > >
> >
> >
> >
> > ------------------------------
> >
> > Message: 3
> > Date: Mon, 16 Oct 2017 15:27:48 +0200
> > From: Serafeim Loukas <[email protected] <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>>
> > To: [email protected] <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> > Subject: [scikit-learn] Question about LDA's coef_ attribute
> > Message-ID: <[email protected]
> > <mailto:[email protected]>
> > <mailto:[email protected]
> > <mailto:[email protected]>>>
> > Content-Type: text/plain; charset="us-ascii"
> >
> > Dear Scikit-learn community,
> >
> > Since the documentation of the LDA
> >
> > (http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
> >
> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>
> >
> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
> >
> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>>
> >
> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
> >
> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>
> >
> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
> >
> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>>>)
> > is not so clear, I would like to ask if the lda.coef_ attribute
> > stores the eigenvectors from the SVD decomposition.
> >
> > Thank you in advance,
> > Serafeim
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL:
> >
> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/4263df5c/attachment-0001.html
> >
> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/4263df5c/attachment-0001.html>
> >
> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/4263df5c/attachment-0001.html
> >
> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/4263df5c/attachment-0001.html>>>
> >
> > ------------------------------
> >
> > Message: 4
> > Date: Mon, 16 Oct 2017 16:57:52 +0200
> > From: Alexandre Gramfort <[email protected]
> > <mailto:[email protected]>
> > <mailto:[email protected]
> > <mailto:[email protected]>>>
> > To: Scikit-learn mailing list <[email protected]
> > <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>>
> > Subject: Re: [scikit-learn] Question about LDA's coef_ attribute
> > Message-ID:
> > ? ? ? ?
> > <cadeotzricoqhuhjmmw2z14cqffeqyndyoxn-ogkavtmq7v0...@mail.gmail.com
> > <mailto:cadeotzricoqhuhjmmw2z14cqffeqyndyoxn-ogkavtmq7v0...@mail.gmail.com>
> >
> > <mailto:cadeotzricoqhuhjmmw2z14cqffeqyndyoxn-ogkavtmq7v0...@mail.gmail.com
> > <mailto:cadeotzricoqhuhjmmw2z14cqffeqyndyoxn-ogkavtmq7v0...@mail.gmail.com>>>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > no it stores the direction of the decision function to match the
> > API of
> > linear models.
> >
> > HTH
> > Alex
> >
> > On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas
> > <[email protected] <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> > > Dear Scikit-learn community,
> > >
> > > Since the documentation of the LDA
> > >
> >
> > (http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
> >
> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>
> >
> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
> >
> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>>)
> > > is not so clear, I would like to ask if the lda.coef_ attribute
> > stores the
> > > eigenvectors from the SVD decomposition.
> > >
> > > Thank you in advance,
> > > Serafeim
> > >
> > > _______________________________________________
> > > scikit-learn mailing list
> > > [email protected] <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> > > https://mail.python.org/mailman/listinfo/scikit-learn
> > <https://mail.python.org/mailman/listinfo/scikit-learn>
> > <https://mail.python.org/mailman/listinfo/scikit-learn
> > <https://mail.python.org/mailman/listinfo/scikit-learn>>
> > >
> >
> >
> > ------------------------------
> >
> > Message: 5
> > Date: Mon, 16 Oct 2017 17:02:46 +0200
> > From: Serafeim Loukas <[email protected] <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>>
> > To: Scikit-learn mailing list <[email protected]
> > <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>>
> > Subject: Re: [scikit-learn] Question about LDA's coef_ attribute
> > Message-ID: <[email protected]
> > <mailto:[email protected]>
> > <mailto:[email protected]
> > <mailto:[email protected]>>>
> > Content-Type: text/plain; charset="us-ascii"
> >
> > Dear Alex,
> >
> > Thank you for the prompt response.
> >
> > Are the eigenvectors stored in some variable ?
> > Does the lda.scalings_ attribute contain the eigenvectors ?
> >
> > Best,
> > Serafeim
> >
> > > On 16 Oct 2017, at 16:57, Alexandre Gramfort
> > <[email protected] <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>>
> > wrote:
> > >
> > > no it stores the direction of the decision function to match the
> > API of
> > > linear models.
> > >
> > > HTH
> > > Alex
> > >
> > > On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas
> > <[email protected] <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> > >> Dear Scikit-learn community,
> > >>
> > >> Since the documentation of the LDA
> > >>
> >
> > (http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
> >
> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>
> >
> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
> >
> > <http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>>)
> > >> is not so clear, I would like to ask if the lda.coef_ attribute
> > stores the
> > >> eigenvectors from the SVD decomposition.
> > >>
> > >> Thank you in advance,
> > >> Serafeim
> > >>
> > >> _______________________________________________
> > >> scikit-learn mailing list
> > >> [email protected] <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> > >> https://mail.python.org/mailman/listinfo/scikit-learn
> > <https://mail.python.org/mailman/listinfo/scikit-learn>
> > <https://mail.python.org/mailman/listinfo/scikit-learn
> > <https://mail.python.org/mailman/listinfo/scikit-learn>>
> > >>
> > > _______________________________________________
> > > scikit-learn mailing list
> > > [email protected] <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> > > https://mail.python.org/mailman/listinfo/scikit-learn
> > <https://mail.python.org/mailman/listinfo/scikit-learn>
> > <https://mail.python.org/mailman/listinfo/scikit-learn
> > <https://mail.python.org/mailman/listinfo/scikit-learn>>
> >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL:
> >
> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/505c7da3/attachment.html
> >
> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/505c7da3/attachment.html>
> >
> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/505c7da3/attachment.html
> >
> > <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/505c7da3/attachment.html>>>
> >
> > ------------------------------
> >
> > Subject: Digest Footer
> >
> > _______________________________________________
> > scikit-learn mailing list
> > [email protected] <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> > https://mail.python.org/mailman/listinfo/scikit-learn
> > <https://mail.python.org/mailman/listinfo/scikit-learn>
> > <https://mail.python.org/mailman/listinfo/scikit-learn
> > <https://mail.python.org/mailman/listinfo/scikit-learn>>
> >
> >
> > ------------------------------
> >
> > End of scikit-learn Digest, Vol 19, Issue 25
> > ********************************************
> >
> >
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> > [email protected] <mailto:[email protected]>
> > https://mail.python.org/mailman/listinfo/scikit-learn
> > <https://mail.python.org/mailman/listinfo/scikit-learn>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/f47e63a9/attachment.html
>
> <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/f47e63a9/attachment.html>>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> scikit-learn mailing list
> [email protected] <mailto:[email protected]>
> https://mail.python.org/mailman/listinfo/scikit-learn
> <https://mail.python.org/mailman/listinfo/scikit-learn>
>
>
> ------------------------------
>
> End of scikit-learn Digest, Vol 19, Issue 28
> ********************************************
>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn