Re: [Scikit-learn-general] PCA inverse transform

Alexandre Gramfort Mon, 30 Jun 2014 05:40:09 -0700

hi,

I would be +1 adding an invert_whitening param to PCA that would
default to False in 0.15 and move to True in 0.16 to eventually
disappear later.


Alex


On Mon, Jun 30, 2014 at 8:53 AM, Michael Eickenberg
<[email protected]> wrote:
> Kyle is facing the same question for his incremental pca pr
> https://github.com/scikit-learn/scikit-learn/pull/3285
>
>
> On Monday, June 30, 2014, Michael Eickenberg <[email protected]>
> wrote:
>>
>> Hi Sean,
>>
>> this has been mentioned in an issue
>> https://github.com/scikit-learn/scikit-learn/pull/3107 along with the
>> changes necessary to invert the whitening properly (if you look at files
>> changed).
>>
>> While we are at it, Alex Gramfort asked me to ask whether anybody sees
>> good reasons *not* to invert the whitening properly in future versions. Any
>> counter-arguments?
>>
>> Michael
>>
>>
>>
>> On Monday, June 30, 2014, Sean Violante <[email protected]> wrote:
>>>
>>> Hi
>>>
>>> Why doesn't PCA and Probabilistic PCA calculate the inverse transform
>>> properly when whitening is enabled? AFAIK all that is required is to (in
>>> addition) multiply by explained_variance?
>>>
>>> sean
>>>
>>>
>>> On Mon, Jun 30, 2014 at 5:28 AM,
>>> <[email protected]> wrote:
>>>>
>>>> Send Scikit-learn-general mailing list submissions to
>>>>         [email protected]
>>>>
>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>>
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>> or, via email, send a message with subject or body 'help' to
>>>>         [email protected]
>>>>
>>>> You can reach the person managing the list at
>>>>         [email protected]
>>>>
>>>> When replying, please edit your Subject line so it is more specific
>>>> than "Re: Contents of Scikit-learn-general digest..."
>>>>
>>>>
>>>> Today's Topics:
>>>>
>>>>    1. Difference between        sklearn.feature_selection.chi2 and
>>>>       scipy.stats.chi2_contingency (Christian Jauvin)
>>>>    2. Re: Retrieve the coefficients of fitted polynomial using
>>>>       LASSO (Fernando Paolo)
>>>>    3. Retrieve the coefficients of fitted polynomial using LASSO
>>>>       (Fernando Paolo)
>>>>
>>>>
>>>> ----------------------------------------------------------------------
>>>>
>>>> Message: 1
>>>> Date: Sun, 29 Jun 2014 18:28:07 -0400
>>>> From: Christian Jauvin <[email protected]>
>>>> Subject: [Scikit-learn-general] Difference between
>>>>         sklearn.feature_selection.chi2 and scipy.stats.chi2_contingency
>>>> To: "scikit-learn mailing list (sklearn)"
>>>>         <[email protected]>
>>>> Message-ID:
>>>>
>>>> <cajse4tqlrjvcxrof2qrqfezt6zae6b-eh_tutp8yrhqikmr...@mail.gmail.com>
>>>> Content-Type: text/plain; charset="utf-8"
>>>>
>>>> Hi,
>>>>
>>>> Suppose I wanted to test the independence of two boolean variables using
>>>> Chi-Square:
>>>>
>>>> >>> X = numpy.vstack(([[0,0]] * 18, [[0,1]] * 7, [[1,0]] * 42, [[1,1]] *
>>>> 33))
>>>> >>> X.shape
>>>> (100, 2)
>>>>
>>>> I'd like to understand the difference between doing:
>>>>
>>>> >>> sklearn.feature_selection.chi2(X[:,[0]], X[:,1])
>>>> (array([ 0.5]), array([ 0.47950012]))
>>>>
>>>> and doing:
>>>>
>>>> >>> pandas.crosstab(X[:,0], X[:,1])
>>>> col_0   0   1
>>>> row_0
>>>> 0      18   7
>>>> 1      42  33
>>>> >>> scipy.stats.chi2_contingency(pd.crosstab(X[:,0], X[:,1]),
>>>> correction=False)
>>>> (2.0, 0.15729920705028505, 1, array([[ 15.,  10.],
>>>>         [ 45.,  30.]]))
>>>>
>>>> What explains the difference in terms of the Chi-Square value (0.5 vs 2)
>>>> and the P-value (0.48 vs 0.157)?
>>>>
>>>> Thanks,
>>>>
>>>> Christian
>>>> -------------- next part --------------
>>>> An HTML attachment was scrubbed...
>>>>
>>>> ------------------------------
>>>>
>>>> Message: 2
>>>> Date: Sun, 29 Jun 2014 18:52:37 -0700
>>>> From: Fernando Paolo <[email protected]>
>>>> Subject: Re: [Scikit-learn-general] Retrieve the coefficients of
>>>>         fitted polynomial using LASSO
>>>> To: "[email protected]"
>>>>         <[email protected]>,
>>>> [email protected]
>>>> Message-ID:
>>>>
>>>> <CAPBk00E+nTDYKUm0T0bWQX=vvebg34tvysl1y8zltxf5mmw...@mail.gmail.com>
>>>> Content-Type: text/plain; charset="utf-8"
>>>>
>>>> Michael and Mathieu, thanks for your answers!
>>>>
>>>> Perhaps I should explain better my problem, so you may have a better
>>>> suggestion on how to approach it. I have several datasets of the form f
>>>> =
>>>> y(x), and I need to fit to these data a 'linear', 'quadratic' or 'cubic'
>>>> polynomial. So I want to (i) *automatically* determine the rank of the
>>>> problem (constrained to n = 1, 2 or 3), (ii) fit the respective
>>>> polynomial
>>>> of order n, and (iii) retrieve the coefficients a_i of the fitted
>>>> polynomial, such that
>>>>
>>>> p(x) = a_0 + a_1 * x + a_2 * x^2 + a_3 * x^3
>>>>
>>>> with x being the input data.
>>>>
>>>> Note: If a simpler model explains the data "reasonably well" (i.e. not
>>>> necessarily presenting the best MSE), then is always preferred. That is,
>>>> a
>>>> line is preferred over a parabola, and so on. That's why I initially
>>>> thought of using LASSO. So, is it possible to retrieve the coefficients
>>>> a_i
>>>> (above) from the LASSO model? If not, how can I achieve this using the
>>>> sklearn library?
>>>>
>>>> Of course a "brute force" approach would be to first determine the rank
>>>> of
>>>> the problem using LASSO, and then fit the respective polynomial using
>>>> least-squares.
>>>>
>>>> Thank you,
>>>> -fernando
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Jun 29, 2014 at 4:50 AM, Mathieu Blondel <[email protected]>
>>>> wrote:
>>>>
>>>> > Hi Fernando,
>>>> >
>>>> >
>>>> > On Sun, Jun 29, 2014 at 1:53 PM, Fernando Paolo <[email protected]>
>>>> > wrote:
>>>> >
>>>> >> Hello,
>>>> >>
>>>> >> I must be missing something obvious because I can't find the "actual"
>>>> >> coefficients of the polynomial fitted using LassoCV. That is, for a
>>>> >> 3rd
>>>> >> degree polynomial
>>>> >>
>>>> >> p = a0 + a1 * x + a2 * x^2 + a3 * x^3
>>>> >>
>>>> >>  I want the a0, a1, a2 and a3 coefficients (as those returned by
>>>> >> numpy.polyfit()). Here is an example code of what I'm after
>>>> >>
>>>> >> import numpy as np
>>>> >> import matplotlib.pyplot as plt
>>>> >> from pandas import *
>>>> >> from math import *
>>>> >> from patsy import dmatrix
>>>> >> from sklearn.linear_model import LassoCV
>>>> >>
>>>> >> sin_data = DataFrame({'x' : np.linspace(0, 1, 101)})
>>>> >> sin_data['y'] = np.sin(2 * pi * sin_data['x']) + np.random.normal(0,
>>>> >> 0.1,
>>>> >> 101)
>>>> >> x = sin_data['x']
>>>> >> y = sin_data['y']
>>>> >> Xpoly = dmatrix('C(x, Poly)')
>>>> >>
>>>> >
>>>> > The development version of scikit-learn contains a transformer to do
>>>> > exactly this:
>>>> >
>>>> >
>>>> > http://scikit-learn.org/dev/modules/generated/sklearn.preprocessing.PolynomialFeatures.html
>>>> >
>>>> >
>>>> >> n = 3
>>>> >> lasso_model = LassoCV(cv=15, copy_X=True, normalize=True)
>>>> >> lasso_fit = lasso_model.fit(Xpoly[:,1:n+1], y)
>>>> >>
>>>> >
>>>> > In  scikit-learn, "fit" always returns the model itself so here
>>>> > "lasso_model" and "lasso_fit" refer to the same thing.
>>>> >
>>>> > lasso_predict = lasso_model.predict(Xpoly[:,1:n+1])
>>>> >>
>>>> >> a = np.r_[lasso_fit.intercept_, lasso_fit.coef_]
>>>> >>
>>>> >> b = np.polyfit(x, y, n)[::-1]
>>>> >>
>>>> >> p_lasso = a[0] + a[1] * x + a[2] * x**2 + a[3] * x**3
>>>> >> p_polyfit = b[0] + b[1] * x + b[2] * x**2 + b[3] * x**3
>>>> >>
>>>> >> print 'coef. lasso:', a
>>>> >> print 'coef. polyfit:', b
>>>> >>
>>>> >>
>>>> >> The returned coefficients 'a' and 'b' are completely different, and
>>>> >> while
>>>> >> 'p_polyfit' is indeed the fitted polynomial of degree 3, 'p_lasso'
>>>> >> makes no
>>>> >> sense (plot to see). Unless 'b' is something else... If so, what
>>>> >> actually
>>>> >> are the coefficients returned by fit()? And how can I get the
>>>> >> coefficients
>>>> >> that reconstruct the fitted polynomial?
>>>> >>
>>>> >>
>>>> > Why are you expecting a and b to be the same? np.polyfit returns a
>>>> > least-squares fit so the model is different from a lasso.
>>>> > You should use LinearRegression or Ridge with light regularization
>>>> > instead.
>>>> >
>>>> > HTH,
>>>> > Mathieu
>>>> >
>>>> >
>>>> >
>>>> > ------------------------------------------------------------------------------
>>>> > Open source business process management suite built on Java and
>>>> > Eclipse
>>>> > Turn processes into business applications with Bonita BPM Community
>>>> > Edition
>>>> > Quickly connect people, data, and systems into organized workflows
>>>> > Winner of BOSSIE, CODIE, OW2 and Gartner awards
>>>> > http://p.sf.net/sfu/Bonitasoft
>>>> > _______________________________________________
>>>> > Scikit-learn-general mailing list
>>>> > [email protected]
>>>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>> >
>>>> >
>>>>
>>>>
>>>> --
>>>> Fernando Paolo
>>>> Institute of Geophysics & Planetary Physics
>>>> Scripps Institution of Oceanography
>>>> University of California, San Diego
>>>> 9500 Gilman Drive
>>>> La Jolla, CA 92093-0225
>>>> -------------- next part --------------
>>>> An HTML attachment was scrubbed...
>>>>
>>>> ------------------------------
>>>>
>>>> Message: 3
>>>> Date: Sun, 29 Jun 2014 20:28:08 -0700
>>>> From: Fernando Paolo <[email protected]>
>>>> Subject: [Scikit-learn-general] Retrieve the coefficients of fitted
>>>>         polynomial using LASSO
>>>> To: "[email protected]"
>>>>         <[email protected]>,   Mathieu Blondel
>>>>         <[email protected]>
>>>> Message-ID:
>>>>
>>>> <capbk00fdv1-naybza894ldbdmxvwkovhlqjygd5kkzbtrdk...@mail.gmail.com>
>>>> Content-Type: text/plain; charset="utf-8"
>>>>
>>>> Note2: In summary, I want the coefficients a_i without having to
>>>> pre-define
>>>> neither the degree of the polynomial to fit (n) nor the amount of
>>>> regularization to apply (alpha), and always preferring the simpler model
>>>> (less coefficients).
>>>>
>>>> -fernando
>>>>
>>>>
>>>>
>>>> On Sun, Jun 29, 2014 at 6:52 PM, Fernando Paolo <[email protected]> wrote:
>>>>
>>>> > Michael and Mathieu, thanks for your answers!
>>>> >
>>>> > Perhaps I should explain better my problem, so you may have a better
>>>> > suggestion on how to approach it. I have several datasets of the form
>>>> > f =
>>>> > y(x), and I need to fit to these data a 'linear', 'quadratic' or
>>>> > 'cubic'
>>>> > polynomial. So I want to (i) *automatically* determine the rank of the
>>>> > problem (constrained to n = 1, 2 or 3), (ii) fit the respective
>>>> > polynomial
>>>> > of order n, and (iii) retrieve the coefficients a_i of the fitted
>>>> > polynomial, such that
>>>> >
>>>> > p(x) = a_0 + a_1 * x + a_2 * x^2 + a_3 * x^3
>>>> >
>>>> > with x being the input data.
>>>> >
>>>> > Note: If a simpler model explains the data "reasonably well" (i.e. not
>>>> > necessarily presenting the best MSE), then is always preferred. That
>>>> > is, a
>>>> > line is preferred over a parabola, and so on. That's why I initially
>>>> > thought of using LASSO. So, is it possible to retrieve the
>>>> > coefficients a_i
>>>> > (above) from the LASSO model? If not, how can I achieve this using the
>>>> > sklearn library?
>>>> >
>>>> > Of course a "brute force" approach would be to first determine the
>>>> > rank of
>>>> > the problem using LASSO, and then fit the respective polynomial using
>>>> > least-squares.
>>>> >
>>>> > Thank you,
>>>> > -fernando
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Sun, Jun 29, 2014 at 4:50 AM, Mathieu Blondel
>>>> > <[email protected]>
>>>> > wrote:
>>>> >
>>>> >> Hi Fernando,
>>>> >>
>>>> >>
>>>> >> On Sun, Jun 29, 2014 at 1:53 PM, Fernando Paolo <[email protected]>
>>>> >> wrote:
>>>> >>
>>>> >>> Hello,
>>>> >>>
>>>> >>> I must be missing something obvious because I can't find the
>>>> >>> "actual"
>>>> >>> coefficients of the polynomial fitted using LassoCV. That is, for a
>>>> >>> 3rd
>>>> >>> degree polynomial
>>>> >>>
>>>> >>> p = a0 + a1 * x + a2 * x^2 + a3 * x^3
>>>> >>>
>>>> >>>  I want the a0, a1, a2 and a3 coefficients (as those returned by
>>>> >>> numpy.polyfit()). Here is an example code of what I'm after
>>>> >>>
>>>> >>> import numpy as np
>>>> >>> import matplotlib.pyplot as plt
>>>> >>> from pandas import *
>>>> >>> from math import *
>>>> >>> from patsy import dmatrix
>>>> >>> from sklearn.linear_model import LassoCV
>>>> >>>
>>>> >>> sin_data = DataFrame({'x' : np.linspace(0, 1, 101)})
>>>> >>> sin_data['y'] = np.sin(2 * pi * sin_data['x']) + np.random.normal(0,
>>>> >>> 0.1, 101)
>>>> >>> x = sin_data['x']
>>>> >>> y = sin_data['y']
>>>> >>> Xpoly = dmatrix('C(x, Poly)')
>>>> >>>
>>>> >>
>>>> >> The development version of scikit-learn contains a transformer to do
>>>> >> exactly this:
>>>> >>
>>>> >>
>>>> >> http://scikit-learn.org/dev/modules/generated/sklearn.preprocessing.PolynomialFeatures.html
>>>> >>
>>>> >>
>>>> >>> n = 3
>>>> >>> lasso_model = LassoCV(cv=15, copy_X=True, normalize=True)
>>>> >>> lasso_fit = lasso_model.fit(Xpoly[:,1:n+1], y)
>>>> >>>
>>>> >>
>>>> >> In  scikit-learn, "fit" always returns the model itself so here
>>>> >> "lasso_model" and "lasso_fit" refer to the same thing.
>>>> >>
>>>> >> lasso_predict = lasso_model.predict(Xpoly[:,1:n+1])
>>>> >>>
>>>> >>> a = np.r_[lasso_fit.intercept_, lasso_fit.coef_]
>>>> >>>
>>>> >>> b = np.polyfit(x, y, n)[::-1]
>>>> >>>
>>>> >>> p_lasso = a[0] + a[1] * x + a[2] * x**2 + a[3] * x**3
>>>> >>> p_polyfit = b[0] + b[1] * x + b[2] * x**2 + b[3] * x**3
>>>> >>>
>>>> >>> print 'coef. lasso:', a
>>>> >>> print 'coef. polyfit:', b
>>>> >>>
>>>> >>>
>>>> >>> The returned coefficients 'a' and 'b' are completely different, and
>>>> >>> while 'p_polyfit' is indeed the fitted polynomial of degree 3,
>>>> >>> 'p_lasso'
>>>> >>> makes no sense (plot to see). Unless 'b' is something else... If so,
>>>> >>> what
>>>> >>> actually are the coefficients returned by fit()? And how can I get
>>>> >>> the
>>>> >>> coefficients that reconstruct the fitted polynomial?
>>>> >>>
>>>> >>>
>>>> >> Why are you expecting a and b to be the same? np.polyfit returns a
>>>> >> least-squares fit so the model is different from a lasso.
>>>> >> You should use LinearRegression or Ridge with light regularization
>>>> >> instead.
>>>> >>
>>>> >> HTH,
>>>> >> Mathieu
>>>> >>
>>>> >>
>>>> >>
>>>> >> ------------------------------------------------------------------------------
>>>> >> Open source business process management suite built on Java and
>>>> >> Eclipse
>>>> >> Turn processes into business applications with Bonita BPM Community
>>>> >> Edition
>>>> >> Quickly connect people, data, and systems into organized workflows
>>>> >> Winner of BOSSIE, CODIE, OW2 and Gartner awards
>>>> >> http://p.sf.net/sfu/Bonitasoft
>>>> >> _______________________________________________
>>>> >> Scikit-learn-general mailing list
>>>> >> [email protected]
>>>> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>> > --
>>>> > Fernando Paolo
>>>> > Institute of Geophysics & Planetary Physics
>>>> > Scripps Institution of Oceanography
>>>> > University of California, San Diego
>>>> > 9500 Gilman Drive
>>>> > La Jolla, CA 92093-0225
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Fernando Paolo
>>>> Institute of Geophysics & Planetary Physics
>>>> Scripps Institution of Oceanography
>>>> University of California, San Diego
>>>>
>>>> web: fspaolo.net
>>>>
>>>>
>>>>
>>>> --
>>>> Fernando Paolo
>>>> Institute of Geophysics & Planetary Physics
>>>> Scripps Institution of Oceanography
>>>> University of California, San Diego
>>>> 9500 Gilman Drive
>>>> La Jolla, CA 92093-0225
>>>> -------------- next part --------------
>>>> An HTML attachment was scrubbed...
>>>>
>>>> ------------------------------
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Open source business process management suite built on Java and Eclipse
>>>> Turn processes into business applications with Bonita BPM Community
>>>> Edition
>>>> Quickly connect people, data, and systems into organized workflows
>>>> Winner of BOSSIE, CODIE, OW2 and Gartner awards
>>>> http://p.sf.net/sfu/Bonitasoft
>>>>
>>>> ------------------------------
>>>>
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>> End of Scikit-learn-general Digest, Vol 53, Issue 51
>>>> ****************************************************
>>>
>>>
>
> ------------------------------------------------------------------------------
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] PCA inverse transform

Reply via email to