Hi Michael, as you point out I also think the most straight forward
approach (good enough) is to fit all the polynomials (n = 0,..,3) using
ols, and evaluate the predictive capability by cross-validation. Will
compare this to the lasso approach.

Thanks for your comments!
-fernando



On Sun, Jun 29, 2014 at 11:37 PM, Michael Eickenberg <
michael.eickenb...@gmail.com> wrote:

> The model selection properties of the lasso are such that it will select a
> certain number of variables that suit it best. If you like, you can run a
> lasso and then select the degree of the polynomial of maximum degree active
> and call it the polynomial order of your problem.
>
> I maintain that the setup you presented works just fine: If you gave it
> x[:, np.newaxis] ** np.arange(4) (appropriately normalized, by eg
> factorial(np.arange(4)) ) instead of Xpoly, it would find the coefficient
> you are looking for, because you would be giving it the polynomial basis
> you are trying to use for predicition.
>
> However, your model selection setting seems to be more of a "select a
> polynomial space of order less than or equal to k and estimate k too",
> which looks to me like a hierarchical group lasso problem if you want it to
> be principled.
>
> Since you are only going up to degree 4, I think the best solution is to
> try all polynomial orders from 0 to 4 by a plain vanilla ols fit, and
> validate predictive power on held out data. Then select the order that
> maximizes this.
>
> Hope that helps!
> Michael
>
>
> On Monday, June 30, 2014, Fernando Paolo <fpa...@ucsd.edu> wrote:
>
>> Michael and Mathieu, thanks for your answers!
>>
>> Perhaps I should explain better my problem, so you may have a better
>> suggestion on how to approach it. I have several datasets of the form f =
>> y(x), and I need to fit to these data a 'linear', 'quadratic' or 'cubic'
>> polynomial. So I want to (i) *automatically* determine the rank of the
>> problem (constrained to n = 1, 2 or 3), (ii) fit the respective polynomial
>> of order n, and (iii) retrieve the coefficients a_i of the fitted
>> polynomial, such that
>>
>> p(x) = a_0 + a_1 * x + a_2 * x^2 + a_3 * x^3
>>
>> with x being the input data.
>>
>> Note: If a simpler model explains the data "reasonably well" (i.e. not
>> necessarily presenting the best MSE), then is always preferred. That is, a
>> line is preferred over a parabola, and so on. That's why I initially
>> thought of using LASSO. So, is it possible to retrieve the coefficients a_i
>> (above) from the LASSO model? If not, how can I achieve this using the
>> sklearn library?
>>
>> Of course a "brute force" approach would be to first determine the rank
>> of the problem using LASSO, and then fit the respective polynomial using
>> least-squares.
>>
>> Thank you,
>> -fernando
>>
>>
>>
>>
>> On Sun, Jun 29, 2014 at 4:50 AM, Mathieu Blondel <math...@mblondel.org>
>> wrote:
>>
>>> Hi Fernando,
>>>
>>>
>>> On Sun, Jun 29, 2014 at 1:53 PM, Fernando Paolo <fpa...@ucsd.edu> wrote:
>>>
>>>> Hello,
>>>>
>>>> I must be missing something obvious because I can't find the "actual"
>>>> coefficients of the polynomial fitted using LassoCV. That is, for a 3rd
>>>> degree polynomial
>>>>
>>>> p = a0 + a1 * x + a2 * x^2 + a3 * x^3
>>>>
>>>>  I want the a0, a1, a2 and a3 coefficients (as those returned by
>>>> numpy.polyfit()). Here is an example code of what I'm after
>>>>
>>>> import numpy as np
>>>> import matplotlib.pyplot as plt
>>>> from pandas import *
>>>> from math import *
>>>> from patsy import dmatrix
>>>> from sklearn.linear_model import LassoCV
>>>>
>>>> sin_data = DataFrame({'x' : np.linspace(0, 1, 101)})
>>>> sin_data['y'] = np.sin(2 * pi * sin_data['x']) + np.random.normal(0,
>>>> 0.1, 101)
>>>> x = sin_data['x']
>>>> y = sin_data['y']
>>>> Xpoly = dmatrix('C(x, Poly)')
>>>>
>>>
>>> The development version of scikit-learn contains a transformer to do
>>> exactly this:
>>>
>>> http://scikit-learn.org/dev/modules/generated/sklearn.preprocessing.PolynomialFeatures.html
>>>
>>>
>>>> n = 3
>>>> lasso_model = LassoCV(cv=15, copy_X=True, normalize=True)
>>>> lasso_fit = lasso_model.fit(Xpoly[:,1:n+1], y)
>>>>
>>>
>>> In  scikit-learn, "fit" always returns the model itself so here
>>> "lasso_model" and "lasso_fit" refer to the same thing.
>>>
>>> lasso_predict = lasso_model.predict(Xpoly[:,1:n+1])
>>>>
>>>> a = np.r_[lasso_fit.intercept_, lasso_fit.coef_]
>>>>
>>>> b = np.polyfit(x, y, n)[::-1]
>>>>
>>>> p_lasso = a[0] + a[1] * x + a[2] * x**2 + a[3] * x**3
>>>> p_polyfit = b[0] + b[1] * x + b[2] * x**2 + b[3] * x**3
>>>>
>>>> print 'coef. lasso:', a
>>>> print 'coef. polyfit:', b
>>>>
>>>>
>>>> The returned coefficients 'a' and 'b' are completely different, and
>>>> while 'p_polyfit' is indeed the fitted polynomial of degree 3, 'p_lasso'
>>>> makes no sense (plot to see). Unless 'b' is something else... If so, what
>>>> actually are the coefficients returned by fit()? And how can I get the
>>>> coefficients that reconstruct the fitted polynomial?
>>>>
>>>>
>>> Why are you expecting a and b to be the same? np.polyfit returns a
>>> least-squares fit so the model is different from a lasso.
>>> You should use LinearRegression or Ridge with light regularization
>>> instead.
>>>
>>> HTH,
>>> Mathieu
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Open source business process management suite built on Java and Eclipse
>>> Turn processes into business applications with Bonita BPM Community
>>> Edition
>>> Quickly connect people, data, and systems into organized workflows
>>> Winner of BOSSIE, CODIE, OW2 and Gartner awards
>>> http://p.sf.net/sfu/Bonitasoft
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> --
>> Fernando Paolo
>> Institute of Geophysics & Planetary Physics
>> Scripps Institution of Oceanography
>> University of California, San Diego
>> 9500 Gilman Drive
>> La Jolla, CA 92093-0225
>>
>
>
> ------------------------------------------------------------------------------
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 
Fernando Paolo
Institute of Geophysics & Planetary Physics
Scripps Institution of Oceanography
University of California, San Diego
9500 Gilman Drive
La Jolla, CA 92093-0225
------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to