Michael and Mathieu, thanks for your answers!

Perhaps I should explain better my problem, so you may have a better
suggestion on how to approach it. I have several datasets of the form f =
y(x), and I need to fit to these data a 'linear', 'quadratic' or 'cubic'
polynomial. So I want to (i) *automatically* determine the rank of the
problem (constrained to n = 1, 2 or 3), (ii) fit the respective polynomial
of order n, and (iii) retrieve the coefficients a_i of the fitted
polynomial, such that

p(x) = a_0 + a_1 * x + a_2 * x^2 + a_3 * x^3

with x being the input data.

Note: If a simpler model explains the data "reasonably well" (i.e. not
necessarily presenting the best MSE), then is always preferred. That is, a
line is preferred over a parabola, and so on. That's why I initially
thought of using LASSO. So, is it possible to retrieve the coefficients a_i
(above) from the LASSO model? If not, how can I achieve this using the
sklearn library?

Of course a "brute force" approach would be to first determine the rank of
the problem using LASSO, and then fit the respective polynomial using
least-squares.

Thank you,
-fernando




On Sun, Jun 29, 2014 at 4:50 AM, Mathieu Blondel <math...@mblondel.org>
wrote:

> Hi Fernando,
>
>
> On Sun, Jun 29, 2014 at 1:53 PM, Fernando Paolo <fpa...@ucsd.edu> wrote:
>
>> Hello,
>>
>> I must be missing something obvious because I can't find the "actual"
>> coefficients of the polynomial fitted using LassoCV. That is, for a 3rd
>> degree polynomial
>>
>> p = a0 + a1 * x + a2 * x^2 + a3 * x^3
>>
>>  I want the a0, a1, a2 and a3 coefficients (as those returned by
>> numpy.polyfit()). Here is an example code of what I'm after
>>
>> import numpy as np
>> import matplotlib.pyplot as plt
>> from pandas import *
>> from math import *
>> from patsy import dmatrix
>> from sklearn.linear_model import LassoCV
>>
>> sin_data = DataFrame({'x' : np.linspace(0, 1, 101)})
>> sin_data['y'] = np.sin(2 * pi * sin_data['x']) + np.random.normal(0, 0.1,
>> 101)
>> x = sin_data['x']
>> y = sin_data['y']
>> Xpoly = dmatrix('C(x, Poly)')
>>
>
> The development version of scikit-learn contains a transformer to do
> exactly this:
>
> http://scikit-learn.org/dev/modules/generated/sklearn.preprocessing.PolynomialFeatures.html
>
>
>> n = 3
>> lasso_model = LassoCV(cv=15, copy_X=True, normalize=True)
>> lasso_fit = lasso_model.fit(Xpoly[:,1:n+1], y)
>>
>
> In  scikit-learn, "fit" always returns the model itself so here
> "lasso_model" and "lasso_fit" refer to the same thing.
>
> lasso_predict = lasso_model.predict(Xpoly[:,1:n+1])
>>
>> a = np.r_[lasso_fit.intercept_, lasso_fit.coef_]
>>
>> b = np.polyfit(x, y, n)[::-1]
>>
>> p_lasso = a[0] + a[1] * x + a[2] * x**2 + a[3] * x**3
>> p_polyfit = b[0] + b[1] * x + b[2] * x**2 + b[3] * x**3
>>
>> print 'coef. lasso:', a
>> print 'coef. polyfit:', b
>>
>>
>> The returned coefficients 'a' and 'b' are completely different, and while
>> 'p_polyfit' is indeed the fitted polynomial of degree 3, 'p_lasso' makes no
>> sense (plot to see). Unless 'b' is something else... If so, what actually
>> are the coefficients returned by fit()? And how can I get the coefficients
>> that reconstruct the fitted polynomial?
>>
>>
> Why are you expecting a and b to be the same? np.polyfit returns a
> least-squares fit so the model is different from a lasso.
> You should use LinearRegression or Ridge with light regularization instead.
>
> HTH,
> Mathieu
>
>
> ------------------------------------------------------------------------------
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 
Fernando Paolo
Institute of Geophysics & Planetary Physics
Scripps Institution of Oceanography
University of California, San Diego
9500 Gilman Drive
La Jolla, CA 92093-0225
------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to