Re: [Scikit-learn-general] Error when using an array for one feature linear regression

Luca Cerone Tue, 24 Sep 2013 05:31:36 -0700

Thanks Vlad, and Jacques,
I agree that it is explained in the documentation, and probably cover such
cases would be cumbersome.


I am a long-time matlab user, that's why sometimes I wonder if certain
features are intended or not.

Thanks for the time, and for the trick on reshaping the vector!


On 24 September 2013 10:26, Vlad Niculae <[email protected]> wrote:

> Just to add, I don't think you need to reshape y. And reshaping x can be
> more briefly stated as x[:, np.newaxis].
>
> In my opinion supporting such cases, while convenient for users, would
> lead to annyoing branches and code that is harder to maintain and test. The
> important thing is being consistent.
>
> My 2c,
> Vlad
>
>
> On Tue, Sep 24, 2013 at 11:21 AM, Jaques Grobler 
> <[email protected]>wrote:
>
>> On a sidenote regarding my different traceback - i'm using the latest
>> developers version
>>
>>
>> 2013/9/24 Jaques Grobler <[email protected]>
>>
>>> Hi Luca,
>>>
>>> From the docs,
>>>
>>> fit(*X*, *y*, 
>>> *n_jobs=1*)<http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.fit>
>>>
>>> Fit linear model.
>>>  Parameters :
>>>
>>> *X* : numpy array or sparse matrix of shape [n_samples,n_features]
>>>
>>> Training data
>>>
>>>
>>> With your example code, my traceback is
>>>
>>> *------------------------------------------------------------*
>>> *Traceback (most recent call last):*
>>> *  File "<ipython console>", line 1, in <module>*
>>> *  File "/home/jaques/scikit-learn/sklearn/linear_model/base.py", line
>>> 363, in fit*
>>>  *    X, y, self.fit_intercept, self.normalize, self.copy_X)*
>>> *  File "/home/jaques/scikit-learn/sklearn/linear_model/base.py", line
>>> 103, in center_data*
>>> *    X_std = np.ones(X.shape[1])*
>>> *IndexError: tuple index out of range*
>>>
>>> Hence why your reshaping fixes the problem.
>>> to match the API, your X, must be of shape [n_samples,n_features]
>>>
>>> So I don't think an issue is necessary, as it is expected, although,
>>> having a better error message in terms of what the input should be could be
>>> useful.
>>>
>>> Thoughts, list?
>>>
>>> Hope this helps
>>> Kind Regards,
>>> Jaques
>>>
>>>
>>>
>>> 2013/9/24 Luca Cerone <[email protected]>
>>>
>>>> Dear all,
>>>>
>>>> I have noticed that the Linear Regression fails to perform the
>>>> prediction if performed on
>>>> with a dataset and target that are normal array.
>>>>
>>>> You can replicate this as follows:
>>>>
>>>> from pylab import linspace, permutation, randn
>>>> from sklearn import linear_model
>>>>
>>>> >>>
>>>> clf = linear_model.LinearRegression()
>>>>
>>>> x = linspace(0,1,201)
>>>> noise = 0.2 * randn(*x.shape)
>>>> y = 0.5 + 2 * x + noise
>>>>
>>>> clf.fit(x,y)
>>>>
>>>> fails with the following message:
>>>>
>>>> TypeError                                 Traceback (most recent call
>>>> last)
>>>> <ipython-input-134-5c1831092d7a> in <module>()
>>>> ----> 1 clf.fit(x,y)
>>>>
>>>> /home/lcerone/CNOVE/local/lib/python2.7/site-packages/sklearn/linear_model/base.pyc
>>>> in fit(self, X, y, n_jobs)
>>>>     361
>>>>     362         X, y, X_mean, y_mean, X_std = self._center_data(
>>>> --> 363             X, y, self.fit_intercept, self.normalize,
>>>> self.copy_X)
>>>>     364
>>>>     365         if sp.issparse(X):
>>>>
>>>> /home/lcerone/CNOVE/local/lib/python2.7/site-packages/sklearn/linear_model/base.pyc
>>>> in center_data(X, y, fit_intercept, normalize, copy, sample_weight)
>>>>      98             if normalize:
>>>>      99                 X_std = np.sqrt(np.sum(X ** 2, axis=0))
>>>> --> 100                 X_std[X_std == 0] = 1
>>>>     101                 X /= X_std
>>>>     102             else:
>>>>
>>>> TypeError: 'numpy.float64' object does not support item assignment
>>>>
>>>> <<<
>>>>
>>>> This however can be solved by reshaping the arrays x,y (x has only 1
>>>> dimension and so does y)
>>>>
>>>> xx = x.reshape((x.shape[0],-1))
>>>> yy = y.reshape((y.shape[0],-1))
>>>> clf.fit(xx,yy)
>>>>
>>>> correctly solves the regression problem.
>>>>
>>>> Similary if now I try to run prediction using an array zz generated
>>>> using linspace the task fails, but can be solved easily by reshaping the
>>>> array zz.
>>>>
>>>> I was wondering if this is the intended behaviour or if I should submit
>>>> an issue on github.
>>>>
>>>> Have a nice day,
>>>> Cheers,
>>>> Luca
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> October Webinars: Code for Performance
>>>> Free Intel webinars can help you accelerate application performance.
>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the
>>>> most from
>>>> the latest Intel processors and coprocessors. See abstracts and
>>>> register >
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> October Webinars: Code for Performance
>> Free Intel webinars can help you accelerate application performance.
>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
>> from
>> the latest Intel processors and coprocessors. See abstracts and register >
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 
*Luca Cerone*

Tel: +447585611951
Skype: luca.cerone

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Error when using an array for one feature linear regression

Reply via email to