2011/12/6 David Warde-Farley <[email protected]>:
> On Tue, Dec 06, 2011 at 09:04:22AM +0100, Alexandre Gramfort wrote:
>> > This actually gets at something I've been meaning to fiddle with and 
>> > report but haven't had time: I'm not sure I completely trust the 
>> > coordinate descent implementation in scikit-learn, because it seems to 
>> > give me bogus answers a lot (i.e., the optimality conditions necessary for 
>> > it to be an actual solution are not even approximately satisfied). Are you 
>> > guys using something weird for the termination condition?
>>
>> can you give us a sample X and y that shows the pb?
>>
>> it should ultimately use the duality gap to stop the iterations but
>> there might be a corner case …
>
> In [34]: rng = np.random.RandomState(0)
>
> In [35]: dictionary = rng.normal(size=(100, 500)) / 1000; dictionary /=
> np.sqrt((dictionary ** 2).sum(axis=0))
>
> In [36]: signal = rng.normal(size=100) / 1000
>
> In [37]: from sklearn.linear_model import Lasso
>
> In [38]: lasso = Lasso(alpha=0.0001, max_iter=1e6, fit_intercept=False,
> tol=1e-8)
>
> In [39]: lasso.fit(dictionary, signal)
> Out[39]:
> Lasso(alpha=0.0001, copy_X=True, fit_intercept=False, max_iter=1000000.0,
>   normalize=False, precompute='auto', tol=1e-08)
>
> In [40]: max(abs(lasso.coef_))
> Out[40]: 0.0
>
> In [41]: from pylearn2.optimization.feature_sign import feature_sign_search
>
> In [42]: coef = feature_sign_search(dictionary, signal, 0.0001)
>
> In [43]: max(abs(coef))
> Out[43]: 0.0027295761244725018
>
> And I'm pretty sure the latter result is the right one, since
>
> In [45]: def gradient(coefs):
>   ....:     gram = np.dot(dictionary.T, dictionary)
>   ....:     corr = np.dot(dictionary.T, signal)
>   ....:     return - 2 * corr + 2 * np.dot(gram, coefs) + 0.0001 *
> np.sign(coefs)
>   ....:

Actually, alpha in scikit-learn is multiplied by n_samples. I agree
this is misleading and not documented in the docstring.

>>> lasso = Lasso(alpha=0.0001 / dictionary.shape[0], max_iter=1e6, 
>>> fit_intercept=False, tol=1e-8).fit(dictionary, signal)
>>> max(abs(lasso.coef_))
0.0027627270397484554
>>> max(abs(gradient(lasso.coef_)))
0.00019687294269977963

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to