Re: [Scikit-learn-general] Specify rather than learn sparse coding dictionary?

Vlad Niculae Tue, 06 Dec 2011 13:53:20 -0800

On Tue, Dec 6, 2011 at 11:46 PM, Alexandre Gramfort
<[email protected]> wrote:
> I do confirm that Lasso and LassoLars both minimize
>
> 1/2n || y - Xw || + alpha ||w||_1
>
> and that the n should not be present in the sparse coding context.
>
> it means :
>
> http://scikit-learn.org/stable/modules/linear_model.html#lasso
>
> is not correct. I don't know if this also affects the doc of the SGD.
> I would also vote for writing the cost function minimized in the Lasso
> (etc.) docstrings.
>
> regarding the shapes using sparse_encode I'll let Vlad comment.


At first sight I agree with Olivier re: the shapes. The alpha issue is
a semantics one and we should simply multiply it back by the
appropriate dimension in order to expose a  clear interface.

How about I address these issues in the pull request I opened earlier today?

I just remember discussing with Alex that the alpha values for the
MiniBatch versions of the algorithms didn't correspond to the batch
versions. I now realize that this might be the reason; the scaling for
a mini-batch was different than for the full batch.

Vlad

> Alex
>
> On Tue, Dec 6, 2011 at 10:27 PM, David Warde-Farley
> <[email protected]> wrote:
>> On Tue, Dec 06, 2011 at 08:43:06PM +0100, Olivier Grisel wrote:
>>> 2011/12/6 David Warde-Farley <[email protected]>:
>>> > On Tue, Dec 06, 2011 at 09:04:22AM +0100, Alexandre Gramfort wrote:
>>> >> > This actually gets at something I've been meaning to fiddle with and 
>>> >> > report but haven't had time: I'm not sure I completely trust the 
>>> >> > coordinate descent implementation in scikit-learn, because it seems to 
>>> >> > give me bogus answers a lot (i.e., the optimality conditions necessary 
>>> >> > for it to be an actual solution are not even approximately satisfied). 
>>> >> > Are you guys using something weird for the termination condition?
>>> >>
>>> >> can you give us a sample X and y that shows the pb?
>>> >>
>>> >> it should ultimately use the duality gap to stop the iterations but
>>> >> there might be a corner case …
>>> >
>>> > In [34]: rng = np.random.RandomState(0)
>>> >
>>> > In [35]: dictionary = rng.normal(size=(100, 500)) / 1000; dictionary /=
>>> > np.sqrt((dictionary ** 2).sum(axis=0))
>>> >
>>> > In [36]: signal = rng.normal(size=100) / 1000
>>> >
>>> > In [37]: from sklearn.linear_model import Lasso
>>> >
>>> > In [38]: lasso = Lasso(alpha=0.0001, max_iter=1e6, fit_intercept=False,
>>> > tol=1e-8)
>>> >
>>> > In [39]: lasso.fit(dictionary, signal)
>>> > Out[39]:
>>> > Lasso(alpha=0.0001, copy_X=True, fit_intercept=False, max_iter=1000000.0,
>>> >   normalize=False, precompute='auto', tol=1e-08)
>>> >
>>> > In [40]: max(abs(lasso.coef_))
>>> > Out[40]: 0.0
>>> >
>>> > In [41]: from pylearn2.optimization.feature_sign import 
>>> > feature_sign_search
>>> >
>>> > In [42]: coef = feature_sign_search(dictionary, signal, 0.0001)
>>> >
>>> > In [43]: max(abs(coef))
>>> > Out[43]: 0.0027295761244725018
>>> >
>>> > And I'm pretty sure the latter result is the right one, since
>>> >
>>> > In [45]: def gradient(coefs):
>>> >   ....:     gram = np.dot(dictionary.T, dictionary)
>>> >   ....:     corr = np.dot(dictionary.T, signal)
>>> >   ....:     return - 2 * corr + 2 * np.dot(gram, coefs) + 0.0001 *
>>> > np.sign(coefs)
>>> >   ....:
>>>
>>> Actually, alpha in scikit-learn is multiplied by n_samples. I agree
>>> this is misleading and not documented in the docstring.
>>>
>>> >>> lasso = Lasso(alpha=0.0001 / dictionary.shape[0], max_iter=1e6, 
>>> >>> fit_intercept=False, tol=1e-8).fit(dictionary, signal)
>>> >>> max(abs(lasso.coef_))
>>> 0.0027627270397484554
>>> >>> max(abs(gradient(lasso.coef_)))
>>> 0.00019687294269977963
>>
>> Seems like there's an added factor of 2 in there as well,
>> though this is a little more standard:
>>
>> In [94]: lasso = Lasso(alpha=0.0001 / (2 * dictionary.shape[0]),
>> max_iter=1e8, fit_intercept=False, tol=1e-8).fit(dictionary, signal)
>>
>> In [95]: coef = feature_sign_search(dictionary, signal, 0.0001)
>> In [96]: allclose(lasso.coef_, coef, atol=1e-7)
>> Out[96]: True
>>
>> I think you're right that the precise cost function definitely ought to be
>> documented in the front-facing classes rather than just the low-level Cython
>> routines.
>>
>> I also think that scaling the way Lasso/ElasticNet does in the context of
>> sparse coding may be very confusing, since in sparse coding it corresponds
>> not to a number of training samples in a regression problem but to the number
>> of input dimensions.
>>
>> The docstring of sparse_encode is quite confusing in that X, the dictionary,
>> says "n_samples, n_components". The number of samples (in the context of
>> sparse coding) should have no influence over the shape of the dictionary;
>> this seems to have leaked over from the Lasso documentation.
>>
>> The shape and mathematical definition of cov doesn't make much sense to me
>> given this change, though (or to begin with, for that matter): In the case of
>> a single problem, the desired covariance is X^T y, with y a column vector,
>> yielding another column vector of (n_components, 1). So the shape, if you
>> have multiple examples you're precomputing for, should end up being
>> (n_components, n_samples), and given the shape of Y that would be achieved by
>> X^T Y^T.
>>
>> David
>>
>> ------------------------------------------------------------------------------
>> Cloud Services Checklist: Pricing and Packaging Optimization
>> This white paper is intended to serve as a reference, checklist and point of
>> discussion for anyone considering optimizing the pricing and packaging model
>> of a cloud services business. Read Now!
>> http://www.accelacomm.com/jaw/sfnl/114/51491232/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> ------------------------------------------------------------------------------
> Cloud Services Checklist: Pricing and Packaging Optimization
> This white paper is intended to serve as a reference, checklist and point of
> discussion for anyone considering optimizing the pricing and packaging model
> of a cloud services business. Read Now!
> http://www.accelacomm.com/jaw/sfnl/114/51491232/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Specify rather than learn sparse coding dictionary?

Reply via email to