Re: [Scikit-learn-general] Specify rather than learn sparse coding dictionary?

David Warde-Farley Tue, 06 Dec 2011 13:28:48 -0800

On Tue, Dec 06, 2011 at 08:43:06PM +0100, Olivier Grisel wrote:
> 2011/12/6 David Warde-Farley <[email protected]>:
> > On Tue, Dec 06, 2011 at 09:04:22AM +0100, Alexandre Gramfort wrote:
> >> > This actually gets at something I've been meaning to fiddle with and 
> >> > report but haven't had time: I'm not sure I completely trust the 
> >> > coordinate descent implementation in scikit-learn, because it seems to 
> >> > give me bogus answers a lot (i.e., the optimality conditions necessary 
> >> > for it to be an actual solution are not even approximately satisfied). 
> >> > Are you guys using something weird for the termination condition?
> >>
> >> can you give us a sample X and y that shows the pb?
> >>
> >> it should ultimately use the duality gap to stop the iterations but
> >> there might be a corner case …
> >
> > In [34]: rng = np.random.RandomState(0)
> >
> > In [35]: dictionary = rng.normal(size=(100, 500)) / 1000; dictionary /=
> > np.sqrt((dictionary ** 2).sum(axis=0))
> >
> > In [36]: signal = rng.normal(size=100) / 1000
> >
> > In [37]: from sklearn.linear_model import Lasso
> >
> > In [38]: lasso = Lasso(alpha=0.0001, max_iter=1e6, fit_intercept=False,
> > tol=1e-8)
> >
> > In [39]: lasso.fit(dictionary, signal)
> > Out[39]:
> > Lasso(alpha=0.0001, copy_X=True, fit_intercept=False, max_iter=1000000.0,
> >   normalize=False, precompute='auto', tol=1e-08)
> >
> > In [40]: max(abs(lasso.coef_))
> > Out[40]: 0.0
> >
> > In [41]: from pylearn2.optimization.feature_sign import feature_sign_search
> >
> > In [42]: coef = feature_sign_search(dictionary, signal, 0.0001)
> >
> > In [43]: max(abs(coef))
> > Out[43]: 0.0027295761244725018
> >
> > And I'm pretty sure the latter result is the right one, since
> >
> > In [45]: def gradient(coefs):
> >   ....:     gram = np.dot(dictionary.T, dictionary)
> >   ....:     corr = np.dot(dictionary.T, signal)
> >   ....:     return - 2 * corr + 2 * np.dot(gram, coefs) + 0.0001 *
> > np.sign(coefs)
> >   ....:
> 
> Actually, alpha in scikit-learn is multiplied by n_samples. I agree
> this is misleading and not documented in the docstring.
> 
> >>> lasso = Lasso(alpha=0.0001 / dictionary.shape[0], max_iter=1e6, 
> >>> fit_intercept=False, tol=1e-8).fit(dictionary, signal)
> >>> max(abs(lasso.coef_))
> 0.0027627270397484554
> >>> max(abs(gradient(lasso.coef_)))
> 0.00019687294269977963


Seems like there's an added factor of 2 in there as well,
though this is a little more standard:

In [94]: lasso = Lasso(alpha=0.0001 / (2 * dictionary.shape[0]),
max_iter=1e8, fit_intercept=False, tol=1e-8).fit(dictionary, signal)

In [95]: coef = feature_sign_search(dictionary, signal, 0.0001)
In [96]: allclose(lasso.coef_, coef, atol=1e-7)
Out[96]: True

I think you're right that the precise cost function definitely ought to be
documented in the front-facing classes rather than just the low-level Cython
routines.

I also think that scaling the way Lasso/ElasticNet does in the context of
sparse coding may be very confusing, since in sparse coding it corresponds
not to a number of training samples in a regression problem but to the number
of input dimensions.

The docstring of sparse_encode is quite confusing in that X, the dictionary,
says "n_samples, n_components". The number of samples (in the context of
sparse coding) should have no influence over the shape of the dictionary;
this seems to have leaked over from the Lasso documentation.

The shape and mathematical definition of cov doesn't make much sense to me
given this change, though (or to begin with, for that matter): In the case of
a single problem, the desired covariance is X^T y, with y a column vector,
yielding another column vector of (n_components, 1). So the shape, if you
have multiple examples you're precomputing for, should end up being
(n_components, n_samples), and given the shape of Y that would be achieved by
X^T Y^T.

David

------------------------------------------------------------------------------
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Specify rather than learn sparse coding dictionary?

Reply via email to