On Tue, Dec 06, 2011 at 08:43:06PM +0100, Olivier Grisel wrote: > 2011/12/6 David Warde-Farley <[email protected]>: > > On Tue, Dec 06, 2011 at 09:04:22AM +0100, Alexandre Gramfort wrote: > >> > This actually gets at something I've been meaning to fiddle with and > >> > report but haven't had time: I'm not sure I completely trust the > >> > coordinate descent implementation in scikit-learn, because it seems to > >> > give me bogus answers a lot (i.e., the optimality conditions necessary > >> > for it to be an actual solution are not even approximately satisfied). > >> > Are you guys using something weird for the termination condition? > >> > >> can you give us a sample X and y that shows the pb? > >> > >> it should ultimately use the duality gap to stop the iterations but > >> there might be a corner case … > > > > In [34]: rng = np.random.RandomState(0) > > > > In [35]: dictionary = rng.normal(size=(100, 500)) / 1000; dictionary /= > > np.sqrt((dictionary ** 2).sum(axis=0)) > > > > In [36]: signal = rng.normal(size=100) / 1000 > > > > In [37]: from sklearn.linear_model import Lasso > > > > In [38]: lasso = Lasso(alpha=0.0001, max_iter=1e6, fit_intercept=False, > > tol=1e-8) > > > > In [39]: lasso.fit(dictionary, signal) > > Out[39]: > > Lasso(alpha=0.0001, copy_X=True, fit_intercept=False, max_iter=1000000.0, > > normalize=False, precompute='auto', tol=1e-08) > > > > In [40]: max(abs(lasso.coef_)) > > Out[40]: 0.0 > > > > In [41]: from pylearn2.optimization.feature_sign import feature_sign_search > > > > In [42]: coef = feature_sign_search(dictionary, signal, 0.0001) > > > > In [43]: max(abs(coef)) > > Out[43]: 0.0027295761244725018 > > > > And I'm pretty sure the latter result is the right one, since > > > > In [45]: def gradient(coefs): > > ....: gram = np.dot(dictionary.T, dictionary) > > ....: corr = np.dot(dictionary.T, signal) > > ....: return - 2 * corr + 2 * np.dot(gram, coefs) + 0.0001 * > > np.sign(coefs) > > ....: > > Actually, alpha in scikit-learn is multiplied by n_samples. I agree > this is misleading and not documented in the docstring. > > >>> lasso = Lasso(alpha=0.0001 / dictionary.shape[0], max_iter=1e6, > >>> fit_intercept=False, tol=1e-8).fit(dictionary, signal) > >>> max(abs(lasso.coef_)) > 0.0027627270397484554 > >>> max(abs(gradient(lasso.coef_))) > 0.00019687294269977963
Seems like there's an added factor of 2 in there as well, though this is a little more standard: In [94]: lasso = Lasso(alpha=0.0001 / (2 * dictionary.shape[0]), max_iter=1e8, fit_intercept=False, tol=1e-8).fit(dictionary, signal) In [95]: coef = feature_sign_search(dictionary, signal, 0.0001) In [96]: allclose(lasso.coef_, coef, atol=1e-7) Out[96]: True I think you're right that the precise cost function definitely ought to be documented in the front-facing classes rather than just the low-level Cython routines. I also think that scaling the way Lasso/ElasticNet does in the context of sparse coding may be very confusing, since in sparse coding it corresponds not to a number of training samples in a regression problem but to the number of input dimensions. The docstring of sparse_encode is quite confusing in that X, the dictionary, says "n_samples, n_components". The number of samples (in the context of sparse coding) should have no influence over the shape of the dictionary; this seems to have leaked over from the Lasso documentation. The shape and mathematical definition of cov doesn't make much sense to me given this change, though (or to begin with, for that matter): In the case of a single problem, the desired covariance is X^T y, with y a column vector, yielding another column vector of (n_components, 1). So the shape, if you have multiple examples you're precomputing for, should end up being (n_components, n_samples), and given the shape of Y that would be achieved by X^T Y^T. David ------------------------------------------------------------------------------ Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
