On Tue, Dec 6, 2011 at 11:46 PM, Alexandre Gramfort <[email protected]> wrote: > I do confirm that Lasso and LassoLars both minimize > > 1/2n || y - Xw || + alpha ||w||_1 > > and that the n should not be present in the sparse coding context. > > it means : > > http://scikit-learn.org/stable/modules/linear_model.html#lasso > > is not correct. I don't know if this also affects the doc of the SGD. > I would also vote for writing the cost function minimized in the Lasso > (etc.) docstrings. > > regarding the shapes using sparse_encode I'll let Vlad comment.
At first sight I agree with Olivier re: the shapes. The alpha issue is a semantics one and we should simply multiply it back by the appropriate dimension in order to expose a clear interface. How about I address these issues in the pull request I opened earlier today? I just remember discussing with Alex that the alpha values for the MiniBatch versions of the algorithms didn't correspond to the batch versions. I now realize that this might be the reason; the scaling for a mini-batch was different than for the full batch. Vlad > Alex > > On Tue, Dec 6, 2011 at 10:27 PM, David Warde-Farley > <[email protected]> wrote: >> On Tue, Dec 06, 2011 at 08:43:06PM +0100, Olivier Grisel wrote: >>> 2011/12/6 David Warde-Farley <[email protected]>: >>> > On Tue, Dec 06, 2011 at 09:04:22AM +0100, Alexandre Gramfort wrote: >>> >> > This actually gets at something I've been meaning to fiddle with and >>> >> > report but haven't had time: I'm not sure I completely trust the >>> >> > coordinate descent implementation in scikit-learn, because it seems to >>> >> > give me bogus answers a lot (i.e., the optimality conditions necessary >>> >> > for it to be an actual solution are not even approximately satisfied). >>> >> > Are you guys using something weird for the termination condition? >>> >> >>> >> can you give us a sample X and y that shows the pb? >>> >> >>> >> it should ultimately use the duality gap to stop the iterations but >>> >> there might be a corner case … >>> > >>> > In [34]: rng = np.random.RandomState(0) >>> > >>> > In [35]: dictionary = rng.normal(size=(100, 500)) / 1000; dictionary /= >>> > np.sqrt((dictionary ** 2).sum(axis=0)) >>> > >>> > In [36]: signal = rng.normal(size=100) / 1000 >>> > >>> > In [37]: from sklearn.linear_model import Lasso >>> > >>> > In [38]: lasso = Lasso(alpha=0.0001, max_iter=1e6, fit_intercept=False, >>> > tol=1e-8) >>> > >>> > In [39]: lasso.fit(dictionary, signal) >>> > Out[39]: >>> > Lasso(alpha=0.0001, copy_X=True, fit_intercept=False, max_iter=1000000.0, >>> > normalize=False, precompute='auto', tol=1e-08) >>> > >>> > In [40]: max(abs(lasso.coef_)) >>> > Out[40]: 0.0 >>> > >>> > In [41]: from pylearn2.optimization.feature_sign import >>> > feature_sign_search >>> > >>> > In [42]: coef = feature_sign_search(dictionary, signal, 0.0001) >>> > >>> > In [43]: max(abs(coef)) >>> > Out[43]: 0.0027295761244725018 >>> > >>> > And I'm pretty sure the latter result is the right one, since >>> > >>> > In [45]: def gradient(coefs): >>> > ....: gram = np.dot(dictionary.T, dictionary) >>> > ....: corr = np.dot(dictionary.T, signal) >>> > ....: return - 2 * corr + 2 * np.dot(gram, coefs) + 0.0001 * >>> > np.sign(coefs) >>> > ....: >>> >>> Actually, alpha in scikit-learn is multiplied by n_samples. I agree >>> this is misleading and not documented in the docstring. >>> >>> >>> lasso = Lasso(alpha=0.0001 / dictionary.shape[0], max_iter=1e6, >>> >>> fit_intercept=False, tol=1e-8).fit(dictionary, signal) >>> >>> max(abs(lasso.coef_)) >>> 0.0027627270397484554 >>> >>> max(abs(gradient(lasso.coef_))) >>> 0.00019687294269977963 >> >> Seems like there's an added factor of 2 in there as well, >> though this is a little more standard: >> >> In [94]: lasso = Lasso(alpha=0.0001 / (2 * dictionary.shape[0]), >> max_iter=1e8, fit_intercept=False, tol=1e-8).fit(dictionary, signal) >> >> In [95]: coef = feature_sign_search(dictionary, signal, 0.0001) >> In [96]: allclose(lasso.coef_, coef, atol=1e-7) >> Out[96]: True >> >> I think you're right that the precise cost function definitely ought to be >> documented in the front-facing classes rather than just the low-level Cython >> routines. >> >> I also think that scaling the way Lasso/ElasticNet does in the context of >> sparse coding may be very confusing, since in sparse coding it corresponds >> not to a number of training samples in a regression problem but to the number >> of input dimensions. >> >> The docstring of sparse_encode is quite confusing in that X, the dictionary, >> says "n_samples, n_components". The number of samples (in the context of >> sparse coding) should have no influence over the shape of the dictionary; >> this seems to have leaked over from the Lasso documentation. >> >> The shape and mathematical definition of cov doesn't make much sense to me >> given this change, though (or to begin with, for that matter): In the case of >> a single problem, the desired covariance is X^T y, with y a column vector, >> yielding another column vector of (n_components, 1). So the shape, if you >> have multiple examples you're precomputing for, should end up being >> (n_components, n_samples), and given the shape of Y that would be achieved by >> X^T Y^T. >> >> David >> >> ------------------------------------------------------------------------------ >> Cloud Services Checklist: Pricing and Packaging Optimization >> This white paper is intended to serve as a reference, checklist and point of >> discussion for anyone considering optimizing the pricing and packaging model >> of a cloud services business. Read Now! >> http://www.accelacomm.com/jaw/sfnl/114/51491232/ >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > ------------------------------------------------------------------------------ > Cloud Services Checklist: Pricing and Packaging Optimization > This white paper is intended to serve as a reference, checklist and point of > discussion for anyone considering optimizing the pricing and packaging model > of a cloud services business. Read Now! > http://www.accelacomm.com/jaw/sfnl/114/51491232/ > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
