On Thu, Mar 29, 2012 at 10:24:29PM +0200, Immanuel wrote:
> 
> > +1 for starting with a first patch on the current CD implementation to
> > get familiar with the existing code base.
> Just want to let you know that I'm on it, I hope I can write the batch
> over the weekend.
> >
> > As for the content of the proposal itself, it would be good to include
> > extensive profiling sessions on realistic datasets (e.g. microarray
> > data) both on individual estimator runs and on regularization paths
> > with warm restarts.
> >
> > Also David experienced poor performance compared to other
> > implementation when using the CD models in a sparse coding. Would be
> You mean that the data matrix X has a lot of zero entries? There is a
> comment
> on this case in section 2.3 (
> www.stanford.edu/~hastie/Papers/glmnet.pdf  ).

No, sparse coding is a related problem whereby a code vector x is minimized
with respect to a fixed dictionary matrix D and target vector y under the
loss function

    argmin_x ||y - Dx||^2 + lambda * |x|_1

The idea is to recover an approximation to y from a sparse linear combination
of the columns of D. It's basically the same problem setting as the lasso,
where x corresponds to the linear model coefficients (often called beta),
the dictionary corresponds to the data, y corresponds to the target vector.

The coordinate descent for Lasso implementation in scikit-learn is re-used
for this purpose.  The difference is that typically, in the process of
learning the dictionary matrix, you need to evaluate thousands of these Lasso
problems per iteration.

David

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to