On Mon, Dec 5, 2011 at 5:31 PM, Olivier Grisel <[email protected]> wrote: > 2011/12/5 Andreas Mueller <[email protected]>: >> On 12/05/2011 11:14 PM, Alexandre Gramfort wrote: >>>> I do not understand. I have the dictionary already, so what is being >>>> estimated? >>> well I am not sure to follow now, but if you have the dictionary the >>> only missing part is the coefs of the decomposition. >>> >>> X = dico x coefs >> I think there is a little misunderstanding here. >> As I understand Ian, he has estimated a dictionary on some dataset >> and wants to use this dictionary to encode some "new" data. >> >> You do not need to estimate anything to get the "model". >> >> What you do want is to "transform" the new data so that it >> is coded using the specified dictionary. > > Yes. > >> I think this is exactly what the sparse encoding method that Olivier >> referenced is doing. > > Yes and this method is just a wrapper for what Alexandre is explaining > (about LARS and OMP). Both replies are consistent. > > Sparse coding is not simple linear projection (as in the transform > method of PCA for instance) as there is a penalty on the non-zero > loadings. Hence there is some "estimator fitting" that still occur at > the "transform" time. Unless of course who use fixed arbitrary > thresholding that might be enough for some tasks (e.g. sparse feature > extraction for image classification).
I think I was mostly confused by the terminology-- I don't consider the code to be part of a sparse coding model, nor to be estimated (I am aware that sparse coding involves iterative optimization but I don't consider the optimizer to be solving an estimation problem). I don't understand exactly what interface Alexandre is saying to use. To use the sparse_encode interface, should I pass a dictionary of shape (num_data_features, num_code_elements) for X and a data matrix of shape (num_data_features, num_examples) for Y? I have tried doing that, but for alpha = 1. or alpha = 0.1 it returns a matrix of all zeros, and for alpha = .01 it returns a code with NaNs in it. > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-novd2d > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
