On Mon, Dec 5, 2011 at 5:31 PM, Olivier Grisel <[email protected]> wrote:
> 2011/12/5 Andreas Mueller <[email protected]>:
>> On 12/05/2011 11:14 PM, Alexandre Gramfort wrote:
>>>> I do not understand. I have the dictionary already, so what is being 
>>>> estimated?
>>> well I am not sure to follow now, but if you have the dictionary the
>>> only missing part is the coefs of the decomposition.
>>>
>>> X = dico x coefs
>> I think there is a little misunderstanding here.
>> As I understand Ian, he has estimated a dictionary on some dataset
>> and wants to use this dictionary to encode some "new" data.
>>
>> You do not need to estimate anything to get the "model".
>>
>> What you do want is to "transform" the new data so that it
>> is coded using the specified dictionary.
>
> Yes.
>
>> I think this is exactly what the sparse encoding method that Olivier
>> referenced is doing.
>
> Yes and this method is just a wrapper for what Alexandre is explaining
> (about LARS and OMP). Both replies are consistent.
>
> Sparse coding is not simple linear projection (as in the transform
> method of PCA for instance) as there is a penalty on the non-zero
> loadings. Hence there is some "estimator fitting" that still occur at
> the "transform" time. Unless of course who use fixed arbitrary
> thresholding that might be enough for some tasks (e.g. sparse feature
> extraction for image classification).

I think I was mostly confused by the terminology-- I don't consider the code
to be part of a sparse coding model, nor to be estimated (I am aware that
sparse coding involves iterative optimization but I don't consider the optimizer
to be solving an estimation problem).

I don't understand exactly what interface Alexandre is saying to use.

To use the sparse_encode interface, should I pass a dictionary of shape
(num_data_features, num_code_elements) for X and a data matrix of shape
(num_data_features, num_examples) for Y?

I have tried doing that, but for alpha = 1. or alpha = 0.1 it returns
a matrix of
all zeros, and for alpha = .01 it returns a code with NaNs in it.

>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to