Re: [Scikit-learn-general] sparse_encode omp error -- "the number of atoms..." (line 456 of dict_learning.py), sklearn 0.14.1

Kyle Kastner Thu, 15 May 2014 08:03:02 -0700

That is very interesting, and sheds some more light on the problem! Looking
at the sparse encoder, I don't think n_nonzero_coefs can ever become zero -
the n_nonzero_coefs should be 'max(n_features / 10, 1)'. The 2 in the
dictionary (D.shape[0]) should be n_atoms, which can't be greater than
n_features (64) according to the error message and the checks in both
orthogonal_mp functions. I think this second case should still work, since
it should pass the check in orthogonal_mp , but breaks for some reason due
to use of orthogonal_mp_gram (presumably to make the OMP faster?).

The second example has n_features = 64, n_samples = 100, and n_atoms = 2...
which fits the criteria of n_nonzero_coefs less than n_features, and should
always fit this case for auto choosing. 10% of the features is always <=
total number of features (as long as n_features is >= 0, I suppose). This
is why failing the check on 'autochoosing' n_nonzero_coefs is confusing me.

orthogonal_mp has 'if tol is None and n_nonzero_coefs > X.shape[1]'
while
orthogonal_mp_gram has 'if tol is None and n_nonzero_coefs > len(Gram)'

Are these two functions meant to be equivalent? It seems like they are
checking different things, since X.shape[1] is n_features (64 here at the
input to sparse_encode), while Gram has shape (2, 2), which is (n_atoms,
n_atoms) since 'Gram = np.dot(dictionary, dictionary.T)'.

When I change D.shape[0], it ends up breaking the check in
orthogonal_mp_gram - but n_nonzero_coefs (autochosen as 10% of 64 = 6) in
both cases is still less than n_features, which hasn't changed. It looks
like the Gram check is really checking n_nonzero_coefs < n_atoms which is
very different, though maybe the correct thing for that computation?

I don't even see a way to do the equivalent check in orthogonal_mp_gram -
it is only passed cov (n_atoms, n_samples) and gram (n_atoms, n_atoms) to
do the computation, along with a few other settings. There doesn't seem to
be any n_features info at all

If they aren't meant to be equivalent, maybe the error message should be
different for the orthogonal_mp_gram? If they *are* meant to be the same,
then I am not sure how that is currently the case, though there are tests
that check equivalency and pass in test_omp.py.

I was using the defaults for n_nonzero_coefs while I try to understand the
mexOMP referenced by a MATLAB KSVD implementation seen here -
http://www.ux.uis.no/~karlsk/dle/#ssec32 . 'tnz', s in the link is using a
default of 5 in the matlab code? Which seems very arbitrary, and not very
different from an auto-chosen value of 6 in this particular case. I hit
this problem when I started testing on a different dataset - got the
initial implementation on images, and am now trying to use it for signals.

Thanks for looking into this - hopefully it is just confusion on my part

On Thu, May 15, 2014 at 1:55 AM, Gael Varoquaux <
gael.varoqu...@normalesup.org> wrote:

> On Thu, May 15, 2014 at 12:56:09AM -0500, Kyle Kastner wrote:
> > #Broken?
> > X = np.random.randn(100, 64)
> > D = np.random.randn(2, 64)
> > sparse_encode(X, D, algorithm='omp', n_nonzero_coefs=None)
>
> After a quick glance, could it be that with a dimension of 2,
> n_nonzero_coefs, which defaults to int(.1 * n_features) is zero? When I
> put it manually to 1, this work.
>
> This parameter is what I believe would control 'k' in a K-SVD. I am a bit
> surprised that you are leaving it to its default value.
>
> HTH,
>
> Gaël
>
>
> ------------------------------------------------------------------------------
> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> Instantly run your Selenium tests across 300+ browser/OS combos.
> Get unparalleled scalability from the best Selenium testing platform
> available
> Simple to use. Nothing to install. Get started now for free."
> http://p.sf.net/sfu/SauceLabs
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] sparse_encode omp error -- "the number of atoms..." (line 456 of dict_learning.py), sklearn 0.14.1

Reply via email to