Hi Jacob,
Indeed, Isomap is a metric MDS, so you have the same hypothesis. A negative
eigenvalue should not happen, but one never knows.
As the eigenvalue only plays as a scaling factor, it is not too weird too
use a negative one in the embedding construction.
Cheers,
Matthieu
2011/11/7 Jacob VanderPlas <[email protected]>
> I dug around a bit, and found some info about kernel form in this document:
> http://people.kyb.tuebingen.mpg.de/lcayton/resexam.pdf
>
> MDS (on which Isomap is based) assumes a Euclidean distance matrix,
> which can be shown to always yield a positive semidefinite kernel. In
> the case of Isomap, the distance matrix is not Euclidean in general, and
> this can be fixed by ignoring any eigenvectors associated with negative
> eigenvalues.
> I think, based on this, that KernelPCA is correct as written, except
> that the arpack method should use which='LA' rather than which='LM'
> (thus ignoring any negative eigenvalues). This would fix Alejandro's
> problem. I'll make the change in master.
> Thanks for the detail & example code in your question Alejandro - it
> made it very easy to track down this bug.
> Jake
>
> Jacob VanderPlas wrote:
> > I looked closer: turns out arpack is actually up-to-date.
> >
> > I think the bug is in the kernel pca code: eigsh should be called with
> > keyword which='LA' rather than which='LM'. The fit_transform routine
> > was finding three vectors, and then removing the one with a negative
> > eigenvalue.
> >
> > Before making this change, I want to understand what's going on. Does
> > anybody know if kernel PCA makes any assumptions about kernel form? I
> > know the kernel must be symmetric, but does the algorithm assume it's
> > positive (semi) definite?
> > Jake
> >
> > Jacob VanderPlas wrote:
> >> Alejandro,
> >> It looks like the problem can be traced back to the ARPACK
> >> eigensolver. If you run the code with eigen_solver='dense', it works
> >> as expected. Sometimes arpack does not converge to all the requested
> >> eigenvalues, and I guess there's no error reported when that happens.
> >>
> >> I tried performing the eigenvalue decomposition using the scipy
> >> development version of arpack, and it gives 3 dimensions as
> >> expected. It may be that we can fix this by updating the arpack
> >> wrapper from scipy.
> >> Jake
> >>
> >> Alejandro Weinstein wrote:
> >>> Hi:
> >>>
> >>> I am observing an unexpected behavior of Isomap, related to the
> >>> dimensions of the transformed data. If I generate random data, say
> >>> 1000 points each with dimension 10, and fit a transform using as a
> >>> parameter out_dim=3, the fitted data has dimension (1000, 3), as
> >>> expected. However, when I repeat the same steps but this time using my
> >>> data set consisting of 427 points, each of dimension 400, the fitted
> >>> data has dimension (427, 2), i.e., the output dimension is 1 less than
> >>> out_dim. Using LLE with the same data set and parameters, the fitted
> >>> data has the expected dimension (427, 3).
> >>>
> >>> The following code illustrate the phenomena:
> >>>
> >>> #############################################
> >>> import numpy as np
> >>> from sklearn import manifold
> >>>
> >>> n = 1000;
> >>> m = 10;
> >>> X = np.random.rand(n,m)
> >>> n_neighbors = 5
> >>> out_dim = 3
> >>>
> >>> Y = manifold.Isomap(n_neighbors, out_dim).fit_transform(X)
> >>> print 'Using random data and Isomap'
> >>> print 'X shape:%s, out_dim:%d, Y shape: %s' % (X.shape, out_dim,
> >>> Y.shape)
> >>>
> >>> X = np.load('X.npy')
> >>> Y = manifold.Isomap(n_neighbors, out_dim).fit_transform(X)
> >>> print
> >>> print 'Using the data X.npy and Isomap'
> >>> print 'X shape:%s, out_dim:%d, Y shape: %s' % (X.shape, out_dim,
> >>> Y.shape)
> >>>
> >>> Y = manifold.LocallyLinearEmbedding(n_neighbors,
> >>> out_dim).fit_transform(X)
> >>> print
> >>> print 'Using the data X.npy and LLE'
> >>> print 'X shape:%s, out_dim:%d, Y shape: %s' % (X.shape, out_dim,
> >>> Y.shape)
> >>> ##################################################################
> >>>
> >>> And this is the output:
> >>>
> >>> Using random data and Isomap
> >>> X shape:(1000, 10), out_dim:3, Y shape: (1000, 3)
> >>>
> >>> Using the data X.npy and Isomap
> >>> X shape:(427, 400), out_dim:3, Y shape: (427, 2)
> >>>
> >>> Using the data X.npy and LLE
> >>> X shape:(427, 400), out_dim:3, Y shape: (427, 3)
> >>>
> >>> The code and the data set is available at
> >>> https://github.com/aweinstein/scrapcode
> >>>
> >>> In case it is relevant, the data set consist of documents represented
> >>> in the Latent Semantic Analysis space.
> >>>
> >>> Is this the expected behavior of Isomap, or is there something wrong?
> >>>
> >>> Alejandro.
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>>
> >>> RSA(R) Conference 2012
> >>> Save $700 by Nov 18
> >>> Register now
> >>> http://p.sf.net/sfu/rsa-sfdev2dev1
> >>> _______________________________________________
> >>> Scikit-learn-general mailing list
> >>> [email protected]
> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>>
>
>
> ------------------------------------------------------------------------------
> RSA(R) Conference 2012
> Save $700 by Nov 18
> Register now
> http://p.sf.net/sfu/rsa-sfdev2dev1
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general