I dug around a bit, and found some info about kernel form in this document:
http://people.kyb.tuebingen.mpg.de/lcayton/resexam.pdf
MDS (on which Isomap is based) assumes a Euclidean distance matrix,
which can be shown to always yield a positive semidefinite kernel. In
the case of Isomap, the distance matrix is not Euclidean in general, and
this can be fixed by ignoring any eigenvectors associated with negative
eigenvalues.
I think, based on this, that KernelPCA is correct as written, except
that the arpack method should use which='LA' rather than which='LM'
(thus ignoring any negative eigenvalues). This would fix Alejandro's
problem. I'll make the change in master.
Thanks for the detail & example code in your question Alejandro - it
made it very easy to track down this bug.
Jake
Jacob VanderPlas wrote:
> I looked closer: turns out arpack is actually up-to-date.
>
> I think the bug is in the kernel pca code: eigsh should be called with
> keyword which='LA' rather than which='LM'. The fit_transform routine
> was finding three vectors, and then removing the one with a negative
> eigenvalue.
>
> Before making this change, I want to understand what's going on. Does
> anybody know if kernel PCA makes any assumptions about kernel form? I
> know the kernel must be symmetric, but does the algorithm assume it's
> positive (semi) definite?
> Jake
>
> Jacob VanderPlas wrote:
>> Alejandro,
>> It looks like the problem can be traced back to the ARPACK
>> eigensolver. If you run the code with eigen_solver='dense', it works
>> as expected. Sometimes arpack does not converge to all the requested
>> eigenvalues, and I guess there's no error reported when that happens.
>>
>> I tried performing the eigenvalue decomposition using the scipy
>> development version of arpack, and it gives 3 dimensions as
>> expected. It may be that we can fix this by updating the arpack
>> wrapper from scipy.
>> Jake
>>
>> Alejandro Weinstein wrote:
>>> Hi:
>>>
>>> I am observing an unexpected behavior of Isomap, related to the
>>> dimensions of the transformed data. If I generate random data, say
>>> 1000 points each with dimension 10, and fit a transform using as a
>>> parameter out_dim=3, the fitted data has dimension (1000, 3), as
>>> expected. However, when I repeat the same steps but this time using my
>>> data set consisting of 427 points, each of dimension 400, the fitted
>>> data has dimension (427, 2), i.e., the output dimension is 1 less than
>>> out_dim. Using LLE with the same data set and parameters, the fitted
>>> data has the expected dimension (427, 3).
>>>
>>> The following code illustrate the phenomena:
>>>
>>> #############################################
>>> import numpy as np
>>> from sklearn import manifold
>>>
>>> n = 1000;
>>> m = 10;
>>> X = np.random.rand(n,m)
>>> n_neighbors = 5
>>> out_dim = 3
>>>
>>> Y = manifold.Isomap(n_neighbors, out_dim).fit_transform(X)
>>> print 'Using random data and Isomap'
>>> print 'X shape:%s, out_dim:%d, Y shape: %s' % (X.shape, out_dim,
>>> Y.shape)
>>>
>>> X = np.load('X.npy')
>>> Y = manifold.Isomap(n_neighbors, out_dim).fit_transform(X)
>>> print
>>> print 'Using the data X.npy and Isomap'
>>> print 'X shape:%s, out_dim:%d, Y shape: %s' % (X.shape, out_dim,
>>> Y.shape)
>>>
>>> Y = manifold.LocallyLinearEmbedding(n_neighbors,
>>> out_dim).fit_transform(X)
>>> print
>>> print 'Using the data X.npy and LLE'
>>> print 'X shape:%s, out_dim:%d, Y shape: %s' % (X.shape, out_dim,
>>> Y.shape)
>>> ##################################################################
>>>
>>> And this is the output:
>>>
>>> Using random data and Isomap
>>> X shape:(1000, 10), out_dim:3, Y shape: (1000, 3)
>>>
>>> Using the data X.npy and Isomap
>>> X shape:(427, 400), out_dim:3, Y shape: (427, 2)
>>>
>>> Using the data X.npy and LLE
>>> X shape:(427, 400), out_dim:3, Y shape: (427, 3)
>>>
>>> The code and the data set is available at
>>> https://github.com/aweinstein/scrapcode
>>>
>>> In case it is relevant, the data set consist of documents represented
>>> in the Latent Semantic Analysis space.
>>>
>>> Is this the expected behavior of Isomap, or is there something wrong?
>>>
>>> Alejandro.
>>>
>>> ------------------------------------------------------------------------------
>>>
>>>
>>> RSA(R) Conference 2012
>>> Save $700 by Nov 18
>>> Register now
>>> http://p.sf.net/sfu/rsa-sfdev2dev1
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general