I dug around a bit, and found some info about kernel form in this document:
http://people.kyb.tuebingen.mpg.de/lcayton/resexam.pdf

MDS (on which Isomap is based) assumes a Euclidean distance matrix, 
which can be shown to always yield a positive semidefinite kernel.  In 
the case of Isomap, the distance matrix is not Euclidean in general, and 
this can be fixed by ignoring any eigenvectors associated with negative 
eigenvalues.
I think, based on this, that KernelPCA is correct as written, except 
that the arpack method should use which='LA' rather than which='LM' 
(thus ignoring any negative eigenvalues).  This would fix Alejandro's 
problem.  I'll make the change in master.
Thanks for the detail & example code in your question Alejandro - it 
made it very easy to track down this bug.
   Jake

Jacob VanderPlas wrote:
> I looked closer: turns out arpack is actually up-to-date.
>
> I think the bug is in the kernel pca code: eigsh should be called with 
> keyword which='LA' rather than which='LM'.  The fit_transform routine 
> was finding three vectors, and then removing the one with a negative 
> eigenvalue.
>
> Before making this change, I want to understand what's going on.  Does 
> anybody know if kernel PCA makes any assumptions about kernel form?  I 
> know the kernel must be symmetric, but does the algorithm assume it's 
> positive (semi) definite?
>   Jake
>
> Jacob VanderPlas wrote:
>> Alejandro,
>> It looks like the problem can be traced back to the ARPACK 
>> eigensolver.  If you run the code with eigen_solver='dense', it works 
>> as expected.  Sometimes arpack does not converge to all the requested 
>> eigenvalues, and I guess there's no error reported when that happens.
>>
>> I tried performing the eigenvalue decomposition using the scipy 
>> development version of arpack, and it gives 3 dimensions as 
>> expected.  It may be that we can fix this by updating the arpack 
>> wrapper from scipy.
>>   Jake
>>
>> Alejandro Weinstein wrote:
>>> Hi:
>>>
>>> I am observing an unexpected behavior of Isomap, related to the
>>> dimensions of the transformed data. If I generate random data, say
>>> 1000 points each with dimension 10, and fit a transform using as a
>>> parameter out_dim=3, the fitted data has dimension (1000, 3), as
>>> expected. However, when I repeat the same steps but this time using my
>>> data set consisting of 427 points, each of dimension 400, the fitted
>>> data has dimension (427, 2), i.e., the output dimension is 1 less than
>>> out_dim. Using LLE with the same data set and parameters, the fitted
>>> data has the expected dimension (427, 3).
>>>
>>> The following code illustrate the phenomena:
>>>
>>> #############################################
>>> import numpy as np
>>> from sklearn import manifold
>>>
>>> n = 1000;
>>> m = 10;
>>> X = np.random.rand(n,m)
>>> n_neighbors = 5
>>> out_dim = 3
>>>
>>> Y = manifold.Isomap(n_neighbors, out_dim).fit_transform(X)
>>> print 'Using random data and Isomap'
>>> print 'X shape:%s, out_dim:%d, Y shape: %s' % (X.shape, out_dim, 
>>> Y.shape)
>>>
>>> X = np.load('X.npy')
>>> Y = manifold.Isomap(n_neighbors, out_dim).fit_transform(X)
>>> print
>>> print 'Using the data X.npy and Isomap'
>>> print 'X shape:%s, out_dim:%d, Y shape: %s' % (X.shape, out_dim, 
>>> Y.shape)
>>>
>>> Y = manifold.LocallyLinearEmbedding(n_neighbors, 
>>> out_dim).fit_transform(X)
>>> print
>>> print 'Using the data X.npy and LLE'
>>> print 'X shape:%s, out_dim:%d, Y shape: %s' % (X.shape, out_dim, 
>>> Y.shape)
>>> ##################################################################
>>>
>>> And this is the output:
>>>
>>> Using random data and Isomap
>>> X shape:(1000, 10), out_dim:3, Y shape: (1000, 3)
>>>
>>> Using the data X.npy and Isomap
>>> X shape:(427, 400), out_dim:3, Y shape: (427, 2)
>>>
>>> Using the data X.npy and LLE
>>> X shape:(427, 400), out_dim:3, Y shape: (427, 3)
>>>
>>> The code and the data set is available at
>>> https://github.com/aweinstein/scrapcode
>>>
>>> In case it is relevant, the data set consist of documents represented
>>> in the Latent Semantic Analysis space.
>>>
>>> Is this the expected behavior of Isomap, or is there something wrong?
>>>
>>> Alejandro.
>>>
>>> ------------------------------------------------------------------------------
>>>  
>>>
>>> RSA(R) Conference 2012
>>> Save $700 by Nov 18
>>> Register now
>>> http://p.sf.net/sfu/rsa-sfdev2dev1
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>   

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to