I looked closer: turns out arpack is actually up-to-date.

I think the bug is in the kernel pca code: eigsh should be called with 
keyword which='LA' rather than which='LM'.  The fit_transform routine 
was finding three vectors, and then removing the one with a negative 
eigenvalue.

Before making this change, I want to understand what's going on.  Does 
anybody know if kernel PCA makes any assumptions about kernel form?  I 
know the kernel must be symmetric, but does the algorithm assume it's 
positive (semi) definite?
   Jake

Jacob VanderPlas wrote:
> Alejandro,
> It looks like the problem can be traced back to the ARPACK 
> eigensolver.  If you run the code with eigen_solver='dense', it works 
> as expected.  Sometimes arpack does not converge to all the requested 
> eigenvalues, and I guess there's no error reported when that happens.
>
> I tried performing the eigenvalue decomposition using the scipy 
> development version of arpack, and it gives 3 dimensions as expected.  
> It may be that we can fix this by updating the arpack wrapper from scipy.
>   Jake
>
> Alejandro Weinstein wrote:
>> Hi:
>>
>> I am observing an unexpected behavior of Isomap, related to the
>> dimensions of the transformed data. If I generate random data, say
>> 1000 points each with dimension 10, and fit a transform using as a
>> parameter out_dim=3, the fitted data has dimension (1000, 3), as
>> expected. However, when I repeat the same steps but this time using my
>> data set consisting of 427 points, each of dimension 400, the fitted
>> data has dimension (427, 2), i.e., the output dimension is 1 less than
>> out_dim. Using LLE with the same data set and parameters, the fitted
>> data has the expected dimension (427, 3).
>>
>> The following code illustrate the phenomena:
>>
>> #############################################
>> import numpy as np
>> from sklearn import manifold
>>
>> n = 1000;
>> m = 10;
>> X = np.random.rand(n,m)
>> n_neighbors = 5
>> out_dim = 3
>>
>> Y = manifold.Isomap(n_neighbors, out_dim).fit_transform(X)
>> print 'Using random data and Isomap'
>> print 'X shape:%s, out_dim:%d, Y shape: %s' % (X.shape, out_dim, 
>> Y.shape)
>>
>> X = np.load('X.npy')
>> Y = manifold.Isomap(n_neighbors, out_dim).fit_transform(X)
>> print
>> print 'Using the data X.npy and Isomap'
>> print 'X shape:%s, out_dim:%d, Y shape: %s' % (X.shape, out_dim, 
>> Y.shape)
>>
>> Y = manifold.LocallyLinearEmbedding(n_neighbors, 
>> out_dim).fit_transform(X)
>> print
>> print 'Using the data X.npy and LLE'
>> print 'X shape:%s, out_dim:%d, Y shape: %s' % (X.shape, out_dim, 
>> Y.shape)
>> ##################################################################
>>
>> And this is the output:
>>
>> Using random data and Isomap
>> X shape:(1000, 10), out_dim:3, Y shape: (1000, 3)
>>
>> Using the data X.npy and Isomap
>> X shape:(427, 400), out_dim:3, Y shape: (427, 2)
>>
>> Using the data X.npy and LLE
>> X shape:(427, 400), out_dim:3, Y shape: (427, 3)
>>
>> The code and the data set is available at
>> https://github.com/aweinstein/scrapcode
>>
>> In case it is relevant, the data set consist of documents represented
>> in the Latent Semantic Analysis space.
>>
>> Is this the expected behavior of Isomap, or is there something wrong?
>>
>> Alejandro.
>>
>> ------------------------------------------------------------------------------
>>  
>>
>> RSA(R) Conference 2012
>> Save $700 by Nov 18
>> Register now
>> http://p.sf.net/sfu/rsa-sfdev2dev1
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>   

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to