Hi Ted,
Thanks for your reply.
I am doing clustering of 10^6 objects (thus affinity matrix of that size) and 
expect 4000-10,000 clusters. That's why I need those many eigenvectors.

Will SVD be faster in this case ?

Aniruddha 



On Jul 19, 2012, at 7:20 PM, "Ted Dunning" <[email protected]> wrote:

> Folks have done SVD on very large matrices with Mahout, but not necessarily
> for spectral clustering.
> 
> Are you sure that you actually need 4000 vectors?  As sparse as your data
> is, I would expect that no more than a few hundred are anything but
> statistical noise.
> 
> On Thu, Jul 19, 2012 at 6:32 PM, Aniruddha Basak <[email protected]>wrote:
> 
>> Thanks Dmitriy for your reply.
>> The matrix I am working on, has 10-20 non zero entries per row. So its
>> very sparse.
>> I am trying to do spectral clustering which involves eigen-decomposition.
>> I am wondering whether anyone has tried to do spectral clustering using
>> mahout
>> for very large affinity matrix (input).
>> 
>> Aniruddha
>> 
>> 
>> -----Original Message-----
>> From: Dmitriy Lyubimov [mailto:[email protected]]
>> Sent: Thursday, July 19, 2012 6:28 PM
>> To: [email protected]
>> Subject: Re: eigendecomposition of very large matrices
>> 
>> very significant sparsity may be a problem though for -q >=1 parameters.
>> Again, depends on the hardware you have and the # of non-zero elements in
>> the input. but -q=1 is still the most recommended setting here.
>> 
>> 
>> On Thu, Jul 19, 2012 at 6:20 PM, Dmitriy Lyubimov <[email protected]>
>> wrote:
>>> you may try SSVD.
>>> https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singular
>>> +Value+Decomposition
>>> 
>>> but 4k eigenvectors (or, rather, singular values) is kind of still a
>>> lot though and may push the precision out of the error estimates. I
>>> don't we had precision study for that many. Also need quite a bit of
>>> memory to compute that (not to mention flops). More realistically you
>>> probably may try 1k singular values . You may try more if you have
>>> access to more powerful hardware than we did in the studies but
>>> distributed computation time will grow at about k^1.5, i.e. faster
>>> than linear, even if you have enough nodes for the tasks.
>>> 
>>> -d
>>> 
>>> On Thu, Jul 19, 2012 at 6:12 PM, Aniruddha Basak <[email protected]>
>> wrote:
>>>> Hi,
>>>> I am working on a clustering problem which involves determining the
>>>> largest "k" eigenvectors of a very large matrix. The matrices, I work
>>>> on, are typically of the order of 10^6 by 10^6.
>>>> Trying to do this using the Lanczos solver available in Mahout, I
>>>> found it is very slow and takes around 1.5 minutes to compute each
>> eigenvectors.
>>>> Hence to get 4000 eigenvectors, it takes 100 hours or 4 days !!
>>>> 
>>>> So I am looking for something faster to solve the "Eigen decomposition"
>>>> problem for very large sparse matrix. Please suggest me what should I
>> use ?
>>>> 
>>>> 
>>>> Thanks,
>>>> Aniruddha
>>>> 
>> 

Reply via email to