Hi,
I am trying to use SSVD instead of Lanczos, as a part of Spectral Kmeans. 
However, I could not find the relation between the eigenvectors and U, V 
matrices.
Can someone please tell me, how to retrieve the eigenvectors from SSVD 
decomposition ?

Thanks,
Aniruddha



-----Original Message-----
From: Dmitriy Lyubimov [mailto:[email protected]] 
Sent: Thursday, July 19, 2012 10:53 PM
To: [email protected]
Subject: RE: eigendecomposition of very large matrices

Pps if you do insist on having a lot of k then you'll benefit from smaller hdfs 
block size, not larger.
On Jul 19, 2012 10:50 PM, "Dmitriy Lyubimov" <[email protected]> wrote:

> Yeah I see OK. Both two experiments conducted with mahout ssvd I am 
> familiar with dealt with input size greater than yours element wise, 
> on a quite modest node count. So i don't think your input size will be 
> a problem. But the number of singular values will be.
>
> But I doubt any input will yield anything useful beyond k=200 but 
> statistical noise. Even if you have a good decay of the singular values.
> But I bet you don't need that many. You can fit significantly more 
> 'clusters' on a 'fairly small' dimensional space.
> On Jul 19, 2012 6:33 PM, "Aniruddha Basak" <[email protected]> wrote:
>
>> Thanks Dmitriy for your reply.
>> The matrix I am working on, has 10-20 non zero entries per row. So 
>> its very sparse.
>> I am trying to do spectral clustering which involves eigen-decomposition.
>> I am wondering whether anyone has tried to do spectral clustering 
>> using mahout for very large affinity matrix (input).
>>
>> Aniruddha
>>
>>
>> -----Original Message-----
>> From: Dmitriy Lyubimov [mailto:[email protected]]
>> Sent: Thursday, July 19, 2012 6:28 PM
>> To: [email protected]
>> Subject: Re: eigendecomposition of very large matrices
>>
>> very significant sparsity may be a problem though for -q >=1 parameters.
>> Again, depends on the hardware you have and the # of non-zero 
>> elements in the input. but -q=1 is still the most recommended setting here.
>>
>>
>> On Thu, Jul 19, 2012 at 6:20 PM, Dmitriy Lyubimov <[email protected]>
>> wrote:
>> > you may try SSVD.
>> > https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singu
>> > lar
>> > +Value+Decomposition
>> >
>> > but 4k eigenvectors (or, rather, singular values) is kind of still 
>> > a lot though and may push the precision out of the error estimates. 
>> > I don't we had precision study for that many. Also need quite a bit 
>> > of memory to compute that (not to mention flops). More 
>> > realistically you probably may try 1k singular values . You may try 
>> > more if you have access to more powerful hardware than we did in 
>> > the studies but distributed computation time will grow at about 
>> > k^1.5, i.e. faster than linear, even if you have enough nodes for the 
>> > tasks.
>> >
>> > -d
>> >
>> > On Thu, Jul 19, 2012 at 6:12 PM, Aniruddha Basak 
>> > <[email protected]>
>> wrote:
>> >> Hi,
>> >> I am working on a clustering problem which involves determining 
>> >> the largest "k" eigenvectors of a very large matrix. The matrices, 
>> >> I work on, are typically of the order of 10^6 by 10^6.
>> >> Trying to do this using the Lanczos solver available in Mahout, I 
>> >> found it is very slow and takes around 1.5 minutes to compute each
>> eigenvectors.
>> >> Hence to get 4000 eigenvectors, it takes 100 hours or 4 days !!
>> >>
>> >> So I am looking for something faster to solve the "Eigen decomposition"
>> >> problem for very large sparse matrix. Please suggest me what 
>> >> should I
>> use ?
>> >>
>> >>
>> >> Thanks,
>> >> Aniruddha
>> >>
>>
>

Reply via email to