Pps if you do insist on having a lot of k then you'll benefit from smaller
hdfs block size, not larger.
On Jul 19, 2012 10:50 PM, "Dmitriy Lyubimov" <[email protected]> wrote:

> Yeah I see OK. Both two experiments conducted with mahout ssvd I am
> familiar with dealt with input size greater than yours element wise, on a
> quite modest node count. So i don't think your input size will be a
> problem. But the number of singular values will be.
>
> But I doubt any input will yield anything useful beyond k=200 but
> statistical noise. Even if you have a good decay of the singular values.
> But I bet you don't need that many. You can fit significantly more
> 'clusters' on a 'fairly small' dimensional space.
> On Jul 19, 2012 6:33 PM, "Aniruddha Basak" <[email protected]> wrote:
>
>> Thanks Dmitriy for your reply.
>> The matrix I am working on, has 10-20 non zero entries per row. So its
>> very sparse.
>> I am trying to do spectral clustering which involves eigen-decomposition.
>> I am wondering whether anyone has tried to do spectral clustering using
>> mahout
>> for very large affinity matrix (input).
>>
>> Aniruddha
>>
>>
>> -----Original Message-----
>> From: Dmitriy Lyubimov [mailto:[email protected]]
>> Sent: Thursday, July 19, 2012 6:28 PM
>> To: [email protected]
>> Subject: Re: eigendecomposition of very large matrices
>>
>> very significant sparsity may be a problem though for -q >=1 parameters.
>> Again, depends on the hardware you have and the # of non-zero elements in
>> the input. but -q=1 is still the most recommended setting here.
>>
>>
>> On Thu, Jul 19, 2012 at 6:20 PM, Dmitriy Lyubimov <[email protected]>
>> wrote:
>> > you may try SSVD.
>> > https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singular
>> > +Value+Decomposition
>> >
>> > but 4k eigenvectors (or, rather, singular values) is kind of still a
>> > lot though and may push the precision out of the error estimates. I
>> > don't we had precision study for that many. Also need quite a bit of
>> > memory to compute that (not to mention flops). More realistically you
>> > probably may try 1k singular values . You may try more if you have
>> > access to more powerful hardware than we did in the studies but
>> > distributed computation time will grow at about k^1.5, i.e. faster
>> > than linear, even if you have enough nodes for the tasks.
>> >
>> > -d
>> >
>> > On Thu, Jul 19, 2012 at 6:12 PM, Aniruddha Basak <[email protected]>
>> wrote:
>> >> Hi,
>> >> I am working on a clustering problem which involves determining the
>> >> largest "k" eigenvectors of a very large matrix. The matrices, I work
>> >> on, are typically of the order of 10^6 by 10^6.
>> >> Trying to do this using the Lanczos solver available in Mahout, I
>> >> found it is very slow and takes around 1.5 minutes to compute each
>> eigenvectors.
>> >> Hence to get 4000 eigenvectors, it takes 100 hours or 4 days !!
>> >>
>> >> So I am looking for something faster to solve the "Eigen decomposition"
>> >> problem for very large sparse matrix. Please suggest me what should I
>> use ?
>> >>
>> >>
>> >> Thanks,
>> >> Aniruddha
>> >>
>>
>

Reply via email to