Yeah I see OK. Both two experiments conducted with mahout ssvd I am
familiar with dealt with input size greater than yours element wise, on a
quite modest node count. So i don't think your input size will be a
problem. But the number of singular values will be.

But I doubt any input will yield anything useful beyond k=200 but
statistical noise. Even if you have a good decay of the singular values.
But I bet you don't need that many. You can fit significantly more
'clusters' on a 'fairly small' dimensional space.
On Jul 19, 2012 6:33 PM, "Aniruddha Basak" <[email protected]> wrote:

> Thanks Dmitriy for your reply.
> The matrix I am working on, has 10-20 non zero entries per row. So its
> very sparse.
> I am trying to do spectral clustering which involves eigen-decomposition.
> I am wondering whether anyone has tried to do spectral clustering using
> mahout
> for very large affinity matrix (input).
>
> Aniruddha
>
>
> -----Original Message-----
> From: Dmitriy Lyubimov [mailto:[email protected]]
> Sent: Thursday, July 19, 2012 6:28 PM
> To: [email protected]
> Subject: Re: eigendecomposition of very large matrices
>
> very significant sparsity may be a problem though for -q >=1 parameters.
> Again, depends on the hardware you have and the # of non-zero elements in
> the input. but -q=1 is still the most recommended setting here.
>
>
> On Thu, Jul 19, 2012 at 6:20 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
> > you may try SSVD.
> > https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singular
> > +Value+Decomposition
> >
> > but 4k eigenvectors (or, rather, singular values) is kind of still a
> > lot though and may push the precision out of the error estimates. I
> > don't we had precision study for that many. Also need quite a bit of
> > memory to compute that (not to mention flops). More realistically you
> > probably may try 1k singular values . You may try more if you have
> > access to more powerful hardware than we did in the studies but
> > distributed computation time will grow at about k^1.5, i.e. faster
> > than linear, even if you have enough nodes for the tasks.
> >
> > -d
> >
> > On Thu, Jul 19, 2012 at 6:12 PM, Aniruddha Basak <[email protected]>
> wrote:
> >> Hi,
> >> I am working on a clustering problem which involves determining the
> >> largest "k" eigenvectors of a very large matrix. The matrices, I work
> >> on, are typically of the order of 10^6 by 10^6.
> >> Trying to do this using the Lanczos solver available in Mahout, I
> >> found it is very slow and takes around 1.5 minutes to compute each
> eigenvectors.
> >> Hence to get 4000 eigenvectors, it takes 100 hours or 4 days !!
> >>
> >> So I am looking for something faster to solve the "Eigen decomposition"
> >> problem for very large sparse matrix. Please suggest me what should I
> use ?
> >>
> >>
> >> Thanks,
> >> Aniruddha
> >>
>

Reply via email to