Folks have done SVD on very large matrices with Mahout, but not necessarily
for spectral clustering.

Are you sure that you actually need 4000 vectors?  As sparse as your data
is, I would expect that no more than a few hundred are anything but
statistical noise.

On Thu, Jul 19, 2012 at 6:32 PM, Aniruddha Basak <[email protected]>wrote:

> Thanks Dmitriy for your reply.
> The matrix I am working on, has 10-20 non zero entries per row. So its
> very sparse.
> I am trying to do spectral clustering which involves eigen-decomposition.
> I am wondering whether anyone has tried to do spectral clustering using
> mahout
> for very large affinity matrix (input).
>
> Aniruddha
>
>
> -----Original Message-----
> From: Dmitriy Lyubimov [mailto:[email protected]]
> Sent: Thursday, July 19, 2012 6:28 PM
> To: [email protected]
> Subject: Re: eigendecomposition of very large matrices
>
> very significant sparsity may be a problem though for -q >=1 parameters.
> Again, depends on the hardware you have and the # of non-zero elements in
> the input. but -q=1 is still the most recommended setting here.
>
>
> On Thu, Jul 19, 2012 at 6:20 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
> > you may try SSVD.
> > https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singular
> > +Value+Decomposition
> >
> > but 4k eigenvectors (or, rather, singular values) is kind of still a
> > lot though and may push the precision out of the error estimates. I
> > don't we had precision study for that many. Also need quite a bit of
> > memory to compute that (not to mention flops). More realistically you
> > probably may try 1k singular values . You may try more if you have
> > access to more powerful hardware than we did in the studies but
> > distributed computation time will grow at about k^1.5, i.e. faster
> > than linear, even if you have enough nodes for the tasks.
> >
> > -d
> >
> > On Thu, Jul 19, 2012 at 6:12 PM, Aniruddha Basak <[email protected]>
> wrote:
> >> Hi,
> >> I am working on a clustering problem which involves determining the
> >> largest "k" eigenvectors of a very large matrix. The matrices, I work
> >> on, are typically of the order of 10^6 by 10^6.
> >> Trying to do this using the Lanczos solver available in Mahout, I
> >> found it is very slow and takes around 1.5 minutes to compute each
> eigenvectors.
> >> Hence to get 4000 eigenvectors, it takes 100 hours or 4 days !!
> >>
> >> So I am looking for something faster to solve the "Eigen decomposition"
> >> problem for very large sparse matrix. Please suggest me what should I
> use ?
> >>
> >>
> >> Thanks,
> >> Aniruddha
> >>
>

Reply via email to