Yeah I see OK. Both two experiments conducted with mahout ssvd I am familiar with dealt with input size greater than yours element wise, on a quite modest node count. So i don't think your input size will be a problem. But the number of singular values will be.
But I doubt any input will yield anything useful beyond k=200 but statistical noise. Even if you have a good decay of the singular values. But I bet you don't need that many. You can fit significantly more 'clusters' on a 'fairly small' dimensional space. On Jul 19, 2012 6:33 PM, "Aniruddha Basak" <[email protected]> wrote: > Thanks Dmitriy for your reply. > The matrix I am working on, has 10-20 non zero entries per row. So its > very sparse. > I am trying to do spectral clustering which involves eigen-decomposition. > I am wondering whether anyone has tried to do spectral clustering using > mahout > for very large affinity matrix (input). > > Aniruddha > > > -----Original Message----- > From: Dmitriy Lyubimov [mailto:[email protected]] > Sent: Thursday, July 19, 2012 6:28 PM > To: [email protected] > Subject: Re: eigendecomposition of very large matrices > > very significant sparsity may be a problem though for -q >=1 parameters. > Again, depends on the hardware you have and the # of non-zero elements in > the input. but -q=1 is still the most recommended setting here. > > > On Thu, Jul 19, 2012 at 6:20 PM, Dmitriy Lyubimov <[email protected]> > wrote: > > you may try SSVD. > > https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singular > > +Value+Decomposition > > > > but 4k eigenvectors (or, rather, singular values) is kind of still a > > lot though and may push the precision out of the error estimates. I > > don't we had precision study for that many. Also need quite a bit of > > memory to compute that (not to mention flops). More realistically you > > probably may try 1k singular values . You may try more if you have > > access to more powerful hardware than we did in the studies but > > distributed computation time will grow at about k^1.5, i.e. faster > > than linear, even if you have enough nodes for the tasks. > > > > -d > > > > On Thu, Jul 19, 2012 at 6:12 PM, Aniruddha Basak <[email protected]> > wrote: > >> Hi, > >> I am working on a clustering problem which involves determining the > >> largest "k" eigenvectors of a very large matrix. The matrices, I work > >> on, are typically of the order of 10^6 by 10^6. > >> Trying to do this using the Lanczos solver available in Mahout, I > >> found it is very slow and takes around 1.5 minutes to compute each > eigenvectors. > >> Hence to get 4000 eigenvectors, it takes 100 hours or 4 days !! > >> > >> So I am looking for something faster to solve the "Eigen decomposition" > >> problem for very large sparse matrix. Please suggest me what should I > use ? > >> > >> > >> Thanks, > >> Aniruddha > >> >
