Thanks Dmitriy for your reply. The matrix I am working on, has 10-20 non zero entries per row. So its very sparse. I am trying to do spectral clustering which involves eigen-decomposition. I am wondering whether anyone has tried to do spectral clustering using mahout for very large affinity matrix (input).
Aniruddha -----Original Message----- From: Dmitriy Lyubimov [mailto:[email protected]] Sent: Thursday, July 19, 2012 6:28 PM To: [email protected] Subject: Re: eigendecomposition of very large matrices very significant sparsity may be a problem though for -q >=1 parameters. Again, depends on the hardware you have and the # of non-zero elements in the input. but -q=1 is still the most recommended setting here. On Thu, Jul 19, 2012 at 6:20 PM, Dmitriy Lyubimov <[email protected]> wrote: > you may try SSVD. > https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singular > +Value+Decomposition > > but 4k eigenvectors (or, rather, singular values) is kind of still a > lot though and may push the precision out of the error estimates. I > don't we had precision study for that many. Also need quite a bit of > memory to compute that (not to mention flops). More realistically you > probably may try 1k singular values . You may try more if you have > access to more powerful hardware than we did in the studies but > distributed computation time will grow at about k^1.5, i.e. faster > than linear, even if you have enough nodes for the tasks. > > -d > > On Thu, Jul 19, 2012 at 6:12 PM, Aniruddha Basak <[email protected]> wrote: >> Hi, >> I am working on a clustering problem which involves determining the >> largest "k" eigenvectors of a very large matrix. The matrices, I work >> on, are typically of the order of 10^6 by 10^6. >> Trying to do this using the Lanczos solver available in Mahout, I >> found it is very slow and takes around 1.5 minutes to compute each >> eigenvectors. >> Hence to get 4000 eigenvectors, it takes 100 hours or 4 days !! >> >> So I am looking for something faster to solve the "Eigen decomposition" >> problem for very large sparse matrix. Please suggest me what should I use ? >> >> >> Thanks, >> Aniruddha >>
