you may try SSVD. https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singular+Value+Decomposition
but 4k eigenvectors (or, rather, singular values) is kind of still a lot though and may push the precision out of the error estimates. I don't we had precision study for that many. Also need quite a bit of memory to compute that (not to mention flops). More realistically you probably may try 1k singular values . You may try more if you have access to more powerful hardware than we did in the studies but distributed computation time will grow at about k^1.5, i.e. faster than linear, even if you have enough nodes for the tasks. -d On Thu, Jul 19, 2012 at 6:12 PM, Aniruddha Basak <[email protected]> wrote: > Hi, > I am working on a clustering problem which involves determining the > largest "k" eigenvectors of a very large matrix. The matrices, I work on, > are typically of the order of 10^6 by 10^6. > Trying to do this using the Lanczos solver available in Mahout, I found it > is very slow and takes around 1.5 minutes to compute each eigenvectors. > Hence to get 4000 eigenvectors, it takes 100 hours or 4 days !! > > So I am looking for something faster to solve the "Eigen decomposition" > problem for very large sparse matrix. Please suggest me what should I use ? > > > Thanks, > Aniruddha >
