Pps if you do insist on having a lot of k then you'll benefit from smaller hdfs block size, not larger. On Jul 19, 2012 10:50 PM, "Dmitriy Lyubimov" <[email protected]> wrote:
> Yeah I see OK. Both two experiments conducted with mahout ssvd I am > familiar with dealt with input size greater than yours element wise, on a > quite modest node count. So i don't think your input size will be a > problem. But the number of singular values will be. > > But I doubt any input will yield anything useful beyond k=200 but > statistical noise. Even if you have a good decay of the singular values. > But I bet you don't need that many. You can fit significantly more > 'clusters' on a 'fairly small' dimensional space. > On Jul 19, 2012 6:33 PM, "Aniruddha Basak" <[email protected]> wrote: > >> Thanks Dmitriy for your reply. >> The matrix I am working on, has 10-20 non zero entries per row. So its >> very sparse. >> I am trying to do spectral clustering which involves eigen-decomposition. >> I am wondering whether anyone has tried to do spectral clustering using >> mahout >> for very large affinity matrix (input). >> >> Aniruddha >> >> >> -----Original Message----- >> From: Dmitriy Lyubimov [mailto:[email protected]] >> Sent: Thursday, July 19, 2012 6:28 PM >> To: [email protected] >> Subject: Re: eigendecomposition of very large matrices >> >> very significant sparsity may be a problem though for -q >=1 parameters. >> Again, depends on the hardware you have and the # of non-zero elements in >> the input. but -q=1 is still the most recommended setting here. >> >> >> On Thu, Jul 19, 2012 at 6:20 PM, Dmitriy Lyubimov <[email protected]> >> wrote: >> > you may try SSVD. >> > https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singular >> > +Value+Decomposition >> > >> > but 4k eigenvectors (or, rather, singular values) is kind of still a >> > lot though and may push the precision out of the error estimates. I >> > don't we had precision study for that many. Also need quite a bit of >> > memory to compute that (not to mention flops). More realistically you >> > probably may try 1k singular values . You may try more if you have >> > access to more powerful hardware than we did in the studies but >> > distributed computation time will grow at about k^1.5, i.e. faster >> > than linear, even if you have enough nodes for the tasks. >> > >> > -d >> > >> > On Thu, Jul 19, 2012 at 6:12 PM, Aniruddha Basak <[email protected]> >> wrote: >> >> Hi, >> >> I am working on a clustering problem which involves determining the >> >> largest "k" eigenvectors of a very large matrix. The matrices, I work >> >> on, are typically of the order of 10^6 by 10^6. >> >> Trying to do this using the Lanczos solver available in Mahout, I >> >> found it is very slow and takes around 1.5 minutes to compute each >> eigenvectors. >> >> Hence to get 4000 eigenvectors, it takes 100 hours or 4 days !! >> >> >> >> So I am looking for something faster to solve the "Eigen decomposition" >> >> problem for very large sparse matrix. Please suggest me what should I >> use ? >> >> >> >> >> >> Thanks, >> >> Aniruddha >> >> >> >
