Lanczos is probably dominated by overhead and startup costs on such a small matrix. You only have 100,000 non-zreo elements which is a truly tiny problem. Stochastic projection SVD, for instance would compute the answer for such a problem in a few milliseconds.
You need a much larger problem to show parallel gain. Try 100 x 10^6 non-zeros or more. On Mon, Jul 4, 2011 at 11:27 PM, agnonchik <[email protected]> wrote: > What could be the reason of a poor Lanczos SVD scalability on cluster? I > don't observe any speed-up at all increasing the number of nodes. What am I > doing wrong? > > I'm processing a 10000x1000 matrix with 1% non-zeros. The elapsed CPU time > scales like this: > 1 slave node - 89m39.399s > 2 slave nodes - 93m47.435s > 8 slave nodes - 89m20.821s > > I checked the output, cleanEigenvectors - they are mathematically correct. > > Cluster specs: > Intel Core2 Duo E7200 @ 2.53 GHz CPUs > Gigabit Ethernet > each node has 80GB hard drive > > I saved the matrix in the sequential format to HDFS. Should I save it in > another format to be processed in parallel? > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Lanczos-SVD-scalability-tp3139790p3139790.html > Sent from the Mahout User List mailing list archive at Nabble.com. >
