Hi there,

I played with Mahout to decompose the adjacency matrices of large graphs
lately. I stumbled on a paper of Christos Faloutsos that describes a
variation of the Lanczos algorithm they use for this on top of Hadoop.
They even explicitly mention Mahout:

"Very recently(March 2010), the Mahout project [2] provides
SVD on top of HADOOP. Due to insufficient documentation, we were not
able to find the input format and run a head-to-head comparison. But,
reading the source code, we discovered that Mahout suffers from two
major issues: (a) it assumes that the vector (b, with n=O(billion)
entries) fits in the memory of a single machine, and (b) it implements
the full re-orthogonalization which is inefficient."

http://www.cs.cmu.edu/~ukang/papers/HeigenPAKDD2011.pdf

--sebastian

Reply via email to