On 17.12.2011 17:27, Dmitriy Lyubimov wrote: > Interesting. > > Well so how did your decomposing go?
I tested the decomposition of the wikipedia pagelink graph (130M edges, 5.6M vertices making approx. quarter of a billion non-zeros in the symmetric adjacency matrix) on a 6 machine hadoop cluster. Got these running times for k = 10, p = 5 and one power-iteration: Q-job 1mins, 41sec Bt-job 9mins, 30sec ABt-job 37mins, 22sec Bt-job 9mins, 41sec U-job 30sec I think I'd need a couple more machines to handle the twitter graph though... --sebastian > On Dec 17, 2011 6:00 AM, "Sebastian Schelter" <[email protected]> > wrote: > >> Hi there, >> >> I played with Mahout to decompose the adjacency matrices of large graphs >> lately. I stumbled on a paper of Christos Faloutsos that describes a >> variation of the Lanczos algorithm they use for this on top of Hadoop. >> They even explicitly mention Mahout: >> >> "Very recently(March 2010), the Mahout project [2] provides >> SVD on top of HADOOP. Due to insufficient documentation, we were not >> able to find the input format and run a head-to-head comparison. But, >> reading the source code, we discovered that Mahout suffers from two >> major issues: (a) it assumes that the vector (b, with n=O(billion) >> entries) fits in the memory of a single machine, and (b) it implements >> the full re-orthogonalization which is inefficient." >> >> http://www.cs.cmu.edu/~ukang/papers/HeigenPAKDD2011.pdf >> >> --sebastian >> >
