On 17.12.2011 17:27, Dmitriy Lyubimov wrote:
> Interesting.
> 
> Well so how did your decomposing go?

I tested the decomposition of the wikipedia pagelink graph (130M edges,
5.6M vertices making approx. quarter of a billion non-zeros in the
symmetric adjacency matrix) on a 6 machine hadoop cluster.

Got these running times for k = 10, p = 5 and one power-iteration:

Q-job    1mins, 41sec
Bt-job   9mins, 30sec
ABt-job  37mins, 22sec
Bt-job   9mins, 41sec
U-job    30sec  

I think I'd need a couple more machines to handle the twitter graph
though...

--sebastian


> On Dec 17, 2011 6:00 AM, "Sebastian Schelter" <[email protected]>
> wrote:
> 
>> Hi there,
>>
>> I played with Mahout to decompose the adjacency matrices of large graphs
>> lately. I stumbled on a paper of Christos Faloutsos that describes a
>> variation of the Lanczos algorithm they use for this on top of Hadoop.
>> They even explicitly mention Mahout:
>>
>> "Very recently(March 2010), the Mahout project [2] provides
>> SVD on top of HADOOP. Due to insufficient documentation, we were not
>> able to find the input format and run a head-to-head comparison. But,
>> reading the source code, we discovered that Mahout suffers from two
>> major issues: (a) it assumes that the vector (b, with n=O(billion)
>> entries) fits in the memory of a single machine, and (b) it implements
>> the full re-orthogonalization which is inefficient."
>>
>> http://www.cs.cmu.edu/~ukang/papers/HeigenPAKDD2011.pdf
>>
>> --sebastian
>>
> 

Reply via email to