ABt-job  37mins, 22sec
this guy should run under Bt-job (under 9 minutes in your case) i
think . In my tests it was. is this with 922 patch?

And it should be mentioned that the cluster size couldn't accomodate
all the generated tasks, is this correct assessment?


On Sat, Dec 17, 2011 at 2:58 PM, Sebastian Schelter <[email protected]> wrote:
> On 17.12.2011 17:27, Dmitriy Lyubimov wrote:
>> Interesting.
>>
>> Well so how did your decomposing go?
>
> I tested the decomposition of the wikipedia pagelink graph (130M edges,
> 5.6M vertices making approx. quarter of a billion non-zeros in the
> symmetric adjacency matrix) on a 6 machine hadoop cluster.
>
> Got these running times for k = 10, p = 5 and one power-iteration:
>
> Q-job    1mins, 41sec
> Bt-job   9mins, 30sec
> ABt-job  37mins, 22sec
> Bt-job   9mins, 41sec
> U-job    30sec
>
> I think I'd need a couple more machines to handle the twitter graph
> though...
>
> --sebastian
>
>
>> On Dec 17, 2011 6:00 AM, "Sebastian Schelter" <[email protected]>
>> wrote:
>>
>>> Hi there,
>>>
>>> I played with Mahout to decompose the adjacency matrices of large graphs
>>> lately. I stumbled on a paper of Christos Faloutsos that describes a
>>> variation of the Lanczos algorithm they use for this on top of Hadoop.
>>> They even explicitly mention Mahout:
>>>
>>> "Very recently(March 2010), the Mahout project [2] provides
>>> SVD on top of HADOOP. Due to insufficient documentation, we were not
>>> able to find the input format and run a head-to-head comparison. But,
>>> reading the source code, we discovered that Mahout suffers from two
>>> major issues: (a) it assumes that the vector (b, with n=O(billion)
>>> entries) fits in the memory of a single machine, and (b) it implements
>>> the full re-orthogonalization which is inefficient."
>>>
>>> http://www.cs.cmu.edu/~ukang/papers/HeigenPAKDD2011.pdf
>>>
>>> --sebastian
>>>
>>
>

Reply via email to