Oh. my commit for 922 branch did not go thru in its entirety for some
reason. need to fix this.


On Sat, Dec 17, 2011 at 3:25 PM, Dmitriy Lyubimov <[email protected]> wrote:
> yeah ok i requested slightly less k+p so not 4 mins for AB' but it
> still should be slightly under Bt running time (as in 8 mins perhaps).
>
> On Sat, Dec 17, 2011 at 3:13 PM, Dmitriy Lyubimov <[email protected]> wrote:
>> In my tests, ABt-job took under 3 minutes per mapper and practically
>> no time for reducing. so it should be running at about 4 minutes on a
>> cluster with sufficient capacity (in your case, something like 10=11
>> nodes, it seemed). Ok i'll rerun in our QA on Monday again to see
>> what's happening.
>>
>> On Sat, Dec 17, 2011 at 3:04 PM, Dmitriy Lyubimov <[email protected]> wrote:
>>> ABt-job  37mins, 22sec
>>> this guy should run under Bt-job (under 9 minutes in your case) i
>>> think . In my tests it was. is this with 922 patch?
>>>
>>> And it should be mentioned that the cluster size couldn't accomodate
>>> all the generated tasks, is this correct assessment?
>>>
>>>
>>> On Sat, Dec 17, 2011 at 2:58 PM, Sebastian Schelter <[email protected]> wrote:
>>>> On 17.12.2011 17:27, Dmitriy Lyubimov wrote:
>>>>> Interesting.
>>>>>
>>>>> Well so how did your decomposing go?
>>>>
>>>> I tested the decomposition of the wikipedia pagelink graph (130M edges,
>>>> 5.6M vertices making approx. quarter of a billion non-zeros in the
>>>> symmetric adjacency matrix) on a 6 machine hadoop cluster.
>>>>
>>>> Got these running times for k = 10, p = 5 and one power-iteration:
>>>>
>>>> Q-job    1mins, 41sec
>>>> Bt-job   9mins, 30sec
>>>> ABt-job  37mins, 22sec
>>>> Bt-job   9mins, 41sec
>>>> U-job    30sec
>>>>
>>>> I think I'd need a couple more machines to handle the twitter graph
>>>> though...
>>>>
>>>> --sebastian
>>>>
>>>>
>>>>> On Dec 17, 2011 6:00 AM, "Sebastian Schelter" <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> I played with Mahout to decompose the adjacency matrices of large graphs
>>>>>> lately. I stumbled on a paper of Christos Faloutsos that describes a
>>>>>> variation of the Lanczos algorithm they use for this on top of Hadoop.
>>>>>> They even explicitly mention Mahout:
>>>>>>
>>>>>> "Very recently(March 2010), the Mahout project [2] provides
>>>>>> SVD on top of HADOOP. Due to insufficient documentation, we were not
>>>>>> able to find the input format and run a head-to-head comparison. But,
>>>>>> reading the source code, we discovered that Mahout suffers from two
>>>>>> major issues: (a) it assumes that the vector (b, with n=O(billion)
>>>>>> entries) fits in the memory of a single machine, and (b) it implements
>>>>>> the full re-orthogonalization which is inefficient."
>>>>>>
>>>>>> http://www.cs.cmu.edu/~ukang/papers/HeigenPAKDD2011.pdf
>>>>>>
>>>>>> --sebastian
>>>>>>
>>>>>
>>>>

Reply via email to