also make sure you are not hitting swap. If you specifiy 3G per task,
and you have typical quad core commodity nodes, it probably means that
you have at least 4 map / 4 reduce tasks configured. that's 8 task
total, unless you have ~32G ram on these machines, i think you are in
danger of hitting swap with settings that high when GC gets all
excited and all. I never though i would say that, but maybe you can
run it a little bit more conservatively.

my cluster is cdh3u0.

-Dmitriy

On Sat, Dec 17, 2011 at 7:24 PM, Dmitriy Lyubimov <[email protected]> wrote:
> Sebastian,
>
> I think my commit got screwed somehow. Here's the benchmark very close
> to yours (I also added broadcast option for B thru distributed cache
> which seems to have been a 30% win in my situation, ok, i leave it on
> by default).
>
> input: 4.5M x 4.5M sparse matrix with 40 non-zero elements per row, total size
>
> -rw-r--r--   3 dmitriy supergroup 2353201134 2011-12-17 17:17
> /user/dmitriy/drm-sparse.seq
>
> This is probably a little smaller than your wikipedia test in terms of
> dimensions but not in size.
>
> ssvd -i /user/dmitriy/drm-sparse.seq -o /user/dmitriy/SSVD-OUT
> --tempDir /user/dmitriy/temp -ow -br false -k 10 -p 1 -t 20 -q 1
> -Dmapred.child.java.opts="-Xmx800m"
>
> so this is for 10 useful eigenvalues and U,V with 1 power iteration:
>
> with broadcast option:
> QJob 1 min 11s
> BtJob 1 min 58s
> AB' Job 2 min 6s
> B'Job 1 min 51 s
>
> U Job - 0m 12 s, V job - 20 s (U and V run in parallel, not sequential)
>
> without broadcast option (only affects AB' step)
>
> QJob 1 min 12s
> BtJob 1 min 59s
> AB' Job 3 min 3s
> B'Job 2 min 0 s
>
> 10 nodes 4 cpu each. (I had 35 splits of original input, so everything
> got allocated). It also runs our QA stuff starting something else
> almost every minute. frequent other jobs so while not loaded, job/task
> trackers are fairly busy setting up/tearing down.
>
> I'll commit shortly.
>
> On Sat, Dec 17, 2011 at 2:58 PM, Sebastian Schelter <[email protected]> wrote:
>> On 17.12.2011 17:27, Dmitriy Lyubimov wrote:
>>> Interesting.
>>>
>>> Well so how did your decomposing go?
>>
>> I tested the decomposition of the wikipedia pagelink graph (130M edges,
>> 5.6M vertices making approx. quarter of a billion non-zeros in the
>> symmetric adjacency matrix) on a 6 machine hadoop cluster.
>>
>> Got these running times for k = 10, p = 5 and one power-iteration:
>>
>> Q-job    1mins, 41sec
>> Bt-job   9mins, 30sec
>> ABt-job  37mins, 22sec
>> Bt-job   9mins, 41sec
>> U-job    30sec
>>
>> I think I'd need a couple more machines to handle the twitter graph
>> though...
>>
>> --sebastian
>>
>>
>>> On Dec 17, 2011 6:00 AM, "Sebastian Schelter" <[email protected]>
>>> wrote:
>>>
>>>> Hi there,
>>>>
>>>> I played with Mahout to decompose the adjacency matrices of large graphs
>>>> lately. I stumbled on a paper of Christos Faloutsos that describes a
>>>> variation of the Lanczos algorithm they use for this on top of Hadoop.
>>>> They even explicitly mention Mahout:
>>>>
>>>> "Very recently(March 2010), the Mahout project [2] provides
>>>> SVD on top of HADOOP. Due to insufficient documentation, we were not
>>>> able to find the input format and run a head-to-head comparison. But,
>>>> reading the source code, we discovered that Mahout suffers from two
>>>> major issues: (a) it assumes that the vector (b, with n=O(billion)
>>>> entries) fits in the memory of a single machine, and (b) it implements
>>>> the full re-orthogonalization which is inefficient."
>>>>
>>>> http://www.cs.cmu.edu/~ukang/papers/HeigenPAKDD2011.pdf
>>>>
>>>> --sebastian
>>>>
>>>
>>

Reply via email to