I looked, and this job already uses a combiner called OuterProductCombiner.
In fact it was right there in the stack trace, oops.  At least, it shows
this is happening in the mapper and the combiner is trying to do its job.

I am still pretty sure both io.sort.* parameters are relevant here.

Anyway I found what I was thinking of, yes this appears to be a known bug
which is about to be fixed:

https://issues.apache.org/jira/browse/MAPREDUCE-5028


On Wed, May 22, 2013 at 6:35 PM, Dmitriy Lyubimov <[email protected]> wrote:

> i am actually not sure how to manipulate use of combiners in hadoop. All i
> can say that the code does make extensive use of combiners but they were
> always "on" for me. I had no idea one might turn their use off.
>
>
> On Wed, May 22, 2013 at 6:17 AM, Jakub Pawłowski
> <[email protected]>wrote:
>
> > Yes, I was manipulating io.sort.factor too, it speeds up reducer, values
> > around 30 gives good result for me.
> > But my problem is not reducer, my problem is Bt-job map taks that spills
> > to drive.
> >
> > You mentioned Combiner, how can I turn it on ? I'm running my job from
> > console like that
> >
> > mahout ssvd --rank 400 --computeU true --computeV true --reduceTasks 3
> >  --input ${INPUT} --output ${OUTPUT} -ow --tempDir /tmp/ssvdtmp/
> >
> > document at https://cwiki.apache.org/**MAHOUT/stochastic-singular-**
> > value-decomposition.data/SSVD-**CLI.pdf<
> https://cwiki.apache.org/MAHOUT/stochastic-singular-value-decomposition.data/SSVD-CLI.pdf>doesn't
> mention anything about combiner.
> >
> > Thanks for your answer.
> >
> >
> >
> > W dniu 22.05.2013 14:59, Sean Owen pisze:
> >
> >  I feel like I've seen this too and it's just a bug. You're not running
> >> out of memory.
> >>
> >> Are you also setting io.sort.factor? that can help too. You might try
> >> as high as 100.
> >>
> >> Also have you tried a Combiner? if you can apply it it should help too
> >> as it is designed to reduce the amount of stuff spilled.
> >>
> >>
> >
>

Reply via email to