Re: Why Yarn has worse performance for terasort, than MRv1?

Robert Evans Wed, 12 Jun 2013 07:24:24 -0700

It is rather difficult to tell without looking into detail of your config
etc.  For our benchmarks on a 350 node cluster running 0.23.3 terasort was
about 5% faster over 1.0.2.   How many map/reduce tasks were launched for
each?  How long did the various phases take map/reduce/shuffle?  Did you
flush the file system caches in between runs?  How many runs did you do on
each system?


--Bobby

On 6/6/13 9:11 PM, "sam liu" <[email protected]> wrote:

>Hi Experts,
>
>We are thinking about whether to use Yarn or not in the near future, and I
>ran teragen/terasort on Yarn and MRv1 for comprison.
>
>My env is three nodes cluster, and each node has similar hardware: 2 cpu(4
>core), 32 mem. Both Yarn and MRv1 cluster are set on the same env. To be
>fair, I did not make any performance tuning on their configurations, but
>use the default configuration values.
>
>Before testing, I think Yarn will be much better than MRv1, if they all
>use
>default configuration, because Yarn is a better framework than MRv1.
>However, the test result shows some differences:
>
>MRv1: Hadoop-1.1.1
>Yarn: Hadoop-2.0.4
>
>(A) Teragen: generate 10 GB data:
>- MRv1: 193 sec
>- Yarn: 69 sec
>*Yarn is 2.8 times better than MRv1*
>
>(B) Terasort: sort 10 GB data:
>- MRv1: 451 sec
>- Yarn: 1136 sec
>*Yarn is 2.5 times worse than MRv1*
>
>After a fast analysis, I think the direct cause might be that Yarn is much
>faster than MRv1 on Map phase, but much worse on Reduce phase.
>
>Here I have two questions:
>*- Why my tests show Yarn is worse than MRv1 for terasort?
>*
>*- What's the stratage for tuning Yarn performance? Is any materials?*
>
>Thanks!
>-- 
>
>Sam Liu

Re: Why Yarn has worse performance for terasort, than MRv1?

Reply via email to