Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Marcos Luis Ortiz Valmaseda Thu, 06 Jun 2013 20:27:32 -0700

I´m a not an expert tuning YARN, but you can try Terasort, doing something
similar with MRv1 and YARN.
I thnik that Arun and their team could be a very good help for it.
Some links?
http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
http://www.slideshare.net/tungld/terasort
http://sortbenchmark.org/
http://www.mapr.com/press-release/mapr-and-google-compute-engine-set-new-world-record-for-hadoop-terasort
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html


It would be nice that if you do this, share your results in a blog post or
in a research article, to spread the word about your findings.

Best wishes.


2013/6/6 sam liu <[email protected]>

> At the begining, I just want to do a fast comparision of MRv1 and Yarn.
> But they have many differences, and to be fair for comparison I did not
> tune their configurations at all.  So I got above test results. After
> analyzing the test result, no doubt, I will configure them and do
> comparison again.
>
> Do you have any idea on current test result? I think, to compare with
> MRv1, Yarn is better on Map phase(teragen test), but worse on Reduce
> phase(terasort test).
> And any detailed suggestions/comments/materials on Yarn performance
> tunning?
>
> Thanks!
>
>
> 2013/6/7 Marcos Luis Ortiz Valmaseda <[email protected]>
>
>> Why not to tune the configurations?
>> Both frameworks have many areas to tune:
>> - Combiners, Shuffle optimization, Block size, etc
>>
>>
>>
>> 2013/6/6 sam liu <[email protected]>
>>
>>> Hi Experts,
>>>
>>> We are thinking about whether to use Yarn or not in the near future, and
>>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>
>>> My env is three nodes cluster, and each node has similar hardware: 2
>>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on the same env. To
>>> be fair, I did not make any performance tuning on their configurations, but
>>> use the default configuration values.
>>>
>>> Before testing, I think Yarn will be much better than MRv1, if they all
>>> use default configuration, because Yarn is a better framework than MRv1.
>>> However, the test result shows some differences:
>>>
>>> MRv1: Hadoop-1.1.1
>>> Yarn: Hadoop-2.0.4
>>>
>>> (A) Teragen: generate 10 GB data:
>>> - MRv1: 193 sec
>>> - Yarn: 69 sec
>>> *Yarn is 2.8 times better than MRv1*
>>>
>>> (B) Terasort: sort 10 GB data:
>>> - MRv1: 451 sec
>>> - Yarn: 1136 sec
>>> *Yarn is 2.5 times worse than MRv1*
>>>
>>> After a fast analysis, I think the direct cause might be that Yarn is
>>> much faster than MRv1 on Map phase, but much worse on Reduce phase.
>>>
>>> Here I have two questions:
>>> *- Why my tests shows Yarn is worse than MRv1 for terasort?
>>> *
>>> *- What's the stratage for tuning Yarn performance? Is any materials?*
>>>
>>> Thanks!
>>>
>>
>>
>>
>> --
>> Marcos Ortiz Valmaseda
>> Product Manager at PDVSA
>> http://about.me/marcosortiz
>>
>>
>


-- 
Marcos Ortiz Valmaseda
Product Manager at PDVSA
http://about.me/marcosortiz

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Reply via email to