I´m a not an expert tuning YARN, but you can try Terasort, doing something similar with MRv1 and YARN. I thnik that Arun and their team could be a very good help for it. Some links? http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/ http://www.slideshare.net/tungld/terasort http://sortbenchmark.org/ http://www.mapr.com/press-release/mapr-and-google-compute-engine-set-new-world-record-for-hadoop-terasort http://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html
It would be nice that if you do this, share your results in a blog post or in a research article, to spread the word about your findings. Best wishes. 2013/6/6 sam liu <[email protected]> > At the begining, I just want to do a fast comparision of MRv1 and Yarn. > But they have many differences, and to be fair for comparison I did not > tune their configurations at all. So I got above test results. After > analyzing the test result, no doubt, I will configure them and do > comparison again. > > Do you have any idea on current test result? I think, to compare with > MRv1, Yarn is better on Map phase(teragen test), but worse on Reduce > phase(terasort test). > And any detailed suggestions/comments/materials on Yarn performance > tunning? > > Thanks! > > > 2013/6/7 Marcos Luis Ortiz Valmaseda <[email protected]> > >> Why not to tune the configurations? >> Both frameworks have many areas to tune: >> - Combiners, Shuffle optimization, Block size, etc >> >> >> >> 2013/6/6 sam liu <[email protected]> >> >>> Hi Experts, >>> >>> We are thinking about whether to use Yarn or not in the near future, and >>> I ran teragen/terasort on Yarn and MRv1 for comprison. >>> >>> My env is three nodes cluster, and each node has similar hardware: 2 >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on the same env. To >>> be fair, I did not make any performance tuning on their configurations, but >>> use the default configuration values. >>> >>> Before testing, I think Yarn will be much better than MRv1, if they all >>> use default configuration, because Yarn is a better framework than MRv1. >>> However, the test result shows some differences: >>> >>> MRv1: Hadoop-1.1.1 >>> Yarn: Hadoop-2.0.4 >>> >>> (A) Teragen: generate 10 GB data: >>> - MRv1: 193 sec >>> - Yarn: 69 sec >>> *Yarn is 2.8 times better than MRv1* >>> >>> (B) Terasort: sort 10 GB data: >>> - MRv1: 451 sec >>> - Yarn: 1136 sec >>> *Yarn is 2.5 times worse than MRv1* >>> >>> After a fast analysis, I think the direct cause might be that Yarn is >>> much faster than MRv1 on Map phase, but much worse on Reduce phase. >>> >>> Here I have two questions: >>> *- Why my tests shows Yarn is worse than MRv1 for terasort? >>> * >>> *- What's the stratage for tuning Yarn performance? Is any materials?* >>> >>> Thanks! >>> >> >> >> >> -- >> Marcos Ortiz Valmaseda >> Product Manager at PDVSA >> http://about.me/marcosortiz >> >> > -- Marcos Ortiz Valmaseda Product Manager at PDVSA http://about.me/marcosortiz
