Thanks very much! And I agree and believe that Arun and their team could be a very good help for it.
Does any expert can give more comments/analysis on my above tests? Thanks in advance! 2013/6/7 Marcos Luis Ortiz Valmaseda <[email protected]> > I´m a not an expert tuning YARN, but you can try Terasort, doing something > similar with MRv1 and YARN. > I thnik that Arun and their team could be a very good help for it. > Some links? > > http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/ > http://www.slideshare.net/tungld/terasort > http://sortbenchmark.org/ > > http://www.mapr.com/press-release/mapr-and-google-compute-engine-set-new-world-record-for-hadoop-terasort > > http://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html > > It would be nice that if you do this, share your results in a blog post or > in a research article, to spread the word about your findings. > > Best wishes. > > > 2013/6/6 sam liu <[email protected]> > >> At the begining, I just want to do a fast comparision of MRv1 and Yarn. >> But they have many differences, and to be fair for comparison I did not >> tune their configurations at all. So I got above test results. After >> analyzing the test result, no doubt, I will configure them and do >> comparison again. >> >> Do you have any idea on current test result? I think, to compare with >> MRv1, Yarn is better on Map phase(teragen test), but worse on Reduce >> phase(terasort test). >> And any detailed suggestions/comments/materials on Yarn performance >> tunning? >> >> Thanks! >> >> >> 2013/6/7 Marcos Luis Ortiz Valmaseda <[email protected]> >> >>> Why not to tune the configurations? >>> Both frameworks have many areas to tune: >>> - Combiners, Shuffle optimization, Block size, etc >>> >>> >>> >>> 2013/6/6 sam liu <[email protected]> >>> >>>> Hi Experts, >>>> >>>> We are thinking about whether to use Yarn or not in the near future, >>>> and I ran teragen/terasort on Yarn and MRv1 for comprison. >>>> >>>> My env is three nodes cluster, and each node has similar hardware: 2 >>>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on the same env. To >>>> be fair, I did not make any performance tuning on their configurations, but >>>> use the default configuration values. >>>> >>>> Before testing, I think Yarn will be much better than MRv1, if they all >>>> use default configuration, because Yarn is a better framework than MRv1. >>>> However, the test result shows some differences: >>>> >>>> MRv1: Hadoop-1.1.1 >>>> Yarn: Hadoop-2.0.4 >>>> >>>> (A) Teragen: generate 10 GB data: >>>> - MRv1: 193 sec >>>> - Yarn: 69 sec >>>> *Yarn is 2.8 times better than MRv1* >>>> >>>> (B) Terasort: sort 10 GB data: >>>> - MRv1: 451 sec >>>> - Yarn: 1136 sec >>>> *Yarn is 2.5 times worse than MRv1* >>>> >>>> After a fast analysis, I think the direct cause might be that Yarn is >>>> much faster than MRv1 on Map phase, but much worse on Reduce phase. >>>> >>>> Here I have two questions: >>>> *- Why my tests shows Yarn is worse than MRv1 for terasort? >>>> * >>>> *- What's the stratage for tuning Yarn performance? Is any materials?* >>>> >>>> Thanks! >>>> >>> >>> >>> >>> -- >>> Marcos Ortiz Valmaseda >>> Product Manager at PDVSA >>> http://about.me/marcosortiz >>> >>> >> > > > -- > Marcos Ortiz Valmaseda > Product Manager at PDVSA > http://about.me/marcosortiz > >
