Why not to tune the configurations? Both frameworks have many areas to tune: - Combiners, Shuffle optimization, Block size, etc
2013/6/6 sam liu <[email protected]> > Hi Experts, > > We are thinking about whether to use Yarn or not in the near future, and I > ran teragen/terasort on Yarn and MRv1 for comprison. > > My env is three nodes cluster, and each node has similar hardware: 2 cpu(4 > core), 32 mem. Both Yarn and MRv1 cluster are set on the same env. To be > fair, I did not make any performance tuning on their configurations, but > use the default configuration values. > > Before testing, I think Yarn will be much better than MRv1, if they all > use default configuration, because Yarn is a better framework than MRv1. > However, the test result shows some differences: > > MRv1: Hadoop-1.1.1 > Yarn: Hadoop-2.0.4 > > (A) Teragen: generate 10 GB data: > - MRv1: 193 sec > - Yarn: 69 sec > *Yarn is 2.8 times better than MRv1* > > (B) Terasort: sort 10 GB data: > - MRv1: 451 sec > - Yarn: 1136 sec > *Yarn is 2.5 times worse than MRv1* > > After a fast analysis, I think the direct cause might be that Yarn is much > faster than MRv1 on Map phase, but much worse on Reduce phase. > > Here I have two questions: > *- Why my tests shows Yarn is worse than MRv1 for terasort? > * > *- What's the stratage for tuning Yarn performance? Is any materials?* > > Thanks! > -- Marcos Ortiz Valmaseda Product Manager at PDVSA http://about.me/marcosortiz
