It is rather difficult to tell without looking into detail of your config etc. For our benchmarks on a 350 node cluster running 0.23.3 terasort was about 5% faster over 1.0.2. How many map/reduce tasks were launched for each? How long did the various phases take map/reduce/shuffle? Did you flush the file system caches in between runs? How many runs did you do on each system?
--Bobby On 6/6/13 9:11 PM, "sam liu" <[email protected]> wrote: >Hi Experts, > >We are thinking about whether to use Yarn or not in the near future, and I >ran teragen/terasort on Yarn and MRv1 for comprison. > >My env is three nodes cluster, and each node has similar hardware: 2 cpu(4 >core), 32 mem. Both Yarn and MRv1 cluster are set on the same env. To be >fair, I did not make any performance tuning on their configurations, but >use the default configuration values. > >Before testing, I think Yarn will be much better than MRv1, if they all >use >default configuration, because Yarn is a better framework than MRv1. >However, the test result shows some differences: > >MRv1: Hadoop-1.1.1 >Yarn: Hadoop-2.0.4 > >(A) Teragen: generate 10 GB data: >- MRv1: 193 sec >- Yarn: 69 sec >*Yarn is 2.8 times better than MRv1* > >(B) Terasort: sort 10 GB data: >- MRv1: 451 sec >- Yarn: 1136 sec >*Yarn is 2.5 times worse than MRv1* > >After a fast analysis, I think the direct cause might be that Yarn is much >faster than MRv1 on Map phase, but much worse on Reduce phase. > >Here I have two questions: >*- Why my tests show Yarn is worse than MRv1 for terasort? >* >*- What's the stratage for tuning Yarn performance? Is any materials?* > >Thanks! >-- > >Sam Liu
