Hi,

We've just upgraded our cluster from Hadoop 0.20.203 to 1.0.4 and have hit 
performance problems.  Before the upgrade a 15TB terasort took about 45 
minutes, afterwards it takes just over an hour.  Looking in more detail it 
appears the shuffle phase has increased from 20 minutes to 40 minutes.  Does 
anyone have any thoughts about what's changed between these releases that may 
have caused this?

The only change to the system has been to Hadoop.  We moved from a tarball 
install of 0.20.203 with all processes running as hadoop to an RPM deployment 
of 1.0.4 with processes running as hdfs and mapred.  Nothing else has changed.

As a related question, we're still running with a configuration that was tuned 
for version 0.20.1. Are there any recommendations for tuning properties that 
have been introduced in recent versions that are worth investigating?

Thanks,
Jon

Reply via email to