Hi, We've just upgraded our cluster from Hadoop 0.20.203 to 1.0.4 and have hit performance problems. Before the upgrade a 15TB terasort took about 45 minutes, afterwards it takes just over an hour. Looking in more detail it appears the shuffle phase has increased from 20 minutes to 40 minutes. Does anyone have any thoughts about what's changed between these releases that may have caused this?
The only change to the system has been to Hadoop. We moved from a tarball install of 0.20.203 with all processes running as hadoop to an RPM deployment of 1.0.4 with processes running as hdfs and mapred. Nothing else has changed. As a related question, we're still running with a configuration that was tuned for version 0.20.1. Are there any recommendations for tuning properties that have been introduced in recent versions that are worth investigating? Thanks, Jon