I'm experiencing the following crash during reduce tasks: https://gist.github.com/slingamn/04ff3ff3412af23aa50d
on Hadoop 1.0.3 (specifically I'm using Amazon's EMR, AMI version 2.2.1). The crash is triggered by especially unbalanced reducer inputs, i.e., when one reducer receives too many records. (The reduce task gets retried three times, but since the data is the same every time, it crashes each time in the same place and the job fails.) >From the following links: https://issues.apache.org/jira/browse/MAPREDUCE-1182 http://hadoop-common.472056.n3.nabble.com/Shuffle-In-Memory-OutOfMemoryError-td433197.html it seems as though Hadoop is supposed to prevent this from happening by intelligently managing the amount of memory that is provided to the shuffle. However, I don't know how ironclad this guarantee is. Can anyone advise me on how robust I can expect Hadoop to be to this issue, in the face of highly unbalanced reducer inputs? Thanks very much for your time.
