Hi Hadoop users and developers, I have a use case that I need produce a large sequence file of 1 TB in size when each datanode has 200GB of storage but I have 30 datanodes.
The problem is that no single reducer can hold 1TB of data during the reduce phase to generate a single sequence file even I use aggressive compression. Any datanode will run out of space since this is a single reducer job. Any comment and help is appreciated. Jerry
