Hi Hadoop users and developers,

I have a use case that I need produce a large sequence file of 1 TB in size
when each datanode has  200GB of storage but I have 30 datanodes.

The problem is that no single reducer can hold 1TB of data during the
reduce phase to generate a single sequence file even I use aggressive
compression. Any datanode will run out of space since this is a single
reducer job.

Any comment and help is appreciated.

Jerry

Reply via email to