I'm doing a map reduce job to create the HFileOutputFormat out of CSVs. * The mapreduce job, operates on 75files, each containing 1Million rows. Total comes up to 16GB. [with replication factor as 2, the total DFS used is 32GB ] * There are 300 Map jobs. * The map job ends perfectly. * There are 3 slave nodes (having 145GB hard disk), so job.setNumReduceTasks(3) are 3 reducers, * When the reduce job is about to end, the space on all the slave nodes run out.
I am confused. Why my space runs out during the reduce time (in the shuffle phase) ? -- View this message in context: http://old.nabble.com/Non-DFS-space-usage-blows-up.-tp30511999p30511999.html Sent from the HBase User mailing list archive at Nabble.com.
