Have you took a look at the content of the 'mapred' directory? St.Ack
On Wed, Dec 22, 2010 at 10:34 AM, rajgopalv <[email protected]> wrote: > Jean-Daniel Cryans <jdcry...@...> writes: >> >> Look on your disks, using du -hs, to see what eats all that space. >> >> J-D >> >> On Tue, Dec 21, 2010 at 11:12 PM, rajgopalv <raja.f...@...> wrote: >> > >> > I'm doing a map reduce job to create the HFileOutputFormat out of CSVs. >> > >> > * The mapreduce job, operates on 75files, each containing 1Million rows. >> > Total comes up to 16GB. [with replication factor as 2, the total DFS used >> > is >> > 32GB ] >> > * There are 300 Map jobs. >> > * The map job ends perfectly. >> > * There are 3 slave nodes (having 145GB hard disk), so >> > job.setNumReduceTasks(3) are 3 reducers, >> > * When the reduce job is about to end, the space on all the slave nodes run >> > out. >> > >> > I am confused. Why my space runs out during the reduce time (in the shuffle >> > phase) ? >> > -- >> > View this message in context: http://old.nabble.com/Non-DFS-space-usage- > blows-up.-tp30511999p30511999.html >> > Sent from the HBase User mailing list archive at Nabble.com. >> > >> > >> >> > > Dear Jean, > > The "mapred" directory eats the space.It occupies around 75GB in every > machine. > Why is that.? > > As per my understanding, Every map job takes a block (which is local to the > machine), and spills it in the hard disk. The reducer, then shuffles the > map's > output and bring everything to local and reduce it. So worst case, each > reducer > will have 16GB (because the seed data is 16GB). > > I have no idea why the disk is getting full.. > > I'm tying a variant of this code, https://issues.apache.org/jira/browse/HBASE- > 2378 > > Thanks and regrads, > Rajgopal V > >
