Dear Stack, The reduce attempt, attempt_201012211759_0003_r_000002_0 blows up to 42GB in each of slaves machine [ This is in the middle of the job, the map is 92%complete, and reduce is 30% complete ]
Regards, Rajgopal V rajgopalv wrote: > > Dear Stack, > > I browsed through the directory structure,And this is how it looks like. > http://pastebin.com/vqW0FZZp > > and inside every map-job, > > 24K job.xml > 554M output > 8.0K pid > 8.0K split.dta > > then i did `cd output; du -sh *` > > 554M file.out > 8.0K file.out.index > > I guess this `file.out` is the HFileOutputFormat of the CVS data. > > Stack <st...@...> writes: >> >> Have you took a look at the content of the 'mapred' directory? >> St.Ack >> >> On Wed, Dec 22, 2010 at 10:34 AM, rajgopalv <raja.f...@...> wrote: >> > Jean-Daniel Cryans <jdcry...@...> writes: >> >> >> >> Look on your disks, using du -hs, to see what eats all that space. >> >> >> >> J-D >> >> >> >> On Tue, Dec 21, 2010 at 11:12 PM, rajgopalv <raja.f...@...> wrote: >> >> > >> >> > I'm doing a map reduce job to create the HFileOutputFormat out of >> CSVs. >> >> > >> >> > * The mapreduce job, operates on 75files, each containing 1Million >> rows. >> >> > Total comes up to 16GB. [with replication factor >> >> > as 2, the total DFS used is >> >> > 32GB ] >> >> > * There are 300 Map jobs. >> >> > * The map job ends perfectly. >> >> > * There are 3 slave nodes (having 145GB hard disk), so >> >> > job.setNumReduceTasks(3) are 3 reducers, >> >> > * When the reduce job is about to end, the space on all the slave >> nodes run >> >> > out. >> >> > >> >> > I am confused. Why my space runs out during >> >> > the reduce time (in the shuffle >> >> > phase) ? >> >> > -- >> >> > View this message in context: >> http://old.nabble.com/Non-DFS-space-usage- >> > blows-up.-tp30511999p30511999.html >> >> > Sent from the HBase User mailing list archive at Nabble.com. >> >> > >> >> > >> >> >> >> >> > >> > Dear Jean, >> > >> > The "mapred" directory eats the space.It occupies around 75GB in every >> machine. >> > Why is that.? >> > >> > As per my understanding, Every map job takes a block (which is local to >> the >> > machine), and spills it in the hard disk. The reducer, then shuffles >> the map's >> > output and bring everything to local and reduce it. So worst case, each >> reducer >> > will have 16GB (because the seed data is 16GB). >> > >> > I have no idea why the disk is getting full.. >> > >> > I'm tying a variant of this code, >> https://issues.apache.org/jira/browse/HBASE- >> > 2378 >> > >> > Thanks and regrads, >> > Rajgopal V >> > >> > >> >> > -- View this message in context: http://old.nabble.com/Non-DFS-space-usage-blows-up.-tp30511999p30520059.html Sent from the HBase User mailing list archive at Nabble.com.
