Dear Stack, I browsed through the directory structure,And this is how it looks like. http://pastebin.com/vqW0FZZp
and inside every map-job, 24K job.xml 554M output 8.0K pid 8.0K split.dta then i did `cd output; du -sh *` 554M file.out 8.0K file.out.index I guess this `file.out` is the HFileOutputFormat of the CVS data. Stack <st...@...> writes: > > Have you took a look at the content of the 'mapred' directory? > St.Ack > > On Wed, Dec 22, 2010 at 10:34 AM, rajgopalv <raja.f...@...> wrote: > > Jean-Daniel Cryans <jdcry...@...> writes: > >> > >> Look on your disks, using du -hs, to see what eats all that space. > >> > >> J-D > >> > >> On Tue, Dec 21, 2010 at 11:12 PM, rajgopalv <raja.f...@...> wrote: > >> > > >> > I'm doing a map reduce job to create the HFileOutputFormat out of > CSVs. > >> > > >> > * The mapreduce job, operates on 75files, each containing 1Million > rows. > >> > Total comes up to 16GB. [with replication factor > >> > as 2, the total DFS used is > >> > 32GB ] > >> > * There are 300 Map jobs. > >> > * The map job ends perfectly. > >> > * There are 3 slave nodes (having 145GB hard disk), so > >> > job.setNumReduceTasks(3) are 3 reducers, > >> > * When the reduce job is about to end, the space on all the slave > nodes run > >> > out. > >> > > >> > I am confused. Why my space runs out during > >> > the reduce time (in the shuffle > >> > phase) ? > >> > -- > >> > View this message in context: > http://old.nabble.com/Non-DFS-space-usage- > > blows-up.-tp30511999p30511999.html > >> > Sent from the HBase User mailing list archive at Nabble.com. > >> > > >> > > >> > >> > > > > Dear Jean, > > > > The "mapred" directory eats the space.It occupies around 75GB in every > machine. > > Why is that.? > > > > As per my understanding, Every map job takes a block (which is local to > the > > machine), and spills it in the hard disk. The reducer, then shuffles > the map's > > output and bring everything to local and reduce it. So worst case, each > reducer > > will have 16GB (because the seed data is 16GB). > > > > I have no idea why the disk is getting full.. > > > > I'm tying a variant of this code, > https://issues.apache.org/jira/browse/HBASE- > > 2378 > > > > Thanks and regrads, > > Rajgopal V > > > > > > -- View this message in context: http://old.nabble.com/Non-DFS-space-usage-blows-up.-tp30511999p30519923.html Sent from the HBase User mailing list archive at Nabble.com.
