Dear Stack, 

I browsed through the directory structure,And this is how it looks like.
http://pastebin.com/vqW0FZZp

and inside every map-job, 

24K     job.xml
554M    output
8.0K    pid
8.0K    split.dta

then i did `cd output; du -sh *`

554M    file.out
8.0K    file.out.index

I guess this `file.out` is the HFileOutputFormat of the CVS data. 

Stack <st...@...> writes:
> 
> Have you took a look at the content of the 'mapred' directory?
> St.Ack
> 
> On Wed, Dec 22, 2010 at 10:34 AM, rajgopalv <raja.f...@...> wrote:
> > Jean-Daniel Cryans <jdcry...@...> writes:
> >>
> >> Look on your disks, using du -hs, to see what eats all that space.
> >>
> >> J-D
> >>
> >> On Tue, Dec 21, 2010 at 11:12 PM, rajgopalv <raja.f...@...> wrote:
> >> >
> >> > I'm doing a map reduce job to create the HFileOutputFormat out of
> CSVs.
> >> >
> >> > * The mapreduce job, operates on 75files, each containing 1Million
> rows.
> >> > Total comes up to 16GB. [with replication factor
> >> > as 2, the total DFS used is
> >> > 32GB ]
> >> > * There are 300 Map jobs.
> >> > * The map job ends perfectly.
> >> > * There are 3 slave nodes (having 145GB hard disk), so
> >> > job.setNumReduceTasks(3) are 3 reducers,
> >> > * When the reduce job is about to end, the space on all the slave
> nodes run
> >> > out.
> >> >
> >> > I am confused. Why my space runs out during
> >> > the reduce time (in the shuffle
> >> > phase) ?
> >> > --
> >> > View this message in context:
> http://old.nabble.com/Non-DFS-space-usage-
> > blows-up.-tp30511999p30511999.html
> >> > Sent from the HBase User mailing list archive at Nabble.com.
> >> >
> >> >
> >>
> >>
> >
> > Dear Jean,
> >
> > The "mapred" directory eats the space.It occupies around 75GB in every
> machine.
> > Why is that.?
> >
> > As per my understanding, Every map job takes a block (which is local to
> the
> > machine), and spills it in the hard disk.  The reducer, then shuffles
> the map's
> > output and bring everything to local and reduce it. So worst case, each
> reducer
> > will have 16GB (because the seed data is 16GB).
> >
> > I have no idea why the disk is getting full..
> >
> > I'm tying a variant of this code,
> https://issues.apache.org/jira/browse/HBASE-
> > 2378
> >
> > Thanks and regrads,
> > Rajgopal V
> >
> >
> 
> 
-- 
View this message in context: 
http://old.nabble.com/Non-DFS-space-usage-blows-up.-tp30511999p30519923.html
Sent from the HBase User mailing list archive at Nabble.com.

Reply via email to