Dear Stack,
The reduce attempt,  attempt_201012211759_0003_r_000002_0 blows up to 42GB
in each of slaves machine [ This is in the middle of the job, the map is
92%complete, and reduce is 30% complete ]

Regards,
Rajgopal V

rajgopalv wrote:
> 
> Dear Stack, 
> 
> I browsed through the directory structure,And this is how it looks like.
> http://pastebin.com/vqW0FZZp
> 
> and inside every map-job, 
> 
> 24K     job.xml
> 554M    output
> 8.0K    pid
> 8.0K    split.dta
> 
> then i did `cd output; du -sh *`
> 
> 554M    file.out
> 8.0K    file.out.index
> 
> I guess this `file.out` is the HFileOutputFormat of the CVS data. 
> 
> Stack <st...@...> writes:
>> 
>> Have you took a look at the content of the 'mapred' directory?
>> St.Ack
>> 
>> On Wed, Dec 22, 2010 at 10:34 AM, rajgopalv <raja.f...@...> wrote:
>> > Jean-Daniel Cryans <jdcry...@...> writes:
>> >>
>> >> Look on your disks, using du -hs, to see what eats all that space.
>> >>
>> >> J-D
>> >>
>> >> On Tue, Dec 21, 2010 at 11:12 PM, rajgopalv <raja.f...@...> wrote:
>> >> >
>> >> > I'm doing a map reduce job to create the HFileOutputFormat out of
>> CSVs.
>> >> >
>> >> > * The mapreduce job, operates on 75files, each containing 1Million
>> rows.
>> >> > Total comes up to 16GB. [with replication factor
>> >> > as 2, the total DFS used is
>> >> > 32GB ]
>> >> > * There are 300 Map jobs.
>> >> > * The map job ends perfectly.
>> >> > * There are 3 slave nodes (having 145GB hard disk), so
>> >> > job.setNumReduceTasks(3) are 3 reducers,
>> >> > * When the reduce job is about to end, the space on all the slave
>> nodes run
>> >> > out.
>> >> >
>> >> > I am confused. Why my space runs out during
>> >> > the reduce time (in the shuffle
>> >> > phase) ?
>> >> > --
>> >> > View this message in context:
>> http://old.nabble.com/Non-DFS-space-usage-
>> > blows-up.-tp30511999p30511999.html
>> >> > Sent from the HBase User mailing list archive at Nabble.com.
>> >> >
>> >> >
>> >>
>> >>
>> >
>> > Dear Jean,
>> >
>> > The "mapred" directory eats the space.It occupies around 75GB in every
>> machine.
>> > Why is that.?
>> >
>> > As per my understanding, Every map job takes a block (which is local to
>> the
>> > machine), and spills it in the hard disk.  The reducer, then shuffles
>> the map's
>> > output and bring everything to local and reduce it. So worst case, each
>> reducer
>> > will have 16GB (because the seed data is 16GB).
>> >
>> > I have no idea why the disk is getting full..
>> >
>> > I'm tying a variant of this code,
>> https://issues.apache.org/jira/browse/HBASE-
>> > 2378
>> >
>> > Thanks and regrads,
>> > Rajgopal V
>> >
>> >
>> 
>> 
> 

-- 
View this message in context: 
http://old.nabble.com/Non-DFS-space-usage-blows-up.-tp30511999p30520059.html
Sent from the HBase User mailing list archive at Nabble.com.

Reply via email to