Re: Non DFS space usage blows up.

Stack Wed, 22 Dec 2010 11:30:48 -0800

Have you took a look at the content of the 'mapred' directory?
St.Ack


On Wed, Dec 22, 2010 at 10:34 AM, rajgopalv <[email protected]> wrote:
> Jean-Daniel Cryans <jdcry...@...> writes:
>>
>> Look on your disks, using du -hs, to see what eats all that space.
>>
>> J-D
>>
>> On Tue, Dec 21, 2010 at 11:12 PM, rajgopalv <raja.f...@...> wrote:
>> >
>> > I'm doing a map reduce job to create the HFileOutputFormat out of CSVs.
>> >
>> > * The mapreduce job, operates on 75files, each containing 1Million rows.
>> > Total comes up to 16GB. [with replication factor as 2, the total DFS used 
>> > is
>> > 32GB ]
>> > * There are 300 Map jobs.
>> > * The map job ends perfectly.
>> > * There are 3 slave nodes (having 145GB hard disk), so
>> > job.setNumReduceTasks(3) are 3 reducers,
>> > * When the reduce job is about to end, the space on all the slave nodes run
>> > out.
>> >
>> > I am confused. Why my space runs out during the reduce time (in the shuffle
>> > phase) ?
>> > --
>> > View this message in context: http://old.nabble.com/Non-DFS-space-usage-
> blows-up.-tp30511999p30511999.html
>> > Sent from the HBase User mailing list archive at Nabble.com.
>> >
>> >
>>
>>
>
> Dear Jean,
>
> The "mapred" directory eats the space.It occupies around 75GB in every 
> machine.
> Why is that.?
>
> As per my understanding, Every map job takes a block (which is local to the
> machine), and spills it in the hard disk.  The reducer, then shuffles the 
> map's
> output and bring everything to local and reduce it. So worst case, each 
> reducer
> will have 16GB (because the seed data is 16GB).
>
> I have no idea why the disk is getting full..
>
> I'm tying a variant of this code, https://issues.apache.org/jira/browse/HBASE-
> 2378
>
> Thanks and regrads,
> Rajgopal V
>
>

Re: Non DFS space usage blows up.

Reply via email to