Re: Hadoop shuffling traffic

karthikeyan S Thu, 25 Sep 2014 22:01:54 -0700

The reducer starts as soon as it has data available from any one of the mappers.
The reducer keeps polling the AM and asks if any mapper has completed
processing. If so it fetches data from that mapper.
So it's not necessary for all the mappers of a task to complete for
the reducer to start processing.


When the reducers starts fetching the data from the mappers it prints
that info in its syslog, from what I have seen.

Thanks,
Karthik

On Thu, Sep 25, 2014 at 8:27 PM, Bing Jiang <[email protected]> wrote:
> see mapreduce.job.reduce.slowstart.completedmaps
> It gives hint of  when reduce tasks could kick off.
>
> 2014-09-26 8:36 GMT+08:00 Abdul Navaz <[email protected]>:
>>
>> Hello,
>>
>> I am having a Hadoop cluster with 1 name node and 3 data nodes. I running
>> sample word count job on 1GB of file which is distributed among the HDFS.
>>
>> When I run the map reduce job, before even completing the mapping 100 %
>> reduce starts.  Say for eg map 40% reduce 10% etc.
>>
>> I would like to know when the shuffling traffic starts ?
>>
>> ->  Is there any way to find out when exactly shuffling started ?  Does it
>> generate any syslog in the logs .
>> -> How to find the total amount of shuffling traffic?
>>
>>
>>
>> Thanks & Regards,
>>
>> Abdul Navaz
>> Research Assistant
>> University of Houston Main Campus, Houston TX
>> Ph: 281-685-0388
>>
>
>
>
> --
> Bing Jiang
> Tel：(86)134-2619-1361
> weibo: http://weibo.com/jiangbinglover
> BLOG: www.binospace.com
> BLOG: http://blog.sina.com.cn/jiangbinglover
> Focus on distributed computing, HDFS/HBase

Re: Hadoop shuffling traffic

Reply via email to