The reducer starts as soon as it has data available from any one of the mappers. The reducer keeps polling the AM and asks if any mapper has completed processing. If so it fetches data from that mapper. So it's not necessary for all the mappers of a task to complete for the reducer to start processing.
When the reducers starts fetching the data from the mappers it prints that info in its syslog, from what I have seen. Thanks, Karthik On Thu, Sep 25, 2014 at 8:27 PM, Bing Jiang <[email protected]> wrote: > see mapreduce.job.reduce.slowstart.completedmaps > It gives hint of when reduce tasks could kick off. > > 2014-09-26 8:36 GMT+08:00 Abdul Navaz <[email protected]>: >> >> Hello, >> >> I am having a Hadoop cluster with 1 name node and 3 data nodes. I running >> sample word count job on 1GB of file which is distributed among the HDFS. >> >> When I run the map reduce job, before even completing the mapping 100 % >> reduce starts. Say for eg map 40% reduce 10% etc. >> >> I would like to know when the shuffling traffic starts ? >> >> -> Is there any way to find out when exactly shuffling started ? Does it >> generate any syslog in the logs . >> -> How to find the total amount of shuffling traffic? >> >> >> >> Thanks & Regards, >> >> Abdul Navaz >> Research Assistant >> University of Houston Main Campus, Houston TX >> Ph: 281-685-0388 >> > > > > -- > Bing Jiang > Tel:(86)134-2619-1361 > weibo: http://weibo.com/jiangbinglover > BLOG: www.binospace.com > BLOG: http://blog.sina.com.cn/jiangbinglover > Focus on distributed computing, HDFS/HBase
