Hello, This is the portion of the output which is displayed on the console when I run sample word count job.
map 0% reduce 0% 14/10/01 18:37:52 INFO mapred.JobClient: map 100% reduce 0% 14/10/01 18:38:10 INFO mapred.JobClient: map 100% reduce 100% 14/10/01 18:38:12 INFO mapred.JobClient: Job complete: job_201409262002_0003 14/10/01 18:38:12 INFO mapred.JobClient: Counters: 29 14/10/01 18:38:12 INFO mapred.JobClient: Job Counters 14/10/01 18:38:12 INFO mapred.JobClient: Launched reduce tasks=1 14/10/01 18:38:12 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=23511 14/10/01 18:38:12 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 14/10/01 18:38:12 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 14/10/01 18:38:12 INFO mapred.JobClient: Launched map tasks=1 14/10/01 18:38:12 INFO mapred.JobClient: Data-local map tasks=1 14/10/01 18:38:12 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=14193 14/10/01 18:38:12 INFO mapred.JobClient: File Output Format Counters 14/10/01 18:38:12 INFO mapred.JobClient: Bytes Written=1106 14/10/01 18:38:12 INFO mapred.JobClient: FileSystemCounters 14/10/01 18:38:12 INFO mapred.JobClient: FILE_BYTES_READ=3059 14/10/01 18:38:12 INFO mapred.JobClient: HDFS_BYTES_READ=1601 14/10/01 18:38:12 INFO mapred.JobClient: FILE_BYTES_WRITTEN=108400 14/10/01 18:38:12 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1106 14/10/01 18:38:12 INFO mapred.JobClient: File Input Format Counters 14/10/01 18:38:12 INFO mapred.JobClient: Bytes Read=1486 14/10/01 18:38:12 INFO mapred.JobClient: Map-Reduce Framework 14/10/01 18:38:12 INFO mapred.JobClient: Map output materialized bytes=3059 14/10/01 18:38:12 INFO mapred.JobClient: Map input records=6 14/10/01 18:38:12 INFO mapred.JobClient: Reduce shuffle bytes=3059 14/10/01 18:38:12 INFO mapred.JobClient: Spilled Records=544 14/10/01 18:38:12 INFO mapred.JobClient: Map output bytes=2509 14/10/01 18:38:12 INFO mapred.JobClient: Total committed heap usage I am trying to find the shuffling traffic that is total traffic generated when mappers exchange their key values pair with the reducer. Is the highlighted portion gives the shuffling traffic ? Thanks & Regards, Abdul Navaz Research Assistant University of Houston Main Campus, Houston TX Ph: 281-685-0388 On 9/26/14, 12:00 AM, "karthikeyan S" <[email protected]> wrote: > The reducer starts as soon as it has data available from any one of the > mappers. > The reducer keeps polling the AM and asks if any mapper has completed > processing. If so it fetches data from that mapper. > So it's not necessary for all the mappers of a task to complete for > the reducer to start processing. > > When the reducers starts fetching the data from the mappers it prints > that info in its syslog, from what I have seen. > > Thanks, > Karthik > > On Thu, Sep 25, 2014 at 8:27 PM, Bing Jiang <[email protected]> wrote: >> see mapreduce.job.reduce.slowstart.completedmaps >> It gives hint of when reduce tasks could kick off. >> >> 2014-09-26 8:36 GMT+08:00 Abdul Navaz <[email protected]>: >>> >>> Hello, >>> >>> I am having a Hadoop cluster with 1 name node and 3 data nodes. I running >>> sample word count job on 1GB of file which is distributed among the HDFS. >>> >>> When I run the map reduce job, before even completing the mapping 100 % >>> reduce starts. Say for eg map 40% reduce 10% etc. >>> >>> I would like to know when the shuffling traffic starts ? >>> >>> -> Is there any way to find out when exactly shuffling started ? Does it >>> generate any syslog in the logs . >>> -> How to find the total amount of shuffling traffic? >>> >>> >>> >>> Thanks & Regards, >>> >>> Abdul Navaz >>> Research Assistant >>> University of Houston Main Campus, Houston TX >>> Ph: 281-685-0388 >>> >> >> >> >> -- >> Bing Jiang >> Tel:(86)134-2619-1361 >> weibo: http://weibo.com/jiangbinglover >> BLOG: www.binospace.com >> BLOG: http://blog.sina.com.cn/jiangbinglover >> Focus on distributed computing, HDFS/HBase >
