Hi Abdul, That is the right metric. You can take a look at this report we made on this earlier: http://www.slideshare.net/pramodbiligiri/shuffle-phase-as-the-bottleneck-in-hadoop-terasort
Pramod On Wed, Oct 1, 2014 at 6:06 PM, Abdul Navaz <[email protected]> wrote: > Hello, > > This is the portion of the output which is displayed on the console when I > run sample word count job. > > map 0% reduce 0% > > 14/10/01 18:37:52 INFO mapred.JobClient: map 100% reduce 0% > > 14/10/01 18:38:10 INFO mapred.JobClient: map 100% reduce 100% > > 14/10/01 18:38:12 INFO mapred.JobClient: Job complete: > job_201409262002_0003 > > 14/10/01 18:38:12 INFO mapred.JobClient: Counters: 29 > > 14/10/01 18:38:12 INFO mapred.JobClient: Job Counters > > 14/10/01 18:38:12 INFO mapred.JobClient: Launched reduce tasks=1 > > 14/10/01 18:38:12 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=23511 > > 14/10/01 18:38:12 INFO mapred.JobClient: Total time spent by all > reduces waiting after reserving slots (ms)=0 > > 14/10/01 18:38:12 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0 > > 14/10/01 18:38:12 INFO mapred.JobClient: Launched map tasks=1 > > 14/10/01 18:38:12 INFO mapred.JobClient: Data-local map tasks=1 > > 14/10/01 18:38:12 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=14193 > > 14/10/01 18:38:12 INFO mapred.JobClient: File Output Format Counters > > 14/10/01 18:38:12 INFO mapred.JobClient: Bytes Written=1106 > > 14/10/01 18:38:12 INFO mapred.JobClient: FileSystemCounters > > 14/10/01 18:38:12 INFO mapred.JobClient: FILE_BYTES_READ=3059 > > 14/10/01 18:38:12 INFO mapred.JobClient: HDFS_BYTES_READ=1601 > > 14/10/01 18:38:12 INFO mapred.JobClient: FILE_BYTES_WRITTEN=108400 > > 14/10/01 18:38:12 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1106 > > 14/10/01 18:38:12 INFO mapred.JobClient: File Input Format Counters > > 14/10/01 18:38:12 INFO mapred.JobClient: Bytes Read=1486 > > 14/10/01 18:38:12 INFO mapred.JobClient: Map-Reduce Framework > > 14/10/01 18:38:12 INFO mapred.JobClient: Map output materialized > bytes=3059 > > 14/10/01 18:38:12 INFO mapred.JobClient: Map input records=6 > > 14/10/01 18:38:12 INFO mapred.JobClient: *Reduce shuffle bytes=3059* > > 14/10/01 18:38:12 INFO mapred.JobClient: Spilled Records=544 > > 14/10/01 18:38:12 INFO mapred.JobClient: Map output bytes=2509 > > 14/10/01 18:38:12 INFO mapred.JobClient: Total committed heap usage > > > I am trying to find the shuffling traffic that is total traffic generated > when mappers exchange their key values pair with the reducer. Is the > highlighted portion gives the shuffling traffic ? > > > Thanks & Regards, > > Abdul Navaz > Research Assistant > University of Houston Main Campus, Houston TX > Ph: 281-685-0388 > > > > > On 9/26/14, 12:00 AM, "karthikeyan S" <[email protected]> wrote: > > The reducer starts as soon as it has data available from any one of the > mappers. > The reducer keeps polling the AM and asks if any mapper has completed > processing. If so it fetches data from that mapper. > So it's not necessary for all the mappers of a task to complete for > the reducer to start processing. > > When the reducers starts fetching the data from the mappers it prints > that info in its syslog, from what I have seen. > > Thanks, > Karthik > > On Thu, Sep 25, 2014 at 8:27 PM, Bing Jiang <[email protected]> > wrote: > > see mapreduce.job.reduce.slowstart.completedmaps > It gives hint of when reduce tasks could kick off. > > 2014-09-26 8:36 GMT+08:00 Abdul Navaz <[email protected]>: > > > Hello, > > I am having a Hadoop cluster with 1 name node and 3 data nodes. I running > sample word count job on 1GB of file which is distributed among the HDFS. > > When I run the map reduce job, before even completing the mapping 100 % > reduce starts. Say for eg map 40% reduce 10% etc. > > I would like to know when the shuffling traffic starts ? > > -> Is there any way to find out when exactly shuffling started ? Does it > generate any syslog in the logs . > -> How to find the total amount of shuffling traffic? > > > > Thanks & Regards, > > Abdul Navaz > Research Assistant > University of Houston Main Campus, Houston TX > Ph: 281-685-0388 > > > > > -- > Bing Jiang > Tel:(86)134-2619-1361 > weibo: http://weibo.com/jiangbinglover > BLOG: www.binospace.com > BLOG: http://blog.sina.com.cn/jiangbinglover > Focus on distributed computing, HDFS/HBase > > >
