I think it refers to the no. of bytes the reducer fetches from the mapper. Pramod
On Wed, Oct 8, 2014 at 10:17 PM, Abdul Navaz <[email protected]> wrote: > Hello, > > Fiesr of all thank you very much for your help. :) > > I still have some doubt with this . > > Is the highlighted metric “ *Reduce shuffle bytes=3059” * > > > 1. Is the total bytes after the reduced phase. ( That is the output > file which is written into HDFS) > > Or > > 2. Is this is the actual shuffled traffic which is exchanged between > mappers and reducers before performing reducing ? > > Please clarify ! > > Thanks & Regards, > > Abdul Navaz > > > > From: Pramod Biligiri <[email protected]> > Reply-To: <[email protected]> > Date: Thursday, October 2, 2014 at 12:44 AM > To: "[email protected]" <[email protected]> > Subject: Re: Hadoop shuffling traffic > > Hi Abdul, > That is the right metric. You can take a look at this report we made on > this earlier: > http://www.slideshare.net/pramodbiligiri/shuffle-phase-as-the-bottleneck-in-hadoop-terasort > > Pramod > > On Wed, Oct 1, 2014 at 6:06 PM, Abdul Navaz <[email protected]> wrote: > >> Hello, >> >> This is the portion of the output which is displayed on the console when >> I run sample word count job. >> >> map 0% reduce 0% >> >> 14/10/01 18:37:52 INFO mapred.JobClient: map 100% reduce 0% >> >> 14/10/01 18:38:10 INFO mapred.JobClient: map 100% reduce 100% >> >> 14/10/01 18:38:12 INFO mapred.JobClient: Job complete: >> job_201409262002_0003 >> >> 14/10/01 18:38:12 INFO mapred.JobClient: Counters: 29 >> >> 14/10/01 18:38:12 INFO mapred.JobClient: Job Counters >> >> 14/10/01 18:38:12 INFO mapred.JobClient: Launched reduce tasks=1 >> >> 14/10/01 18:38:12 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=23511 >> >> 14/10/01 18:38:12 INFO mapred.JobClient: Total time spent by all >> reduces waiting after reserving slots (ms)=0 >> >> 14/10/01 18:38:12 INFO mapred.JobClient: Total time spent by all maps >> waiting after reserving slots (ms)=0 >> >> 14/10/01 18:38:12 INFO mapred.JobClient: Launched map tasks=1 >> >> 14/10/01 18:38:12 INFO mapred.JobClient: Data-local map tasks=1 >> >> 14/10/01 18:38:12 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=14193 >> >> 14/10/01 18:38:12 INFO mapred.JobClient: File Output Format Counters >> >> 14/10/01 18:38:12 INFO mapred.JobClient: Bytes Written=1106 >> >> 14/10/01 18:38:12 INFO mapred.JobClient: FileSystemCounters >> >> 14/10/01 18:38:12 INFO mapred.JobClient: FILE_BYTES_READ=3059 >> >> 14/10/01 18:38:12 INFO mapred.JobClient: HDFS_BYTES_READ=1601 >> >> 14/10/01 18:38:12 INFO mapred.JobClient: FILE_BYTES_WRITTEN=108400 >> >> 14/10/01 18:38:12 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1106 >> >> 14/10/01 18:38:12 INFO mapred.JobClient: File Input Format Counters >> >> 14/10/01 18:38:12 INFO mapred.JobClient: Bytes Read=1486 >> >> 14/10/01 18:38:12 INFO mapred.JobClient: Map-Reduce Framework >> >> 14/10/01 18:38:12 INFO mapred.JobClient: Map output materialized >> bytes=3059 >> >> 14/10/01 18:38:12 INFO mapred.JobClient: Map input records=6 >> >> 14/10/01 18:38:12 INFO mapred.JobClient: *Reduce shuffle bytes=3059* >> >> 14/10/01 18:38:12 INFO mapred.JobClient: Spilled Records=544 >> >> 14/10/01 18:38:12 INFO mapred.JobClient: Map output bytes=2509 >> >> 14/10/01 18:38:12 INFO mapred.JobClient: Total committed heap usage >> >> >> I am trying to find the shuffling traffic that is total traffic >> generated when mappers exchange their key values pair with the reducer. Is >> the highlighted portion gives the shuffling traffic ? >> >> >> Thanks & Regards, >> >> Abdul Navaz >> Research Assistant >> University of Houston Main Campus, Houston TX >> Ph: 281-685-0388 >> >> >> >> >> On 9/26/14, 12:00 AM, "karthikeyan S" <[email protected]> wrote: >> >> The reducer starts as soon as it has data available from any one of the >> mappers. >> The reducer keeps polling the AM and asks if any mapper has completed >> processing. If so it fetches data from that mapper. >> So it's not necessary for all the mappers of a task to complete for >> the reducer to start processing. >> >> When the reducers starts fetching the data from the mappers it prints >> that info in its syslog, from what I have seen. >> >> Thanks, >> Karthik >> >> On Thu, Sep 25, 2014 at 8:27 PM, Bing Jiang <[email protected]> >> wrote: >> >> see mapreduce.job.reduce.slowstart.completedmaps >> It gives hint of when reduce tasks could kick off. >> >> 2014-09-26 8:36 GMT+08:00 Abdul Navaz <[email protected]>: >> >> >> Hello, >> >> I am having a Hadoop cluster with 1 name node and 3 data nodes. I running >> sample word count job on 1GB of file which is distributed among the HDFS. >> >> When I run the map reduce job, before even completing the mapping 100 % >> reduce starts. Say for eg map 40% reduce 10% etc. >> >> I would like to know when the shuffling traffic starts ? >> >> -> Is there any way to find out when exactly shuffling started ? Does it >> generate any syslog in the logs . >> -> How to find the total amount of shuffling traffic? >> >> >> >> Thanks & Regards, >> >> Abdul Navaz >> Research Assistant >> University of Houston Main Campus, Houston TX >> Ph: 281-685-0388 >> >> >> >> >> -- >> Bing Jiang >> Tel:(86)134-2619-1361 >> weibo: http://weibo.com/jiangbinglover >> BLOG: www.binospace.com >> BLOG: http://blog.sina.com.cn/jiangbinglover >> Focus on distributed computing, HDFS/HBase >> >> >> >
