Hi Abdul,
That is the right metric. You can take a look at this report we made on
this earlier:
http://www.slideshare.net/pramodbiligiri/shuffle-phase-as-the-bottleneck-in-hadoop-terasort

Pramod

On Wed, Oct 1, 2014 at 6:06 PM, Abdul Navaz <[email protected]> wrote:

> Hello,
>
> This is the portion of the output which is displayed on the console when I
> run sample word count job.
>
> map 0% reduce 0%
>
> 14/10/01 18:37:52 INFO mapred.JobClient:  map 100% reduce 0%
>
> 14/10/01 18:38:10 INFO mapred.JobClient:  map 100% reduce 100%
>
> 14/10/01 18:38:12 INFO mapred.JobClient: Job complete:
> job_201409262002_0003
>
> 14/10/01 18:38:12 INFO mapred.JobClient: Counters: 29
>
> 14/10/01 18:38:12 INFO mapred.JobClient:   Job Counters
>
> 14/10/01 18:38:12 INFO mapred.JobClient:     Launched reduce tasks=1
>
> 14/10/01 18:38:12 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=23511
>
> 14/10/01 18:38:12 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
>
> 14/10/01 18:38:12 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
>
> 14/10/01 18:38:12 INFO mapred.JobClient:     Launched map tasks=1
>
> 14/10/01 18:38:12 INFO mapred.JobClient:     Data-local map tasks=1
>
> 14/10/01 18:38:12 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=14193
>
> 14/10/01 18:38:12 INFO mapred.JobClient:   File Output Format Counters
>
> 14/10/01 18:38:12 INFO mapred.JobClient:     Bytes Written=1106
>
> 14/10/01 18:38:12 INFO mapred.JobClient:   FileSystemCounters
>
> 14/10/01 18:38:12 INFO mapred.JobClient:     FILE_BYTES_READ=3059
>
> 14/10/01 18:38:12 INFO mapred.JobClient:     HDFS_BYTES_READ=1601
>
> 14/10/01 18:38:12 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=108400
>
> 14/10/01 18:38:12 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1106
>
> 14/10/01 18:38:12 INFO mapred.JobClient:   File Input Format Counters
>
> 14/10/01 18:38:12 INFO mapred.JobClient:     Bytes Read=1486
>
> 14/10/01 18:38:12 INFO mapred.JobClient:   Map-Reduce Framework
>
> 14/10/01 18:38:12 INFO mapred.JobClient:     Map output materialized
> bytes=3059
>
> 14/10/01 18:38:12 INFO mapred.JobClient:     Map input records=6
>
> 14/10/01 18:38:12 INFO mapred.JobClient:     *Reduce shuffle bytes=3059*
>
> 14/10/01 18:38:12 INFO mapred.JobClient:     Spilled Records=544
>
> 14/10/01 18:38:12 INFO mapred.JobClient:     Map output bytes=2509
>
> 14/10/01 18:38:12 INFO mapred.JobClient:     Total committed heap usage
>
>
> I am trying to find the shuffling traffic that is total traffic generated
> when mappers exchange their key values pair with the reducer. Is the
> highlighted portion gives the shuffling traffic ?
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
>
>
> On 9/26/14, 12:00 AM, "karthikeyan S" <[email protected]> wrote:
>
> The reducer starts as soon as it has data available from any one of the
> mappers.
> The reducer keeps polling the AM and asks if any mapper has completed
> processing. If so it fetches data from that mapper.
> So it's not necessary for all the mappers of a task to complete for
> the reducer to start processing.
>
> When the reducers starts fetching the data from the mappers it prints
> that info in its syslog, from what I have seen.
>
> Thanks,
> Karthik
>
> On Thu, Sep 25, 2014 at 8:27 PM, Bing Jiang <[email protected]>
> wrote:
>
> see mapreduce.job.reduce.slowstart.completedmaps
> It gives hint of  when reduce tasks could kick off.
>
> 2014-09-26 8:36 GMT+08:00 Abdul Navaz <[email protected]>:
>
>
> Hello,
>
> I am having a Hadoop cluster with 1 name node and 3 data nodes. I running
> sample word count job on 1GB of file which is distributed among the HDFS.
>
> When I run the map reduce job, before even completing the mapping 100 %
> reduce starts.  Say for eg map 40% reduce 10% etc.
>
> I would like to know when the shuffling traffic starts ?
>
> ->  Is there any way to find out when exactly shuffling started ?  Does it
> generate any syslog in the logs .
> -> How to find the total amount of shuffling traffic?
>
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
>
>
> --
> Bing Jiang
> Tel:(86)134-2619-1361
> weibo: http://weibo.com/jiangbinglover
> BLOG: www.binospace.com
> BLOG: http://blog.sina.com.cn/jiangbinglover
> Focus on distributed computing, HDFS/HBase
>
>
>

Reply via email to