Re: Hadoop shuffling traffic

Abdul Navaz Wed, 01 Oct 2014 18:07:24 -0700

Hello,

This is the portion of the output which is displayed on the console when I
run sample word count job.


map 0% reduce 0%

14/10/01 18:37:52 INFO mapred.JobClient:  map 100% reduce 0%

14/10/01 18:38:10 INFO mapred.JobClient:  map 100% reduce 100%

14/10/01 18:38:12 INFO mapred.JobClient: Job complete: job_201409262002_0003

14/10/01 18:38:12 INFO mapred.JobClient: Counters: 29

14/10/01 18:38:12 INFO mapred.JobClient:   Job Counters

14/10/01 18:38:12 INFO mapred.JobClient:     Launched reduce tasks=1

14/10/01 18:38:12 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=23511

14/10/01 18:38:12 INFO mapred.JobClient:     Total time spent by all reduces
waiting after reserving slots (ms)=0

14/10/01 18:38:12 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0

14/10/01 18:38:12 INFO mapred.JobClient:     Launched map tasks=1

14/10/01 18:38:12 INFO mapred.JobClient:     Data-local map tasks=1

14/10/01 18:38:12 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=14193

14/10/01 18:38:12 INFO mapred.JobClient:   File Output Format Counters

14/10/01 18:38:12 INFO mapred.JobClient:     Bytes Written=1106

14/10/01 18:38:12 INFO mapred.JobClient:   FileSystemCounters

14/10/01 18:38:12 INFO mapred.JobClient:     FILE_BYTES_READ=3059

14/10/01 18:38:12 INFO mapred.JobClient:     HDFS_BYTES_READ=1601

14/10/01 18:38:12 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=108400

14/10/01 18:38:12 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1106

14/10/01 18:38:12 INFO mapred.JobClient:   File Input Format Counters

14/10/01 18:38:12 INFO mapred.JobClient:     Bytes Read=1486

14/10/01 18:38:12 INFO mapred.JobClient:   Map-Reduce Framework

14/10/01 18:38:12 INFO mapred.JobClient:     Map output materialized
bytes=3059

14/10/01 18:38:12 INFO mapred.JobClient:     Map input records=6

14/10/01 18:38:12 INFO mapred.JobClient:     Reduce shuffle bytes=3059

14/10/01 18:38:12 INFO mapred.JobClient:     Spilled Records=544

14/10/01 18:38:12 INFO mapred.JobClient:     Map output bytes=2509

14/10/01 18:38:12 INFO mapred.JobClient:     Total committed heap usage



I am trying to find the shuffling traffic that is total traffic generated
when mappers exchange their key values pair with the reducer. Is the
highlighted portion gives the shuffling traffic ?


Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388




On 9/26/14, 12:00 AM, "karthikeyan S" <[email protected]> wrote:

> The reducer starts as soon as it has data available from any one of the
> mappers.
> The reducer keeps polling the AM and asks if any mapper has completed
> processing. If so it fetches data from that mapper.
> So it's not necessary for all the mappers of a task to complete for
> the reducer to start processing.
> 
> When the reducers starts fetching the data from the mappers it prints
> that info in its syslog, from what I have seen.
> 
> Thanks,
> Karthik
> 
> On Thu, Sep 25, 2014 at 8:27 PM, Bing Jiang <[email protected]> wrote:
>>  see mapreduce.job.reduce.slowstart.completedmaps
>>  It gives hint of  when reduce tasks could kick off.
>> 
>>  2014-09-26 8:36 GMT+08:00 Abdul Navaz <[email protected]>:
>>> 
>>>  Hello,
>>> 
>>>  I am having a Hadoop cluster with 1 name node and 3 data nodes. I running
>>>  sample word count job on 1GB of file which is distributed among the HDFS.
>>> 
>>>  When I run the map reduce job, before even completing the mapping 100 %
>>>  reduce starts.  Say for eg map 40% reduce 10% etc.
>>> 
>>>  I would like to know when the shuffling traffic starts ?
>>> 
>>>  ->  Is there any way to find out when exactly shuffling started ?  Does it
>>>  generate any syslog in the logs .
>>>  -> How to find the total amount of shuffling traffic?
>>> 
>>> 
>>> 
>>>  Thanks & Regards,
>>> 
>>>  Abdul Navaz
>>>  Research Assistant
>>>  University of Houston Main Campus, Houston TX
>>>  Ph: 281-685-0388
>>> 
>> 
>> 
>> 
>>  --
>>  Bing Jiang
>>  Tel：(86)134-2619-1361
>>  weibo: http://weibo.com/jiangbinglover
>>  BLOG: www.binospace.com
>>  BLOG: http://blog.sina.com.cn/jiangbinglover
>>  Focus on distributed computing, HDFS/HBase
>

Re: Hadoop shuffling traffic

Reply via email to