The MapReduce-Job contains a shuffle phase, where the intermediary map outputs are copied to the reducer nodes. This phase of the job is assumed to be part of the reduce-phase, therefore. the counter already starts before the map-phase has finished. The actual reduce task will be started, just as you have heard, when all the map tasks are finished.
On Wednesday, April 23, 2014 1:18:40 PM UTC+2, Kishore kumar wrote: > > Hi All, > > I heard about the reduce job, it will be started after all map tasks > finished 100%, but in my hive query the reduce job started at below stage, > please explain why is this.(I copied below line when the job is running). > > 2014-04-22 21:15:12,803 Stage-1 map = 83%, reduce = 1%, Cumulative CPU > 4194.4 sec > > -- > > > *Kishore * >