How is your key distribution in your data? There might be a chance that the 2 reducers are getting bulk of your data because of skewed key/data distribution.
>From the counters themselves, you can see that the 2 reducers' have much higher values than the set of 14. Regards, Shahab On Tue, Sep 3, 2013 at 5:39 AM, centerqi hu <[email protected]> wrote: > hi all > > Want to know why, > Why two reduce execution time so long > But another 14 reduce is performed so fast > > Code is as follows > > > SITEG = GROUP ASET by (platform); > RES = FOREACH SITEG{ > UV = DISTINCT ASET.ukey; > > LISTTMP = FILTER ASET BY requesturl == 'list'; > LISTUV = DISTINCT LISTTMP.ukey; > > ITEMTMP = FILTER ASET BY requesturl == 'item'; > ITEMUV = DISTINCT ITEMTMP.ukey; > > TAOKETMP = FILTER ASET BY requesturl == 'jump'; > TAOKEUV = DISTINCT TAOKETMP.ukey; > > COLTMP = FILTER ASET BY requesturl == 'favorite'; > COLUV = DISTINCT COLTMP.ukey; > > GENERATE FLATTEN(group),COUNT(UV),COUNT(LISTTMP),COUNT(LISTUV), > COUNT(ITEMTMP),COUNT(ITEMUV), COUNT(TAOKETMP), > COUNT(TAOKEUV),COUNT(COLTMP),COUNT(COLUV); > }; > > A total of 16 reduce. > But 14 reduce the counter as follows > > > *File Output Format Counters*Bytes Written0 > *FileSystemCounters*FILE_BYTES_READ22 FILE_BYTES_WRITTEN124,031 > *Map-Reduce Framework* Reduce input groups0 Combine output records0Reduce > shuffle bytes 1,710Physical memory (bytes) snapshot305,762,304 Reduce > output records0Spilled Records 0CPU time spent (ms)8,850 Total committed > heap usage (bytes)757,137,408Virtual memory (bytes) snapshot > 2,749,321,216Combine > input records0 Reduce input records0 > > The other two counters > > > > > *org.apache.pig.PigCounters*PROACTIVE_SPILL_COUNT_RECS31,154,190 > SPILLABLE_MEMORY_MANAGER_SPILL_COUNT 3PROACTIVE_SPILL_COUNT_BAGS1 > *File Output Format Counters*Bytes Written 0 > *FileSystemCounters*FILE_BYTES_READ181,863,945FILE_BYTES_WRITTEN > 181,987,953 > HDFS_BYTES_WRITTEN70 > *Map-Reduce Framework*Reduce input groups 1Combine output records0Reduce > shuffle bytes225,663,351 Physical memory (bytes) > snapshot2,039,889,920Reduce > output records1Spilled Records 32,370,070Total committed heap usage (bytes) > 1,903,493,120CPU time spent (ms)925,630 Virtual memory (bytes) snapshot > 2,727,219,200Combine input records > 0 > [email protected] >
