Hi all,

I'm parsing a segment with 1061 urls on a Hadoop cluster (4 nodes).
I have 20 map tasks and 20 reduce tasks (ParseSegment). Each map task takes about 14 secs to parse about 50 documents. The reduce phase takes more time: 2 minutes for the first 4 reduce tasks, about 30 seconds for the other 16.

Any ideas why the reduce phase is relatively slow? Could it be the i/o on the DFS?

Counter     Map     Reduce     Total
Map input records     1,061     0     1,061
Map output records     844     0     844
Map input bytes     3,549,700     0     3,549,700
Map output bytes     675,599     0     675,599
Reduce input records     0     844     844
Reduce output records     0     844     844

Thanks,
Mathijs

--
Knowlogy
Helperpark 290 C
9723 ZA Groningen

[EMAIL PROTECTED]
+31 (0)6 15312977
http://www.knowlogy.nl


Reply via email to