Hi all,
I'm parsing a segment with 1061 urls on a Hadoop cluster (4 nodes).
I have 20 map tasks and 20 reduce tasks (ParseSegment). Each map task
takes about 14 secs to parse about 50 documents. The reduce phase takes
more time: 2 minutes for the first 4 reduce tasks, about 30 seconds for
the other 16.
Any ideas why the reduce phase is relatively slow? Could it be the i/o
on the DFS?
Counter Map Reduce Total
Map input records 1,061 0 1,061
Map output records 844 0 844
Map input bytes 3,549,700 0 3,549,700
Map output bytes 675,599 0 675,599
Reduce input records 0 844 844
Reduce output records 0 844 844
Thanks,
Mathijs
--
Knowlogy
Helperpark 290 C
9723 ZA Groningen
[EMAIL PROTECTED]
+31 (0)6 15312977
http://www.knowlogy.nl