[Nutch-general] ParseSegment: slow reduce phase

Mathijs Homminga Mon, 14 May 2007 04:14:55 -0700

Hi all,

I'm parsing a segment with 1061 urls on a Hadoop cluster (4 nodes).
I have 20 map tasks and 20 reduce tasks (ParseSegment). Each map task 
takes about 14 secs to parse about 50 documents. The reduce phase takes 
more time: 2 minutes for the first 4 reduce tasks, about 30 seconds for 
the other 16.


Any ideas why the reduce phase is relatively slow? Could it be the i/o 
on the DFS?

Counter     Map     Reduce     Total
Map input records     1,061     0     1,061
Map output records     844     0     844
Map input bytes     3,549,700     0     3,549,700
Map output bytes     675,599     0     675,599
Reduce input records     0     844     844
Reduce output records     0     844     844

Thanks,
Mathijs

-- 
Knowlogy
Helperpark 290 C
9723 ZA Groningen

[EMAIL PROTECTED]
+31 (0)6 15312977
http://www.knowlogy.nl



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] ParseSegment: slow reduce phase

Reply via email to