Hi all, I'm parsing a segment with 1061 urls on a Hadoop cluster (4 nodes). I have 20 map tasks and 20 reduce tasks (ParseSegment). Each map task takes about 14 secs to parse about 50 documents. The reduce phase takes more time: 2 minutes for the first 4 reduce tasks, about 30 seconds for the other 16.
Any ideas why the reduce phase is relatively slow? Could it be the i/o on the DFS? Counter Map Reduce Total Map input records 1,061 0 1,061 Map output records 844 0 844 Map input bytes 3,549,700 0 3,549,700 Map output bytes 675,599 0 675,599 Reduce input records 0 844 844 Reduce output records 0 844 844 Thanks, Mathijs -- Knowlogy Helperpark 290 C 9723 ZA Groningen [EMAIL PROTECTED] +31 (0)6 15312977 http://www.knowlogy.nl ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
