Hi,
Currently,I am running nutch in a single Linux box with 1G memory and one
3GHZ Intel P4 CPU. The hadoop is running in local mode.Now I am trying to
reparse html pages fetched. The process is very slow,it require more than 10
days for processing nearly 20M pages. I am wondering whether the two solutions
below can improve the performance ?
1. Increase the memory size ?
2. Run the hadoop in distributed mode,and use more than map/reduce job in one
machine?
Any suggestions about improve the performance are welcome ! Thanks in advance!!
-chee
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general