In my situation, I find that linkdb merge takes much more time than fetch and parse combined, even though fetch is fully polite.
What is the standard advice for making linkdb-merge go faster? I call invertlinks like this: __bin_nutch invertlinks "$CRAWL_PATH"/linkdb "$CRAWL_PATH"/segments/$SEGMENT invertlinks seems to call mergelinkdb automatically. I currently have about 3-6 slaves for fetching, though that will increase soon. I am currently using small segment sizes (3000 urls) but I can increase that if it would help. I have the following properties that may be relevant. <property> <name>db.max.outlinks.per.page</name> <value>1000</value> </property> <property> <name>db.ignore.external.links</name> <value>false</value> </property> The following props are left as default in nutch-default.xml <property> <name>db.update.max.inlinks</name> <value>10000</value> </property> <property> <name>db.ignore.internal.links</name> <value>false</value> </description> </property> <property> <name>db.ignore.external.links</name> <value>false</value> </description> </property>

