Chirag & ir,

Thank you for your help!

Since threads are handled automatically, it seems that ir's solution of starting up separate instances of nutch crawl is the only way to get things done in parallell.

First question: How do I get multiple instances of "nutch crawl" running at the same time? Do I simply start the same "Nutch"-script repeatedly with different parameters or do I need to have completely separate folders, each with a complete Nutch installation to be able to have separate urls-file and crawl-urlfilter.txt for each instance?

Second question: Starting 20 instances of Nutch will lead to 20 indexes. Can I combine them with "merge" or "mergesegs" without losing any data?

* Ir: I am definitely interested in your ideas, please explain further!

* Chirag: My version of Nutch is 0.6, just downloaded from the project's homepage. I searched the web for patches for Nutch, the best hit was http://issues.apache.org/jira/secure/IssueNavigator.jspa Most of these issues seem to have files attached to them, but I cant decide whether they are all acutal patches. Is one of these (Nutch-54?) the patch by Andrzej that you suggested and if so, how do I install it?

Kind regards,

Jon

_________________________________________________________________
Nyhet! Hotmail direkt i din Mobil! http://mobile.msn.com/

Reply via email to