Chirag & ir,
Thank you for your help!
Since threads are handled automatically, it seems that ir's solution of
starting up separate instances of nutch crawl is the only way to get things
done in parallell.
First question: How do I get multiple instances of "nutch crawl" running at
the same time?
Do I simply start the same "Nutch"-script repeatedly with different
parameters or do I need to have completely separate folders, each with a
complete Nutch installation to be able to have separate urls-file and
crawl-urlfilter.txt for each instance?
Second question: Starting 20 instances of Nutch will lead to 20 indexes. Can
I combine them with "merge" or "mergesegs" without losing any data?
* Ir: I am definitely interested in your ideas, please explain further!
* Chirag: My version of Nutch is 0.6, just downloaded from the project's
homepage. I searched the web for patches for Nutch, the best hit was
http://issues.apache.org/jira/secure/IssueNavigator.jspa Most of these
issues seem to have files attached to them, but I cant decide whether they
are all acutal patches. Is one of these (Nutch-54?) the patch by Andrzej
that you suggested and if so, how do I install it?
Kind regards,
Jon
_________________________________________________________________
Nyhet! Hotmail direkt i din Mobil! http://mobile.msn.com/