Eric wrote:
Andrzej,

Just to make sure I have this straight, set the generate.update.db property to true then

bin/nutch generate crawl/crawldb crawl/segments -topN 100000: 16 times?

Yes. When this property is set to true, then each fetchlist will be different, because the records for those pages that are already on another fetchlist will be temporarily locked. Please note that this lock holds only for 1 week, so you need to fetch all segments within one week from generating them.

You can fetch and updatedb in arbitrary order, so once you fetched some segments you can run the parsing and updatedb just from these segments, without waiting for all 16 segments to be processed.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to