Eric wrote:
Andrzej,
Just to make sure I have this straight, set the generate.update.db
property to true then
bin/nutch generate crawl/crawldb crawl/segments -topN 100000: 16 times?
Yes. When this property is set to true, then each fetchlist will be
different, because the records for those pages that are already on
another fetchlist will be temporarily locked. Please note that this lock
holds only for 1 week, so you need to fetch all segments within one week
from generating them.
You can fetch and updatedb in arbitrary order, so once you fetched some
segments you can run the parsing and updatedb just from these segments,
without waiting for all 16 segments to be processed.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com