Hey Andrzej,

Can you tell me where to set this property (generate.update.db)? I am trying
to run similar kind of crawl scenario that Eric is running.

-Gaurang

2009/10/5 Andrzej Bialecki <a...@getopt.org>

> Eric wrote:
>
>> Andrzej,
>>
>> Just to make sure I have this straight, set the generate.update.db
>> property to true then
>>
>> bin/nutch generate crawl/crawldb crawl/segments -topN 100000: 16 times?
>>
>
> Yes. When this property is set to true, then each fetchlist will be
> different, because the records for those pages that are already on another
> fetchlist will be temporarily locked. Please note that this lock holds only
> for 1 week, so you need to fetch all segments within one week from
> generating them.
>
> You can fetch and updatedb in arbitrary order, so once you fetched some
> segments you can run the parsing and updatedb just from these segments,
> without waiting for all 16 segments to be processed.
>
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>

Reply via email to