Hey Andrzej, Can you tell me where to set this property (generate.update.db)? I am trying to run similar kind of crawl scenario that Eric is running.
-Gaurang 2009/10/5 Andrzej Bialecki <a...@getopt.org> > Eric wrote: > >> Andrzej, >> >> Just to make sure I have this straight, set the generate.update.db >> property to true then >> >> bin/nutch generate crawl/crawldb crawl/segments -topN 100000: 16 times? >> > > Yes. When this property is set to true, then each fetchlist will be > different, because the records for those pages that are already on another > fetchlist will be temporarily locked. Please note that this lock holds only > for 1 week, so you need to fetch all segments within one week from > generating them. > > You can fetch and updatedb in arbitrary order, so once you fetched some > segments you can run the parsing and updatedb just from these segments, > without waiting for all 16 segments to be processed. > > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > >