Don't change options in nutch-default.xml - copy the option into nutch-site.xml and change it there. That way the change will (hopefully) survive an upgrade.
On Tue, Oct 6, 2009 at 1:01 AM, Gaurang Patel <gaurangtpa...@gmail.com> wrote: > Hey, > > Never mind. I got *generate.update.db* in *nutch-default.xml* and set it > true. > > Regards, > Gaurang > > 2009/10/5 Gaurang Patel <gaurangtpa...@gmail.com> > >> Hey Andrzej, >> >> Can you tell me where to set this property (generate.update.db)? I am >> trying to run similar kind of crawl scenario that Eric is running. >> >> -Gaurang >> >> 2009/10/5 Andrzej Bialecki <a...@getopt.org> >> >> Eric wrote: >>> >>>> Andrzej, >>>> >>>> Just to make sure I have this straight, set the generate.update.db >>>> property to true then >>>> >>>> bin/nutch generate crawl/crawldb crawl/segments -topN 100000: 16 times? >>>> >>> >>> Yes. When this property is set to true, then each fetchlist will be >>> different, because the records for those pages that are already on another >>> fetchlist will be temporarily locked. Please note that this lock holds only >>> for 1 week, so you need to fetch all segments within one week from >>> generating them. >>> >>> You can fetch and updatedb in arbitrary order, so once you fetched some >>> segments you can run the parsing and updatedb just from these segments, >>> without waiting for all 16 segments to be processed. >>> >>> >>> >>> -- >>> Best regards, >>> Andrzej Bialecki <>< >>> ___. ___ ___ ___ _ _ __________________________________ >>> [__ || __|__/|__||\/| Information Retrieval, Semantic Web >>> ___|||__|| \| || | Embedded Unix, System Integration >>> http://www.sigram.com Contact: info at sigram dot com >>> >>> >> > -- http://www.linkedin.com/in/paultomblin