Thank you Lewis and Rémy for your replies. I'll have to modify my scripts to use the individual commands and use the mentioned crawl script.
Thanks a lot for your help, Max -----Original Message----- From: Lewis John Mcgibbney [mailto:[email protected]] Sent: den 30 augusti 2012 12:26 To: [email protected] Subject: Re: recrawl a URL? Hi Max, On Tue, Aug 28, 2012 at 3:24 PM, Max Dzyuba <[email protected]> wrote: > Is it possible to use the same crawldb but store segment data in a > different directory for consecutive crawls using the "bin/nutch crawl" > command? I thought that there is no option to specify the path to > crawldb or linkdb, but only the path to a directory where to save all > crawl data into. I'm using Nutch 1.5. If it's possible, how would the crawl command look like? No this is not possible out of the box as it would make the generic cmdlin solution too convoluted. As you mention, in the past we only specified one directory for all crawl data and this is still the same. Please note that the crawl command is now deprecated in trunk and will not be supported via convenience commands from the nutch script in future releases. Julian and others implemented a crawl script which gives you much more control over your crawl cycles. I must finally add that it would be a piece of cake to edit the script for your purposes e.g. set a variable to todays date, create a directory with the variable then move your data there via the script... or something similar. For reference the script can be seen below http://svn.apache.org/repos/asf/nutch/trunk/src/bin/crawl

