Thanks Markus, this was as I suspected but just wanted to make sure before I went and implemented it myself :)
Chris On 18 July 2011 18:29, Markus Jelsma <[email protected]> wrote: > I think you need to implement your own fetch schedule. But it might also be > worthwhile checking out the adaptive fetch schedule that comes with Nutch. > > > Hi all, > > > > Currently I am aware that the time between a page being re-crawled is > > specified in the nutch-site.xml configuration. However I am interested in > > modifying this value on a per-crawl basis, the reasons for which will > later > > become apparent. However I see no method of modifying this value at > runtime > > e.g. through a command line argument to one of the whole-web crawl > > commands, is it modifiable in the configuration file only? > > > > If it is only available in the configuration file, from a few experiments > > running crawls, it seems that the configuration is read when the page is > > crawled, then some kind of next-crawl-time is specified. I.e. I can > modify > > the configuration value, run a crawl, then even if I shorten the > > configuration's recrawl period it won't affect previously crawled pages' > > next crawl time. Is this analysis correct? If so I can just work around > the > > lack of command line option by modifying the config file for each crawl. > > > > Thanks > > > > Chris >

