I think you need to implement your own fetch schedule. But it might also be worthwhile checking out the adaptive fetch schedule that comes with Nutch.
> Hi all, > > Currently I am aware that the time between a page being re-crawled is > specified in the nutch-site.xml configuration. However I am interested in > modifying this value on a per-crawl basis, the reasons for which will later > become apparent. However I see no method of modifying this value at runtime > e.g. through a command line argument to one of the whole-web crawl > commands, is it modifiable in the configuration file only? > > If it is only available in the configuration file, from a few experiments > running crawls, it seems that the configuration is read when the page is > crawled, then some kind of next-crawl-time is specified. I.e. I can modify > the configuration value, run a crawl, then even if I shorten the > configuration's recrawl period it won't affect previously crawled pages' > next crawl time. Is this analysis correct? If so I can just work around the > lack of command line option by modifying the config file for each crawl. > > Thanks > > Chris

