I think you need to implement your own fetch schedule. But it might also be 
worthwhile checking out the adaptive fetch schedule that comes with Nutch.

> Hi all,
> 
> Currently I am aware that the time between a page being re-crawled is
> specified in the nutch-site.xml configuration. However I am interested in
> modifying this value on a per-crawl basis, the reasons for which will later
> become apparent. However I see no method of modifying this value at runtime
> e.g. through a command line argument to one of the whole-web crawl
> commands, is it modifiable in the configuration file only?
> 
> If it is only available in the configuration file, from a few experiments
> running crawls, it seems that the configuration is read when the page is
> crawled, then some kind of next-crawl-time is specified. I.e. I can modify
> the configuration value, run a crawl, then even if I shorten the
> configuration's recrawl period it won't affect previously crawled pages'
> next crawl time. Is this analysis correct? If so I can just work around the
> lack of command line option by modifying the config file for each crawl.
> 
> Thanks
> 
> Chris

Reply via email to