Hi all,

Currently I am aware that the time between a page being re-crawled is
specified in the nutch-site.xml configuration. However I am interested in
modifying this value on a per-crawl basis, the reasons for which will later
become apparent. However I see no method of modifying this value at runtime
e.g. through a command line argument to one of the whole-web crawl commands,
is it modifiable in the configuration file only?

If it is only available in the configuration file, from a few experiments
running crawls, it seems that the configuration is read when the page is
crawled, then some kind of next-crawl-time is specified. I.e. I can modify
the configuration value, run a crawl, then even if I shorten the
configuration's recrawl period it won't affect previously crawled pages'
next crawl time. Is this analysis correct? If so I can just work around the
lack of command line option by modifying the config file for each crawl.

Thanks

Chris

Reply via email to