Thanks Markus, this was as I suspected but just wanted to make sure before I
went and implemented it myself :)

Chris

On 18 July 2011 18:29, Markus Jelsma <[email protected]> wrote:

> I think you need to implement your own fetch schedule. But it might also be
> worthwhile checking out the adaptive fetch schedule that comes with Nutch.
>
> > Hi all,
> >
> > Currently I am aware that the time between a page being re-crawled is
> > specified in the nutch-site.xml configuration. However I am interested in
> > modifying this value on a per-crawl basis, the reasons for which will
> later
> > become apparent. However I see no method of modifying this value at
> runtime
> > e.g. through a command line argument to one of the whole-web crawl
> > commands, is it modifiable in the configuration file only?
> >
> > If it is only available in the configuration file, from a few experiments
> > running crawls, it seems that the configuration is read when the page is
> > crawled, then some kind of next-crawl-time is specified. I.e. I can
> modify
> > the configuration value, run a crawl, then even if I shorten the
> > configuration's recrawl period it won't affect previously crawled pages'
> > next crawl time. Is this analysis correct? If so I can just work around
> the
> > lack of command line option by modifying the config file for each crawl.
> >
> > Thanks
> >
> > Chris
>

Reply via email to