2007/4/19, Briggs <[EMAIL PROTECTED]>:
Nutch 0.9
Anyone know if it is possible to be more granular regarding crawl
frequency? Meaning, that I would like some sites to be crawled more
often then others. Like, a news site should be crawled every day, but
your average business website should be crawled every 30 days. So, is
it possible to specify a crawl frequency for specific urls, or is it
only global for within the crawl db? I suppose I could have several
crawldbs or something like that, and deal with it.. but, just curious.
There's something like that in the nutch JIRA (couldn't find it,
though), only the JIRA issue is about an adaptive algorithm (as
opposed to user provided settings) which would determine the rate of
content change at any given URL and adapt the crawl frequency
accordingly. Don't know if it's more than a wish, at this point.
Cheers,
t.n.a.