Already in 1.x: https://issues.apache.org/jira/browse/NUTCH-1388
Also see: https://issues.apache.org/jira/browse/NUTCH-1405 You can already inject with fetchInterval but you need a fixedFetchInterval to be added to the metadata and a FetchScheduler that supports it. -----Original message----- > From:Otis Gospodnetic <[email protected]> > Sent: Tuesday 10th December 2013 15:56 > To: [email protected] > Subject: New feature: Seed URL high fetch frequency > > Hi, > > While working for a client we came across a use case that seems like it > might not be uncommon. We may have some code to contribute. > > The use case is that we have a few seed URLs that we need to fetch at > relatively high frequency (e.g. every N minutes). There URLs have pointers > to news type of content. Thus, these seed URLs are used primarily for URL > discovery. From there we do w relatively shallow crawl. But the > important thing is that we need to make sure we get to refetching seed URLs > (depth=0) at some high frequency, while all other URLs can be refetched at > their default frequency. In case of news that actually probably means > "fetch once and never again". > > So I'm wondering if a simple custom "seed URL scheduler" would be of > interest. Something like: > > if (URL is seed) > fetch at seed URL fetch freq > else > fetch at standard freq > > ? > > .... or if this can already be done without a custom scheduler, I'd love to > know how! > > Thanks, > Otis > -- > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ >

