Already in 1.x:
https://issues.apache.org/jira/browse/NUTCH-1388

Also see:
https://issues.apache.org/jira/browse/NUTCH-1405

You can already inject with fetchInterval but you need a fixedFetchInterval to 
be added to the metadata and a FetchScheduler that supports it.
 
-----Original message-----
> From:Otis Gospodnetic <[email protected]>
> Sent: Tuesday 10th December 2013 15:56
> To: [email protected]
> Subject: New feature: Seed URL high fetch frequency
> 
> Hi,
> 
> While working for a client we came across a use case that seems like it
> might not be uncommon.  We may have some code to contribute.
> 
> The use case is that we have a few seed URLs that we need to fetch at
> relatively high frequency (e.g. every N minutes).  There URLs have pointers
> to news type of content.  Thus, these seed URLs are used primarily for URL
> discovery.  From there we do w  relatively shallow crawl.  But the
> important thing is that we need to make sure we get to refetching seed URLs
> (depth=0) at some high frequency, while all other URLs can be refetched at
> their default frequency.  In case of news that actually probably means
> "fetch once and never again".
> 
> So I'm wondering if a simple custom "seed URL scheduler" would be of
> interest.  Something like:
> 
> if (URL is seed)
>   fetch at seed URL fetch freq
> else
>   fetch at standard freq
> 
> ?
> 
> .... or if this can already be done without a custom scheduler, I'd love to
> know how!
> 
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
> 

Reply via email to