By the way, if you don't use an adaptive scheduler but one that maintain's the
configured or injected interval, you can already do it by simply injecting
url's with low intervals.
-----Original message-----
> From:Markus Jelsma <[email protected]>
> Sent: Tuesday 10th December 2013 16:04
> To: [email protected]
> Subject: RE: New feature: Seed URL high fetch frequency
>
> Already in 1.x:
> https://issues.apache.org/jira/browse/NUTCH-1388
>
> Also see:
> https://issues.apache.org/jira/browse/NUTCH-1405
>
> You can already inject with fetchInterval but you need a fixedFetchInterval
> to be added to the metadata and a FetchScheduler that supports it.
>
> -----Original message-----
> > From:Otis Gospodnetic <[email protected]>
> > Sent: Tuesday 10th December 2013 15:56
> > To: [email protected]
> > Subject: New feature: Seed URL high fetch frequency
> >
> > Hi,
> >
> > While working for a client we came across a use case that seems like it
> > might not be uncommon. We may have some code to contribute.
> >
> > The use case is that we have a few seed URLs that we need to fetch at
> > relatively high frequency (e.g. every N minutes). There URLs have pointers
> > to news type of content. Thus, these seed URLs are used primarily for URL
> > discovery. From there we do w relatively shallow crawl. But the
> > important thing is that we need to make sure we get to refetching seed URLs
> > (depth=0) at some high frequency, while all other URLs can be refetched at
> > their default frequency. In case of news that actually probably means
> > "fetch once and never again".
> >
> > So I'm wondering if a simple custom "seed URL scheduler" would be of
> > interest. Something like:
> >
> > if (URL is seed)
> > fetch at seed URL fetch freq
> > else
> > fetch at standard freq
> >
> > ?
> >
> > .... or if this can already be done without a custom scheduler, I'd love to
> > know how!
> >
> > Thanks,
> > Otis
> > --
> > Performance Monitoring * Log Analytics * Search Analytics
> > Solr & Elasticsearch Support * http://sematext.com/
> >
>