You could also pass it a high score during the injection without having to write a custom filter and rely on metadata
see http://wiki.apache.org/nutch/bin/nutch%20inject e.g. http://www.thissimplycannotwait.com *nutch.score=10000* the trouble with this is that is that it might have an impact on the weight of the document once indexed. Markus' approach is cleaner as you'd control the sort value used by the generation directly. On 25 November 2015 at 21:15, Markus Jelsma <[email protected]> wrote: > Hello - please see the freegen tool. It takes a list of URL's and > generates a fetch list for them that you can crawl immediately. The > drawback is that regular metadata changes differently when running updatedb > vs regular generated URL's. We use it too to prioritize URL's while also > running regular crawl cycles. > > Another possibility would be a custom scoring filter, inject them with > certain metadata and prioritize on that. > > M. > > > > -----Original message----- > > From:Gaspar Pizarro <[email protected]> > > Sent: Wednesday 25th November 2015 21:56 > > To: [email protected] > > Subject: Manipulate queues > > > > Hi, > > > > I am doing whole-web crawling, but from time to time I need to fetch a > list > > of urls immediately, with top priority. I was watching what happens with > > the injected urls when there are already non-fetched urls, and, when > > fetching the new injected urls are not on top on the list. Is there a way > > to ensure the new injected urls are on top of the next generate-fetch > queue? > > > > Thanks > > > -- *Open Source Solutions for Text Engineering* http://www.digitalpebble.com http://digitalpebble.blogspot.com/ #digitalpebble <http://twitter.com/digitalpebble>

