Yes, you can using the -expr with an JEXL expression e.g. -expr '(status = 
"db_fetched")'

Fields are here: 
https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/crawl/CrawlDatum.java#L524

But you can also achieve this using a custom scoring filter, which is a much 
more elegant solution. Take care of spider traps, if you prioritize unfetched 
unconditionally, you can easily fall into such a trap and not come out of it.
 
-----Original message-----
> From:Eyeris Rodriguez Rueda <[email protected]>
> Sent: Tuesday 25th October 2016 18:34
> To: [email protected]
> Subject: generator conditional by crawldb status
> 
> Hi all.
> I am using nutch 1.12 and solr 4.10.3 with linuxmint 18.
> I want to crawl pages from crawldb using this order.
> 
> 1-unfetched 
> 2-modified
> 3-gone
> and others
> 
> I know that generator process is which decides what pages are selected or not 
> from crawldb.
> Any help or advice to crawl pages in that order will be appreciated.
> 
> Greetings.
> 

Reply via email to