thanks a lot markus for your answer. For now maybe i need to use jexl expresion because i have to many documents unfetched and is important for me to crawl it first. I have used a command (bin/crawl urls/ crawl/ 5) Can you tell me how use jexl parameter ?, please one example using the command will be appreciated.
Later i will use my own custom scoring using perhaps a percent of topN parameter dedicated to status of crawldb(unfetched) and other percent using normal scoring. this is for avoid traps. Thanks a lot. ----- Mensaje original ----- De: "Markus Jelsma" <[email protected]> Para: [email protected] Enviados: Martes, 25 de Octubre 2016 12:48:06 Asunto: ***UNCHECKED*** [MASSMAIL]RE: generator conditional by crawldb status Yes, you can using the -expr with an JEXL expression e.g. -expr '(status = "db_fetched")' Fields are here: https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/crawl/CrawlDatum.java#L524 But you can also achieve this using a custom scoring filter, which is a much more elegant solution. Take care of spider traps, if you prioritize unfetched unconditionally, you can easily fall into such a trap and not come out of it. -----Original message----- > From:Eyeris Rodriguez Rueda <[email protected]> > Sent: Tuesday 25th October 2016 18:34 > To: [email protected] > Subject: generator conditional by crawldb status > > Hi all. > I am using nutch 1.12 and solr 4.10.3 with linuxmint 18. > I want to crawl pages from crawldb using this order. > > 1-unfetched > 2-modified > 3-gone > and others > > I know that generator process is which decides what pages are selected or not > from crawldb. > Any help or advice to crawl pages in that order will be appreciated. > > Greetings. >

