Hi Andres, Check your rules in the URL filters.
Roannel ----- Mensaje original ----- > De: "Andrés Rincón Pacheco" <[email protected]> > Para: [email protected] > Enviados: Jueves, 8 de Octubre 2015 9:26:11 > Asunto: [MASSMAIL]Nutch only fetch and parse the third part of urls > > Hi, > > I am using nutch 1.9, after review the urls added by the Injector the total > url is 25146. > (Log evidence) > crawl.Injector - Injector: Total number of urls after normalization: 25146 > > When I was checking the log file only 7003 urls was fetched and 6727 urls > was parsed. > > And these are the statistics: > > CrawlDb statistics start: ../crawlInfo/crawldb > Statistics for CrawlDb: ../crawlInfo/crawldb > TOTAL urls: 30914 > retry 0: 30913 > retry 1: 1 > min score: 0.0 > avg score: 0.4359605 > max score: 100.002 > status 1 (db_unfetched): 23912 > status 2 (db_fetched): 6727 > status 3 (db_gone): 8 > status 4 (db_redir_temp): 266 > status 5 (db_redir_perm): 1 > CrawlDb statistics: done > > Why only the third part (approximately) urls is fetched and parsed? > > Thanks. > 17 de octubre: Final Cubana 2015 del Concurso de Programación ACM-ICPC. http://coj.uci.cu/contest/contestview.xhtml?cid07

