Hi Andres,

Check your rules in the URL filters.

Roannel

----- Mensaje original -----
> De: "Andrés Rincón Pacheco" <[email protected]>
> Para: [email protected]
> Enviados: Jueves, 8 de Octubre 2015 9:26:11
> Asunto: [MASSMAIL]Nutch only fetch and parse the third part of urls
> 
> Hi,
> 
> I am using nutch 1.9, after review the urls added by the Injector the total
> url is 25146.
> (Log evidence)
> crawl.Injector - Injector: Total number of urls after normalization: 25146
> 
> When I was checking the log file only 7003 urls was fetched and 6727 urls
> was parsed.
> 
> And these are the statistics:
> 
> CrawlDb statistics start: ../crawlInfo/crawldb
> Statistics for CrawlDb: ../crawlInfo/crawldb
> TOTAL urls:     30914
> retry 0:        30913
> retry 1:        1
> min score:      0.0
> avg score:      0.4359605
> max score:      100.002
> status 1 (db_unfetched):        23912
> status 2 (db_fetched):  6727
> status 3 (db_gone):     8
> status 4 (db_redir_temp):       266
> status 5 (db_redir_perm):       1
> CrawlDb statistics: done
> 
> Why only the third part (approximately) urls is fetched and parsed?
> 
> Thanks.
> 
17 de octubre: Final Cubana 2015 del Concurso de Programación ACM-ICPC.
http://coj.uci.cu/contest/contestview.xhtml?cid07

Reply via email to