Hi. Maybe one cause: Have you seen topN (fetchlist) parameter inside bin/crawl script (line 117) sizeFetchlist=`expr $numSlaves \* 50` this number could limit your url list.
Also check your filters. Tell me if you have solved the problem ----- Mensaje original ----- De: "Chip Calhoun" <[email protected]> Para: [email protected] Enviados: Jueves, 11 de Mayo 2017 16:30:34 Asunto: [MASSMAIL]Nutch not indexing all seed URLs I'm using Nutch 1.12 to index a local site. To keep Nutch from indexing the uninteresting navigation pages on my site, I've made a URLs list of all the URLs I want crawled; the current list is 2522 URLs. However, the indexer stopped after just 1077 of these URLs. My generate.max.count is set to -1. What would cause my URLs to be skipped? Chip Calhoun Digital Archivist Niels Bohr Library & Archives American Institute of Physics One Physics Ellipse College Park, MD 20740-3840 USA Tel: +1 301-209-3180 Email: [email protected] https://www.aip.org/history-programs/niels-bohr-library La @universidad_uci es Fidel. Los jóvenes no fallaremos. #HastaSiempreComandante #HastalaVictoriaSiempre

