Thank you. The problem was right below that; I had the default "timeLimitFetch=180", and it stopped after 3 hours. I'll bump that up to something ridiculous and try again.
Chip -----Original Message----- From: Eyeris Rodriguez Rueda [mailto:[email protected]] Sent: Thursday, May 11, 2017 4:46 PM To: [email protected] Subject: Re: [MASSMAIL]Nutch not indexing all seed URLs Hi. Maybe one cause: Have you seen topN (fetchlist) parameter inside bin/crawl script (line 117) sizeFetchlist=`expr $numSlaves \* 50` this number could limit your url list. Also check your filters. Tell me if you have solved the problem ----- Mensaje original ----- De: "Chip Calhoun" <[email protected]> Para: [email protected] Enviados: Jueves, 11 de Mayo 2017 16:30:34 Asunto: [MASSMAIL]Nutch not indexing all seed URLs I'm using Nutch 1.12 to index a local site. To keep Nutch from indexing the uninteresting navigation pages on my site, I've made a URLs list of all the URLs I want crawled; the current list is 2522 URLs. However, the indexer stopped after just 1077 of these URLs. My generate.max.count is set to -1. What would cause my URLs to be skipped? Chip Calhoun Digital Archivist Niels Bohr Library & Archives American Institute of Physics One Physics Ellipse College Park, MD 20740-3840 USA Tel: +1 301-209-3180 Email: [email protected] https://www.aip.org/history-programs/niels-bohr-library La @universidad_uci es Fidel. Los jóvenes no fallaremos. #HastaSiempreComandante #HastalaVictoriaSiempre

