I'm using Nutch 1.12 to index a local site. To keep Nutch from indexing the uninteresting navigation pages on my site, I've made a URLs list of all the URLs I want crawled; the current list is 2522 URLs. However, the indexer stopped after just 1077 of these URLs. My generate.max.count is set to -1. What would cause my URLs to be skipped?
Chip Calhoun Digital Archivist Niels Bohr Library & Archives American Institute of Physics One Physics Ellipse College Park, MD 20740-3840 USA Tel: +1 301-209-3180 Email: [email protected] https://www.aip.org/history-programs/niels-bohr-library

