I was under the impression that setting topN for crawl cycles would limit the number of items each iteration of the crawl would fetch/parse. However, eventually after continuously running crawl cycles it would get ALL the urls. My continuous crawl has stopped fetching/parsing and the stats from crawldb indicate that db_unfetched is 133,359.
Why is it no longer fetching urls if there are so many unfetched? -- View this message in context: http://lucene.472066.n3.nabble.com/db-unfetched-large-number-but-crawling-not-fetching-any-longer-tp3851587p3851587.html Sent from the Nutch - User mailing list archive at Nabble.com.