I was under the impression that setting topN for crawl cycles would limit the
number of items each iteration of the crawl would fetch/parse.  However,
eventually after continuously running crawl cycles it would get ALL the
urls.  My continuous crawl has stopped fetching/parsing and the stats from
crawldb indicate that db_unfetched is 133,359. 

Why is it no longer fetching urls if there are so many unfetched?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/db-unfetched-large-number-but-crawling-not-fetching-any-longer-tp3851587p3851587.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to