NutchDeveloper pisze:
You can fetch unfetched urls and those expired when you dont use -topN switchI use this script to crawl and recrawl web: http://wiki.apache.org/nutch/CrawlI noticed that database grow very slow (depth=2, topn=1000, adddays=30) because it fetches the same urls several times in different recrawl loops. What I should do to force Nutch to fetch ONLY unfetched urls from crawldb?
-topN gets only those with the higher score Bartosz
