In order to achieve the most timely crawling of news sites, I want to be able to manipulate the refetch intervals and scores in the crawl db. I thought I could accomplish that by re-injecting the urls that should be re-fetched most often. According to the documentation, it seems I should be able to do that using the db.injector.overwrite property. However, it does not actually work for me.
Here is the injection command I use: $NUTCH_HOME/runtime/deploy/bin/nutch inject -D db.score.injected=10 -D db.injector.overwrite=true -D db.fetch.interval.default=1800 /crawls/news0/data/crawldb /crawls/news0/seeds/reuters.txt After re-injecting, I inspect the crawldb dump and see that the intervals and scores have not been overwritten. I have also tried db.injector.overwrite=true, with similar results. I suspect that my db.fetch.interval.default does not affect existing urls. Is there any way to change the refetch intervals of existing urls? For a test case, one could inject a few of the following urls, crawl several iterations, and then inject all of them. The result should be that all of them have the 1800 interval. http://mobile.reuters.com/ http://mobile.reuters.com/business http://mobile.reuters.com/finance http://mobile.reuters.com/news/entertainment http://mobile.reuters.com/news/entertainment/arts http://mobile.reuters.com/news/environment http://mobile.reuters.com/news/health http://mobile.reuters.com/news/lifestyle http://mobile.reuters.com/news/oddlyEnough http://mobile.reuters.com/news/science http://mobile.reuters.com/news/sports http://mobile.reuters.com/news/technology http://mobile.reuters.com/news/us http://mobile.reuters.com/news/world http://mobile.reuters.com/politics http://www.reuters.com/subjects/healthcare https://www.reuters.com/ https://www.reuters.com/energy-environment https://www.reuters.com/finance https://www.reuters.com/money https://www.reuters.com/news/entertainment https://www.reuters.com/news/health https://www.reuters.com/news/technology https://www.reuters.com/news/world https://www.reuters.com/politics

