Hi, I want to crawl and build index for several domains by nutch's intranet crawling method. Since those domains update from time to time, I want to re-crawl them periodically but with different frequencies. Say, for domain A, I re-crawl it every week, but for domain B, re-crawling is done every other day, for example. Two questions here 1) When I do crawling with the same direction, old index is completely removed. Is there any way I can just update the crawled URLs from the existing index? 2) How to set different crawling frequency for different domains? Should I crawl them individually, and merge them? Or I can configure it in nutch?
Many thanks!
