Hi,

I'm using nutch 2.1 for dedicated crawling for several blogs.
In my urls folder, there are several blog article list page in seed.txt.
The blogs are not updated very frequently. I don't want to re-crawl the article 
content page once it is already crawled, but I want the article list to be 
crawled every time so that new article page could be found.

The parameter 'db.fetch.interval.default' is for this purpose, but I guess it 
will impact all urls including the article list page.
So, is there any way to specify the re-crawling strategy based on url?
Thanks.

Regards,
Rui

Reply via email to