Re-crawling strategy

高睿 Thu, 07 Feb 2013 18:56:59 -0800

 Hi,

I'm using nutch 2.1 for dedicated crawling for several blogs.
In my urls folder, there are several blog article list page in seed.txt.
The blogs are not updated very frequently. I don't want to re-crawl the article 
content page once it is already crawled, but I want the article list to be 
crawled every time so that new article page could be found.


The parameter 'db.fetch.interval.default' is for this purpose, but I guess it 
will impact all urls including the article list page.
So, is there any way to specify the re-crawling strategy based on url?
Thanks.

Regards,
Rui

Re-crawling strategy

Reply via email to