hi you can define the list page fetch interval time in your seed list text like this http://www.nutch.org/ \t nutch.score=10 \t nutch.fetchInterval=2592000 \t userType=open_source
On Friday, February 8, 2013, 高睿 wrote: > Hi, > > I'm using nutch 2.1 for dedicated crawling for several blogs. > In my urls folder, there are several blog article list page in seed.txt. > The blogs are not updated very frequently. I don't want to re-crawl the > article content page once it is already crawled, but I want the article > list to be crawled every time so that new article page could be found. > > The parameter 'db.fetch.interval.default' is for this purpose, but I guess > it will impact all urls including the article list page. > So, is there any way to specify the re-crawling strategy based on url? > Thanks. > > Regards, > Rui > -- Don't Grow Old, Grow Up... :-)

