db.default.fetch.interval
db.fetch.schedule.adaptive.*
Tom
On 25/11/16 13:43, Vladimir Loubenski wrote:
Thank you Tom,
What the relevant config XML variables control it?
Thank you in advance,
Vladimir.
-----Original Message-----
From: Tom Chiverton [mailto:[email protected]]
Sent: November-25-16 2:31 AM
To: [email protected]
Subject: Re: Nutch 2.3.1 re-crawls unchanged web pages
I understand it's expected. Especially if the page is in the list of seeds.
You can control this by changing the relevant config XML variables.
On 24 November 2016 20:10:02 GMT+00:00, Vladimir Loubenski
<[email protected]> wrote:
Hi ,
I am using Nutch 2.3.1.
I run in loop generate, fetch, parse, updateDB steps.
I noted that during re-crawl even if a web page doesn't change nutch
doesn't detect it by value of ETag, Last-Modified or signature fields
and continue process all these steps for unchanged web pages.
Is it expected behaviour?
Are there plans to fix it in future releases?
Regards,
Vladimir.
______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________