Reposting my question.
Hi All,
I have a quick question regarding the db.default.fetch.interval
parameter , I have currently set it to 15 days , however my crawl
cycle itself is going beyond 15 days and upto 30 days , now I was not
sure since I have set the db.default.fetch.interval to be only 15 days
, is there a possibility that even before a complete crawl is
completed , an already fetched page will get re-fetched before an
un-fetched page is fetched and there by fetching less number of
distinct pages.
I guess, I am trying to know if setting the db.default.fetch.interval
to a value less than the time it takes to do one complete crawl of the
web will lead to some kind of infinite loop where the recently
fetched pages will be re-fetched before the completely un-fetched ones
because the value of the interval is less than the total crawl time ?
Thanks.
Thanks.
On Sun, Dec 28, 2014 at 11:18 AM, Meraj A. Khan mera...@gmail.com wrote:
Hi All,
I have a quick question regarding the db.default.fetch.interval
parameter , I have currently set it to 15 days , however my crawl
cycle itself is going beyond 15 days and upto 30 days , now I was not
sure since I have set the db.default.fetch.interval to be only 15 days
, is there a possibility that even before a complete crawl is
completed , an already fetched page will get re-fetched before an
un-fetched page is fetched and there by fetching less number of
distinct pages.
I guess, I am trying to know if db.default.fetch.interval be set to
at-least be greater than one comprehensive crawl cycle time .
Thanks.