Reposting my question. Hi All,
I have a quick question regarding the db.default.fetch.interval parameter , I have currently set it to 15 days , however my crawl cycle itself is going beyond 15 days and upto 30 days , now I was not sure since I have set the db.default.fetch.interval to be only 15 days , is there a possibility that even before a complete crawl is completed , an already fetched page will get re-fetched before an un-fetched page is fetched and there by fetching less number of distinct pages. I guess, I am trying to know if setting the db.default.fetch.interval to a value less than the time it takes to do one complete crawl of the web will lead to some kind of infinite loop where the recently fetched pages will be re-fetched before the completely un-fetched ones because the value of the interval is less than the total crawl time ? Thanks. Thanks. On Sun, Dec 28, 2014 at 11:18 AM, Meraj A. Khan <mera...@gmail.com> wrote: > Hi All, > > I have a quick question regarding the db.default.fetch.interval > parameter , I have currently set it to 15 days , however my crawl > cycle itself is going beyond 15 days and upto 30 days , now I was not > sure since I have set the db.default.fetch.interval to be only 15 days > , is there a possibility that even before a complete crawl is > completed , an already fetched page will get re-fetched before an > un-fetched page is fetched and there by fetching less number of > distinct pages. > > I guess, I am trying to know if db.default.fetch.interval be set to > at-least be greater than one comprehensive crawl cycle time . > > Thanks.