Re: Question about db.default.fetch.interval.

2015-01-03 Thread Meraj A. Khan
Reposting my question.

Hi All,

I have a quick question regarding the db.default.fetch.interval
parameter , I have currently set it to 15 days , however my crawl
cycle itself  is going beyond 15 days and upto 30 days , now I was not
sure since I have set the db.default.fetch.interval to be only 15 days
, is there a possibility that even before a complete crawl is
completed , an already fetched page will get re-fetched before an
un-fetched page is fetched and there by fetching less number of
distinct pages.

I guess, I am trying to know if setting the db.default.fetch.interval
to a value less than the time it takes to do one complete crawl of the
web will  lead to some kind of infinite loop where the recently
fetched pages will be re-fetched before the completely un-fetched ones
because the value of the interval is less than the total crawl time ?


Thanks.

Thanks.

On Sun, Dec 28, 2014 at 11:18 AM, Meraj A. Khan mera...@gmail.com wrote:
 Hi All,

 I have a quick question regarding the db.default.fetch.interval
 parameter , I have currently set it to 15 days , however my crawl
 cycle itself  is going beyond 15 days and upto 30 days , now I was not
 sure since I have set the db.default.fetch.interval to be only 15 days
 , is there a possibility that even before a complete crawl is
 completed , an already fetched page will get re-fetched before an
 un-fetched page is fetched and there by fetching less number of
 distinct pages.

 I guess, I am trying to know if db.default.fetch.interval be set to
 at-least be greater than one comprehensive crawl cycle time .

 Thanks.


Question about db.default.fetch.interval.

2014-12-28 Thread Meraj A. Khan
Hi All,

I have a quick question regarding the db.default.fetch.interval
parameter , I have currently set it to 15 days , however my crawl
cycle itself  is going beyond 15 days and upto 30 days , now I was not
sure since I have set the db.default.fetch.interval to be only 15 days
, is there a possibility that even before a complete crawl is
completed , an already fetched page will get re-fetched before an
un-fetched page is fetched and there by fetching less number of
distinct pages.

I guess, I am trying to know if db.default.fetch.interval be set to
at-least be greater than one comprehensive crawl cycle time .

Thanks.