Re: Continuous crawling

Florian Schmedding Sun, 05 Jan 2014 06:10:21 -0800

Hi Karl,

yes, in our case it is necessary to make sure that new documents are
discovered and indexed within a certain interval. I have created a feature
request on that. In the meantime we will try to use a scheduled job
instead.


Thanks for your help,
Florian


> Hi Florian,
>
> What you are seeing is "dynamic crawling" behavior.  The time between
> refetches of a document is based on the history of fetches of that
> document.  The recrawl interval is the initial time between document
> fetches, but if a document does not change, the interval for the document
> increases according to a formula.
>
> I would need to look at the code to be able to give you the precise
> formula, but if you need a limit on the amount of time between document
> fetch attempts, I suggest you create a ticket and I will look into adding
> that as a feature.
>
> Thanks,
> Karl
>
>
>
> On Sat, Jan 4, 2014 at 7:56 AM, Florian Schmedding <
> [email protected]> wrote:
>
>> Hello,
>>
>> the parameters reseed interval and recrawl interval of a continuous
>> crawling job are not quite clear to me. The documentation tells that the
>> reseed interval is the time after which the seeds are checked again, and
>> the recrawl interval is the time after which a document is checked for
>> changes.
>>
>> However, we observed that the recrawl interval for a document increases
>> after each check. On the other hand, the reseed interval seems to be set
>> up correctly in the database metadata about the seed documents. Yet the
>> web server does not receive requests at each time the interval elapses
>> but
>> only after several intervals have elapsed.
>>
>> We are using a web connector. The web server does not tell the client to
>> cache the documents. Any help would be appreciated.
>>
>> Best regards,
>> Florian
>>
>>
>>
>>
>

Re: Continuous crawling

Reply via email to