Hi Karl, yes, in our case it is necessary to make sure that new documents are discovered and indexed within a certain interval. I have created a feature request on that. In the meantime we will try to use a scheduled job instead.
Thanks for your help, Florian > Hi Florian, > > What you are seeing is "dynamic crawling" behavior. The time between > refetches of a document is based on the history of fetches of that > document. The recrawl interval is the initial time between document > fetches, but if a document does not change, the interval for the document > increases according to a formula. > > I would need to look at the code to be able to give you the precise > formula, but if you need a limit on the amount of time between document > fetch attempts, I suggest you create a ticket and I will look into adding > that as a feature. > > Thanks, > Karl > > > > On Sat, Jan 4, 2014 at 7:56 AM, Florian Schmedding < > [email protected]> wrote: > >> Hello, >> >> the parameters reseed interval and recrawl interval of a continuous >> crawling job are not quite clear to me. The documentation tells that the >> reseed interval is the time after which the seeds are checked again, and >> the recrawl interval is the time after which a document is checked for >> changes. >> >> However, we observed that the recrawl interval for a document increases >> after each check. On the other hand, the reseed interval seems to be set >> up correctly in the database metadata about the seed documents. Yet the >> web server does not receive requests at each time the interval elapses >> but >> only after several intervals have elapsed. >> >> We are using a web connector. The web server does not tell the client to >> cache the documents. Any help would be appreciated. >> >> Best regards, >> Florian >> >> >> >> >
