Hi Florian, It is impossible to *guarantee* that a document will be checked, because if load on the crawler is high enough, it will fall behind. But I will look into adding the feature you request.
Karl On Sun, Jan 5, 2014 at 9:08 AM, Florian Schmedding < [email protected]> wrote: > Hi Karl, > > yes, in our case it is necessary to make sure that new documents are > discovered and indexed within a certain interval. I have created a feature > request on that. In the meantime we will try to use a scheduled job > instead. > > Thanks for your help, > Florian > > > > Hi Florian, > > > > What you are seeing is "dynamic crawling" behavior. The time between > > refetches of a document is based on the history of fetches of that > > document. The recrawl interval is the initial time between document > > fetches, but if a document does not change, the interval for the document > > increases according to a formula. > > > > I would need to look at the code to be able to give you the precise > > formula, but if you need a limit on the amount of time between document > > fetch attempts, I suggest you create a ticket and I will look into adding > > that as a feature. > > > > Thanks, > > Karl > > > > > > > > On Sat, Jan 4, 2014 at 7:56 AM, Florian Schmedding < > > [email protected]> wrote: > > > >> Hello, > >> > >> the parameters reseed interval and recrawl interval of a continuous > >> crawling job are not quite clear to me. The documentation tells that the > >> reseed interval is the time after which the seeds are checked again, and > >> the recrawl interval is the time after which a document is checked for > >> changes. > >> > >> However, we observed that the recrawl interval for a document increases > >> after each check. On the other hand, the reseed interval seems to be set > >> up correctly in the database metadata about the seed documents. Yet the > >> web server does not receive requests at each time the interval elapses > >> but > >> only after several intervals have elapsed. > >> > >> We are using a web connector. The web server does not tell the client to > >> cache the documents. Any help would be appreciated. > >> > >> Best regards, > >> Florian > >> > >> > >> > >> > > > > >
