Hi Karl, these are the values: Priority: 5 Start method: Start at beginning of schedule window Schedule type: Scan every document once Minimum recrawl interval: Not applicable Expiration interval: Not applicable Reseed interval: Not applicable Scheduled time: Any day of week at 12 am 1 am 2 am 3 am 4 am 5 am 6 am 7 am 8 am 9 am 10 am 11 am 12 pm 1 pm 2 pm 3 pm 4 pm 5 pm 6 pm 7 pm 8 pm 9 pm 10 pm 11 pm Maximum run time: No limit Job invocation: Complete
Maybe it is because I've changed the job from continuous crawling to this schedule. I started it a few times manually, too. I couldn't notice anything strange in the job setup or in the respective entries in the database. Regards, Florian > Hi Florian, > > I was unable to reproduce the behavior you described. > > Could you view your job, and post a screen shot of that page? I want to > see what your schedule record(s) look like. > > Thanks, > Karl > > > > On Tue, Jan 14, 2014 at 6:09 AM, Karl Wright <[email protected]> wrote: > >> Hi Florian, >> >> I've never noted this behavior before. I'll see if I can reproduce it >> here. >> >> Karl >> >> >> >> On Tue, Jan 14, 2014 at 5:36 AM, Florian Schmedding < >> [email protected]> wrote: >> >>> Hi Karl, >>> >>> the scheduled job seems to work as expecetd. However, it runs two >>> times: >>> It starts at the beginning of the scheduled time, finishes, and >>> immediately starts again. After finishing the second run it waits for >>> the >>> next scheduled time. Why does it run two times? The start method is >>> "Start >>> at beginning of schedule window". >>> >>> Yes, you're right about the checking guarantee. Currently, our interval >>> is >>> long enough for a complete crawler run. >>> >>> Best, >>> Florian >>> >>> >>> > Hi Florian, >>> > >>> > It is impossible to *guarantee* that a document will be checked, >>> because >>> > if >>> > load on the crawler is high enough, it will fall behind. But I will >>> look >>> > into adding the feature you request. >>> > >>> > Karl >>> > >>> > >>> > On Sun, Jan 5, 2014 at 9:08 AM, Florian Schmedding < >>> > [email protected]> wrote: >>> > >>> >> Hi Karl, >>> >> >>> >> yes, in our case it is necessary to make sure that new documents are >>> >> discovered and indexed within a certain interval. I have created a >>> >> feature >>> >> request on that. In the meantime we will try to use a scheduled job >>> >> instead. >>> >> >>> >> Thanks for your help, >>> >> Florian >>> >> >>> >> >>> >> > Hi Florian, >>> >> > >>> >> > What you are seeing is "dynamic crawling" behavior. The time >>> between >>> >> > refetches of a document is based on the history of fetches of that >>> >> > document. The recrawl interval is the initial time between >>> document >>> >> > fetches, but if a document does not change, the interval for the >>> >> document >>> >> > increases according to a formula. >>> >> > >>> >> > I would need to look at the code to be able to give you the >>> precise >>> >> > formula, but if you need a limit on the amount of time between >>> >> document >>> >> > fetch attempts, I suggest you create a ticket and I will look into >>> >> adding >>> >> > that as a feature. >>> >> > >>> >> > Thanks, >>> >> > Karl >>> >> > >>> >> > >>> >> > >>> >> > On Sat, Jan 4, 2014 at 7:56 AM, Florian Schmedding < >>> >> > [email protected]> wrote: >>> >> > >>> >> >> Hello, >>> >> >> >>> >> >> the parameters reseed interval and recrawl interval of a >>> continuous >>> >> >> crawling job are not quite clear to me. The documentation tells >>> that >>> >> the >>> >> >> reseed interval is the time after which the seeds are checked >>> again, >>> >> and >>> >> >> the recrawl interval is the time after which a document is >>> checked >>> >> for >>> >> >> changes. >>> >> >> >>> >> >> However, we observed that the recrawl interval for a document >>> >> increases >>> >> >> after each check. On the other hand, the reseed interval seems to >>> be >>> >> set >>> >> >> up correctly in the database metadata about the seed documents. >>> Yet >>> >> the >>> >> >> web server does not receive requests at each time the interval >>> >> elapses >>> >> >> but >>> >> >> only after several intervals have elapsed. >>> >> >> >>> >> >> We are using a web connector. The web server does not tell the >>> client >>> >> to >>> >> >> cache the documents. Any help would be appreciated. >>> >> >> >>> >> >> Best regards, >>> >> >> Florian >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> > >>> >> >>> >> >>> >> >>> > >>> >>> >>> >> >
