hi Karl, it's a shame, but I must say - the problem was sitting in front of the monitor :) I was sure, the jobs were in non-continuous mode, and didn't look at this, but the scheduler's setting was: "rescan dynamically". Since the bin's throttling was set to 60 docs/min, the stuffer thread was doing everything right - it woke up every 2 sec and put 2 docs into the memory queue. Please correct me if I'm wrong. The world is OK again :) Thank you very much for helping with this question. -- Konstantin
2016-07-21 0:18 GMT+02:00 Karl Wright <[email protected]>: > "Another observation from the document status - when they get status > "waiting for processing" their "Scheduled time" are in the future - "job > restart time + 1 day". That could be the reason, why the job does not exit > - there are still active document, but not yet ready for processing." > > This happens ONLY because the document is getting requeued. A document is > requeued ONLY because of (a) errors, (b) the job is running in "continuous" > mode. > > The 24 hour period is suspicious because that's the default recrawl time > in continuous mode. Are you running the job in continuous mode? > > Karl > > > On Wed, Jul 20, 2016 at 4:56 PM, jetnet <[email protected]> wrote: > >> We use a Postgres instance, the crawler job is a windows share job. There >> are 3 windows file share job using the same repository. And all of them >> have the same issue. >> So, I've stopped 2 jobs and currently debugging the last one. >> Another observation from the document status - when they get status >> "waiting for processing" their "Scheduled time" are in the future - "job >> restart time + 1 day". That could be the reason, why the job does not exit >> - there are still active document, but not yet ready for processing. >> >> Thanks! >> >> >> 2016-07-20 22:08 GMT+02:00 Karl Wright <[email protected]>: >> >>> Hi, >>> >>> Can you provide more details about your setup, etc.? Specifically: >>> >>> - what database is this? >>> - how many jobs are there? >>> - are there any jobs that have lots of documents that are in an unusual >>> state? >>> >>> The query that the stuffer thread runs is not magic, and if it returns >>> only a couple of documents every now and then there must be a reason. I do >>> not see the kind of behavior you are describing here. >>> >>> Karl >>> >>> >>> On Wed, Jul 20, 2016 at 3:35 PM, jetnet <[email protected]> wrote: >>> >>>> hi Karl, >>>> >>>> actually, there are no failed documents at all. All documents (without >>>> exception) have status "Waiting for Processing" and the job is hanging >>>> around with status "Running", but nothing happens. >>>> When the job gets restarted, then all docs become status "About to >>>> Process", and then slowly (2 docs / 2 seconds) moving to "Waiting for >>>> Processing" again. >>>> I could delete the job and re-create it again, but I'd like to >>>> understand the root cause of the issue. >>>> >>>> Any hints where I could look else very much appreciated! >>>> Thanks! >>>> >>>> 2016-07-19 15:31 GMT+02:00 Karl Wright <[email protected]>: >>>> >>>>> Hi Konstantin, >>>>> >>>>> It's not possible to figure out from the log why the stuffer thread >>>>> queued only two documents at that time. Probably it's because there were >>>>> no other eligible documents. Documents in the queue have a priority and >>>>> they also have a minimum time they can be acted upon. If you have >>>>> documents that failed earlier, they get requeued depending on the >>>>> particular kind of failure (up to the connector) with specific times. >>>>> There's no way to make the queue resolve "faster" if it's full of >>>>> documents >>>>> that were requeued in this way due to errors. >>>>> >>>>> You can figure all this out by using the Document Status and Queue >>>>> Status reports. >>>>> >>>>> Karl >>>>> >>>>> >>>>> On Tue, Jul 19, 2016 at 7:29 AM, jetnet <[email protected]> wrote: >>>>> >>>>>> Hi Karl, >>>>>> >>>>>> the documentation states: "it [Document Stuffer Thread] stuffs some >>>>>> number that is a >>>>>> multiple of the number of Worker Threads at one time, up to some >>>>>> limit (which is also >>>>>> related to the number of Worker Threads)." >>>>>> >>>>>> here is the worker threads config: >>>>>> >>>>>> <property name="org.apache.manifoldcf.database.maxhandles" >>>>>> value="200"/> >>>>>> <property name="org.apache.manifoldcf.crawler.threads" value="100"/> >>>>>> >>>>>> >>>>>> So, I'd expect, that the stuffer thread would put about 200 docs in >>>>>> the queue, but it puts only 2... >>>>>> >>>>>> DEBUG 2016-07-19 13:24:02,916 (Stuffer thread) - Document stuffer >>>>>> thread woke up >>>>>> DEBUG 2016-07-19 13:24:02,963 (Stuffer thread) - Stuffing document >>>>>> 'smb://...' that has priority 10.284489143231797 onto active list >>>>>> DEBUG 2016-07-19 13:24:02,963 (Stuffer thread) - Stuffing document >>>>>> 'smb://...' that has priority 10.28520622168017 onto active list >>>>>> DEBUG 2016-07-19 13:24:02,963 (Set priority thread) - Reprioritizing >>>>>> 1 documents >>>>>> DEBUG 2016-07-19 13:24:02,979 (Set priority thread) - Reprioritizing >>>>>> 1 documents >>>>>> DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Setting new minimum >>>>>> depth value to 29273.99867577113 >>>>>> DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Stuffer thread: >>>>>> Found 2 documents to queue >>>>>> DEBUG 2016-07-19 13:24:02,979 (Set priority thread) - Reprioritizing >>>>>> 1 documents >>>>>> DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Putting document >>>>>> 'smb://...' with bins [SERVER ] onto active queue >>>>>> DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Putting document >>>>>> 'smb://...' with bins [SERVER ] onto active queue >>>>>> DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Document stuffer >>>>>> thread woke up >>>>>> >>>>>> Or did I miss something? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> >>>>>> 2016-07-19 12:51 GMT+02:00 Karl Wright <[email protected]>: >>>>>> >>>>>>> Hi Konstantin, >>>>>>> >>>>>>> The stuffer thread operates independently of the worker threads. It >>>>>>> wakes up, and if anything is available, stuffs as much as it can, and >>>>>>> when >>>>>>> done sleeps for a short time. The queue is maintained at at least twice >>>>>>> the number of active worker threads. See ManifoldCF in Action. >>>>>>> >>>>>>> Karl >>>>>>> >>>>>>> >>>>>>> On Tue, Jul 19, 2016 at 6:33 AM, jetnet <[email protected]> wrote: >>>>>>> >>>>>>>> hi All, >>>>>>>> >>>>>>>> I've encountered recently an issue with the crawler (JCIFS >>>>>>>> connector): when a jobs gets started, all it's documents are being >>>>>>>> checked, >>>>>>>> and this process is taking too long. After tuning DEBUG on, I found, >>>>>>>> that >>>>>>>> there are ~2 seconds delay when processing the document queue: >>>>>>>> >>>>>>>> e.g.: >>>>>>>> >>>>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - JCIFS: Leaving >>>>>>>> wouldFileBeIncluded for 'smb://... >>>>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Worker thread >>>>>>>> done processing 1 documents >>>>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Adding >>>>>>>> 1453999232278 to finishList >>>>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Adding >>>>>>>> 1453999232278 to ingesterCheckList >>>>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Finishing >>>>>>>> documents {1453999232278 } >>>>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Requeueing >>>>>>>> documents due to carrydown {} >>>>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Requeuing >>>>>>>> {1453999232278 } >>>>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Deleting {} >>>>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Hopcount >>>>>>>> removal {} >>>>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Rescanning >>>>>>>> documents {} >>>>>>>> DEBUG 2016-07-19 10:31:06,500 (Stuffer thread) - Stuffer thread: >>>>>>>> Found 0 documents to queue >>>>>>>> DEBUG 2016-07-19 10:31:06,750 (Document cleanup stuffer thread) - >>>>>>>> Document cleanup stuffer thread woke up >>>>>>>> DEBUG 2016-07-19 10:31:06,750 (Document delete stuffer thread) - >>>>>>>> Document delete stuffer thread woke up >>>>>>>> DEBUG 2016-07-19 10:31:06,750 (Document cleanup stuffer thread) - >>>>>>>> Document cleanup stuffer thread found nothing to do >>>>>>>> DEBUG 2016-07-19 10:31:06,750 (Document delete stuffer thread) - >>>>>>>> Document delete stuffer thread found nothing to do >>>>>>>> DEBUG 2016-07-19 10:31:07,375 (Set priority thread) - Done >>>>>>>> reprioritizing because no more documents to reprioritize >>>>>>>> DEBUG 2016-07-19 10:31:07,750 (Document cleanup stuffer thread) - >>>>>>>> Document cleanup stuffer thread woke up >>>>>>>> DEBUG 2016-07-19 10:31:07,750 (Document delete stuffer thread) - >>>>>>>> Document delete stuffer thread woke up >>>>>>>> DEBUG 2016-07-19 10:31:07,750 (Document delete stuffer thread) - >>>>>>>> Document delete stuffer thread found nothing to do >>>>>>>> DEBUG 2016-07-19 10:31:07,750 (Document cleanup stuffer thread) - >>>>>>>> Document cleanup stuffer thread found nothing to do >>>>>>>> DEBUG 2016-07-19 10:31:08,500 (Stuffer thread) - Document stuffer >>>>>>>> thread woke up >>>>>>>> DEBUG 2016-07-19 10:31:08,516 (Stuffer thread) - Stuffer thread: >>>>>>>> Found 2 documents to queue >>>>>>>> DEBUG 2016-07-19 10:31:08,531 (Stuffer thread) - Document stuffer >>>>>>>> thread woke up >>>>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Worker thread >>>>>>>> processing documents: 1453999191642 >>>>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Worker thread >>>>>>>> processing documents: 1453999188326 >>>>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Worker thread >>>>>>>> starting document count is 1 >>>>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Worker thread >>>>>>>> starting document count is 1 >>>>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - >>>>>>>> Post-relationship document count is 1 >>>>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - >>>>>>>> Post-relationship document count is 1 >>>>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Post-hopcount >>>>>>>> pruned document count is 1 >>>>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Post-hopcount >>>>>>>> pruned document count is 1 >>>>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Worker thread >>>>>>>> about to process {1453999191642 } >>>>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Worker thread >>>>>>>> about to process {1453999188326 } >>>>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - JCIFS: >>>>>>>> Processing 'smb://... >>>>>>>> >>>>>>>> >>>>>>>> As one can see, there is a delay between the thread-61 has left the >>>>>>>> checks and next 2 threads have started. >>>>>>>> Any idea why that happens? Why the document stuffer thread does not >>>>>>>> wake up immediately? Or - why the "document queue batch size" is only >>>>>>>> 2? >>>>>>>> >>>>>>>> P.S. MCF version 2.3 >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> -- >>>>>>>> Konstantin >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
