Hi Karl, the documentation states: "it [Document Stuffer Thread] stuffs some number that is a multiple of the number of Worker Threads at one time, up to some limit (which is also related to the number of Worker Threads)."
here is the worker threads config: <property name="org.apache.manifoldcf.database.maxhandles" value="200"/> <property name="org.apache.manifoldcf.crawler.threads" value="100"/> So, I'd expect, that the stuffer thread would put about 200 docs in the queue, but it puts only 2... DEBUG 2016-07-19 13:24:02,916 (Stuffer thread) - Document stuffer thread woke up DEBUG 2016-07-19 13:24:02,963 (Stuffer thread) - Stuffing document 'smb://...' that has priority 10.284489143231797 onto active list DEBUG 2016-07-19 13:24:02,963 (Stuffer thread) - Stuffing document 'smb://...' that has priority 10.28520622168017 onto active list DEBUG 2016-07-19 13:24:02,963 (Set priority thread) - Reprioritizing 1 documents DEBUG 2016-07-19 13:24:02,979 (Set priority thread) - Reprioritizing 1 documents DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Setting new minimum depth value to 29273.99867577113 DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Stuffer thread: Found 2 documents to queue DEBUG 2016-07-19 13:24:02,979 (Set priority thread) - Reprioritizing 1 documents DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Putting document 'smb://...' with bins [SERVER ] onto active queue DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Putting document 'smb://...' with bins [SERVER ] onto active queue DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Document stuffer thread woke up Or did I miss something? Thanks! 2016-07-19 12:51 GMT+02:00 Karl Wright <[email protected]>: > Hi Konstantin, > > The stuffer thread operates independently of the worker threads. It wakes > up, and if anything is available, stuffs as much as it can, and when done > sleeps for a short time. The queue is maintained at at least twice the > number of active worker threads. See ManifoldCF in Action. > > Karl > > > On Tue, Jul 19, 2016 at 6:33 AM, jetnet <[email protected]> wrote: > >> hi All, >> >> I've encountered recently an issue with the crawler (JCIFS connector): >> when a jobs gets started, all it's documents are being checked, and this >> process is taking too long. After tuning DEBUG on, I found, that there are >> ~2 seconds delay when processing the document queue: >> >> e.g.: >> >> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - JCIFS: Leaving >> wouldFileBeIncluded for 'smb://... >> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Worker thread done >> processing 1 documents >> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Adding >> 1453999232278 to finishList >> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Adding >> 1453999232278 to ingesterCheckList >> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Finishing documents >> {1453999232278 } >> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Requeueing >> documents due to carrydown {} >> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Requeuing >> {1453999232278 } >> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Deleting {} >> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Hopcount removal {} >> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Rescanning documents >> {} >> DEBUG 2016-07-19 10:31:06,500 (Stuffer thread) - Stuffer thread: Found 0 >> documents to queue >> DEBUG 2016-07-19 10:31:06,750 (Document cleanup stuffer thread) - >> Document cleanup stuffer thread woke up >> DEBUG 2016-07-19 10:31:06,750 (Document delete stuffer thread) - Document >> delete stuffer thread woke up >> DEBUG 2016-07-19 10:31:06,750 (Document cleanup stuffer thread) - >> Document cleanup stuffer thread found nothing to do >> DEBUG 2016-07-19 10:31:06,750 (Document delete stuffer thread) - Document >> delete stuffer thread found nothing to do >> DEBUG 2016-07-19 10:31:07,375 (Set priority thread) - Done reprioritizing >> because no more documents to reprioritize >> DEBUG 2016-07-19 10:31:07,750 (Document cleanup stuffer thread) - >> Document cleanup stuffer thread woke up >> DEBUG 2016-07-19 10:31:07,750 (Document delete stuffer thread) - Document >> delete stuffer thread woke up >> DEBUG 2016-07-19 10:31:07,750 (Document delete stuffer thread) - Document >> delete stuffer thread found nothing to do >> DEBUG 2016-07-19 10:31:07,750 (Document cleanup stuffer thread) - >> Document cleanup stuffer thread found nothing to do >> DEBUG 2016-07-19 10:31:08,500 (Stuffer thread) - Document stuffer thread >> woke up >> DEBUG 2016-07-19 10:31:08,516 (Stuffer thread) - Stuffer thread: Found 2 >> documents to queue >> DEBUG 2016-07-19 10:31:08,531 (Stuffer thread) - Document stuffer thread >> woke up >> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Worker thread >> processing documents: 1453999191642 >> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Worker thread >> processing documents: 1453999188326 >> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Worker thread >> starting document count is 1 >> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Worker thread >> starting document count is 1 >> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Post-relationship >> document count is 1 >> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Post-relationship >> document count is 1 >> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Post-hopcount >> pruned document count is 1 >> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Post-hopcount >> pruned document count is 1 >> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Worker thread about >> to process {1453999191642 } >> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Worker thread about >> to process {1453999188326 } >> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - JCIFS: Processing >> 'smb://... >> >> >> As one can see, there is a delay between the thread-61 has left the >> checks and next 2 threads have started. >> Any idea why that happens? Why the document stuffer thread does not wake >> up immediately? Or - why the "document queue batch size" is only 2? >> >> P.S. MCF version 2.3 >> >> Thanks! >> >> -- >> Konstantin >> > >
