Thanks Karl. The stuffer thread query isn't doing too badly. Judging by stats from the pg_stat_activity table in postgresql, the stuffer query usually takes < 2 seconds to return.
>> In a continuous job, documents may well be scheduled to be crawled at some time in the future, and are ineligible for crawling until that future time arrives. Such documents would be excluded by the stuffer query, right? Thanks for the pointer to the queue status page. Using the root server name as an identifier class, I get the bulk of documents grouped under the "About to Process" and "Waiting for Processing" categories. For example, I have a job with 677,856 and 102,342 docs respecitvely. Another job has 320,804 and 443,596 doc respectively. All other status categories have 0 docs. >> If there are tons of idle worker threads AND your stuffer thread is waiting on Postgresql, that's a good sign it is not keeping up due to database reasons. Interestingly, the stuffer thread spends the majority of its time trying to acquire the stuffer lock. I have 3 nodes in the cluster and each node's stuffer thread spends ~ 2/3 of its time blocked waiting for the lock. Of course the SQL query itself and connection grabbing/releasing all happen within the scope of the lock. The effect is that the more nodes there are in the cluster, the less time each node has for stuffing documents.
