yes, I use the recommended db settings: shared_buffers = 1024MB # min 128kB
The indexes: select * from pg_indexes where tablename = 'jobqueue'; "public";"jobqueue";"jobqueue_pkey";"";"CREATE UNIQUE INDEX jobqueue_pkey ON jobqueue USING btree (id)" "public";"jobqueue";"i1459943614205";"";"CREATE INDEX i1459943614205 ON jobqueue USING btree (jobid, status)" "public";"jobqueue";"i1459943614204";"";"CREATE INDEX i1459943614204 ON jobqueue USING btree (needpriority)" "public";"jobqueue";"i1459943614203";"";"CREATE INDEX i1459943614203 ON jobqueue USING btree (isseed, jobid)" "public";"jobqueue";"i1459943614202";"";"CREATE INDEX i1459943614202 ON jobqueue USING btree (failtime, jobid)" "public";"jobqueue";"i1459943614201";"";"CREATE INDEX i1459943614201 ON jobqueue USING btree (status, checkaction, checktime)" "public";"jobqueue";"i1459943614200";"";"CREATE INDEX i1459943614200 ON jobqueue USING btree (docpriority, status, checkaction, checktime)" "public";"jobqueue";"i1459943614199";"";"CREATE UNIQUE INDEX i1459943614199 ON jobqueue USING btree (dochash, jobid)" Thanks! Konstantin 2016-04-27 14:57 GMT+02:00 Karl Wright <[email protected]>: > Also, have you looked at increasing the amount of shared buffers in your > postgresql configuration? See the "how-to-build-and-deploy" page for that; > it can make a lot of difference for MCF. So even if the plan is not perfect > the database simply performs better. > > Karl > > On Wed, Apr 27, 2016 at 8:53 AM, Karl Wright <[email protected]> wrote: >> >> Hi Konstantin, >> >> Can you list the indexes defined on the jobqueue table? >> >> The query we do for stuffing is precisely aligned with one of the indexes >> we create on the jobqueue table. It should be doing an ordered index read >> with complex filtering conditions; instead it is doing this: >> >> WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >> -> Bitmap Heap Scan on jobqueue t0 (cost=2223.73..3352.95 >> rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1) >> >> ... with a hash join and a hash sort. >> >> Postgres is being way too smart for its own good here. Probably that's at >> least partly because its time estimates for its chosen plan are far less >> than reality. There may be a way to configure postgresql not to make that >> mistake but I don't know offhand how you do it. >> >> In order for us to be able to control the plan via hints, which would make >> this problem go away once and for all, you need the version of Postgres that >> you pay for. Sigh. >> >> Anyhow, let's verify that the right indexes are in place, at least... >> >> Karl >> >> >> On Wed, Apr 27, 2016 at 8:38 AM, jetnet <[email protected]> wrote: >>> >>> yet another long running query: >>> >>> WARN 2016-04-27 14:23:31,562 (Agents thread) - Found a long-running >>> query (215465 ms): [UPDATE jobqueue SET >>> docpriority=?,needpriorityprocessid=NULL,needpriority=? WHERE >>> docpriority<?] >>> WARN 2016-04-27 14:23:31,562 (Agents thread) - Parameter 0: >>> '1.000000001E9' >>> WARN 2016-04-27 14:23:31,562 (Agents thread) - Parameter 1: 'T' >>> WARN 2016-04-27 14:23:31,562 (Agents thread) - Parameter 2: >>> '1.000000001E9' >>> WARN 2016-04-27 14:23:31,562 (Agents thread) - Plan: Update on >>> jobqueue (cost=0.56..37802.01 rows=19647 width=352) >>> WARN 2016-04-27 14:23:31,562 (Agents thread) - Plan: -> Index >>> Scan using i1459943614200 on jobqueue (cost=0.56..37802.01 rows=19647 >>> width=352) >>> WARN 2016-04-27 14:23:31,562 (Agents thread) - Plan: Index >>> Cond: (docpriority < 1000000001::double precision) >>> WARN 2016-04-27 14:23:31,562 (Agents thread) - >>> WARN 2016-04-27 14:23:31,578 (Agents thread) - Stats: n_distinct=4.0 >>> most_common_vals={Z,C,G,P} >>> most_common_freqs={0.64193332,0.21796666,0.13086666,0.0092333332} >>> >>> >>> >>> 2016-04-27 14:32 GMT+02:00 jetnet <[email protected]>: >>> > Thank you for the fast reply! >>> > >>> > I provided the query to show, how many docs have been processed by the >>> > instances so far. >>> > I do not use UI to check the jobs statues - it takes 10-15 minutes to >>> > display the page :( (ui.maxstatuscount=100.000). >>> > The full vacuum and re-index database are running as a scheduled job >>> > on a daily basis. >>> > So, just tried to analyze the "problem" table - jobqueue, but I don't >>> > think it's helped a lot, the same "status" query took 250 second now, >>> > improved, but it's still very long. >>> > >>> > The long-running queries reported in the log are all about jobqueue >>> > table, e.g.: >>> > >>> > WARN 2016-04-27 13:49:48,596 (Stuffer thread) - Found a long-running >>> > query (75328 ms): [SELECT >>> > t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount >>> > FROM jobqueue t0 WHERE t0.docpriority<? AND (t0.status=? OR >>> > t0.status=?) AND t0.checkaction=? AND t0.checktime<=? AND >>> > EXISTS(SELECT 'x' FROM jobs t1 WHERE (t1.status=? OR t1.status=?) AND >>> > t1.id=t0.jobid AND t1.priority=?) AND NOT EXISTS(SELECT 'x' FROM >>> > jobqueue t2 WHERE t2.dochash=t0.dochash AND (t2.status=? OR >>> > t2.status=? OR t2.status=? OR t2.status=? OR t2.status=? OR >>> > t2.status=?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM >>> > prereqevents t3,events t4 WHERE t0.id=t3.owner AND >>> > t3.eventname=t4.name) ORDER BY t0.docpriority ASC LIMIT 400] >>> > WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 0: >>> > '1.000000001E9' >>> > WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 1: 'P' >>> > WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 2: 'G' >>> > WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 3: 'R' >>> > WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 4: >>> > '1461757713112' >>> > WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 5: 'A' >>> > WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 6: 'a' >>> > WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 7: '5' >>> > WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 8: 'A' >>> > WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 9: 'F' >>> > WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 10: 'a' >>> > WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 11: 'f' >>> > WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 12: 'D' >>> > WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 13: 'd' >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: Limit >>> > (cost=4016.80..4016.98 rows=72 width=235) (actual >>> > time=738.775..738.842 rows=400 loops=1) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: -> Sort >>> > (cost=4016.80..4016.98 rows=72 width=235) (actual >>> > time=738.774..738.792 rows=400 loops=1) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: Sort >>> > Key: t0.docpriority >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: Sort >>> > Method: top-N heapsort Memory: 139kB >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: -> >>> > Nested Loop Anti Join (cost=2252.06..4014.58 rows=72 width=235) >>> > (actual time=46.841..723.735 rows=20215 loops=1) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > -> Hash Anti Join (cost=2251.51..3387.34 rows=72 width=235) (actual >>> > time=46.770..108.460 rows=20215 loops=1) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > Hash Cond: (t0.id = t3.owner) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > -> Nested Loop (cost=2223.73..3358.66 rows=72 width=235) >>> > (actual time=46.747..102.890 rows=20215 loops=1) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > Join Filter: (t0.jobid = t1.id) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > Rows Removed by Join Filter: 643 >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > -> Seq Scan on jobs t1 (cost=0.00..2.14 rows=1 width=8) >>> > (actual time=0.015..0.024 rows=1 loops=1) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > Filter: ((priority = 5::bigint) AND ((status = >>> > 'A'::bpchar) OR (status = 'a'::bpchar))) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > Rows Removed by Filter: 3 >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > -> Bitmap Heap Scan on jobqueue t0 (cost=2223.73..3352.95 >>> > rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > Recheck Cond: (((docpriority < 1000000001::double >>> > precision) AND (status = 'P'::bpchar) AND (checkaction = 'R'::bpchar) >>> > AND (checktime <= 1461757713112::bigint)) OR ((docpriority < >>> > 1000000001::double precision) AND (status = 'G'::bpchar) AND >>> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > -> BitmapOr (cost=2223.73..2223.73 rows=288 >>> > width=0) (actual time=45.349..45.349 rows=0 loops=1) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > -> Bitmap Index Scan on i1459943614200 >>> > (cost=0.00..1111.80 rows=20 width=0) (actual time=29.325..29.325 >>> > rows=333 loops=1) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > Index Cond: ((docpriority < >>> > 1000000001::double precision) AND (status = 'P'::bpchar) AND >>> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint)) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > -> Bitmap Index Scan on i1459943614200 >>> > (cost=0.00..1111.80 rows=268 width=0) (actual time=16.020..16.020 >>> > rows=20555 loops=1) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > Index Cond: ((docpriority < >>> > 1000000001::double precision) AND (status = 'G'::bpchar) AND >>> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint)) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > -> Hash (cost=26.15..26.15 rows=130 width=8) (actual >>> > time=0.004..0.004 rows=0 loops=1) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > Buckets: 1024 Batches: 1 Memory Usage: 0kB >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > -> Hash Join (cost=12.93..26.15 rows=130 width=8) (actual >>> > time=0.003..0.003 rows=0 loops=1) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > Hash Cond: ((t3.eventname)::text = (t4.name)::text) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > -> Seq Scan on prereqevents t3 (cost=0.00..11.40 >>> > rows=140 width=524) (actual time=0.001..0.001 rows=0 loops=1) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > -> Hash (cost=11.30..11.30 rows=130 width=516) >>> > (never executed) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > -> Seq Scan on events t4 (cost=0.00..11.30 >>> > rows=130 width=516) (never executed) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > -> Index Scan using i1459943614199 on jobqueue t2 (cost=0.56..8.59 >>> > rows=1 width=49) (actual time=0.030..0.030 rows=0 loops=20215) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > Index Cond: ((dochash)::text = (t0.dochash)::text) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > Filter: ((jobid <> t0.jobid) AND ((status = 'A'::bpchar) OR >>> > (status = 'F'::bpchar) OR (status = 'a'::bpchar) OR (status = >>> > 'f'::bpchar) OR (status = 'D'::bpchar) OR (status = 'd'::bpchar))) >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: >>> > Rows Removed by Filter: 1 >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: Total runtime: >>> > 747.031 ms >>> > WARN 2016-04-27 13:49:49,440 (Stuffer thread) - >>> > WARN 2016-04-27 13:49:50,206 (Stuffer thread) - Stats: >>> > n_distinct=5.0 most_common_vals={Z,C,G,P} >>> > most_common_freqs={0.64013332,0.21616666,0.13356666,0.0101} >>> > >>> > >>> > 2016-04-27 13:43 GMT+02:00 Karl Wright <[email protected]>: >>> >> Hi Konstantin, >>> >> >>> >> The query you are looking at is performed by the UI only, and there is >>> >> a >>> >> parameter you can set which applies a limit to the number of documents >>> >> so >>> >> that the count is reported as "<limit>+" in the UI. This is the >>> >> parameter: >>> >> >>> >> org.apache.manifoldcf.ui.maxstatuscount >>> >> >>> >> As for why the database gets slow for crawling, unless you are seeing >>> >> reports in the log of long-running queries, then it's a good chance >>> >> you need >>> >> to vacuum your database instance. I generally recommend that a vacuum >>> >> full >>> >> be done periodically for database instances. Autovacuuming has gotten >>> >> a lot >>> >> better in postgres than it used to be but at least in the past the >>> >> autovacuuming process would get far behind ManifoldCF and so the >>> >> database >>> >> would get quite bloated anyway. So I'd give that a try. >>> >> >>> >> If you are seeing logging output mentioning slow queries, you may need >>> >> to >>> >> tune how often MCF analyzes certain tables. There are parameters that >>> >> control that as well. In general, if there is a slow query with a bad >>> >> plan, >>> >> and analyzing the tables involved makes it come up with a much better >>> >> plan, >>> >> analysis is not happening often enough. But first, before you get to >>> >> that >>> >> point, have a look at the log and see whether this is likely to be the >>> >> problem. (Usually it is the stuffer query that gets slow when there's >>> >> an >>> >> issue with table analysis, FWIW). Please feel free to post the plan >>> >> of the >>> >> queries being reported here. >>> >> >>> >> Thanks, >>> >> Karl >>> >> >>> >> >>> >> On Wed, Apr 27, 2016 at 7:33 AM, jetnet <[email protected]> wrote: >>> >>> >>> >>> Hi Karl, >>> >>> >>> >>> I set up two MCF instances (quick setup) on the same machine, using >>> >>> the same Postgres 9.3 instance (with different databases >>> >>> "org.apache.manifoldcf.database.name" of course). >>> >>> After a couple of days I've got a performance issue: one MCF instance >>> >>> has become very slow - it processes a few docs per hour only. I >>> >>> guess, >>> >>> the bottleneck is the database: >>> >>> >>> >>> "normal" instance: >>> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status -- >>> >>> 738.311 rows in the table, took 1,2 sec >>> >>> "G";50674 >>> >>> "F";68 >>> >>> "P";149179 >>> >>> "C";402367 >>> >>> "A";33 >>> >>> "Z";136676 >>> >>> >>> >>> "slow" instance (currently with a single active job): >>> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status -- >>> >>> 2.745.329 rows in the table, took 350 sec >>> >>> "G";337922 --STATUS_PENDINGPURGATORY >>> >>> "F";449 --STATUS_ACTIVEPURGATORY >>> >>> "P";25909 --STATUS_PENDING >>> >>> "C";562772 --STATUS_COMPLETE >>> >>> "A";9 --STATUS_ACTIVE >>> >>> "Z";1644927 --STATUS_PURGATORY >>> >>> >>> >>> Since "count(*)" is terrible slow in Postgres, I used the following >>> >>> sql to count jobqueue's rows: >>> >>> SELECT reltuples::bigint AS approximate_row_count FROM pg_class WHERE >>> >>> relname = 'jobqueue'; >>> >>> >>> >>> Both MCF instances have the same number of working threads, database >>> >>> handles etc. >>> >>> Is the database "full"? What could you recommend to improve the >>> >>> performance? >>> >>> >>> >>> Thank you! >>> >>> Konstantin >>> >> >>> >> >> >> >
