Re: Database performance

jetnet Wed, 27 Apr 2016 06:01:31 -0700

yes, I use the recommended db settings:
shared_buffers = 1024MB            # min 128kB


The indexes:
select * from pg_indexes where tablename = 'jobqueue';

"public";"jobqueue";"jobqueue_pkey";"";"CREATE UNIQUE INDEX
jobqueue_pkey ON jobqueue USING btree (id)"
"public";"jobqueue";"i1459943614205";"";"CREATE INDEX i1459943614205
ON jobqueue USING btree (jobid, status)"
"public";"jobqueue";"i1459943614204";"";"CREATE INDEX i1459943614204
ON jobqueue USING btree (needpriority)"
"public";"jobqueue";"i1459943614203";"";"CREATE INDEX i1459943614203
ON jobqueue USING btree (isseed, jobid)"
"public";"jobqueue";"i1459943614202";"";"CREATE INDEX i1459943614202
ON jobqueue USING btree (failtime, jobid)"
"public";"jobqueue";"i1459943614201";"";"CREATE INDEX i1459943614201
ON jobqueue USING btree (status, checkaction, checktime)"
"public";"jobqueue";"i1459943614200";"";"CREATE INDEX i1459943614200
ON jobqueue USING btree (docpriority, status, checkaction, checktime)"
"public";"jobqueue";"i1459943614199";"";"CREATE UNIQUE INDEX
i1459943614199 ON jobqueue USING btree (dochash, jobid)"


Thanks!
Konstantin

2016-04-27 14:57 GMT+02:00 Karl Wright <[email protected]>:
> Also, have you looked at increasing the amount of shared buffers in your
> postgresql configuration?  See the "how-to-build-and-deploy" page for that;
> it can make a lot of difference for MCF.  So even if the plan is not perfect
> the database simply performs better.
>
> Karl
>
> On Wed, Apr 27, 2016 at 8:53 AM, Karl Wright <[email protected]> wrote:
>>
>> Hi Konstantin,
>>
>> Can you list the indexes defined on the jobqueue table?
>>
>> The query we do for stuffing is precisely aligned with one of the indexes
>> we create on the jobqueue table.  It should be doing an ordered index read
>> with complex filtering conditions; instead it is doing this:
>>
>> WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>            ->  Bitmap Heap Scan on jobqueue t0  (cost=2223.73..3352.95
>> rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
>>
>> ... with a hash join and a hash sort.
>>
>> Postgres is being way too smart for its own good here.  Probably that's at
>> least partly because its time estimates for its chosen plan are far less
>> than reality.  There may be a way to configure postgresql not to make that
>> mistake but I don't know offhand how you do it.
>>
>> In order for us to be able to control the plan via hints, which would make
>> this problem go away once and for all, you need the version of Postgres that
>> you pay for.  Sigh.
>>
>> Anyhow, let's verify that the right indexes are in place, at least...
>>
>> Karl
>>
>>
>> On Wed, Apr 27, 2016 at 8:38 AM, jetnet <[email protected]> wrote:
>>>
>>> yet another long running query:
>>>
>>>  WARN 2016-04-27 14:23:31,562 (Agents thread) - Found a long-running
>>> query (215465 ms): [UPDATE jobqueue SET
>>> docpriority=?,needpriorityprocessid=NULL,needpriority=? WHERE
>>> docpriority<?]
>>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 0:
>>> '1.000000001E9'
>>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 1: 'T'
>>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 2:
>>> '1.000000001E9'
>>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan: Update on
>>> jobqueue  (cost=0.56..37802.01 rows=19647 width=352)
>>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:   ->  Index
>>> Scan using i1459943614200 on jobqueue  (cost=0.56..37802.01 rows=19647
>>> width=352)
>>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:         Index
>>> Cond: (docpriority < 1000000001::double precision)
>>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -
>>>  WARN 2016-04-27 14:23:31,578 (Agents thread) -  Stats: n_distinct=4.0
>>> most_common_vals={Z,C,G,P}
>>> most_common_freqs={0.64193332,0.21796666,0.13086666,0.0092333332}
>>>
>>>
>>>
>>> 2016-04-27 14:32 GMT+02:00 jetnet <[email protected]>:
>>> > Thank you for the fast reply!
>>> >
>>> > I provided the query to show, how many docs have been processed by the
>>> > instances so far.
>>> > I do not use UI to check the jobs statues - it takes 10-15 minutes to
>>> > display the page :( (ui.maxstatuscount=100.000).
>>> > The full vacuum and re-index database are running as a scheduled job
>>> > on a daily basis.
>>> > So, just tried to analyze the "problem" table - jobqueue, but I don't
>>> > think it's helped a lot, the same "status" query took 250 second now,
>>> > improved, but it's still very long.
>>> >
>>> > The long-running queries reported in the log are all about jobqueue
>>> > table, e.g.:
>>> >
>>> >  WARN 2016-04-27 13:49:48,596 (Stuffer thread) - Found a long-running
>>> > query (75328 ms): [SELECT
>>> > t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount
>>> > FROM jobqueue t0  WHERE t0.docpriority<? AND  (t0.status=? OR
>>> > t0.status=?) AND t0.checkaction=? AND t0.checktime<=? AND
>>> > EXISTS(SELECT 'x' FROM jobs t1 WHERE  (t1.status=? OR t1.status=?) AND
>>> > t1.id=t0.jobid AND t1.priority=?) AND NOT EXISTS(SELECT 'x' FROM
>>> > jobqueue t2 WHERE t2.dochash=t0.dochash AND  (t2.status=? OR
>>> > t2.status=? OR t2.status=? OR t2.status=? OR t2.status=? OR
>>> > t2.status=?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM
>>> > prereqevents t3,events t4 WHERE t0.id=t3.owner AND
>>> > t3.eventname=t4.name) ORDER BY t0.docpriority ASC LIMIT 400]
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 0:
>>> > '1.000000001E9'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 1: 'P'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 2: 'G'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 3: 'R'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 4:
>>> > '1461757713112'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 5: 'A'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 6: 'a'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 7: '5'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 8: 'A'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 9: 'F'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 10: 'a'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 11: 'f'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 12: 'D'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 13: 'd'
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Limit
>>> > (cost=4016.80..4016.98 rows=72 width=235) (actual
>>> > time=738.775..738.842 rows=400 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:   ->  Sort
>>> > (cost=4016.80..4016.98 rows=72 width=235) (actual
>>> > time=738.774..738.792 rows=400 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
>>> > Key: t0.docpriority
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
>>> > Method: top-N heapsort  Memory: 139kB
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         ->
>>> > Nested Loop Anti Join  (cost=2252.06..4014.58 rows=72 width=235)
>>> > (actual time=46.841..723.735 rows=20215 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> > ->  Hash Anti Join  (cost=2251.51..3387.34 rows=72 width=235) (actual
>>> > time=46.770..108.460 rows=20215 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >      Hash Cond: (t0.id = t3.owner)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >      ->  Nested Loop  (cost=2223.73..3358.66 rows=72 width=235)
>>> > (actual time=46.747..102.890 rows=20215 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >            Join Filter: (t0.jobid = t1.id)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >            Rows Removed by Join Filter: 643
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >            ->  Seq Scan on jobs t1  (cost=0.00..2.14 rows=1 width=8)
>>> > (actual time=0.015..0.024 rows=1 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                  Filter: ((priority = 5::bigint) AND ((status =
>>> > 'A'::bpchar) OR (status = 'a'::bpchar)))
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                  Rows Removed by Filter: 3
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >            ->  Bitmap Heap Scan on jobqueue t0  (cost=2223.73..3352.95
>>> > rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                  Recheck Cond: (((docpriority < 1000000001::double
>>> > precision) AND (status = 'P'::bpchar) AND (checkaction = 'R'::bpchar)
>>> > AND (checktime <= 1461757713112::bigint)) OR ((docpriority <
>>> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
>>> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint)))
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                  ->  BitmapOr  (cost=2223.73..2223.73 rows=288
>>> > width=0) (actual time=45.349..45.349 rows=0 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                        ->  Bitmap Index Scan on i1459943614200
>>> > (cost=0.00..1111.80 rows=20 width=0) (actual time=29.325..29.325
>>> > rows=333 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                              Index Cond: ((docpriority <
>>> > 1000000001::double precision) AND (status = 'P'::bpchar) AND
>>> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                        ->  Bitmap Index Scan on i1459943614200
>>> > (cost=0.00..1111.80 rows=268 width=0) (actual time=16.020..16.020
>>> > rows=20555 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                              Index Cond: ((docpriority <
>>> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
>>> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >      ->  Hash  (cost=26.15..26.15 rows=130 width=8) (actual
>>> > time=0.004..0.004 rows=0 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >            Buckets: 1024  Batches: 1  Memory Usage: 0kB
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >            ->  Hash Join  (cost=12.93..26.15 rows=130 width=8) (actual
>>> > time=0.003..0.003 rows=0 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                  Hash Cond: ((t3.eventname)::text = (t4.name)::text)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                  ->  Seq Scan on prereqevents t3  (cost=0.00..11.40
>>> > rows=140 width=524) (actual time=0.001..0.001 rows=0 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                  ->  Hash  (cost=11.30..11.30 rows=130 width=516)
>>> > (never executed)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                        ->  Seq Scan on events t4  (cost=0.00..11.30
>>> > rows=130 width=516) (never executed)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> > ->  Index Scan using i1459943614199 on jobqueue t2  (cost=0.56..8.59
>>> > rows=1 width=49) (actual time=0.030..0.030 rows=0 loops=20215)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >      Index Cond: ((dochash)::text = (t0.dochash)::text)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >      Filter: ((jobid <> t0.jobid) AND ((status = 'A'::bpchar) OR
>>> > (status = 'F'::bpchar) OR (status = 'a'::bpchar) OR (status =
>>> > 'f'::bpchar) OR (status = 'D'::bpchar) OR (status = 'd'::bpchar)))
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >      Rows Removed by Filter: 1
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Total runtime:
>>> > 747.031 ms
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -
>>> >  WARN 2016-04-27 13:49:50,206 (Stuffer thread) -  Stats:
>>> > n_distinct=5.0 most_common_vals={Z,C,G,P}
>>> > most_common_freqs={0.64013332,0.21616666,0.13356666,0.0101}
>>> >
>>> >
>>> > 2016-04-27 13:43 GMT+02:00 Karl Wright <[email protected]>:
>>> >> Hi Konstantin,
>>> >>
>>> >> The query you are looking at is performed by the UI only, and there is
>>> >> a
>>> >> parameter you can set which applies a limit to the number of documents
>>> >> so
>>> >> that the count is reported as "<limit>+" in the UI.  This is the
>>> >> parameter:
>>> >>
>>> >> org.apache.manifoldcf.ui.maxstatuscount
>>> >>
>>> >> As for why the database gets slow for crawling, unless you are seeing
>>> >> reports in the log of long-running queries, then it's a good chance
>>> >> you need
>>> >> to vacuum your database instance.  I generally recommend that a vacuum
>>> >> full
>>> >> be done periodically for database instances.  Autovacuuming has gotten
>>> >> a lot
>>> >> better in postgres than it used to be but at least in the past the
>>> >> autovacuuming process would get far behind ManifoldCF and so the
>>> >> database
>>> >> would get quite bloated anyway.  So I'd give that a try.
>>> >>
>>> >> If you are seeing logging output mentioning slow queries, you may need
>>> >> to
>>> >> tune how often MCF analyzes certain tables.  There are parameters that
>>> >> control that as well.  In general, if there is a slow query with a bad
>>> >> plan,
>>> >> and analyzing the tables involved makes it come up with a much better
>>> >> plan,
>>> >> analysis is not happening often enough.  But first, before you get to
>>> >> that
>>> >> point, have a look at the log and see whether this is likely to be the
>>> >> problem.  (Usually it is the stuffer query that gets slow when there's
>>> >> an
>>> >> issue with table analysis, FWIW).  Please feel free to post the plan
>>> >> of the
>>> >> queries being reported here.
>>> >>
>>> >> Thanks,
>>> >> Karl
>>> >>
>>> >>
>>> >> On Wed, Apr 27, 2016 at 7:33 AM, jetnet <[email protected]> wrote:
>>> >>>
>>> >>> Hi Karl,
>>> >>>
>>> >>> I set up two MCF instances (quick setup) on the same machine, using
>>> >>> the same Postgres 9.3 instance (with different databases
>>> >>> "org.apache.manifoldcf.database.name" of course).
>>> >>> After a couple of days I've got a performance issue: one MCF instance
>>> >>> has become very slow - it processes a few docs per hour only. I
>>> >>> guess,
>>> >>> the bottleneck is the database:
>>> >>>
>>> >>> "normal" instance:
>>> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
>>> >>> 738.311 rows in the table, took 1,2 sec
>>> >>> "G";50674
>>> >>> "F";68
>>> >>> "P";149179
>>> >>> "C";402367
>>> >>> "A";33
>>> >>> "Z";136676
>>> >>>
>>> >>> "slow" instance (currently with a single active job):
>>> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
>>> >>> 2.745.329 rows in the table, took 350 sec
>>> >>> "G";337922  --STATUS_PENDINGPURGATORY
>>> >>> "F";449     --STATUS_ACTIVEPURGATORY
>>> >>> "P";25909   --STATUS_PENDING
>>> >>> "C";562772  --STATUS_COMPLETE
>>> >>> "A";9       --STATUS_ACTIVE
>>> >>> "Z";1644927 --STATUS_PURGATORY
>>> >>>
>>> >>> Since "count(*)" is terrible slow in Postgres, I used the following
>>> >>> sql to count jobqueue's rows:
>>> >>> SELECT reltuples::bigint AS approximate_row_count FROM pg_class WHERE
>>> >>> relname = 'jobqueue';
>>> >>>
>>> >>> Both MCF instances have the same number of working threads, database
>>> >>> handles etc.
>>> >>> Is the database "full"? What could you recommend to improve the
>>> >>> performance?
>>> >>>
>>> >>> Thank you!
>>> >>> Konstantin
>>> >>
>>> >>
>>
>>
>

Re: Database performance

Reply via email to