Re: Database performance

jetnet Wed, 27 Apr 2016 06:26:29 -0700

Hi Karl,

just want to let you know - after a "full analyse" (for all tables),
the DB performance seems to be OK,  and I see, the crawler is moving
now! :)
I'll add the analyze command to the "vacuum" daily script :)


Thank you very much for you time, hopefully, the lack of analyzing was
the root cause for that particular problem.

Konstantin


2016-04-27 15:16 GMT+02:00 Karl Wright <[email protected]>:
> Ok, here's the query:
>
> SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount
> FROM jobqueue t0  WHERE t0.docpriority<? AND  (t0.status=? OR t0.status=?)
> AND t0.checkaction=? AND t0.checktime<=? ... ORDER BY t0.docpriority ASC
> LIMIT 400
>
> Here's the index:
>
> "public";"jobqueue";"i1459943614200";"";"CREATE INDEX i1459943614200
> ON jobqueue USING btree (docpriority, status, checkaction, checktime)"
>
> In earlier versions of Postgresql it uses this index for both access and
> ordering (it reports "ordered index scan" in the plan or something like
> that).  We rely on the ability to use this index for fast stuffer queries.
> Maybe postgresql 9.3 broke this in some way?
>
> Here's the documentation; maybe you can figure it out. :-P  You can play
> around using "EXPLAIN" to figure out what it's trying to do.  My guess is,
> as I said, that it just thinks it has a better way, which isn't in fact
> better and is in fact worse...
>
> http://www.postgresql.org/docs/9.2/static/indexes-ordering.html
>
> Karl
>
>
> On Wed, Apr 27, 2016 at 9:00 AM, jetnet <[email protected]> wrote:
>>
>> yes, I use the recommended db settings:
>> shared_buffers = 1024MB            # min 128kB
>>
>> The indexes:
>> select * from pg_indexes where tablename = 'jobqueue';
>>
>> "public";"jobqueue";"jobqueue_pkey";"";"CREATE UNIQUE INDEX
>> jobqueue_pkey ON jobqueue USING btree (id)"
>> "public";"jobqueue";"i1459943614205";"";"CREATE INDEX i1459943614205
>> ON jobqueue USING btree (jobid, status)"
>> "public";"jobqueue";"i1459943614204";"";"CREATE INDEX i1459943614204
>> ON jobqueue USING btree (needpriority)"
>> "public";"jobqueue";"i1459943614203";"";"CREATE INDEX i1459943614203
>> ON jobqueue USING btree (isseed, jobid)"
>> "public";"jobqueue";"i1459943614202";"";"CREATE INDEX i1459943614202
>> ON jobqueue USING btree (failtime, jobid)"
>> "public";"jobqueue";"i1459943614201";"";"CREATE INDEX i1459943614201
>> ON jobqueue USING btree (status, checkaction, checktime)"
>> "public";"jobqueue";"i1459943614200";"";"CREATE INDEX i1459943614200
>> ON jobqueue USING btree (docpriority, status, checkaction, checktime)"
>> "public";"jobqueue";"i1459943614199";"";"CREATE UNIQUE INDEX
>> i1459943614199 ON jobqueue USING btree (dochash, jobid)"
>>
>>
>> Thanks!
>> Konstantin
>>
>> 2016-04-27 14:57 GMT+02:00 Karl Wright <[email protected]>:
>> > Also, have you looked at increasing the amount of shared buffers in your
>> > postgresql configuration?  See the "how-to-build-and-deploy" page for
>> > that;
>> > it can make a lot of difference for MCF.  So even if the plan is not
>> > perfect
>> > the database simply performs better.
>> >
>> > Karl
>> >
>> > On Wed, Apr 27, 2016 at 8:53 AM, Karl Wright <[email protected]> wrote:
>> >>
>> >> Hi Konstantin,
>> >>
>> >> Can you list the indexes defined on the jobqueue table?
>> >>
>> >> The query we do for stuffing is precisely aligned with one of the
>> >> indexes
>> >> we create on the jobqueue table.  It should be doing an ordered index
>> >> read
>> >> with complex filtering conditions; instead it is doing this:
>> >>
>> >> WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>            ->  Bitmap Heap Scan on jobqueue t0  (cost=2223.73..3352.95
>> >> rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
>> >>
>> >> ... with a hash join and a hash sort.
>> >>
>> >> Postgres is being way too smart for its own good here.  Probably that's
>> >> at
>> >> least partly because its time estimates for its chosen plan are far
>> >> less
>> >> than reality.  There may be a way to configure postgresql not to make
>> >> that
>> >> mistake but I don't know offhand how you do it.
>> >>
>> >> In order for us to be able to control the plan via hints, which would
>> >> make
>> >> this problem go away once and for all, you need the version of Postgres
>> >> that
>> >> you pay for.  Sigh.
>> >>
>> >> Anyhow, let's verify that the right indexes are in place, at least...
>> >>
>> >> Karl
>> >>
>> >>
>> >> On Wed, Apr 27, 2016 at 8:38 AM, jetnet <[email protected]> wrote:
>> >>>
>> >>> yet another long running query:
>> >>>
>> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) - Found a long-running
>> >>> query (215465 ms): [UPDATE jobqueue SET
>> >>> docpriority=?,needpriorityprocessid=NULL,needpriority=? WHERE
>> >>> docpriority<?]
>> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 0:
>> >>> '1.000000001E9'
>> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 1: 'T'
>> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 2:
>> >>> '1.000000001E9'
>> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan: Update on
>> >>> jobqueue  (cost=0.56..37802.01 rows=19647 width=352)
>> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:   ->  Index
>> >>> Scan using i1459943614200 on jobqueue  (cost=0.56..37802.01 rows=19647
>> >>> width=352)
>> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:         Index
>> >>> Cond: (docpriority < 1000000001::double precision)
>> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -
>> >>>  WARN 2016-04-27 14:23:31,578 (Agents thread) -  Stats: n_distinct=4.0
>> >>> most_common_vals={Z,C,G,P}
>> >>> most_common_freqs={0.64193332,0.21796666,0.13086666,0.0092333332}
>> >>>
>> >>>
>> >>>
>> >>> 2016-04-27 14:32 GMT+02:00 jetnet <[email protected]>:
>> >>> > Thank you for the fast reply!
>> >>> >
>> >>> > I provided the query to show, how many docs have been processed by
>> >>> > the
>> >>> > instances so far.
>> >>> > I do not use UI to check the jobs statues - it takes 10-15 minutes
>> >>> > to
>> >>> > display the page :( (ui.maxstatuscount=100.000).
>> >>> > The full vacuum and re-index database are running as a scheduled job
>> >>> > on a daily basis.
>> >>> > So, just tried to analyze the "problem" table - jobqueue, but I
>> >>> > don't
>> >>> > think it's helped a lot, the same "status" query took 250 second
>> >>> > now,
>> >>> > improved, but it's still very long.
>> >>> >
>> >>> > The long-running queries reported in the log are all about jobqueue
>> >>> > table, e.g.:
>> >>> >
>> >>> >  WARN 2016-04-27 13:49:48,596 (Stuffer thread) - Found a
>> >>> > long-running
>> >>> > query (75328 ms): [SELECT
>> >>> >
>> >>> > t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount
>> >>> > FROM jobqueue t0  WHERE t0.docpriority<? AND  (t0.status=? OR
>> >>> > t0.status=?) AND t0.checkaction=? AND t0.checktime<=? AND
>> >>> > EXISTS(SELECT 'x' FROM jobs t1 WHERE  (t1.status=? OR t1.status=?)
>> >>> > AND
>> >>> > t1.id=t0.jobid AND t1.priority=?) AND NOT EXISTS(SELECT 'x' FROM
>> >>> > jobqueue t2 WHERE t2.dochash=t0.dochash AND  (t2.status=? OR
>> >>> > t2.status=? OR t2.status=? OR t2.status=? OR t2.status=? OR
>> >>> > t2.status=?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM
>> >>> > prereqevents t3,events t4 WHERE t0.id=t3.owner AND
>> >>> > t3.eventname=t4.name) ORDER BY t0.docpriority ASC LIMIT 400]
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 0:
>> >>> > '1.000000001E9'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 1: 'P'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 2: 'G'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 3: 'R'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 4:
>> >>> > '1461757713112'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 5: 'A'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 6: 'a'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 7: '5'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 8: 'A'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 9: 'F'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 10: 'a'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 11: 'f'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 12: 'D'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 13: 'd'
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Limit
>> >>> > (cost=4016.80..4016.98 rows=72 width=235) (actual
>> >>> > time=738.775..738.842 rows=400 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:   ->  Sort
>> >>> > (cost=4016.80..4016.98 rows=72 width=235) (actual
>> >>> > time=738.774..738.792 rows=400 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
>> >>> > Key: t0.docpriority
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
>> >>> > Method: top-N heapsort  Memory: 139kB
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         ->
>> >>> > Nested Loop Anti Join  (cost=2252.06..4014.58 rows=72 width=235)
>> >>> > (actual time=46.841..723.735 rows=20215 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> > ->  Hash Anti Join  (cost=2251.51..3387.34 rows=72 width=235)
>> >>> > (actual
>> >>> > time=46.770..108.460 rows=20215 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >      Hash Cond: (t0.id = t3.owner)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >      ->  Nested Loop  (cost=2223.73..3358.66 rows=72 width=235)
>> >>> > (actual time=46.747..102.890 rows=20215 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >            Join Filter: (t0.jobid = t1.id)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >            Rows Removed by Join Filter: 643
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >            ->  Seq Scan on jobs t1  (cost=0.00..2.14 rows=1 width=8)
>> >>> > (actual time=0.015..0.024 rows=1 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                  Filter: ((priority = 5::bigint) AND ((status =
>> >>> > 'A'::bpchar) OR (status = 'a'::bpchar)))
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                  Rows Removed by Filter: 3
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >            ->  Bitmap Heap Scan on jobqueue t0
>> >>> > (cost=2223.73..3352.95
>> >>> > rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                  Recheck Cond: (((docpriority < 1000000001::double
>> >>> > precision) AND (status = 'P'::bpchar) AND (checkaction =
>> >>> > 'R'::bpchar)
>> >>> > AND (checktime <= 1461757713112::bigint)) OR ((docpriority <
>> >>> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
>> >>> > (checkaction = 'R'::bpchar) AND (checktime <=
>> >>> > 1461757713112::bigint)))
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                  ->  BitmapOr  (cost=2223.73..2223.73 rows=288
>> >>> > width=0) (actual time=45.349..45.349 rows=0 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                        ->  Bitmap Index Scan on i1459943614200
>> >>> > (cost=0.00..1111.80 rows=20 width=0) (actual time=29.325..29.325
>> >>> > rows=333 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                              Index Cond: ((docpriority <
>> >>> > 1000000001::double precision) AND (status = 'P'::bpchar) AND
>> >>> > (checkaction = 'R'::bpchar) AND (checktime <=
>> >>> > 1461757713112::bigint))
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                        ->  Bitmap Index Scan on i1459943614200
>> >>> > (cost=0.00..1111.80 rows=268 width=0) (actual time=16.020..16.020
>> >>> > rows=20555 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                              Index Cond: ((docpriority <
>> >>> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
>> >>> > (checkaction = 'R'::bpchar) AND (checktime <=
>> >>> > 1461757713112::bigint))
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >      ->  Hash  (cost=26.15..26.15 rows=130 width=8) (actual
>> >>> > time=0.004..0.004 rows=0 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >            Buckets: 1024  Batches: 1  Memory Usage: 0kB
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >            ->  Hash Join  (cost=12.93..26.15 rows=130 width=8)
>> >>> > (actual
>> >>> > time=0.003..0.003 rows=0 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                  Hash Cond: ((t3.eventname)::text = (t4.name)::text)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                  ->  Seq Scan on prereqevents t3  (cost=0.00..11.40
>> >>> > rows=140 width=524) (actual time=0.001..0.001 rows=0 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                  ->  Hash  (cost=11.30..11.30 rows=130 width=516)
>> >>> > (never executed)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                        ->  Seq Scan on events t4  (cost=0.00..11.30
>> >>> > rows=130 width=516) (never executed)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> > ->  Index Scan using i1459943614199 on jobqueue t2  (cost=0.56..8.59
>> >>> > rows=1 width=49) (actual time=0.030..0.030 rows=0 loops=20215)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >      Index Cond: ((dochash)::text = (t0.dochash)::text)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >      Filter: ((jobid <> t0.jobid) AND ((status = 'A'::bpchar) OR
>> >>> > (status = 'F'::bpchar) OR (status = 'a'::bpchar) OR (status =
>> >>> > 'f'::bpchar) OR (status = 'D'::bpchar) OR (status = 'd'::bpchar)))
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >      Rows Removed by Filter: 1
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Total
>> >>> > runtime:
>> >>> > 747.031 ms
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -
>> >>> >  WARN 2016-04-27 13:49:50,206 (Stuffer thread) -  Stats:
>> >>> > n_distinct=5.0 most_common_vals={Z,C,G,P}
>> >>> > most_common_freqs={0.64013332,0.21616666,0.13356666,0.0101}
>> >>> >
>> >>> >
>> >>> > 2016-04-27 13:43 GMT+02:00 Karl Wright <[email protected]>:
>> >>> >> Hi Konstantin,
>> >>> >>
>> >>> >> The query you are looking at is performed by the UI only, and there
>> >>> >> is
>> >>> >> a
>> >>> >> parameter you can set which applies a limit to the number of
>> >>> >> documents
>> >>> >> so
>> >>> >> that the count is reported as "<limit>+" in the UI.  This is the
>> >>> >> parameter:
>> >>> >>
>> >>> >> org.apache.manifoldcf.ui.maxstatuscount
>> >>> >>
>> >>> >> As for why the database gets slow for crawling, unless you are
>> >>> >> seeing
>> >>> >> reports in the log of long-running queries, then it's a good chance
>> >>> >> you need
>> >>> >> to vacuum your database instance.  I generally recommend that a
>> >>> >> vacuum
>> >>> >> full
>> >>> >> be done periodically for database instances.  Autovacuuming has
>> >>> >> gotten
>> >>> >> a lot
>> >>> >> better in postgres than it used to be but at least in the past the
>> >>> >> autovacuuming process would get far behind ManifoldCF and so the
>> >>> >> database
>> >>> >> would get quite bloated anyway.  So I'd give that a try.
>> >>> >>
>> >>> >> If you are seeing logging output mentioning slow queries, you may
>> >>> >> need
>> >>> >> to
>> >>> >> tune how often MCF analyzes certain tables.  There are parameters
>> >>> >> that
>> >>> >> control that as well.  In general, if there is a slow query with a
>> >>> >> bad
>> >>> >> plan,
>> >>> >> and analyzing the tables involved makes it come up with a much
>> >>> >> better
>> >>> >> plan,
>> >>> >> analysis is not happening often enough.  But first, before you get
>> >>> >> to
>> >>> >> that
>> >>> >> point, have a look at the log and see whether this is likely to be
>> >>> >> the
>> >>> >> problem.  (Usually it is the stuffer query that gets slow when
>> >>> >> there's
>> >>> >> an
>> >>> >> issue with table analysis, FWIW).  Please feel free to post the
>> >>> >> plan
>> >>> >> of the
>> >>> >> queries being reported here.
>> >>> >>
>> >>> >> Thanks,
>> >>> >> Karl
>> >>> >>
>> >>> >>
>> >>> >> On Wed, Apr 27, 2016 at 7:33 AM, jetnet <[email protected]> wrote:
>> >>> >>>
>> >>> >>> Hi Karl,
>> >>> >>>
>> >>> >>> I set up two MCF instances (quick setup) on the same machine,
>> >>> >>> using
>> >>> >>> the same Postgres 9.3 instance (with different databases
>> >>> >>> "org.apache.manifoldcf.database.name" of course).
>> >>> >>> After a couple of days I've got a performance issue: one MCF
>> >>> >>> instance
>> >>> >>> has become very slow - it processes a few docs per hour only. I
>> >>> >>> guess,
>> >>> >>> the bottleneck is the database:
>> >>> >>>
>> >>> >>> "normal" instance:
>> >>> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
>> >>> >>> 738.311 rows in the table, took 1,2 sec
>> >>> >>> "G";50674
>> >>> >>> "F";68
>> >>> >>> "P";149179
>> >>> >>> "C";402367
>> >>> >>> "A";33
>> >>> >>> "Z";136676
>> >>> >>>
>> >>> >>> "slow" instance (currently with a single active job):
>> >>> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
>> >>> >>> 2.745.329 rows in the table, took 350 sec
>> >>> >>> "G";337922  --STATUS_PENDINGPURGATORY
>> >>> >>> "F";449     --STATUS_ACTIVEPURGATORY
>> >>> >>> "P";25909   --STATUS_PENDING
>> >>> >>> "C";562772  --STATUS_COMPLETE
>> >>> >>> "A";9       --STATUS_ACTIVE
>> >>> >>> "Z";1644927 --STATUS_PURGATORY
>> >>> >>>
>> >>> >>> Since "count(*)" is terrible slow in Postgres, I used the
>> >>> >>> following
>> >>> >>> sql to count jobqueue's rows:
>> >>> >>> SELECT reltuples::bigint AS approximate_row_count FROM pg_class
>> >>> >>> WHERE
>> >>> >>> relname = 'jobqueue';
>> >>> >>>
>> >>> >>> Both MCF instances have the same number of working threads,
>> >>> >>> database
>> >>> >>> handles etc.
>> >>> >>> Is the database "full"? What could you recommend to improve the
>> >>> >>> performance?
>> >>> >>>
>> >>> >>> Thank you!
>> >>> >>> Konstantin
>> >>> >>
>> >>> >>
>> >>
>> >>
>> >
>
>

Re: Database performance

Reply via email to