Hi Konstantin,

Can you list the indexes defined on the jobqueue table?

The query we do for stuffing is precisely aligned with one of the indexes
we create on the jobqueue table.  It should be doing an ordered index read
with complex filtering conditions; instead it is doing this:

WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
           ->  Bitmap Heap Scan on jobqueue t0  (cost=2223.73..3352.95
rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)

... with a hash join and a hash sort.

Postgres is being way too smart for its own good here.  Probably that's at
least partly because its time estimates for its chosen plan are far less
than reality.  There may be a way to configure postgresql not to make that
mistake but I don't know offhand how you do it.

In order for us to be able to control the plan via hints, which would make
this problem go away once and for all, you need the version of Postgres
that you pay for.  Sigh.

Anyhow, let's verify that the right indexes are in place, at least...

Karl


On Wed, Apr 27, 2016 at 8:38 AM, jetnet <[email protected]> wrote:

> yet another long running query:
>
>  WARN 2016-04-27 14:23:31,562 (Agents thread) - Found a long-running
> query (215465 ms): [UPDATE jobqueue SET
> docpriority=?,needpriorityprocessid=NULL,needpriority=? WHERE
> docpriority<?]
>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 0:
> '1.000000001E9'
>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 1: 'T'
>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 2:
> '1.000000001E9'
>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan: Update on
> jobqueue  (cost=0.56..37802.01 rows=19647 width=352)
>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:   ->  Index
> Scan using i1459943614200 on jobqueue  (cost=0.56..37802.01 rows=19647
> width=352)
>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:         Index
> Cond: (docpriority < 1000000001::double precision)
>  WARN 2016-04-27 14:23:31,562 (Agents thread) -
>  WARN 2016-04-27 14:23:31,578 (Agents thread) -  Stats: n_distinct=4.0
> most_common_vals={Z,C,G,P}
> most_common_freqs={0.64193332,0.21796666,0.13086666,0.0092333332}
>
>
>
> 2016-04-27 14:32 GMT+02:00 jetnet <[email protected]>:
> > Thank you for the fast reply!
> >
> > I provided the query to show, how many docs have been processed by the
> > instances so far.
> > I do not use UI to check the jobs statues - it takes 10-15 minutes to
> > display the page :( (ui.maxstatuscount=100.000).
> > The full vacuum and re-index database are running as a scheduled job
> > on a daily basis.
> > So, just tried to analyze the "problem" table - jobqueue, but I don't
> > think it's helped a lot, the same "status" query took 250 second now,
> > improved, but it's still very long.
> >
> > The long-running queries reported in the log are all about jobqueue
> table, e.g.:
> >
> >  WARN 2016-04-27 13:49:48,596 (Stuffer thread) - Found a long-running
> > query (75328 ms): [SELECT
> > t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount
> > FROM jobqueue t0  WHERE t0.docpriority<? AND  (t0.status=? OR
> > t0.status=?) AND t0.checkaction=? AND t0.checktime<=? AND
> > EXISTS(SELECT 'x' FROM jobs t1 WHERE  (t1.status=? OR t1.status=?) AND
> > t1.id=t0.jobid AND t1.priority=?) AND NOT EXISTS(SELECT 'x' FROM
> > jobqueue t2 WHERE t2.dochash=t0.dochash AND  (t2.status=? OR
> > t2.status=? OR t2.status=? OR t2.status=? OR t2.status=? OR
> > t2.status=?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM
> > prereqevents t3,events t4 WHERE t0.id=t3.owner AND
> > t3.eventname=t4.name) ORDER BY t0.docpriority ASC LIMIT 400]
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 0:
> '1.000000001E9'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 1: 'P'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 2: 'G'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 3: 'R'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 4:
> '1461757713112'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 5: 'A'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 6: 'a'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 7: '5'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 8: 'A'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 9: 'F'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 10: 'a'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 11: 'f'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 12: 'D'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 13: 'd'
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Limit
> > (cost=4016.80..4016.98 rows=72 width=235) (actual
> > time=738.775..738.842 rows=400 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:   ->  Sort
> > (cost=4016.80..4016.98 rows=72 width=235) (actual
> > time=738.774..738.792 rows=400 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
> > Key: t0.docpriority
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
> > Method: top-N heapsort  Memory: 139kB
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         ->
> > Nested Loop Anti Join  (cost=2252.06..4014.58 rows=72 width=235)
> > (actual time=46.841..723.735 rows=20215 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> > ->  Hash Anti Join  (cost=2251.51..3387.34 rows=72 width=235) (actual
> > time=46.770..108.460 rows=20215 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >      Hash Cond: (t0.id = t3.owner)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >      ->  Nested Loop  (cost=2223.73..3358.66 rows=72 width=235)
> > (actual time=46.747..102.890 rows=20215 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >            Join Filter: (t0.jobid = t1.id)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >            Rows Removed by Join Filter: 643
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >            ->  Seq Scan on jobs t1  (cost=0.00..2.14 rows=1 width=8)
> > (actual time=0.015..0.024 rows=1 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                  Filter: ((priority = 5::bigint) AND ((status =
> > 'A'::bpchar) OR (status = 'a'::bpchar)))
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                  Rows Removed by Filter: 3
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >            ->  Bitmap Heap Scan on jobqueue t0  (cost=2223.73..3352.95
> > rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                  Recheck Cond: (((docpriority < 1000000001::double
> > precision) AND (status = 'P'::bpchar) AND (checkaction = 'R'::bpchar)
> > AND (checktime <= 1461757713112::bigint)) OR ((docpriority <
> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint)))
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                  ->  BitmapOr  (cost=2223.73..2223.73 rows=288
> > width=0) (actual time=45.349..45.349 rows=0 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                        ->  Bitmap Index Scan on i1459943614200
> > (cost=0.00..1111.80 rows=20 width=0) (actual time=29.325..29.325
> > rows=333 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                              Index Cond: ((docpriority <
> > 1000000001::double precision) AND (status = 'P'::bpchar) AND
> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                        ->  Bitmap Index Scan on i1459943614200
> > (cost=0.00..1111.80 rows=268 width=0) (actual time=16.020..16.020
> > rows=20555 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                              Index Cond: ((docpriority <
> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >      ->  Hash  (cost=26.15..26.15 rows=130 width=8) (actual
> > time=0.004..0.004 rows=0 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >            Buckets: 1024  Batches: 1  Memory Usage: 0kB
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >            ->  Hash Join  (cost=12.93..26.15 rows=130 width=8) (actual
> > time=0.003..0.003 rows=0 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                  Hash Cond: ((t3.eventname)::text = (t4.name)::text)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                  ->  Seq Scan on prereqevents t3  (cost=0.00..11.40
> > rows=140 width=524) (actual time=0.001..0.001 rows=0 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                  ->  Hash  (cost=11.30..11.30 rows=130 width=516)
> > (never executed)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                        ->  Seq Scan on events t4  (cost=0.00..11.30
> > rows=130 width=516) (never executed)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> > ->  Index Scan using i1459943614199 on jobqueue t2  (cost=0.56..8.59
> > rows=1 width=49) (actual time=0.030..0.030 rows=0 loops=20215)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >      Index Cond: ((dochash)::text = (t0.dochash)::text)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >      Filter: ((jobid <> t0.jobid) AND ((status = 'A'::bpchar) OR
> > (status = 'F'::bpchar) OR (status = 'a'::bpchar) OR (status =
> > 'f'::bpchar) OR (status = 'D'::bpchar) OR (status = 'd'::bpchar)))
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >      Rows Removed by Filter: 1
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Total runtime:
> > 747.031 ms
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -
> >  WARN 2016-04-27 13:49:50,206 (Stuffer thread) -  Stats:
> > n_distinct=5.0 most_common_vals={Z,C,G,P}
> > most_common_freqs={0.64013332,0.21616666,0.13356666,0.0101}
> >
> >
> > 2016-04-27 13:43 GMT+02:00 Karl Wright <[email protected]>:
> >> Hi Konstantin,
> >>
> >> The query you are looking at is performed by the UI only, and there is a
> >> parameter you can set which applies a limit to the number of documents
> so
> >> that the count is reported as "<limit>+" in the UI.  This is the
> parameter:
> >>
> >> org.apache.manifoldcf.ui.maxstatuscount
> >>
> >> As for why the database gets slow for crawling, unless you are seeing
> >> reports in the log of long-running queries, then it's a good chance you
> need
> >> to vacuum your database instance.  I generally recommend that a vacuum
> full
> >> be done periodically for database instances.  Autovacuuming has gotten
> a lot
> >> better in postgres than it used to be but at least in the past the
> >> autovacuuming process would get far behind ManifoldCF and so the
> database
> >> would get quite bloated anyway.  So I'd give that a try.
> >>
> >> If you are seeing logging output mentioning slow queries, you may need
> to
> >> tune how often MCF analyzes certain tables.  There are parameters that
> >> control that as well.  In general, if there is a slow query with a bad
> plan,
> >> and analyzing the tables involved makes it come up with a much better
> plan,
> >> analysis is not happening often enough.  But first, before you get to
> that
> >> point, have a look at the log and see whether this is likely to be the
> >> problem.  (Usually it is the stuffer query that gets slow when there's
> an
> >> issue with table analysis, FWIW).  Please feel free to post the plan of
> the
> >> queries being reported here.
> >>
> >> Thanks,
> >> Karl
> >>
> >>
> >> On Wed, Apr 27, 2016 at 7:33 AM, jetnet <[email protected]> wrote:
> >>>
> >>> Hi Karl,
> >>>
> >>> I set up two MCF instances (quick setup) on the same machine, using
> >>> the same Postgres 9.3 instance (with different databases
> >>> "org.apache.manifoldcf.database.name" of course).
> >>> After a couple of days I've got a performance issue: one MCF instance
> >>> has become very slow - it processes a few docs per hour only. I guess,
> >>> the bottleneck is the database:
> >>>
> >>> "normal" instance:
> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
> >>> 738.311 rows in the table, took 1,2 sec
> >>> "G";50674
> >>> "F";68
> >>> "P";149179
> >>> "C";402367
> >>> "A";33
> >>> "Z";136676
> >>>
> >>> "slow" instance (currently with a single active job):
> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
> >>> 2.745.329 rows in the table, took 350 sec
> >>> "G";337922  --STATUS_PENDINGPURGATORY
> >>> "F";449     --STATUS_ACTIVEPURGATORY
> >>> "P";25909   --STATUS_PENDING
> >>> "C";562772  --STATUS_COMPLETE
> >>> "A";9       --STATUS_ACTIVE
> >>> "Z";1644927 --STATUS_PURGATORY
> >>>
> >>> Since "count(*)" is terrible slow in Postgres, I used the following
> >>> sql to count jobqueue's rows:
> >>> SELECT reltuples::bigint AS approximate_row_count FROM pg_class WHERE
> >>> relname = 'jobqueue';
> >>>
> >>> Both MCF instances have the same number of working threads, database
> >>> handles etc.
> >>> Is the database "full"? What could you recommend to improve the
> >>> performance?
> >>>
> >>> Thank you!
> >>> Konstantin
> >>
> >>
>

Reply via email to