Also, have you looked at increasing the amount of shared buffers in your
postgresql configuration?  See the "how-to-build-and-deploy" page for that;
it can make a lot of difference for MCF.  So even if the plan is not
perfect the database simply performs better.

Karl

On Wed, Apr 27, 2016 at 8:53 AM, Karl Wright <[email protected]> wrote:

> Hi Konstantin,
>
> Can you list the indexes defined on the jobqueue table?
>
> The query we do for stuffing is precisely aligned with one of the indexes
> we create on the jobqueue table.  It should be doing an ordered index read
> with complex filtering conditions; instead it is doing this:
>
> WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>            ->  Bitmap Heap Scan on jobqueue t0  (cost=2223.73..3352.95
> rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
>
> ... with a hash join and a hash sort.
>
> Postgres is being way too smart for its own good here.  Probably that's at
> least partly because its time estimates for its chosen plan are far less
> than reality.  There may be a way to configure postgresql not to make that
> mistake but I don't know offhand how you do it.
>
> In order for us to be able to control the plan via hints, which would make
> this problem go away once and for all, you need the version of Postgres
> that you pay for.  Sigh.
>
> Anyhow, let's verify that the right indexes are in place, at least...
>
> Karl
>
>
> On Wed, Apr 27, 2016 at 8:38 AM, jetnet <[email protected]> wrote:
>
>> yet another long running query:
>>
>>  WARN 2016-04-27 14:23:31,562 (Agents thread) - Found a long-running
>> query (215465 ms): [UPDATE jobqueue SET
>> docpriority=?,needpriorityprocessid=NULL,needpriority=? WHERE
>> docpriority<?]
>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 0:
>> '1.000000001E9'
>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 1: 'T'
>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 2:
>> '1.000000001E9'
>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan: Update on
>> jobqueue  (cost=0.56..37802.01 rows=19647 width=352)
>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:   ->  Index
>> Scan using i1459943614200 on jobqueue  (cost=0.56..37802.01 rows=19647
>> width=352)
>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:         Index
>> Cond: (docpriority < 1000000001::double precision)
>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -
>>  WARN 2016-04-27 14:23:31,578 (Agents thread) -  Stats: n_distinct=4.0
>> most_common_vals={Z,C,G,P}
>> most_common_freqs={0.64193332,0.21796666,0.13086666,0.0092333332}
>>
>>
>>
>> 2016-04-27 14:32 GMT+02:00 jetnet <[email protected]>:
>> > Thank you for the fast reply!
>> >
>> > I provided the query to show, how many docs have been processed by the
>> > instances so far.
>> > I do not use UI to check the jobs statues - it takes 10-15 minutes to
>> > display the page :( (ui.maxstatuscount=100.000).
>> > The full vacuum and re-index database are running as a scheduled job
>> > on a daily basis.
>> > So, just tried to analyze the "problem" table - jobqueue, but I don't
>> > think it's helped a lot, the same "status" query took 250 second now,
>> > improved, but it's still very long.
>> >
>> > The long-running queries reported in the log are all about jobqueue
>> table, e.g.:
>> >
>> >  WARN 2016-04-27 13:49:48,596 (Stuffer thread) - Found a long-running
>> > query (75328 ms): [SELECT
>> > t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount
>> > FROM jobqueue t0  WHERE t0.docpriority<? AND  (t0.status=? OR
>> > t0.status=?) AND t0.checkaction=? AND t0.checktime<=? AND
>> > EXISTS(SELECT 'x' FROM jobs t1 WHERE  (t1.status=? OR t1.status=?) AND
>> > t1.id=t0.jobid AND t1.priority=?) AND NOT EXISTS(SELECT 'x' FROM
>> > jobqueue t2 WHERE t2.dochash=t0.dochash AND  (t2.status=? OR
>> > t2.status=? OR t2.status=? OR t2.status=? OR t2.status=? OR
>> > t2.status=?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM
>> > prereqevents t3,events t4 WHERE t0.id=t3.owner AND
>> > t3.eventname=t4.name) ORDER BY t0.docpriority ASC LIMIT 400]
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 0:
>> '1.000000001E9'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 1: 'P'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 2: 'G'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 3: 'R'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 4:
>> '1461757713112'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 5: 'A'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 6: 'a'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 7: '5'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 8: 'A'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 9: 'F'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 10: 'a'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 11: 'f'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 12: 'D'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 13: 'd'
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Limit
>> > (cost=4016.80..4016.98 rows=72 width=235) (actual
>> > time=738.775..738.842 rows=400 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:   ->  Sort
>> > (cost=4016.80..4016.98 rows=72 width=235) (actual
>> > time=738.774..738.792 rows=400 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
>> > Key: t0.docpriority
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
>> > Method: top-N heapsort  Memory: 139kB
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         ->
>> > Nested Loop Anti Join  (cost=2252.06..4014.58 rows=72 width=235)
>> > (actual time=46.841..723.735 rows=20215 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> > ->  Hash Anti Join  (cost=2251.51..3387.34 rows=72 width=235) (actual
>> > time=46.770..108.460 rows=20215 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >      Hash Cond: (t0.id = t3.owner)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >      ->  Nested Loop  (cost=2223.73..3358.66 rows=72 width=235)
>> > (actual time=46.747..102.890 rows=20215 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >            Join Filter: (t0.jobid = t1.id)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >            Rows Removed by Join Filter: 643
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >            ->  Seq Scan on jobs t1  (cost=0.00..2.14 rows=1 width=8)
>> > (actual time=0.015..0.024 rows=1 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                  Filter: ((priority = 5::bigint) AND ((status =
>> > 'A'::bpchar) OR (status = 'a'::bpchar)))
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                  Rows Removed by Filter: 3
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >            ->  Bitmap Heap Scan on jobqueue t0  (cost=2223.73..3352.95
>> > rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                  Recheck Cond: (((docpriority < 1000000001::double
>> > precision) AND (status = 'P'::bpchar) AND (checkaction = 'R'::bpchar)
>> > AND (checktime <= 1461757713112::bigint)) OR ((docpriority <
>> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
>> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint)))
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                  ->  BitmapOr  (cost=2223.73..2223.73 rows=288
>> > width=0) (actual time=45.349..45.349 rows=0 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                        ->  Bitmap Index Scan on i1459943614200
>> > (cost=0.00..1111.80 rows=20 width=0) (actual time=29.325..29.325
>> > rows=333 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                              Index Cond: ((docpriority <
>> > 1000000001::double precision) AND (status = 'P'::bpchar) AND
>> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                        ->  Bitmap Index Scan on i1459943614200
>> > (cost=0.00..1111.80 rows=268 width=0) (actual time=16.020..16.020
>> > rows=20555 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                              Index Cond: ((docpriority <
>> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
>> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >      ->  Hash  (cost=26.15..26.15 rows=130 width=8) (actual
>> > time=0.004..0.004 rows=0 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >            Buckets: 1024  Batches: 1  Memory Usage: 0kB
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >            ->  Hash Join  (cost=12.93..26.15 rows=130 width=8) (actual
>> > time=0.003..0.003 rows=0 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                  Hash Cond: ((t3.eventname)::text = (t4.name)::text)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                  ->  Seq Scan on prereqevents t3  (cost=0.00..11.40
>> > rows=140 width=524) (actual time=0.001..0.001 rows=0 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                  ->  Hash  (cost=11.30..11.30 rows=130 width=516)
>> > (never executed)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                        ->  Seq Scan on events t4  (cost=0.00..11.30
>> > rows=130 width=516) (never executed)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> > ->  Index Scan using i1459943614199 on jobqueue t2  (cost=0.56..8.59
>> > rows=1 width=49) (actual time=0.030..0.030 rows=0 loops=20215)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >      Index Cond: ((dochash)::text = (t0.dochash)::text)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >      Filter: ((jobid <> t0.jobid) AND ((status = 'A'::bpchar) OR
>> > (status = 'F'::bpchar) OR (status = 'a'::bpchar) OR (status =
>> > 'f'::bpchar) OR (status = 'D'::bpchar) OR (status = 'd'::bpchar)))
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >      Rows Removed by Filter: 1
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Total runtime:
>> > 747.031 ms
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -
>> >  WARN 2016-04-27 13:49:50,206 (Stuffer thread) -  Stats:
>> > n_distinct=5.0 most_common_vals={Z,C,G,P}
>> > most_common_freqs={0.64013332,0.21616666,0.13356666,0.0101}
>> >
>> >
>> > 2016-04-27 13:43 GMT+02:00 Karl Wright <[email protected]>:
>> >> Hi Konstantin,
>> >>
>> >> The query you are looking at is performed by the UI only, and there is
>> a
>> >> parameter you can set which applies a limit to the number of documents
>> so
>> >> that the count is reported as "<limit>+" in the UI.  This is the
>> parameter:
>> >>
>> >> org.apache.manifoldcf.ui.maxstatuscount
>> >>
>> >> As for why the database gets slow for crawling, unless you are seeing
>> >> reports in the log of long-running queries, then it's a good chance
>> you need
>> >> to vacuum your database instance.  I generally recommend that a vacuum
>> full
>> >> be done periodically for database instances.  Autovacuuming has gotten
>> a lot
>> >> better in postgres than it used to be but at least in the past the
>> >> autovacuuming process would get far behind ManifoldCF and so the
>> database
>> >> would get quite bloated anyway.  So I'd give that a try.
>> >>
>> >> If you are seeing logging output mentioning slow queries, you may need
>> to
>> >> tune how often MCF analyzes certain tables.  There are parameters that
>> >> control that as well.  In general, if there is a slow query with a bad
>> plan,
>> >> and analyzing the tables involved makes it come up with a much better
>> plan,
>> >> analysis is not happening often enough.  But first, before you get to
>> that
>> >> point, have a look at the log and see whether this is likely to be the
>> >> problem.  (Usually it is the stuffer query that gets slow when there's
>> an
>> >> issue with table analysis, FWIW).  Please feel free to post the plan
>> of the
>> >> queries being reported here.
>> >>
>> >> Thanks,
>> >> Karl
>> >>
>> >>
>> >> On Wed, Apr 27, 2016 at 7:33 AM, jetnet <[email protected]> wrote:
>> >>>
>> >>> Hi Karl,
>> >>>
>> >>> I set up two MCF instances (quick setup) on the same machine, using
>> >>> the same Postgres 9.3 instance (with different databases
>> >>> "org.apache.manifoldcf.database.name" of course).
>> >>> After a couple of days I've got a performance issue: one MCF instance
>> >>> has become very slow - it processes a few docs per hour only. I guess,
>> >>> the bottleneck is the database:
>> >>>
>> >>> "normal" instance:
>> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
>> >>> 738.311 rows in the table, took 1,2 sec
>> >>> "G";50674
>> >>> "F";68
>> >>> "P";149179
>> >>> "C";402367
>> >>> "A";33
>> >>> "Z";136676
>> >>>
>> >>> "slow" instance (currently with a single active job):
>> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
>> >>> 2.745.329 rows in the table, took 350 sec
>> >>> "G";337922  --STATUS_PENDINGPURGATORY
>> >>> "F";449     --STATUS_ACTIVEPURGATORY
>> >>> "P";25909   --STATUS_PENDING
>> >>> "C";562772  --STATUS_COMPLETE
>> >>> "A";9       --STATUS_ACTIVE
>> >>> "Z";1644927 --STATUS_PURGATORY
>> >>>
>> >>> Since "count(*)" is terrible slow in Postgres, I used the following
>> >>> sql to count jobqueue's rows:
>> >>> SELECT reltuples::bigint AS approximate_row_count FROM pg_class WHERE
>> >>> relname = 'jobqueue';
>> >>>
>> >>> Both MCF instances have the same number of working threads, database
>> >>> handles etc.
>> >>> Is the database "full"? What could you recommend to improve the
>> >>> performance?
>> >>>
>> >>> Thank you!
>> >>> Konstantin
>> >>
>> >>
>>
>
>

Reply via email to