Thank you for the fast reply!
I provided the query to show, how many docs have been processed by the
instances so far.
I do not use UI to check the jobs statues - it takes 10-15 minutes to
display the page :( (ui.maxstatuscount=100.000).
The full vacuum and re-index database are running as a scheduled job
on a daily basis.
So, just tried to analyze the "problem" table - jobqueue, but I don't
think it's helped a lot, the same "status" query took 250 second now,
improved, but it's still very long.
The long-running queries reported in the log are all about jobqueue table, e.g.:
WARN 2016-04-27 13:49:48,596 (Stuffer thread) - Found a long-running
query (75328 ms): [SELECT
t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount
FROM jobqueue t0 WHERE t0.docpriority<? AND (t0.status=? OR
t0.status=?) AND t0.checkaction=? AND t0.checktime<=? AND
EXISTS(SELECT 'x' FROM jobs t1 WHERE (t1.status=? OR t1.status=?) AND
t1.id=t0.jobid AND t1.priority=?) AND NOT EXISTS(SELECT 'x' FROM
jobqueue t2 WHERE t2.dochash=t0.dochash AND (t2.status=? OR
t2.status=? OR t2.status=? OR t2.status=? OR t2.status=? OR
t2.status=?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM
prereqevents t3,events t4 WHERE t0.id=t3.owner AND
t3.eventname=t4.name) ORDER BY t0.docpriority ASC LIMIT 400]
WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 0: '1.000000001E9'
WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 1: 'P'
WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 2: 'G'
WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 3: 'R'
WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 4: '1461757713112'
WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 5: 'A'
WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 6: 'a'
WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 7: '5'
WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 8: 'A'
WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 9: 'F'
WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 10: 'a'
WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 11: 'f'
WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 12: 'D'
WARN 2016-04-27 13:49:48,628 (Stuffer thread) - Parameter 13: 'd'
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: Limit
(cost=4016.80..4016.98 rows=72 width=235) (actual
time=738.775..738.842 rows=400 loops=1)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: -> Sort
(cost=4016.80..4016.98 rows=72 width=235) (actual
time=738.774..738.792 rows=400 loops=1)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: Sort
Key: t0.docpriority
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: Sort
Method: top-N heapsort Memory: 139kB
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: ->
Nested Loop Anti Join (cost=2252.06..4014.58 rows=72 width=235)
(actual time=46.841..723.735 rows=20215 loops=1)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
-> Hash Anti Join (cost=2251.51..3387.34 rows=72 width=235) (actual
time=46.770..108.460 rows=20215 loops=1)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
Hash Cond: (t0.id = t3.owner)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
-> Nested Loop (cost=2223.73..3358.66 rows=72 width=235)
(actual time=46.747..102.890 rows=20215 loops=1)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
Join Filter: (t0.jobid = t1.id)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
Rows Removed by Join Filter: 643
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
-> Seq Scan on jobs t1 (cost=0.00..2.14 rows=1 width=8)
(actual time=0.015..0.024 rows=1 loops=1)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
Filter: ((priority = 5::bigint) AND ((status =
'A'::bpchar) OR (status = 'a'::bpchar)))
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
Rows Removed by Filter: 3
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
-> Bitmap Heap Scan on jobqueue t0 (cost=2223.73..3352.95
rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
Recheck Cond: (((docpriority < 1000000001::double
precision) AND (status = 'P'::bpchar) AND (checkaction = 'R'::bpchar)
AND (checktime <= 1461757713112::bigint)) OR ((docpriority <
1000000001::double precision) AND (status = 'G'::bpchar) AND
(checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint)))
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
-> BitmapOr (cost=2223.73..2223.73 rows=288
width=0) (actual time=45.349..45.349 rows=0 loops=1)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
-> Bitmap Index Scan on i1459943614200
(cost=0.00..1111.80 rows=20 width=0) (actual time=29.325..29.325
rows=333 loops=1)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
Index Cond: ((docpriority <
1000000001::double precision) AND (status = 'P'::bpchar) AND
(checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
-> Bitmap Index Scan on i1459943614200
(cost=0.00..1111.80 rows=268 width=0) (actual time=16.020..16.020
rows=20555 loops=1)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
Index Cond: ((docpriority <
1000000001::double precision) AND (status = 'G'::bpchar) AND
(checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
-> Hash (cost=26.15..26.15 rows=130 width=8) (actual
time=0.004..0.004 rows=0 loops=1)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
Buckets: 1024 Batches: 1 Memory Usage: 0kB
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
-> Hash Join (cost=12.93..26.15 rows=130 width=8) (actual
time=0.003..0.003 rows=0 loops=1)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
Hash Cond: ((t3.eventname)::text = (t4.name)::text)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
-> Seq Scan on prereqevents t3 (cost=0.00..11.40
rows=140 width=524) (actual time=0.001..0.001 rows=0 loops=1)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
-> Hash (cost=11.30..11.30 rows=130 width=516)
(never executed)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
-> Seq Scan on events t4 (cost=0.00..11.30
rows=130 width=516) (never executed)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
-> Index Scan using i1459943614199 on jobqueue t2 (cost=0.56..8.59
rows=1 width=49) (actual time=0.030..0.030 rows=0 loops=20215)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
Index Cond: ((dochash)::text = (t0.dochash)::text)
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
Filter: ((jobid <> t0.jobid) AND ((status = 'A'::bpchar) OR
(status = 'F'::bpchar) OR (status = 'a'::bpchar) OR (status =
'f'::bpchar) OR (status = 'D'::bpchar) OR (status = 'd'::bpchar)))
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan:
Rows Removed by Filter: 1
WARN 2016-04-27 13:49:49,440 (Stuffer thread) - Plan: Total runtime:
747.031 ms
WARN 2016-04-27 13:49:49,440 (Stuffer thread) -
WARN 2016-04-27 13:49:50,206 (Stuffer thread) - Stats:
n_distinct=5.0 most_common_vals={Z,C,G,P}
most_common_freqs={0.64013332,0.21616666,0.13356666,0.0101}
2016-04-27 13:43 GMT+02:00 Karl Wright <[email protected]>:
> Hi Konstantin,
>
> The query you are looking at is performed by the UI only, and there is a
> parameter you can set which applies a limit to the number of documents so
> that the count is reported as "<limit>+" in the UI. This is the parameter:
>
> org.apache.manifoldcf.ui.maxstatuscount
>
> As for why the database gets slow for crawling, unless you are seeing
> reports in the log of long-running queries, then it's a good chance you need
> to vacuum your database instance. I generally recommend that a vacuum full
> be done periodically for database instances. Autovacuuming has gotten a lot
> better in postgres than it used to be but at least in the past the
> autovacuuming process would get far behind ManifoldCF and so the database
> would get quite bloated anyway. So I'd give that a try.
>
> If you are seeing logging output mentioning slow queries, you may need to
> tune how often MCF analyzes certain tables. There are parameters that
> control that as well. In general, if there is a slow query with a bad plan,
> and analyzing the tables involved makes it come up with a much better plan,
> analysis is not happening often enough. But first, before you get to that
> point, have a look at the log and see whether this is likely to be the
> problem. (Usually it is the stuffer query that gets slow when there's an
> issue with table analysis, FWIW). Please feel free to post the plan of the
> queries being reported here.
>
> Thanks,
> Karl
>
>
> On Wed, Apr 27, 2016 at 7:33 AM, jetnet <[email protected]> wrote:
>>
>> Hi Karl,
>>
>> I set up two MCF instances (quick setup) on the same machine, using
>> the same Postgres 9.3 instance (with different databases
>> "org.apache.manifoldcf.database.name" of course).
>> After a couple of days I've got a performance issue: one MCF instance
>> has become very slow - it processes a few docs per hour only. I guess,
>> the bottleneck is the database:
>>
>> "normal" instance:
>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
>> 738.311 rows in the table, took 1,2 sec
>> "G";50674
>> "F";68
>> "P";149179
>> "C";402367
>> "A";33
>> "Z";136676
>>
>> "slow" instance (currently with a single active job):
>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
>> 2.745.329 rows in the table, took 350 sec
>> "G";337922 --STATUS_PENDINGPURGATORY
>> "F";449 --STATUS_ACTIVEPURGATORY
>> "P";25909 --STATUS_PENDING
>> "C";562772 --STATUS_COMPLETE
>> "A";9 --STATUS_ACTIVE
>> "Z";1644927 --STATUS_PURGATORY
>>
>> Since "count(*)" is terrible slow in Postgres, I used the following
>> sql to count jobqueue's rows:
>> SELECT reltuples::bigint AS approximate_row_count FROM pg_class WHERE
>> relname = 'jobqueue';
>>
>> Both MCF instances have the same number of working threads, database
>> handles etc.
>> Is the database "full"? What could you recommend to improve the
>> performance?
>>
>> Thank you!
>> Konstantin
>
>