Re: Long running queries on jobqueue

Karl Wright Tue, 28 May 2019 06:38:15 -0700

Oh, and you might want to check the JDBC driver to be sure it's rated as
compatible with the version of the POstgresql database you are using.  I
imagine that can matter too.


Karl


On Tue, May 28, 2019 at 9:35 AM Karl Wright <[email protected]> wrote:

> When it fails again, I expect that the diagnostics will demonstrate
> conclusively that we had a database transactional integrity failure, as I
> have always seen when I look at the generated logs from this debug mode.
> In the past this has happened most often with mysql, but once in a great
> while with postgresql too.  It hasn't been generally worth pursuing because
> it is so infrequent, but in your case it sounds like it might be worth
> opening a Postgresql ticket, provided we can describe the failure.  Be
> prepared to furnish a test case they can use to reproduce it.
>
> Bugs of this kind generally take forever to fix, which is why I'm not
> thrilled with the prospect of having to push it through the process with
> the Postgresql people.
>
> Karl
>
>
> On Tue, May 28, 2019 at 9:24 AM Julien <[email protected]>
> wrote:
>
>> For MCF I am using a Postgres 10.1 database which is located on the same
>> machine that performs the crawl, so no reason there is a latency problem.
>>
>>
>>
>> The MCF threads are configured like this :
>>
>> <property name="org.apache.manifoldcf.database.maxhandles" value="200"/>
>>
>>   <property name="org.apache.manifoldcf.crawler.threads" value="50"/>
>>
>>   <property name="org.apache.manifoldcf.crawler.expirethreads"
>> value="10"/>
>>
>>   <property name="org.apache.manifoldcf.crawler.cleanupthreads"
>> value="10"/>
>>
>>   <property name="org.apache.manifoldcf.crawler.deletethreads"
>> value="10"/>
>>
>>
>>
>> For info, following the recommended settings from the MCF documentation,
>> I applied the following Postgres properties :
>>
>>
>>
>> max_connections = 405
>>
>> shared_buffers = 1024MB
>>
>> checkpoint_timeout = 900s
>>
>> max_wal_size = 14GB
>>
>> autovacuum = off
>>
>>
>>
>> Julien
>>
>>
>>
>> *De : *Karl Wright <[email protected]>
>> *Envoyé le :*mardi 28 mai 2019 15:02
>> *À : *[email protected]
>> *Objet :*Re: Long running queries on jobqueue
>>
>>
>>
>> Hi Julien,
>>
>>
>>
>> "Error : Unexpected jobqueue status - record id 15588697113928, expecting
>> active status, saw 0"
>>
>> As you said, the only thing that can be done here is to turn on
>> diagnostic logging.  Essentially, the status returned is not possible if
>> the database is truly honoring transactional integrity, which is why the
>> system gives up at that point.  I have noted that this may be due to
>> network latency reaching the database from the machine that is doing the
>> crawling, but no way to tell for sure.  Since it's a database-related
>> issue, it's essential also to know which database you are using here.
>>
>>
>>
>> "I don’t really understand what is causing those long running queries..."
>>
>> The plan that is being executed looks OK but not great, which is bad news
>> for a database that is large.  The solution there is to hand-analyze the
>> jobqueue and jobs tables.  ManifoldCF does try to keep up with that but it
>> is always possible that at some point the plan will go bad; it's heuristic
>> at best.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, May 28, 2019 at 8:37 AM Julien <[email protected]>
>> wrote:
>>
>> Hi,
>>
>>
>>
>> I need some help to better understand what I am experiencing with a job
>> using a JDBC connector. The job crawls a table containing approximately 9M
>> of documents. I am currently unable to complete this job as it randomly
>> fails with the Following error :
>>
>>
>>
>> Error : Unexpected jobqueue status - record id 15588697113928, expecting
>> active status, saw 0
>>
>>
>>
>> The logs are showing a lot of long running queries, here is an example
>> but there are a lot and on different query kind, but all of them concern
>> the jobqueue table  :
>>
>>
>>
>> WARN 2019-05-27T01:16:56,048 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|Found a long-running query (92568 ms):
>> [SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount
>> FROM jobqueue t0  WHERE t0.docpriority<? AND  (t0.status=? OR t0.status=?)
>> AND t0.checkaction=? AND t0.checktime<=? AND EXISTS(SELECT 'x' FROM jobs t1
>> WHERE  (t1.status=? OR t1.status=?) AND t1.id=t0.jobid AND
>> t1.priority=?) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE
>> t2.dochash=t0.dochash AND  (t2.status=? OR t2.status=? OR t2.status=? OR
>> t2.status=? OR t2.status=? OR t2.status=?) AND t2.jobid!=t0.jobid) AND NOT
>> EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner
>> AND t3.eventname=t4.name) ORDER BY t0.docpriority ASC LIMIT 40000]
>>
>> WARN 2019-05-27T01:16:56,053 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|  Parameter 0: '1.000000001E9'
>>
>> WARN 2019-05-27T01:16:56,053 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|  Parameter 1: 'P'
>>
>> WARN 2019-05-27T01:16:56,053 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|  Parameter 2: 'G'
>>
>> WARN 2019-05-27T01:16:56,053 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|  Parameter 3: 'R'
>>
>> WARN 2019-05-27T01:16:56,053 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|  Parameter 4: '1558919723463'
>>
>> WARN 2019-05-27T01:16:56,053 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|  Parameter 5: 'A'
>>
>> WARN 2019-05-27T01:16:56,053 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|  Parameter 6: 'a'
>>
>> WARN 2019-05-27T01:16:56,053 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|  Parameter 7: '5'
>>
>> WARN 2019-05-27T01:16:56,053 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|  Parameter 8: 'A'
>>
>> WARN 2019-05-27T01:16:56,053 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|  Parameter 9: 'F'
>>
>> WARN 2019-05-27T01:16:56,053 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|  Parameter 10: 'a'
>>
>> WARN 2019-05-27T01:16:56,053 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|  Parameter 11: 'f'
>>
>> WARN 2019-05-27T01:16:56,053 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|  Parameter 12: 'D'
>>
>> WARN 2019-05-27T01:16:56,053 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|  Parameter 13: 'd'
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan: Limit  (cost=89494.71..89495.11
>> rows=160 width=91) (actual time=15468.494..15481.412 rows=40000 loops=1)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:   ->  Sort
>> (cost=89494.71..89495.11 rows=160 width=91) (actual
>> time=15468.493..15477.328 rows=40000 loops=1)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:         Sort Key: t0.docpriority
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:         Sort Method: external
>> sort  Disk: 102584kB
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:         ->  Nested Loop Anti
>> Join  (cost=42423.35..89488.86 rows=160 width=91) (actual
>> time=234.622..13792.727 rows=1164882 loops=1)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:               ->  Hash Anti Join
>> (cost=42422.79..88190.34 rows=160 width=91) (actual time=234.593..1316.236
>> rows=1164882 loops=1)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:                     Hash Cond: (
>> t0.id = t3.owner)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:                     ->  Nested
>> Loop  (cost=42395.08..88160.62 rows=160 width=91) (actual
>> time=234.568..1002.111 rows=1164882 loops=1)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:                           Join
>> Filter: (t0.jobid = t1.id)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:                           ->  Seq
>> Scan on jobs t1  (cost=0.00..11.40 rows=1 width=8) (actual
>> time=0.015..0.016 rows=1 loops=1)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:
>> Filter: ((priority = '5'::bigint) AND ((status = 'A'::bpchar) OR (status =
>> 'a'::bpchar)))
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:
>> Rows Removed by Filter: 6
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:                           ->
>> Bitmap Heap Scan on jobqueue t0  (cost=42395.08..87989.60 rows=12769
>> width=91) (actual time=234.548..656.692 rows=1164882 loops=1)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:
>> Recheck Cond: (((docpriority < '1000000001'::double precision) AND (status
>> = 'P'::bpchar) AND (checkaction = 'R'::bpchar) AND (checktime <=
>> '1558919723463'::bigint)) OR ((status = 'G'::bpchar) AND (checkaction =
>> 'R'::bpchar) AND (checktime <= '1558919723463'::bigint)))
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:
>> Filter: (docpriority < '1000000001'::double precision)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:
>> Heap Blocks: exact=20094
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:
>>                ->  BitmapOr  (cost=42395.08..42395.08 rows=12769 width=0)
>> (actual time=229.219..229.219 rows=0 loops=1)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|
>> Plan:                                       ->  Bitmap Index Scan on
>> i1556284697879  (cost=0.00..42384.25 rows=12769 width=0) (actual
>> time=229.176..229.176 rows=1164882 loops=1)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|
>> Plan:                                             Index Cond: ((docpriority
>> < '1000000001'::double precision) AND (status = 'P'::bpchar) AND
>> (checkaction = 'R'::bpchar) AND (checktime <= '1558919723463'::bigint))
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|
>> Plan:                                       ->  Bitmap Index Scan on
>> i1556284697880  (cost=0.00..4.45 rows=1 width=0) (actual time=0.041..0.041
>> rows=93 loops=1)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:
>>                                       Index Cond: ((status = 'G'::bpchar)
>> AND (checkaction = 'R'::bpchar) AND (checktime <= '1558919723463'::bigint))
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:                     ->  Hash
>> (cost=26.08..26.08 rows=130 width=8) (actual time=0.007..0.007 rows=0
>> loops=1)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:
>> Buckets: 1024  Batches: 1  Memory Usage: 8kB
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:                           ->
>> Hash Join  (cost=12.93..26.08 rows=130 width=8) (actual time=0.006..0.006
>> rows=0 loops=1)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:
>> Hash Cond: ((t3.eventname)::text = (t4.name)::text)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:
>> ->  Seq Scan on prereqevents t3  (cost=0.00..11.40 rows=140 width=524)
>> (actual time=0.005..0.005 rows=0 loops=1)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:
>> ->  Hash  (cost=11.30..11.30 rows=130 width=516) (never executed)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|
>> Plan:                                       ->  Seq Scan on events t4
>> (cost=0.00..11.30 rows=130 width=516) (never executed)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:               ->  Index Scan
>> using i1556284697878 on jobqueue t2  (cost=0.56..8.11 rows=1 width=49)
>> (actual time=0.010..0.010 rows=0 loops=1164882)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:                     Index Cond:
>> ((dochash)::text = (t0.dochash)::text)
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:                     Filter:
>> ((jobid <> t0.jobid) AND ((status = 'A'::bpchar) OR (status = 'F'::bpchar)
>> OR (status = 'a'::bpchar) OR (status = 'f'::bpchar) OR (status =
>> 'D'::bpchar) OR (status = 'd'::bpchar)))
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan:                     Rows Removed
>> by Filter: 1
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan: Planning time: 1.455 ms
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Plan: Execution time: 15518.479 ms
>>
>> WARN 2019-05-27T01:17:11,635 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db|
>>
>> WARN 2019-05-27T01:17:11,749 (Stuffer thread) -
>> MCF|MCF-agent|apache.manifoldcf.db| Stats: n_distinct=3.0
>> most_common_vals={C,P,A}
>> most_common_freqs={0.779200017,0.220666662,0.000133333335}
>>
>>
>>
>> The same happens if I split the job into several jobs of 1.5M of docs at
>> the difference that if I restart the jobs, some may end but with no
>> guarantee.
>>
>>
>>
>> I don’t really understand what is causing those long running queries and
>> the fact that the job/jobs can fail on an Error : Unexpected status…
>>
>>
>>
>> I tried to add the following parameter to the properties.xml to have
>> better logs but it changed nothing : <property name="
>> org.apache.manifoldcf.diagnostics" value="DEBUG"/>
>>
>>
>>
>> The only thing I did that can explain a lack of performances is that I
>> raised the org.apache.manifoldcf.ui.maxstatuscount from 500k to 2M. But I
>> was only expecting low performances on job status UI. Is it possible that
>> it is responsible of the long running queries ?
>>
>>
>>
>> Thanks for the help,
>> Julien
>>
>>
>>
>> [image:
>> https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]
>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>>
>> Garanti sans virus. www.avast.com
>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>>
>>
>>
>>
>>
>

Re: Long running queries on jobqueue

Reply via email to