When it fails again, I expect that the diagnostics will demonstrate conclusively that we had a database transactional integrity failure, as I have always seen when I look at the generated logs from this debug mode. In the past this has happened most often with mysql, but once in a great while with postgresql too. It hasn't been generally worth pursuing because it is so infrequent, but in your case it sounds like it might be worth opening a Postgresql ticket, provided we can describe the failure. Be prepared to furnish a test case they can use to reproduce it.
Bugs of this kind generally take forever to fix, which is why I'm not thrilled with the prospect of having to push it through the process with the Postgresql people. Karl On Tue, May 28, 2019 at 9:24 AM Julien <[email protected]> wrote: > For MCF I am using a Postgres 10.1 database which is located on the same > machine that performs the crawl, so no reason there is a latency problem. > > > > The MCF threads are configured like this : > > <property name="org.apache.manifoldcf.database.maxhandles" value="200"/> > > <property name="org.apache.manifoldcf.crawler.threads" value="50"/> > > <property name="org.apache.manifoldcf.crawler.expirethreads" value="10"/> > > <property name="org.apache.manifoldcf.crawler.cleanupthreads" > value="10"/> > > <property name="org.apache.manifoldcf.crawler.deletethreads" value="10"/> > > > > For info, following the recommended settings from the MCF documentation, I > applied the following Postgres properties : > > > > max_connections = 405 > > shared_buffers = 1024MB > > checkpoint_timeout = 900s > > max_wal_size = 14GB > > autovacuum = off > > > > Julien > > > > *De : *Karl Wright <[email protected]> > *Envoyé le :*mardi 28 mai 2019 15:02 > *À : *[email protected] > *Objet :*Re: Long running queries on jobqueue > > > > Hi Julien, > > > > "Error : Unexpected jobqueue status - record id 15588697113928, expecting > active status, saw 0" > > As you said, the only thing that can be done here is to turn on diagnostic > logging. Essentially, the status returned is not possible if the database > is truly honoring transactional integrity, which is why the system gives up > at that point. I have noted that this may be due to network latency > reaching the database from the machine that is doing the crawling, but no > way to tell for sure. Since it's a database-related issue, it's essential > also to know which database you are using here. > > > > "I don’t really understand what is causing those long running queries..." > > The plan that is being executed looks OK but not great, which is bad news > for a database that is large. The solution there is to hand-analyze the > jobqueue and jobs tables. ManifoldCF does try to keep up with that but it > is always possible that at some point the plan will go bad; it's heuristic > at best. > > > > Karl > > > > > > On Tue, May 28, 2019 at 8:37 AM Julien <[email protected]> > wrote: > > Hi, > > > > I need some help to better understand what I am experiencing with a job > using a JDBC connector. The job crawls a table containing approximately 9M > of documents. I am currently unable to complete this job as it randomly > fails with the Following error : > > > > Error : Unexpected jobqueue status - record id 15588697113928, expecting > active status, saw 0 > > > > The logs are showing a lot of long running queries, here is an example but > there are a lot and on different query kind, but all of them concern the > jobqueue table : > > > > WARN 2019-05-27T01:16:56,048 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db|Found a long-running query (92568 ms): > [SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount > FROM jobqueue t0 WHERE t0.docpriority<? AND (t0.status=? OR t0.status=?) > AND t0.checkaction=? AND t0.checktime<=? AND EXISTS(SELECT 'x' FROM jobs t1 > WHERE (t1.status=? OR t1.status=?) AND t1.id=t0.jobid AND t1.priority=?) > AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND > (t2.status=? OR t2.status=? OR t2.status=? OR t2.status=? OR t2.status=? OR > t2.status=?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM > prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) > ORDER BY t0.docpriority ASC LIMIT 40000] > > WARN 2019-05-27T01:16:56,053 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Parameter 0: '1.000000001E9' > > WARN 2019-05-27T01:16:56,053 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Parameter 1: 'P' > > WARN 2019-05-27T01:16:56,053 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Parameter 2: 'G' > > WARN 2019-05-27T01:16:56,053 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Parameter 3: 'R' > > WARN 2019-05-27T01:16:56,053 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Parameter 4: '1558919723463' > > WARN 2019-05-27T01:16:56,053 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Parameter 5: 'A' > > WARN 2019-05-27T01:16:56,053 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Parameter 6: 'a' > > WARN 2019-05-27T01:16:56,053 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Parameter 7: '5' > > WARN 2019-05-27T01:16:56,053 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Parameter 8: 'A' > > WARN 2019-05-27T01:16:56,053 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Parameter 9: 'F' > > WARN 2019-05-27T01:16:56,053 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Parameter 10: 'a' > > WARN 2019-05-27T01:16:56,053 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Parameter 11: 'f' > > WARN 2019-05-27T01:16:56,053 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Parameter 12: 'D' > > WARN 2019-05-27T01:16:56,053 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Parameter 13: 'd' > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: Limit (cost=89494.71..89495.11 > rows=160 width=91) (actual time=15468.494..15481.412 rows=40000 loops=1) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: -> Sort > (cost=89494.71..89495.11 rows=160 width=91) (actual > time=15468.493..15477.328 rows=40000 loops=1) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: Sort Key: t0.docpriority > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: Sort Method: external > sort Disk: 102584kB > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: -> Nested Loop Anti > Join (cost=42423.35..89488.86 rows=160 width=91) (actual > time=234.622..13792.727 rows=1164882 loops=1) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: -> Hash Anti Join > (cost=42422.79..88190.34 rows=160 width=91) (actual time=234.593..1316.236 > rows=1164882 loops=1) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: Hash Cond: ( > t0.id = t3.owner) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: -> Nested > Loop (cost=42395.08..88160.62 rows=160 width=91) (actual > time=234.568..1002.111 rows=1164882 loops=1) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: Join > Filter: (t0.jobid = t1.id) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: -> Seq > Scan on jobs t1 (cost=0.00..11.40 rows=1 width=8) (actual > time=0.015..0.016 rows=1 loops=1) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: > Filter: ((priority = '5'::bigint) AND ((status = 'A'::bpchar) OR (status = > 'a'::bpchar))) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: > Rows Removed by Filter: 6 > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: -> > Bitmap Heap Scan on jobqueue t0 (cost=42395.08..87989.60 rows=12769 > width=91) (actual time=234.548..656.692 rows=1164882 loops=1) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: > Recheck Cond: (((docpriority < '1000000001'::double precision) AND (status > = 'P'::bpchar) AND (checkaction = 'R'::bpchar) AND (checktime <= > '1558919723463'::bigint)) OR ((status = 'G'::bpchar) AND (checkaction = > 'R'::bpchar) AND (checktime <= '1558919723463'::bigint))) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: > Filter: (docpriority < '1000000001'::double precision) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: > Heap Blocks: exact=20094 > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: > -> BitmapOr (cost=42395.08..42395.08 rows=12769 width=0) > (actual time=229.219..229.219 rows=0 loops=1) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| > Plan: -> Bitmap Index Scan on > i1556284697879 (cost=0.00..42384.25 rows=12769 width=0) (actual > time=229.176..229.176 rows=1164882 loops=1) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| > Plan: Index Cond: ((docpriority > < '1000000001'::double precision) AND (status = 'P'::bpchar) AND > (checkaction = 'R'::bpchar) AND (checktime <= '1558919723463'::bigint)) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| > Plan: -> Bitmap Index Scan on > i1556284697880 (cost=0.00..4.45 rows=1 width=0) (actual time=0.041..0.041 > rows=93 loops=1) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: > Index Cond: ((status = 'G'::bpchar) > AND (checkaction = 'R'::bpchar) AND (checktime <= '1558919723463'::bigint)) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: -> Hash > (cost=26.08..26.08 rows=130 width=8) (actual time=0.007..0.007 rows=0 > loops=1) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: > Buckets: 1024 Batches: 1 Memory Usage: 8kB > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: -> > Hash Join (cost=12.93..26.08 rows=130 width=8) (actual time=0.006..0.006 > rows=0 loops=1) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: > Hash Cond: ((t3.eventname)::text = (t4.name)::text) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: > -> Seq Scan on prereqevents t3 (cost=0.00..11.40 rows=140 width=524) > (actual time=0.005..0.005 rows=0 loops=1) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: > -> Hash (cost=11.30..11.30 rows=130 width=516) (never executed) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| > Plan: -> Seq Scan on events t4 > (cost=0.00..11.30 rows=130 width=516) (never executed) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: -> Index Scan > using i1556284697878 on jobqueue t2 (cost=0.56..8.11 rows=1 width=49) > (actual time=0.010..0.010 rows=0 loops=1164882) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: Index Cond: > ((dochash)::text = (t0.dochash)::text) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: Filter: > ((jobid <> t0.jobid) AND ((status = 'A'::bpchar) OR (status = 'F'::bpchar) > OR (status = 'a'::bpchar) OR (status = 'f'::bpchar) OR (status = > 'D'::bpchar) OR (status = 'd'::bpchar))) > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: Rows Removed > by Filter: 1 > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: Planning time: 1.455 ms > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Plan: Execution time: 15518.479 ms > > WARN 2019-05-27T01:17:11,635 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| > > WARN 2019-05-27T01:17:11,749 (Stuffer thread) - > MCF|MCF-agent|apache.manifoldcf.db| Stats: n_distinct=3.0 > most_common_vals={C,P,A} > most_common_freqs={0.779200017,0.220666662,0.000133333335} > > > > The same happens if I split the job into several jobs of 1.5M of docs at > the difference that if I restart the jobs, some may end but with no > guarantee. > > > > I don’t really understand what is causing those long running queries and > the fact that the job/jobs can fail on an Error : Unexpected status… > > > > I tried to add the following parameter to the properties.xml to have > better logs but it changed nothing : <property name=" > org.apache.manifoldcf.diagnostics" value="DEBUG"/> > > > > The only thing I did that can explain a lack of performances is that I > raised the org.apache.manifoldcf.ui.maxstatuscount from 500k to 2M. But I > was only expecting low performances on job status UI. Is it possible that > it is responsible of the long running queries ? > > > > Thanks for the help, > Julien > > > > [image: > https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif] > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> > > Garanti sans virus. www.avast.com > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> > > > > >
