Hi Paul, The query looks right; the database driver determines the maximum number of clauses in a conjunction OR list, just like it does for an IN() list. In the case of Postgresql and OR, the limit is 25; for IN()'s it's 100.
The standard integration tests generally run small jobs but that is typically sufficient to find query generation problems. I have load tests I can also run but they take several hours to complete. I'll start one now, but I may need to abort it before it finishes. Karl On Fri, Sep 12, 2014 at 11:26 AM, Paul Boichat <[email protected]> wrote: > Hi, > > I'm looking through the logs - can see the change from IN to OR in each > query - and there's clearly a difference in execution path but it's quite > verbose so will take a while. > > It may well be that document state has not been reprioritised or in some > way inconsistent. However, I don't think it's that which is causing the > issue - I can switch this behaviour on and off over by changing the > DBInterfacePostgres class and restarting Manifold. That seems to suggest a > query isn't behaving the same way between IN and OR - I just can't isolate > the particular query (yet). > > Have you tested with a job already in running state (on a restart) with a > large document count? For example am seeing this kind of thing which looks > messy but appears to execute as you'd expect: > > SELECT id,dockey,lastversion,lastoutputversion,authorityname,forcedparams > FROM ingeststatus WHERE (dockey=? OR dockey=? OR dockey=? OR dockey=? OR > dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR > dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR > dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR > dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR > dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR > dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR > dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR > dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR > dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR > dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR > dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR > dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR > dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR > dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR > dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR > dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=?) AND > connectionname=?] > DEBUG 2014-09-12 15:01:27,052 (Thread-542) - Parameter 0: > '1407144048431:F42CD76D66FA6BAD396FF8F8A409DD211C184E6A' > DEBUG 2014-09-12 15:01:27,052 (Thread-542) - Parameter 1: > '1407144048431:FE66CC4054300E4EB2A84138DC9B62B80F59F5B9' > > > > > VP Engineering, > Exonar Ltd > > T: +44 7940 567724 > > twitter:@exonarco @pboichat > W: http://www.exonar.com > Nothing is secure. Now what? Exonar Raven <http://video.exonar.com/> > > Exonar Limited, registered in the UK, registration number 06439969 at 14 > West Mills, Newbury, Berkshire, RG14 5HG > DISCLAIMER: This email and any attachments to it may be confidential and > are intended solely for the use of the individual to whom it is addressed. > Any views or opinions expressed are solely those of the author and do not > necessarily represent those of Exonar Ltd. If you are not the intended > recipient of this email, you must neither take any action based upon its > contents, nor copy or show it to anyone. Please contact the sender if you > believe you have received this email in error. > > On Fri, Sep 12, 2014 at 4:20 PM, Karl Wright <[email protected]> wrote: > >> Hi Paul, >> >> The tests in fact do multiple complete crawls, so it is extremely >> unlikely that the stuffer query is broken. If you look at the queries >> generated, you should note that the only difference is that whenever an xxx >> IN(?,?) was generated before, a (xxx=? OR xxx=?) is generated instead. >> These should be completely equivalent; if they don't look equivalent to you >> in the log, then I will fix whatever is broken. I'll make sure here that >> the queries look right visually too. >> >> One possibility is that when you restarted the agents process, the >> jobqueue records did not yet finish getting reprioritized. Stuffer queries >> are fired all the time, but the running jobs must complete reprioritization >> before the stuffer query will pick up any records. I wonder if they may >> not have managed to get to the right state before you aborted the >> experiment? You can tell what is happening by using jstack to get a thread >> dump of the agents process. >> >> Thanks, >> Karl >> >> >> On Fri, Sep 12, 2014 at 11:05 AM, Paul Boichat <[email protected]> >> wrote: >> >>> I stayed with base 1.6.1 and manually patched the code to include the >>> two new methods in DBInterfacePostgreSQL >>> >>> Paul >>> >>> >>> >>> VP Engineering, >>> Exonar Ltd >>> >>> T: +44 7940 567724 >>> >>> twitter:@exonarco @pboichat >>> W: http://www.exonar.com >>> Nothing is secure. Now what? Exonar Raven <http://video.exonar.com/> >>> >>> Exonar Limited, registered in the UK, registration number 06439969 at 14 >>> West Mills, Newbury, Berkshire, RG14 5HG >>> DISCLAIMER: This email and any attachments to it may be confidential >>> and are intended solely for the use of the individual to whom it is >>> addressed. Any views or opinions expressed are solely those of the author >>> and do not necessarily represent those of Exonar Ltd. If you are not >>> the intended recipient of this email, you must neither take any action >>> based upon its contents, nor copy or show it to anyone. Please contact >>> the sender if you believe you have received this email in error. >>> >>> On Fri, Sep 12, 2014 at 4:01 PM, Karl Wright <[email protected]> wrote: >>> >>>> The changes pass all tests here. Is it possible that you attempted >>>> some upgrade that failed (or didn't attempt upgrade but went to a new code >>>> version)? >>>> >>>> If you could let me know as exactly as possible what you did, I can let >>>> you know if that should have worked or not. >>>> >>>> Thanks! >>>> Karl >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Sep 12, 2014 at 10:57 AM, Paul Boichat <[email protected] >>>> > wrote: >>>> >>>>> Karl, >>>>> >>>>> We appear to be seeing an issue with the performance change to use an >>>>> OR clause rather than IN. After making the change, when we restart >>>>> manifoldcf (with one job in running state) documents in the running job >>>>> are >>>>> not picked up for processing by the stuffer thread. If we redploy base >>>>> 1.6.1 and restart documents are processed. This is consistently switchable >>>>> depending on which code base is deployed. >>>>> >>>>> We have logs that I could upload to the ticket if you recommend that >>>>> we reopen the issue (or create a new one)? >>>>> >>>>> Paul >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> VP Engineering, >>>>> Exonar Ltd >>>>> >>>>> T: +44 7940 567724 >>>>> >>>>> twitter:@exonarco @pboichat >>>>> W: http://www.exonar.com >>>>> Nothing is secure. Now what? Exonar Raven <http://video.exonar.com/> >>>>> >>>>> Exonar Limited, registered in the UK, registration number 06439969 at 14 >>>>> West Mills, Newbury, Berkshire, RG14 5HG >>>>> DISCLAIMER: This email and any attachments to it may be confidential >>>>> and are intended solely for the use of the individual to whom it is >>>>> addressed. Any views or opinions expressed are solely those of the author >>>>> and do not necessarily represent those of Exonar Ltd. If you are not >>>>> the intended recipient of this email, you must neither take any action >>>>> based upon its contents, nor copy or show it to anyone. Please >>>>> contact the sender if you believe you have received this email in error. >>>>> >>>>> On Fri, Sep 12, 2014 at 6:05 AM, Karl Wright <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Paul -- >>>>>> >>>>>> Just to be clear -- the branch for CONNECTORS-1027 is a branch of >>>>>> trunk, which is MCF 2.0. MCF 2.0 is not backwards compatible with any >>>>>> previous MCF release, and indeed there is no upgrade from any 1.x release >>>>>> to 2.0. That's why I said to use the patches, and try to stay on 1.6.1 >>>>>> or >>>>>> at most to migrate to 1.7. >>>>>> >>>>>> IF you ALREADY tried an upgrade with the branch code, then you would >>>>>> have wound up in a schema state where the schema had more columns in it >>>>>> than the branch knew how to deal with. That's bad, and you will need to >>>>>> do >>>>>> things to fix the situation. I believe you should still be able to do >>>>>> the >>>>>> following: >>>>>> >>>>>> - Download 1.7 source, or check out >>>>>> https://svn.apache.org/repos/asf/manifoldcf/branches/release-1.7-branch >>>>>> - Apply the patches >>>>>> - Build >>>>>> - Modify your properties.xml to point to your postgresql instance >>>>>> - Run the upgrade (initialize.bat on the multi-process example, or >>>>>> start the single-process example) >>>>>> >>>>>> You should then have a working 1.7 release, with code patches applied. >>>>>> >>>>>> Thanks, >>>>>> Karl >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Sep 11, 2014 at 11:34 AM, Paul Boichat < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Thanks - we've pulled down the branch and will test the changes. It >>>>>>> looks like a branch of 1.7 so it's going to take us a little while to >>>>>>> test. >>>>>>> We need to migrate our connectors (there's some deprecated stuff that's >>>>>>> now >>>>>>> been cleared in 1.7 .eg. getShareACL) and we'll need to patch the >>>>>>> database >>>>>>> to include the pipeline and any other schema changes. We'll have some >>>>>>> environment contention over the next week as our performance test >>>>>>> environment needs to remain on 1.6.1 while we test a release. Once >>>>>>> that's >>>>>>> clear I'll move to 1.7 >>>>>>> >>>>>>> On the database schema patch moving from 1.6.1 to 1.7 - is there a >>>>>>> simple way to migrate and existing database? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Paul >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> VP Engineering, >>>>>>> Exonar Ltd >>>>>>> >>>>>>> T: +44 7940 567724 >>>>>>> >>>>>>> twitter:@exonarco @pboichat >>>>>>> W: http://www.exonar.com >>>>>>> Nothing is secure. Now what? Exonar Raven <http://video.exonar.com/> >>>>>>> >>>>>>> Exonar Limited, registered in the UK, registration number 06439969 >>>>>>> at 14 West Mills, Newbury, Berkshire, RG14 5HG >>>>>>> DISCLAIMER: This email and any attachments to it may be >>>>>>> confidential and are intended solely for the use of the individual to >>>>>>> whom >>>>>>> it is addressed. Any views or opinions expressed are solely those of the >>>>>>> author and do not necessarily represent those of Exonar Ltd. If you >>>>>>> are not the intended recipient of this email, you must neither take any >>>>>>> action based upon its contents, nor copy or show it to anyone. Please >>>>>>> contact the sender if you believe you have received this email in error. >>>>>>> >>>>>>> On Thu, Sep 11, 2014 at 1:27 PM, Karl Wright <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks -- I'll include that change as well then, in ticket >>>>>>>> CONNECTORS-1027. >>>>>>>> >>>>>>>> >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Sep 11, 2014 at 7:45 AM, Paul Boichat < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> That comes back immediately with 10001 rows: >>>>>>>>> >>>>>>>>> explain analyze SELECT count(*) FROM (SELECT 'x' FROM jobqueue >>>>>>>>> LIMIT 10001) t; >>>>>>>>> >>>>>>>>> QUERY PLAN >>>>>>>>> >>>>>>>>> >>>>>>>>> ----------------------------------------------------------------------------------------------------------------------- >>>>>>>>> ---------------------------------- >>>>>>>>> Aggregate (cost=544.08..544.09 rows=1 width=0) (actual >>>>>>>>> time=9.125..9.125 rows=1 loops=1) >>>>>>>>> -> Limit (cost=0.00..419.07 rows=10001 width=0) (actual >>>>>>>>> time=0.033..6.945 rows=10001 loops=1) >>>>>>>>> -> Index Only Scan using jobqueue_pkey on jobqueue >>>>>>>>> (cost=0.00..431189.31 rows=10290271 width=0) (actual time >>>>>>>>> =0.031..3.257 rows=10001 loops=1) >>>>>>>>> Heap Fetches: 725 >>>>>>>>> Total runtime: 9.157 ms >>>>>>>>> (5 rows) >>>>>>>>> >>>>>>>>> >>>>>>>>> Whereas: >>>>>>>>> >>>>>>>>> explain analyze SELECT count(*) FROM jobqueue limit 10001; >>>>>>>>> >>>>>>>>> QUERY PLAN >>>>>>>>> >>>>>>>>> >>>>>>>>> ----------------------------------------------------------------------------------------------------------------------- >>>>>>>>> ---------------------------------------- >>>>>>>>> Limit (cost=456922.99..456923.00 rows=1 width=0) (actual >>>>>>>>> time=5225.107..5225.109 rows=1 loops=1) >>>>>>>>> -> Aggregate (cost=456922.99..456923.00 rows=1 width=0) >>>>>>>>> (actual time=5225.105..5225.106 rows=1 loops=1) >>>>>>>>> -> Index Only Scan using jobqueue_pkey on jobqueue >>>>>>>>> (cost=0.00..431197.31 rows=10290271 width=0) (actual time >>>>>>>>> =0.108..3090.848 rows=10370209 loops=1) >>>>>>>>> Heap Fetches: 684297 >>>>>>>>> Total runtime: 5225.151 ms >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Paul >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> VP Engineering, >>>>>>>>> Exonar Ltd >>>>>>>>> >>>>>>>>> T: +44 7940 567724 >>>>>>>>> >>>>>>>>> twitter:@exonarco @pboichat >>>>>>>>> W: http://www.exonar.com >>>>>>>>> Nothing is secure. Now what? Exonar Raven >>>>>>>>> <http://video.exonar.com/> >>>>>>>>> >>>>>>>>> Exonar Limited, registered in the UK, registration number 06439969 >>>>>>>>> at 14 West Mills, Newbury, Berkshire, RG14 5HG >>>>>>>>> DISCLAIMER: This email and any attachments to it may be >>>>>>>>> confidential and are intended solely for the use of the individual to >>>>>>>>> whom >>>>>>>>> it is addressed. Any views or opinions expressed are solely those of >>>>>>>>> the >>>>>>>>> author and do not necessarily represent those of Exonar Ltd. If >>>>>>>>> you are not the intended recipient of this email, you must neither >>>>>>>>> take any >>>>>>>>> action based upon its contents, nor copy or show it to anyone. Please >>>>>>>>> contact the sender if you believe you have received this email in >>>>>>>>> error. >>>>>>>>> >>>>>>>>> On Thu, Sep 11, 2014 at 12:25 PM, Karl Wright <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Paul, >>>>>>>>>> >>>>>>>>>> Could you try this query on your database please and tell me if >>>>>>>>>> it executes promptly: >>>>>>>>>> >>>>>>>>>> SELECT count(*) FROM (SELECT 'x' FROM jobqueue LIMIT 10001) t >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I vaguely remember that I had to change the form of this query in >>>>>>>>>> order to support MySQL -- but first let's see if this helps. >>>>>>>>>> >>>>>>>>>> Karl >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Sep 11, 2014 at 6:01 AM, Karl Wright <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I've created a ticket (CONNECTORS-1027) and a trunk-based branch >>>>>>>>>>> (branches/CONNECTORS-1027) for looking at any changes we do for >>>>>>>>>>> large-scale >>>>>>>>>>> Postgresql optimization work. >>>>>>>>>>> >>>>>>>>>>> Please note that trunk code already has schema changes relative >>>>>>>>>>> to MCF 1.7, so you will not be able to work directly with this >>>>>>>>>>> branch >>>>>>>>>>> code. I'll have to create patches for whatever changes you would >>>>>>>>>>> need to >>>>>>>>>>> try. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Karl >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Sep 11, 2014 at 5:56 AM, Paul Boichat < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> We're on Postgres 9.2. I'll get the query plans and add them to >>>>>>>>>>>> the doc. >>>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> >>>>>>>>>>>> Paul >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> VP Engineering, >>>>>>>>>>>> Exonar Ltd >>>>>>>>>>>> >>>>>>>>>>>> T: +44 7940 567724 >>>>>>>>>>>> >>>>>>>>>>>> twitter:@exonarco @pboichat >>>>>>>>>>>> W: http://www.exonar.com >>>>>>>>>>>> Nothing is secure. Now what? Exonar Raven >>>>>>>>>>>> <http://video.exonar.com/> >>>>>>>>>>>> >>>>>>>>>>>> Exonar Limited, registered in the UK, registration number >>>>>>>>>>>> 06439969 at 14 West Mills, Newbury, Berkshire, RG14 5HG >>>>>>>>>>>> DISCLAIMER: This email and any attachments to it may be >>>>>>>>>>>> confidential and are intended solely for the use of the individual >>>>>>>>>>>> to whom >>>>>>>>>>>> it is addressed. Any views or opinions expressed are solely those >>>>>>>>>>>> of the >>>>>>>>>>>> author and do not necessarily represent those of Exonar Ltd. If >>>>>>>>>>>> you are not the intended recipient of this email, you must neither >>>>>>>>>>>> take any >>>>>>>>>>>> action based upon its contents, nor copy or show it to anyone. >>>>>>>>>>>> Please >>>>>>>>>>>> contact the sender if you believe you have received this email in >>>>>>>>>>>> error. >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Sep 11, 2014 at 10:51 AM, Karl Wright < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Paul, >>>>>>>>>>>>> >>>>>>>>>>>>> Can you include the logged plan for this query; this is an >>>>>>>>>>>>> actual query encountered during crawling: >>>>>>>>>>>>> >>>>>>>>>>>>> WARN 2014-09-05 12:43:39,897 (Worker thread '61') - Found a >>>>>>>>>>>>> long-running query (596499 ms): [SELECT t0.id,t0.dochash,t0.docid >>>>>>>>>>>>> FROM carrydown t1, jobqueue t0 WHERE t1.jobid=? AND >>>>>>>>>>>>> t1.parentidhash=? AND >>>>>>>>>>>>> t0.dochash=t1.childidhash AND t0.jobid=t1.jobid AND t1.isnew=?] >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> These queries are all from the UI; it is what gets generated >>>>>>>>>>>>> when no limits are in place: >>>>>>>>>>>>> >>>>>>>>>>>>> WARN 2014-09-05 12:33:47,445 (http-apr-8081-exec-2) - Found a >>>>>>>>>>>>> long-running query (166845 ms): [SELECT jobid,COUNT(dochash) AS >>>>>>>>>>>>> doccount >>>>>>>>>>>>> FROM jobqueue t1 GROUP BY jobid] >>>>>>>>>>>>> WARN 2014-09-05 12:33:47,908 (http-apr-8081-exec-3) - Found a >>>>>>>>>>>>> long-running query (107222 ms): [SELECT jobid,COUNT(dochash) AS >>>>>>>>>>>>> doccount >>>>>>>>>>>>> FROM jobqueue t1 GROUP BY jobid] >>>>>>>>>>>>> >>>>>>>>>>>>> This query is from the UI with a limit of 1000000: >>>>>>>>>>>>> >>>>>>>>>>>>> WARN 2014-09-05 12:33:45,390 (http-apr-8081-exec-10) - Found a >>>>>>>>>>>>> long-running query (254851 ms): [SELECT COUNT(dochash) AS >>>>>>>>>>>>> doccount FROM >>>>>>>>>>>>> jobqueue t1 LIMIT 1000001] >>>>>>>>>>>>> >>>>>>>>>>>>> I honestly don't understand why PostgreSQL would execute a >>>>>>>>>>>>> sequential scan of the entire table when given a limit clause. It >>>>>>>>>>>>> certainly didn't used to do that. If you have any other >>>>>>>>>>>>> suggestions please >>>>>>>>>>>>> let me know. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Some queries show up in this list because MCF periodically >>>>>>>>>>>>> reindexes tables. For example, this query goes only against the >>>>>>>>>>>>> (small) >>>>>>>>>>>>> jobs table. Its poor performance on occasion is likely due to >>>>>>>>>>>>> something >>>>>>>>>>>>> else happening to the database, probably a reindex: >>>>>>>>>>>>> >>>>>>>>>>>>> WARN 2014-09-05 12:43:40,404 (Finisher thread) - Found a >>>>>>>>>>>>> long-running query (592474 ms): [SELECT id FROM jobs WHERE status >>>>>>>>>>>>> IN >>>>>>>>>>>>> (?,?,?,?,?) FOR UPDATE] >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> The final query is the document stuffing query, which is >>>>>>>>>>>>> perhaps the most critical query in the whole system: >>>>>>>>>>>>> >>>>>>>>>>>>> SELECT >>>>>>>>>>>>> t0.id >>>>>>>>>>>>> ,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount, >>>>>>>>>>>>> t0.priorityset FROM jobqueue t0 >>>>>>>>>>>>> WHERE t0.status IN ('P','G') AND t0.checkaction='R' AND >>>>>>>>>>>>> t0.checktime >>>>>>>>>>>>> <= 1407246846166 >>>>>>>>>>>>> AND EXISTS ( >>>>>>>>>>>>> SELECT 'x' FROM jobs t1 >>>>>>>>>>>>> WHERE t1.status IN ('A','a') AND t1.id=t0.jobid AND >>>>>>>>>>>>> t1.priority=5 >>>>>>>>>>>>> ) >>>>>>>>>>>>> AND NOT EXISTS ( >>>>>>>>>>>>> SELECT 'x' FROM jobqueue t2 >>>>>>>>>>>>> WHERE t2.dochash=t0.dochash AND t2.status IN >>>>>>>>>>>>> ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid >>>>>>>>>>>>> ) >>>>>>>>>>>>> AND NOT EXISTS ( >>>>>>>>>>>>> SELECT 'x' FROM prereqevents t3,events t4 >>>>>>>>>>>>> WHERE t0.id=t3.owner AND t3.eventname=t4.name >>>>>>>>>>>>> ) >>>>>>>>>>>>> ORDER BY t0.docpriority ASC >>>>>>>>>>>>> LIMIT 480; >>>>>>>>>>>>> >>>>>>>>>>>>> Your analysis of whether IN beats OR does not agree with >>>>>>>>>>>>> experiments I did on postgresql 8.7 which showed no difference. >>>>>>>>>>>>> What >>>>>>>>>>>>> Postgresql version are you using? Also, I trust you have query >>>>>>>>>>>>> plans that >>>>>>>>>>>>> demonstrate your claim? In any case, whether IN vs. OR is >>>>>>>>>>>>> generated is a >>>>>>>>>>>>> function of the MCF database driver, so this is trivial to >>>>>>>>>>>>> experiment >>>>>>>>>>>>> with. I'll create a ticket and a branch for experimentation. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Karl >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Sep 11, 2014 at 5:32 AM, Paul Boichat < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Karl, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Changing maxcountstatus to something much smaller (10,000) >>>>>>>>>>>>>> doesn't seem to buy us that much on the table scan - in the >>>>>>>>>>>>>> attached you'll >>>>>>>>>>>>>> see that it's still taking a long time to return the job status >>>>>>>>>>>>>> page. Also >>>>>>>>>>>>>> in the attached are some sample other long running queries that >>>>>>>>>>>>>> we're >>>>>>>>>>>>>> beginning to see more frequently. There's also an example of a >>>>>>>>>>>>>> query that's >>>>>>>>>>>>>> frequently executed and regularly takes > 4 secs (plus a >>>>>>>>>>>>>> suggested change >>>>>>>>>>>>>> to improve performance). This one in particular would certainly >>>>>>>>>>>>>> benefit >>>>>>>>>>>>>> from a change to SSDs which should relieve the I/O bound >>>>>>>>>>>>>> bottleneck on >>>>>>>>>>>>>> postgres. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We're loading the system from 10mil towards 100mil so would >>>>>>>>>>>>>> be keen to work with you to optimise where possible. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Paul >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> VP Engineering, >>>>>>>>>>>>>> Exonar Ltd >>>>>>>>>>>>>> >>>>>>>>>>>>>> T: +44 7940 567724 >>>>>>>>>>>>>> >>>>>>>>>>>>>> twitter:@exonarco @pboichat >>>>>>>>>>>>>> W: http://www.exonar.com >>>>>>>>>>>>>> Nothing is secure. Now what? Exonar Raven >>>>>>>>>>>>>> <http://video.exonar.com/> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Exonar Limited, registered in the UK, registration number >>>>>>>>>>>>>> 06439969 at 14 West Mills, Newbury, Berkshire, RG14 5HG >>>>>>>>>>>>>> DISCLAIMER: This email and any attachments to it may be >>>>>>>>>>>>>> confidential and are intended solely for the use of the >>>>>>>>>>>>>> individual to whom >>>>>>>>>>>>>> it is addressed. Any views or opinions expressed are solely >>>>>>>>>>>>>> those of the >>>>>>>>>>>>>> author and do not necessarily represent those of Exonar Ltd. If >>>>>>>>>>>>>> you are not the intended recipient of this email, you must >>>>>>>>>>>>>> neither take any >>>>>>>>>>>>>> action based upon its contents, nor copy or show it to anyone. >>>>>>>>>>>>>> Please >>>>>>>>>>>>>> contact the sender if you believe you have received this email >>>>>>>>>>>>>> in error. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Sep 10, 2014 at 6:34 PM, Karl Wright < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Paul, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The jobstatus query that uses count(*) should be doing >>>>>>>>>>>>>>> something like this when the maxdocumentstatuscount value is >>>>>>>>>>>>>>> set: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> select count(*) from jobqueue where xxx limit 500001 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This will still do a sequential scan, but it will be an >>>>>>>>>>>>>>> aborted one, so you can control the maximum amount of time >>>>>>>>>>>>>>> spent doing the >>>>>>>>>>>>>>> query. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Sep 10, 2014 at 1:23 PM, Paul Boichat < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We've had a play with maxstatuscount and couldn't stop it >>>>>>>>>>>>>>>> from count(*)-ing but I'll certainly have another look to see >>>>>>>>>>>>>>>> if we've >>>>>>>>>>>>>>>> missed something. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We're increasingly seeing long running threads and I'll put >>>>>>>>>>>>>>>> together some samples. As an example, on a job that's >>>>>>>>>>>>>>>> currently aborting: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:29,900 (Job reset thread) - Found a >>>>>>>>>>>>>>>> long-running query (72902 ms): [UPDATE jobqueue SET >>>>>>>>>>>>>>>> docpriority=?,priorityset=NULL WHERE jobid=?] >>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:29,900 (Job reset thread) - >>>>>>>>>>>>>>>> Parameter 0: '1.000000001E9' >>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:29,900 (Job reset thread) - >>>>>>>>>>>>>>>> Parameter 1: '1407144048075' >>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:29,960 (Job reset thread) - Plan: >>>>>>>>>>>>>>>> Update on jobqueue (cost=18806.08..445770.39 rows=764916 >>>>>>>>>>>>>>>> width=287) >>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:29,960 (Job reset thread) - Plan: >>>>>>>>>>>>>>>> -> Bitmap Heap Scan on jobqueue (cost=18806.08..445770.39 >>>>>>>>>>>>>>>> rows=764916 >>>>>>>>>>>>>>>> width=287) >>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:29,960 (Job reset thread) - >>>>>>>>>>>>>>>> Plan: Recheck Cond: (jobid = 1407144048075::bigint) >>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:29,960 (Job reset thread) - >>>>>>>>>>>>>>>> Plan: -> Bitmap Index Scan on i1392985450177 >>>>>>>>>>>>>>>> (cost=0.00..18614.85 >>>>>>>>>>>>>>>> rows=764916 width=0) >>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:29,960 (Job reset thread) - >>>>>>>>>>>>>>>> Plan: Index Cond: (jobid = 1407144048075::bigint) >>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:29,960 (Job reset thread) - >>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:30,140 (Job reset thread) - Stats: >>>>>>>>>>>>>>>> n_distinct=4.0 most_common_vals={G,C,Z,P} >>>>>>>>>>>>>>>> most_common_freqs={0.40676665,0.36629999,0.16606666,0.060866665} >>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:30,140 (Job reset thread) - >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Paul >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> VP Engineering, >>>>>>>>>>>>>>>> Exonar Ltd >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> T: +44 7940 567724 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> twitter:@exonarco @pboichat >>>>>>>>>>>>>>>> W: http://www.exonar.com >>>>>>>>>>>>>>>> Nothing is secure. Now what? Exonar Raven >>>>>>>>>>>>>>>> <http://video.exonar.com/> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Exonar Limited, registered in the UK, registration number >>>>>>>>>>>>>>>> 06439969 at 14 West Mills, Newbury, Berkshire, RG14 5HG >>>>>>>>>>>>>>>> DISCLAIMER: This email and any attachments to it may be >>>>>>>>>>>>>>>> confidential and are intended solely for the use of the >>>>>>>>>>>>>>>> individual to whom >>>>>>>>>>>>>>>> it is addressed. Any views or opinions expressed are solely >>>>>>>>>>>>>>>> those of the >>>>>>>>>>>>>>>> author and do not necessarily represent those of Exonar Ltd. If >>>>>>>>>>>>>>>> you are not the intended recipient of this email, you must >>>>>>>>>>>>>>>> neither take any >>>>>>>>>>>>>>>> action based upon its contents, nor copy or show it to anyone. >>>>>>>>>>>>>>>> Please >>>>>>>>>>>>>>>> contact the sender if you believe you have received this email >>>>>>>>>>>>>>>> in error. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Sep 10, 2014 at 6:14 PM, Karl Wright < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Paul, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For the jobqueue scans from the UI, there is a parameter >>>>>>>>>>>>>>>>> you can set which limits the number of documents counted to >>>>>>>>>>>>>>>>> at most a >>>>>>>>>>>>>>>>> specified amount. This uses a limit clause, which should >>>>>>>>>>>>>>>>> prevent unbounded >>>>>>>>>>>>>>>>> time doing these kinds of queries: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> org.apache.manifoldcf.ui.maxstatuscount >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The documentation says that the default value for this >>>>>>>>>>>>>>>>> parameter is 10000, which however is incorrect. The actual >>>>>>>>>>>>>>>>> true default is >>>>>>>>>>>>>>>>> 500000. You could set that lower for better UI performance >>>>>>>>>>>>>>>>> (losing some >>>>>>>>>>>>>>>>> information, of course.) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> As for long-running queries, a lot of time and effort has >>>>>>>>>>>>>>>>> been spent in MCF to insure that this doesn't happen. >>>>>>>>>>>>>>>>> Specifically, the >>>>>>>>>>>>>>>>> main document queuing query is structured to read directly >>>>>>>>>>>>>>>>> out of a >>>>>>>>>>>>>>>>> specific jobqueue index. This is the crucial query that must >>>>>>>>>>>>>>>>> work properly >>>>>>>>>>>>>>>>> for scalability, since doing a query that is effectively just >>>>>>>>>>>>>>>>> a sort on the >>>>>>>>>>>>>>>>> entire jobqueue would be a major problem. There are some >>>>>>>>>>>>>>>>> times where >>>>>>>>>>>>>>>>> Postgresql's optimizer fails to do the right thing here, >>>>>>>>>>>>>>>>> mostly because it >>>>>>>>>>>>>>>>> makes a huge distinction between whether there's zero of >>>>>>>>>>>>>>>>> something or one >>>>>>>>>>>>>>>>> of something, but you can work around that particular issue >>>>>>>>>>>>>>>>> by setting the >>>>>>>>>>>>>>>>> analyze count to 1 if you start to see this problem -- which >>>>>>>>>>>>>>>>> basically >>>>>>>>>>>>>>>>> means that reanalysis of the table has to occur on every >>>>>>>>>>>>>>>>> stuffing query. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'd appreciate seeing the queries that are long-running in >>>>>>>>>>>>>>>>> your case so that I can see if that is what you are >>>>>>>>>>>>>>>>> encountering or not. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Sep 10, 2014 at 1:01 PM, Paul Boichat < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Karl, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We're beginning to see issues with a document count > 10 >>>>>>>>>>>>>>>>>> million. At that point, even with good postgres >>>>>>>>>>>>>>>>>> vacuuming the jobqueue table is starting to become a >>>>>>>>>>>>>>>>>> bottleneck. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> For example select count(*) from jobqueue, which is >>>>>>>>>>>>>>>>>> executed when querying job status will do a full table scan >>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>> jobqueue which has more than 10 million rows. That's >>>>>>>>>>>>>>>>>> going to take some time in postgres. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> SSDs will certainly make a big difference to document >>>>>>>>>>>>>>>>>> processing through-put (which we see is largely I/O bound in >>>>>>>>>>>>>>>>>> postgres) but we are increasingly seeing long running >>>>>>>>>>>>>>>>>> queries in the logs. Our current thinking is that we'll need >>>>>>>>>>>>>>>>>> to refactor >>>>>>>>>>>>>>>>>> JobQueue somewhat to optimise queries and, potentially >>>>>>>>>>>>>>>>>> partition jobqueue into a subset of tables (table per >>>>>>>>>>>>>>>>>> queue for example). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Paul >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> VP Engineering, >>>>>>>>>>>>>>>>>> Exonar Ltd >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> T: +44 7940 567724 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> twitter:@exonarco @pboichat >>>>>>>>>>>>>>>>>> W: http://www.exonar.com >>>>>>>>>>>>>>>>>> Nothing is secure. Now what? Exonar Raven >>>>>>>>>>>>>>>>>> <http://video.exonar.com/> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Exonar Limited, registered in the UK, registration number >>>>>>>>>>>>>>>>>> 06439969 at 14 West Mills, Newbury, Berkshire, RG14 5HG >>>>>>>>>>>>>>>>>> DISCLAIMER: This email and any attachments to it may be >>>>>>>>>>>>>>>>>> confidential and are intended solely for the use of the >>>>>>>>>>>>>>>>>> individual to whom >>>>>>>>>>>>>>>>>> it is addressed. Any views or opinions expressed are solely >>>>>>>>>>>>>>>>>> those of the >>>>>>>>>>>>>>>>>> author and do not necessarily represent those of Exonar Ltd. >>>>>>>>>>>>>>>>>> If >>>>>>>>>>>>>>>>>> you are not the intended recipient of this email, you must >>>>>>>>>>>>>>>>>> neither take any >>>>>>>>>>>>>>>>>> action based upon its contents, nor copy or show it to >>>>>>>>>>>>>>>>>> anyone. Please >>>>>>>>>>>>>>>>>> contact the sender if you believe you have received this >>>>>>>>>>>>>>>>>> email in error. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Sep 10, 2014 at 3:15 PM, Karl Wright < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi Baptiste, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ManifoldCF is not limited by the number of agents >>>>>>>>>>>>>>>>>>> processes or parallel connectors. Overall database >>>>>>>>>>>>>>>>>>> performance is the >>>>>>>>>>>>>>>>>>> limiting factor. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I would read this: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/performance-tuning.html >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Also, there's a section in ManifoldCF (I believe Chapter >>>>>>>>>>>>>>>>>>> 2) that discusses this issue. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Some five years ago, I successfully crawled 5 million >>>>>>>>>>>>>>>>>>> web documents, using Postgresql 8.3. Postgresql 9.x is >>>>>>>>>>>>>>>>>>> faster, and with >>>>>>>>>>>>>>>>>>> modern SSD's, I expect that you will do even better. In >>>>>>>>>>>>>>>>>>> general, I'd say >>>>>>>>>>>>>>>>>>> it was fine to shoot for 10M - 100M documents on >>>>>>>>>>>>>>>>>>> ManifoldCF, provided that >>>>>>>>>>>>>>>>>>> you use a good database, and provided that you maintain it >>>>>>>>>>>>>>>>>>> properly. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, Sep 10, 2014 at 10:07 AM, Baptiste Berthier < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I would like to know what is the maximum number of >>>>>>>>>>>>>>>>>>>> documents that you managed to crawl with ManifoldCF and >>>>>>>>>>>>>>>>>>>> with how many >>>>>>>>>>>>>>>>>>>> connectors in parallel it could works ? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks for your answer >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Baptiste >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
