A longer (63 minute) postgresql load test just completed, also without any problem.
Are you sure you included the patch code correctly? Karl On Fri, Sep 12, 2014 at 12:51 PM, Karl Wright <[email protected]> wrote: > It's done actually; my new laptop with PostgreSQL 9.3 can do 111111 > documents in roughly 15 minutes. No problems encountered; roughly 120 > documents per second. > > For sanity sake, could you try the following: > > - check out or unpack 1.6.1 sources > - lay down downloaded lib dependencies > - build using "ant build" > - modify properties.xml and start the approparite example > > Please see if you have any problems doing this process *without* any > patches. > > Also, what kind of synchronization are you using? File based, zookeeper, > or single-process? > > Thanks, > Karl > > > On Fri, Sep 12, 2014 at 12:29 PM, Karl Wright <[email protected]> wrote: > >> Hi Paul, >> >> The query looks right; the database driver determines the maximum number >> of clauses in a conjunction OR list, just like it does for an IN() list. >> In the case of Postgresql and OR, the limit is 25; for IN()'s it's 100. >> >> The standard integration tests generally run small jobs but that is >> typically sufficient to find query generation problems. I have load tests >> I can also run but they take several hours to complete. I'll start one >> now, but I may need to abort it before it finishes. >> >> Karl >> >> >> On Fri, Sep 12, 2014 at 11:26 AM, Paul Boichat <[email protected]> >> wrote: >> >>> Hi, >>> >>> I'm looking through the logs - can see the change from IN to OR in each >>> query - and there's clearly a difference in execution path but it's quite >>> verbose so will take a while. >>> >>> It may well be that document state has not been reprioritised or in some >>> way inconsistent. However, I don't think it's that which is causing the >>> issue - I can switch this behaviour on and off over by changing the >>> DBInterfacePostgres class and restarting Manifold. That seems to suggest a >>> query isn't behaving the same way between IN and OR - I just can't isolate >>> the particular query (yet). >>> >>> Have you tested with a job already in running state (on a restart) with >>> a large document count? For example am seeing this kind of thing which >>> looks messy but appears to execute as you'd expect: >>> >>> SELECT >>> id,dockey,lastversion,lastoutputversion,authorityname,forcedparams FROM >>> ingeststatus WHERE (dockey=? OR dockey=? OR dockey=? OR dockey=? OR >>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR >>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR >>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR >>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR >>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR >>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR >>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR >>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR >>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR >>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR >>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR >>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR >>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR >>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR >>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR >>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=?) AND >>> connectionname=?] >>> DEBUG 2014-09-12 15:01:27,052 (Thread-542) - Parameter 0: >>> '1407144048431:F42CD76D66FA6BAD396FF8F8A409DD211C184E6A' >>> DEBUG 2014-09-12 15:01:27,052 (Thread-542) - Parameter 1: >>> '1407144048431:FE66CC4054300E4EB2A84138DC9B62B80F59F5B9' >>> >>> >>> >>> >>> VP Engineering, >>> Exonar Ltd >>> >>> T: +44 7940 567724 >>> >>> twitter:@exonarco @pboichat >>> W: http://www.exonar.com >>> Nothing is secure. Now what? Exonar Raven <http://video.exonar.com/> >>> >>> Exonar Limited, registered in the UK, registration number 06439969 at 14 >>> West Mills, Newbury, Berkshire, RG14 5HG >>> DISCLAIMER: This email and any attachments to it may be confidential >>> and are intended solely for the use of the individual to whom it is >>> addressed. Any views or opinions expressed are solely those of the author >>> and do not necessarily represent those of Exonar Ltd. If you are not >>> the intended recipient of this email, you must neither take any action >>> based upon its contents, nor copy or show it to anyone. Please contact >>> the sender if you believe you have received this email in error. >>> >>> On Fri, Sep 12, 2014 at 4:20 PM, Karl Wright <[email protected]> wrote: >>> >>>> Hi Paul, >>>> >>>> The tests in fact do multiple complete crawls, so it is extremely >>>> unlikely that the stuffer query is broken. If you look at the queries >>>> generated, you should note that the only difference is that whenever an xxx >>>> IN(?,?) was generated before, a (xxx=? OR xxx=?) is generated instead. >>>> These should be completely equivalent; if they don't look equivalent to you >>>> in the log, then I will fix whatever is broken. I'll make sure here that >>>> the queries look right visually too. >>>> >>>> One possibility is that when you restarted the agents process, the >>>> jobqueue records did not yet finish getting reprioritized. Stuffer queries >>>> are fired all the time, but the running jobs must complete reprioritization >>>> before the stuffer query will pick up any records. I wonder if they may >>>> not have managed to get to the right state before you aborted the >>>> experiment? You can tell what is happening by using jstack to get a thread >>>> dump of the agents process. >>>> >>>> Thanks, >>>> Karl >>>> >>>> >>>> On Fri, Sep 12, 2014 at 11:05 AM, Paul Boichat <[email protected] >>>> > wrote: >>>> >>>>> I stayed with base 1.6.1 and manually patched the code to include the >>>>> two new methods in DBInterfacePostgreSQL >>>>> >>>>> Paul >>>>> >>>>> >>>>> >>>>> VP Engineering, >>>>> Exonar Ltd >>>>> >>>>> T: +44 7940 567724 >>>>> >>>>> twitter:@exonarco @pboichat >>>>> W: http://www.exonar.com >>>>> Nothing is secure. Now what? Exonar Raven <http://video.exonar.com/> >>>>> >>>>> Exonar Limited, registered in the UK, registration number 06439969 at 14 >>>>> West Mills, Newbury, Berkshire, RG14 5HG >>>>> DISCLAIMER: This email and any attachments to it may be confidential >>>>> and are intended solely for the use of the individual to whom it is >>>>> addressed. Any views or opinions expressed are solely those of the author >>>>> and do not necessarily represent those of Exonar Ltd. If you are not >>>>> the intended recipient of this email, you must neither take any action >>>>> based upon its contents, nor copy or show it to anyone. Please >>>>> contact the sender if you believe you have received this email in error. >>>>> >>>>> On Fri, Sep 12, 2014 at 4:01 PM, Karl Wright <[email protected]> >>>>> wrote: >>>>> >>>>>> The changes pass all tests here. Is it possible that you attempted >>>>>> some upgrade that failed (or didn't attempt upgrade but went to a new >>>>>> code >>>>>> version)? >>>>>> >>>>>> If you could let me know as exactly as possible what you did, I can >>>>>> let you know if that should have worked or not. >>>>>> >>>>>> Thanks! >>>>>> Karl >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Sep 12, 2014 at 10:57 AM, Paul Boichat < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Karl, >>>>>>> >>>>>>> We appear to be seeing an issue with the performance change to use >>>>>>> an OR clause rather than IN. After making the change, when we restart >>>>>>> manifoldcf (with one job in running state) documents in the running job >>>>>>> are >>>>>>> not picked up for processing by the stuffer thread. If we redploy base >>>>>>> 1.6.1 and restart documents are processed. This is consistently >>>>>>> switchable >>>>>>> depending on which code base is deployed. >>>>>>> >>>>>>> We have logs that I could upload to the ticket if you recommend that >>>>>>> we reopen the issue (or create a new one)? >>>>>>> >>>>>>> Paul >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> VP Engineering, >>>>>>> Exonar Ltd >>>>>>> >>>>>>> T: +44 7940 567724 >>>>>>> >>>>>>> twitter:@exonarco @pboichat >>>>>>> W: http://www.exonar.com >>>>>>> Nothing is secure. Now what? Exonar Raven <http://video.exonar.com/> >>>>>>> >>>>>>> Exonar Limited, registered in the UK, registration number 06439969 >>>>>>> at 14 West Mills, Newbury, Berkshire, RG14 5HG >>>>>>> DISCLAIMER: This email and any attachments to it may be >>>>>>> confidential and are intended solely for the use of the individual to >>>>>>> whom >>>>>>> it is addressed. Any views or opinions expressed are solely those of the >>>>>>> author and do not necessarily represent those of Exonar Ltd. If you >>>>>>> are not the intended recipient of this email, you must neither take any >>>>>>> action based upon its contents, nor copy or show it to anyone. Please >>>>>>> contact the sender if you believe you have received this email in error. >>>>>>> >>>>>>> On Fri, Sep 12, 2014 at 6:05 AM, Karl Wright <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Paul -- >>>>>>>> >>>>>>>> Just to be clear -- the branch for CONNECTORS-1027 is a branch of >>>>>>>> trunk, which is MCF 2.0. MCF 2.0 is not backwards compatible with any >>>>>>>> previous MCF release, and indeed there is no upgrade from any 1.x >>>>>>>> release >>>>>>>> to 2.0. That's why I said to use the patches, and try to stay on >>>>>>>> 1.6.1 or >>>>>>>> at most to migrate to 1.7. >>>>>>>> >>>>>>>> IF you ALREADY tried an upgrade with the branch code, then you >>>>>>>> would have wound up in a schema state where the schema had more >>>>>>>> columns in >>>>>>>> it than the branch knew how to deal with. That's bad, and you will >>>>>>>> need to >>>>>>>> do things to fix the situation. I believe you should still be able to >>>>>>>> do >>>>>>>> the following: >>>>>>>> >>>>>>>> - Download 1.7 source, or check out >>>>>>>> https://svn.apache.org/repos/asf/manifoldcf/branches/release-1.7-branch >>>>>>>> - Apply the patches >>>>>>>> - Build >>>>>>>> - Modify your properties.xml to point to your postgresql instance >>>>>>>> - Run the upgrade (initialize.bat on the multi-process example, or >>>>>>>> start the single-process example) >>>>>>>> >>>>>>>> You should then have a working 1.7 release, with code patches >>>>>>>> applied. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Sep 11, 2014 at 11:34 AM, Paul Boichat < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Thanks - we've pulled down the branch and will test the changes. >>>>>>>>> It looks like a branch of 1.7 so it's going to take us a little while >>>>>>>>> to >>>>>>>>> test. We need to migrate our connectors (there's some deprecated stuff >>>>>>>>> that's now been cleared in 1.7 .eg. getShareACL) and we'll need to >>>>>>>>> patch >>>>>>>>> the database to include the pipeline and any other schema changes. >>>>>>>>> We'll >>>>>>>>> have some environment contention over the next week as our >>>>>>>>> performance test >>>>>>>>> environment needs to remain on 1.6.1 while we test a release. Once >>>>>>>>> that's >>>>>>>>> clear I'll move to 1.7 >>>>>>>>> >>>>>>>>> On the database schema patch moving from 1.6.1 to 1.7 - is there a >>>>>>>>> simple way to migrate and existing database? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Paul >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> VP Engineering, >>>>>>>>> Exonar Ltd >>>>>>>>> >>>>>>>>> T: +44 7940 567724 >>>>>>>>> >>>>>>>>> twitter:@exonarco @pboichat >>>>>>>>> W: http://www.exonar.com >>>>>>>>> Nothing is secure. Now what? Exonar Raven >>>>>>>>> <http://video.exonar.com/> >>>>>>>>> >>>>>>>>> Exonar Limited, registered in the UK, registration number 06439969 >>>>>>>>> at 14 West Mills, Newbury, Berkshire, RG14 5HG >>>>>>>>> DISCLAIMER: This email and any attachments to it may be >>>>>>>>> confidential and are intended solely for the use of the individual to >>>>>>>>> whom >>>>>>>>> it is addressed. Any views or opinions expressed are solely those of >>>>>>>>> the >>>>>>>>> author and do not necessarily represent those of Exonar Ltd. If >>>>>>>>> you are not the intended recipient of this email, you must neither >>>>>>>>> take any >>>>>>>>> action based upon its contents, nor copy or show it to anyone. Please >>>>>>>>> contact the sender if you believe you have received this email in >>>>>>>>> error. >>>>>>>>> >>>>>>>>> On Thu, Sep 11, 2014 at 1:27 PM, Karl Wright <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks -- I'll include that change as well then, in ticket >>>>>>>>>> CONNECTORS-1027. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Karl >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Sep 11, 2014 at 7:45 AM, Paul Boichat < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> That comes back immediately with 10001 rows: >>>>>>>>>>> >>>>>>>>>>> explain analyze SELECT count(*) FROM (SELECT 'x' FROM jobqueue >>>>>>>>>>> LIMIT 10001) t; >>>>>>>>>>> >>>>>>>>>>> QUERY PLAN >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ----------------------------------------------------------------------------------------------------------------------- >>>>>>>>>>> ---------------------------------- >>>>>>>>>>> Aggregate (cost=544.08..544.09 rows=1 width=0) (actual >>>>>>>>>>> time=9.125..9.125 rows=1 loops=1) >>>>>>>>>>> -> Limit (cost=0.00..419.07 rows=10001 width=0) (actual >>>>>>>>>>> time=0.033..6.945 rows=10001 loops=1) >>>>>>>>>>> -> Index Only Scan using jobqueue_pkey on jobqueue >>>>>>>>>>> (cost=0.00..431189.31 rows=10290271 width=0) (actual time >>>>>>>>>>> =0.031..3.257 rows=10001 loops=1) >>>>>>>>>>> Heap Fetches: 725 >>>>>>>>>>> Total runtime: 9.157 ms >>>>>>>>>>> (5 rows) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Whereas: >>>>>>>>>>> >>>>>>>>>>> explain analyze SELECT count(*) FROM jobqueue limit 10001; >>>>>>>>>>> >>>>>>>>>>> QUERY PLAN >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ----------------------------------------------------------------------------------------------------------------------- >>>>>>>>>>> ---------------------------------------- >>>>>>>>>>> Limit (cost=456922.99..456923.00 rows=1 width=0) (actual >>>>>>>>>>> time=5225.107..5225.109 rows=1 loops=1) >>>>>>>>>>> -> Aggregate (cost=456922.99..456923.00 rows=1 width=0) >>>>>>>>>>> (actual time=5225.105..5225.106 rows=1 loops=1) >>>>>>>>>>> -> Index Only Scan using jobqueue_pkey on jobqueue >>>>>>>>>>> (cost=0.00..431197.31 rows=10290271 width=0) (actual time >>>>>>>>>>> =0.108..3090.848 rows=10370209 loops=1) >>>>>>>>>>> Heap Fetches: 684297 >>>>>>>>>>> Total runtime: 5225.151 ms >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Paul >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> VP Engineering, >>>>>>>>>>> Exonar Ltd >>>>>>>>>>> >>>>>>>>>>> T: +44 7940 567724 >>>>>>>>>>> >>>>>>>>>>> twitter:@exonarco @pboichat >>>>>>>>>>> W: http://www.exonar.com >>>>>>>>>>> Nothing is secure. Now what? Exonar Raven >>>>>>>>>>> <http://video.exonar.com/> >>>>>>>>>>> >>>>>>>>>>> Exonar Limited, registered in the UK, registration number >>>>>>>>>>> 06439969 at 14 West Mills, Newbury, Berkshire, RG14 5HG >>>>>>>>>>> DISCLAIMER: This email and any attachments to it may be >>>>>>>>>>> confidential and are intended solely for the use of the individual >>>>>>>>>>> to whom >>>>>>>>>>> it is addressed. Any views or opinions expressed are solely those >>>>>>>>>>> of the >>>>>>>>>>> author and do not necessarily represent those of Exonar Ltd. If >>>>>>>>>>> you are not the intended recipient of this email, you must neither >>>>>>>>>>> take any >>>>>>>>>>> action based upon its contents, nor copy or show it to anyone. >>>>>>>>>>> Please >>>>>>>>>>> contact the sender if you believe you have received this email in >>>>>>>>>>> error. >>>>>>>>>>> >>>>>>>>>>> On Thu, Sep 11, 2014 at 12:25 PM, Karl Wright < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Paul, >>>>>>>>>>>> >>>>>>>>>>>> Could you try this query on your database please and tell me if >>>>>>>>>>>> it executes promptly: >>>>>>>>>>>> >>>>>>>>>>>> SELECT count(*) FROM (SELECT 'x' FROM jobqueue LIMIT 10001) t >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I vaguely remember that I had to change the form of this query >>>>>>>>>>>> in order to support MySQL -- but first let's see if this helps. >>>>>>>>>>>> >>>>>>>>>>>> Karl >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Sep 11, 2014 at 6:01 AM, Karl Wright < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I've created a ticket (CONNECTORS-1027) and a trunk-based >>>>>>>>>>>>> branch (branches/CONNECTORS-1027) for looking at any changes we >>>>>>>>>>>>> do for >>>>>>>>>>>>> large-scale Postgresql optimization work. >>>>>>>>>>>>> >>>>>>>>>>>>> Please note that trunk code already has schema changes >>>>>>>>>>>>> relative to MCF 1.7, so you will not be able to work directly >>>>>>>>>>>>> with this >>>>>>>>>>>>> branch code. I'll have to create patches for whatever changes >>>>>>>>>>>>> you would >>>>>>>>>>>>> need to try. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Karl >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Sep 11, 2014 at 5:56 AM, Paul Boichat < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> We're on Postgres 9.2. I'll get the query plans and add them >>>>>>>>>>>>>> to the doc. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>> >>>>>>>>>>>>>> Paul >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> VP Engineering, >>>>>>>>>>>>>> Exonar Ltd >>>>>>>>>>>>>> >>>>>>>>>>>>>> T: +44 7940 567724 >>>>>>>>>>>>>> >>>>>>>>>>>>>> twitter:@exonarco @pboichat >>>>>>>>>>>>>> W: http://www.exonar.com >>>>>>>>>>>>>> Nothing is secure. Now what? Exonar Raven >>>>>>>>>>>>>> <http://video.exonar.com/> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Exonar Limited, registered in the UK, registration number >>>>>>>>>>>>>> 06439969 at 14 West Mills, Newbury, Berkshire, RG14 5HG >>>>>>>>>>>>>> DISCLAIMER: This email and any attachments to it may be >>>>>>>>>>>>>> confidential and are intended solely for the use of the >>>>>>>>>>>>>> individual to whom >>>>>>>>>>>>>> it is addressed. Any views or opinions expressed are solely >>>>>>>>>>>>>> those of the >>>>>>>>>>>>>> author and do not necessarily represent those of Exonar Ltd. If >>>>>>>>>>>>>> you are not the intended recipient of this email, you must >>>>>>>>>>>>>> neither take any >>>>>>>>>>>>>> action based upon its contents, nor copy or show it to anyone. >>>>>>>>>>>>>> Please >>>>>>>>>>>>>> contact the sender if you believe you have received this email >>>>>>>>>>>>>> in error. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Sep 11, 2014 at 10:51 AM, Karl Wright < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Paul, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Can you include the logged plan for this query; this is an >>>>>>>>>>>>>>> actual query encountered during crawling: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> WARN 2014-09-05 12:43:39,897 (Worker thread '61') - Found a >>>>>>>>>>>>>>> long-running query (596499 ms): [SELECT >>>>>>>>>>>>>>> t0.id,t0.dochash,t0.docid >>>>>>>>>>>>>>> FROM carrydown t1, jobqueue t0 WHERE t1.jobid=? AND >>>>>>>>>>>>>>> t1.parentidhash=? AND >>>>>>>>>>>>>>> t0.dochash=t1.childidhash AND t0.jobid=t1.jobid AND t1.isnew=?] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> These queries are all from the UI; it is what gets generated >>>>>>>>>>>>>>> when no limits are in place: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> WARN 2014-09-05 12:33:47,445 (http-apr-8081-exec-2) - Found >>>>>>>>>>>>>>> a long-running query (166845 ms): [SELECT jobid,COUNT(dochash) >>>>>>>>>>>>>>> AS doccount >>>>>>>>>>>>>>> FROM jobqueue t1 GROUP BY jobid] >>>>>>>>>>>>>>> WARN 2014-09-05 12:33:47,908 (http-apr-8081-exec-3) - Found >>>>>>>>>>>>>>> a long-running query (107222 ms): [SELECT jobid,COUNT(dochash) >>>>>>>>>>>>>>> AS doccount >>>>>>>>>>>>>>> FROM jobqueue t1 GROUP BY jobid] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This query is from the UI with a limit of 1000000: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> WARN 2014-09-05 12:33:45,390 (http-apr-8081-exec-10) - Found >>>>>>>>>>>>>>> a long-running query (254851 ms): [SELECT COUNT(dochash) AS >>>>>>>>>>>>>>> doccount FROM >>>>>>>>>>>>>>> jobqueue t1 LIMIT 1000001] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I honestly don't understand why PostgreSQL would execute a >>>>>>>>>>>>>>> sequential scan of the entire table when given a limit clause. >>>>>>>>>>>>>>> It >>>>>>>>>>>>>>> certainly didn't used to do that. If you have any other >>>>>>>>>>>>>>> suggestions please >>>>>>>>>>>>>>> let me know. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Some queries show up in this list because MCF periodically >>>>>>>>>>>>>>> reindexes tables. For example, this query goes only against >>>>>>>>>>>>>>> the (small) >>>>>>>>>>>>>>> jobs table. Its poor performance on occasion is likely due to >>>>>>>>>>>>>>> something >>>>>>>>>>>>>>> else happening to the database, probably a reindex: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> WARN 2014-09-05 12:43:40,404 (Finisher thread) - Found a >>>>>>>>>>>>>>> long-running query (592474 ms): [SELECT id FROM jobs WHERE >>>>>>>>>>>>>>> status IN >>>>>>>>>>>>>>> (?,?,?,?,?) FOR UPDATE] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The final query is the document stuffing query, which is >>>>>>>>>>>>>>> perhaps the most critical query in the whole system: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> SELECT >>>>>>>>>>>>>>> t0.id >>>>>>>>>>>>>>> ,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount, >>>>>>>>>>>>>>> t0.priorityset FROM jobqueue t0 >>>>>>>>>>>>>>> WHERE t0.status IN ('P','G') AND t0.checkaction='R' AND >>>>>>>>>>>>>>> t0.checktime >>>>>>>>>>>>>>> <= 1407246846166 >>>>>>>>>>>>>>> AND EXISTS ( >>>>>>>>>>>>>>> SELECT 'x' FROM jobs t1 >>>>>>>>>>>>>>> WHERE t1.status IN ('A','a') AND t1.id=t0.jobid AND >>>>>>>>>>>>>>> t1.priority=5 >>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>> AND NOT EXISTS ( >>>>>>>>>>>>>>> SELECT 'x' FROM jobqueue t2 >>>>>>>>>>>>>>> WHERE t2.dochash=t0.dochash AND t2.status IN >>>>>>>>>>>>>>> ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid >>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>> AND NOT EXISTS ( >>>>>>>>>>>>>>> SELECT 'x' FROM prereqevents t3,events t4 >>>>>>>>>>>>>>> WHERE t0.id=t3.owner AND t3.eventname=t4.name >>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>> ORDER BY t0.docpriority ASC >>>>>>>>>>>>>>> LIMIT 480; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Your analysis of whether IN beats OR does not agree with >>>>>>>>>>>>>>> experiments I did on postgresql 8.7 which showed no difference. >>>>>>>>>>>>>>> What >>>>>>>>>>>>>>> Postgresql version are you using? Also, I trust you have query >>>>>>>>>>>>>>> plans that >>>>>>>>>>>>>>> demonstrate your claim? In any case, whether IN vs. OR is >>>>>>>>>>>>>>> generated is a >>>>>>>>>>>>>>> function of the MCF database driver, so this is trivial to >>>>>>>>>>>>>>> experiment >>>>>>>>>>>>>>> with. I'll create a ticket and a branch for experimentation. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Sep 11, 2014 at 5:32 AM, Paul Boichat < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Karl, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Changing maxcountstatus to something much smaller (10,000) >>>>>>>>>>>>>>>> doesn't seem to buy us that much on the table scan - in the >>>>>>>>>>>>>>>> attached you'll >>>>>>>>>>>>>>>> see that it's still taking a long time to return the job >>>>>>>>>>>>>>>> status page. Also >>>>>>>>>>>>>>>> in the attached are some sample other long running queries >>>>>>>>>>>>>>>> that we're >>>>>>>>>>>>>>>> beginning to see more frequently. There's also an example of a >>>>>>>>>>>>>>>> query that's >>>>>>>>>>>>>>>> frequently executed and regularly takes > 4 secs (plus a >>>>>>>>>>>>>>>> suggested change >>>>>>>>>>>>>>>> to improve performance). This one in particular would >>>>>>>>>>>>>>>> certainly benefit >>>>>>>>>>>>>>>> from a change to SSDs which should relieve the I/O bound >>>>>>>>>>>>>>>> bottleneck on >>>>>>>>>>>>>>>> postgres. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We're loading the system from 10mil towards 100mil so would >>>>>>>>>>>>>>>> be keen to work with you to optimise where possible. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Paul >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> VP Engineering, >>>>>>>>>>>>>>>> Exonar Ltd >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> T: +44 7940 567724 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> twitter:@exonarco @pboichat >>>>>>>>>>>>>>>> W: http://www.exonar.com >>>>>>>>>>>>>>>> Nothing is secure. Now what? Exonar Raven >>>>>>>>>>>>>>>> <http://video.exonar.com/> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Exonar Limited, registered in the UK, registration number >>>>>>>>>>>>>>>> 06439969 at 14 West Mills, Newbury, Berkshire, RG14 5HG >>>>>>>>>>>>>>>> DISCLAIMER: This email and any attachments to it may be >>>>>>>>>>>>>>>> confidential and are intended solely for the use of the >>>>>>>>>>>>>>>> individual to whom >>>>>>>>>>>>>>>> it is addressed. Any views or opinions expressed are solely >>>>>>>>>>>>>>>> those of the >>>>>>>>>>>>>>>> author and do not necessarily represent those of Exonar Ltd. If >>>>>>>>>>>>>>>> you are not the intended recipient of this email, you must >>>>>>>>>>>>>>>> neither take any >>>>>>>>>>>>>>>> action based upon its contents, nor copy or show it to anyone. >>>>>>>>>>>>>>>> Please >>>>>>>>>>>>>>>> contact the sender if you believe you have received this email >>>>>>>>>>>>>>>> in error. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Sep 10, 2014 at 6:34 PM, Karl Wright < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Paul, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The jobstatus query that uses count(*) should be doing >>>>>>>>>>>>>>>>> something like this when the maxdocumentstatuscount value is >>>>>>>>>>>>>>>>> set: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> select count(*) from jobqueue where xxx limit 500001 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This will still do a sequential scan, but it will be an >>>>>>>>>>>>>>>>> aborted one, so you can control the maximum amount of time >>>>>>>>>>>>>>>>> spent doing the >>>>>>>>>>>>>>>>> query. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Sep 10, 2014 at 1:23 PM, Paul Boichat < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We've had a play with maxstatuscount and couldn't stop it >>>>>>>>>>>>>>>>>> from count(*)-ing but I'll certainly have another look to >>>>>>>>>>>>>>>>>> see if we've >>>>>>>>>>>>>>>>>> missed something. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We're increasingly seeing long running threads and I'll >>>>>>>>>>>>>>>>>> put together some samples. As an example, on a job that's >>>>>>>>>>>>>>>>>> currently >>>>>>>>>>>>>>>>>> aborting: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:29,900 (Job reset thread) - Found a >>>>>>>>>>>>>>>>>> long-running query (72902 ms): [UPDATE jobqueue SET >>>>>>>>>>>>>>>>>> docpriority=?,priorityset=NULL WHERE jobid=?] >>>>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:29,900 (Job reset thread) - >>>>>>>>>>>>>>>>>> Parameter 0: '1.000000001E9' >>>>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:29,900 (Job reset thread) - >>>>>>>>>>>>>>>>>> Parameter 1: '1407144048075' >>>>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:29,960 (Job reset thread) - Plan: >>>>>>>>>>>>>>>>>> Update on jobqueue (cost=18806.08..445770.39 rows=764916 >>>>>>>>>>>>>>>>>> width=287) >>>>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:29,960 (Job reset thread) - >>>>>>>>>>>>>>>>>> Plan: -> Bitmap Heap Scan on jobqueue >>>>>>>>>>>>>>>>>> (cost=18806.08..445770.39 >>>>>>>>>>>>>>>>>> rows=764916 width=287) >>>>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:29,960 (Job reset thread) - >>>>>>>>>>>>>>>>>> Plan: Recheck Cond: (jobid = 1407144048075::bigint) >>>>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:29,960 (Job reset thread) - >>>>>>>>>>>>>>>>>> Plan: -> Bitmap Index Scan on i1392985450177 >>>>>>>>>>>>>>>>>> (cost=0.00..18614.85 >>>>>>>>>>>>>>>>>> rows=764916 width=0) >>>>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:29,960 (Job reset thread) - >>>>>>>>>>>>>>>>>> Plan: Index Cond: (jobid = >>>>>>>>>>>>>>>>>> 1407144048075::bigint) >>>>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:29,960 (Job reset thread) - >>>>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:30,140 (Job reset thread) - >>>>>>>>>>>>>>>>>> Stats: n_distinct=4.0 most_common_vals={G,C,Z,P} >>>>>>>>>>>>>>>>>> most_common_freqs={0.40676665,0.36629999,0.16606666,0.060866665} >>>>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:30,140 (Job reset thread) - >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Paul >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> VP Engineering, >>>>>>>>>>>>>>>>>> Exonar Ltd >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> T: +44 7940 567724 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> twitter:@exonarco @pboichat >>>>>>>>>>>>>>>>>> W: http://www.exonar.com >>>>>>>>>>>>>>>>>> Nothing is secure. Now what? Exonar Raven >>>>>>>>>>>>>>>>>> <http://video.exonar.com/> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Exonar Limited, registered in the UK, registration number >>>>>>>>>>>>>>>>>> 06439969 at 14 West Mills, Newbury, Berkshire, RG14 5HG >>>>>>>>>>>>>>>>>> DISCLAIMER: This email and any attachments to it may be >>>>>>>>>>>>>>>>>> confidential and are intended solely for the use of the >>>>>>>>>>>>>>>>>> individual to whom >>>>>>>>>>>>>>>>>> it is addressed. Any views or opinions expressed are solely >>>>>>>>>>>>>>>>>> those of the >>>>>>>>>>>>>>>>>> author and do not necessarily represent those of Exonar Ltd. >>>>>>>>>>>>>>>>>> If >>>>>>>>>>>>>>>>>> you are not the intended recipient of this email, you must >>>>>>>>>>>>>>>>>> neither take any >>>>>>>>>>>>>>>>>> action based upon its contents, nor copy or show it to >>>>>>>>>>>>>>>>>> anyone. Please >>>>>>>>>>>>>>>>>> contact the sender if you believe you have received this >>>>>>>>>>>>>>>>>> email in error. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Sep 10, 2014 at 6:14 PM, Karl Wright < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi Paul, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> For the jobqueue scans from the UI, there is a parameter >>>>>>>>>>>>>>>>>>> you can set which limits the number of documents counted to >>>>>>>>>>>>>>>>>>> at most a >>>>>>>>>>>>>>>>>>> specified amount. This uses a limit clause, which should >>>>>>>>>>>>>>>>>>> prevent unbounded >>>>>>>>>>>>>>>>>>> time doing these kinds of queries: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> org.apache.manifoldcf.ui.maxstatuscount >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The documentation says that the default value for this >>>>>>>>>>>>>>>>>>> parameter is 10000, which however is incorrect. The actual >>>>>>>>>>>>>>>>>>> true default is >>>>>>>>>>>>>>>>>>> 500000. You could set that lower for better UI performance >>>>>>>>>>>>>>>>>>> (losing some >>>>>>>>>>>>>>>>>>> information, of course.) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> As for long-running queries, a lot of time and effort >>>>>>>>>>>>>>>>>>> has been spent in MCF to insure that this doesn't happen. >>>>>>>>>>>>>>>>>>> Specifically, >>>>>>>>>>>>>>>>>>> the main document queuing query is structured to read >>>>>>>>>>>>>>>>>>> directly out of a >>>>>>>>>>>>>>>>>>> specific jobqueue index. This is the crucial query that >>>>>>>>>>>>>>>>>>> must work properly >>>>>>>>>>>>>>>>>>> for scalability, since doing a query that is effectively >>>>>>>>>>>>>>>>>>> just a sort on the >>>>>>>>>>>>>>>>>>> entire jobqueue would be a major problem. There are some >>>>>>>>>>>>>>>>>>> times where >>>>>>>>>>>>>>>>>>> Postgresql's optimizer fails to do the right thing here, >>>>>>>>>>>>>>>>>>> mostly because it >>>>>>>>>>>>>>>>>>> makes a huge distinction between whether there's zero of >>>>>>>>>>>>>>>>>>> something or one >>>>>>>>>>>>>>>>>>> of something, but you can work around that particular issue >>>>>>>>>>>>>>>>>>> by setting the >>>>>>>>>>>>>>>>>>> analyze count to 1 if you start to see this problem -- >>>>>>>>>>>>>>>>>>> which basically >>>>>>>>>>>>>>>>>>> means that reanalysis of the table has to occur on every >>>>>>>>>>>>>>>>>>> stuffing query. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I'd appreciate seeing the queries that are long-running >>>>>>>>>>>>>>>>>>> in your case so that I can see if that is what you are >>>>>>>>>>>>>>>>>>> encountering or not. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, Sep 10, 2014 at 1:01 PM, Paul Boichat < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi Karl, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> We're beginning to see issues with a document count > >>>>>>>>>>>>>>>>>>>> 10 million. At that point, even with good postgres >>>>>>>>>>>>>>>>>>>> vacuuming the jobqueue table is starting to become a >>>>>>>>>>>>>>>>>>>> bottleneck. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> For example select count(*) from jobqueue, which is >>>>>>>>>>>>>>>>>>>> executed when querying job status will do a full table >>>>>>>>>>>>>>>>>>>> scan of >>>>>>>>>>>>>>>>>>>> jobqueue which has more than 10 million rows. That's >>>>>>>>>>>>>>>>>>>> going to take some time in postgres. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> SSDs will certainly make a big difference to document >>>>>>>>>>>>>>>>>>>> processing through-put (which we see is largely I/O bound >>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>> postgres) but we are increasingly seeing long running >>>>>>>>>>>>>>>>>>>> queries in the logs. Our current thinking is that we'll >>>>>>>>>>>>>>>>>>>> need to refactor >>>>>>>>>>>>>>>>>>>> JobQueue somewhat to optimise queries and, potentially >>>>>>>>>>>>>>>>>>>> partition jobqueue into a subset of tables (table per >>>>>>>>>>>>>>>>>>>> queue for example). >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Paul >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> VP Engineering, >>>>>>>>>>>>>>>>>>>> Exonar Ltd >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> T: +44 7940 567724 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> twitter:@exonarco @pboichat >>>>>>>>>>>>>>>>>>>> W: http://www.exonar.com >>>>>>>>>>>>>>>>>>>> Nothing is secure. Now what? Exonar Raven >>>>>>>>>>>>>>>>>>>> <http://video.exonar.com/> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Exonar Limited, registered in the UK, registration >>>>>>>>>>>>>>>>>>>> number 06439969 at 14 West Mills, Newbury, Berkshire, >>>>>>>>>>>>>>>>>>>> RG14 5HG >>>>>>>>>>>>>>>>>>>> DISCLAIMER: This email and any attachments to it may >>>>>>>>>>>>>>>>>>>> be confidential and are intended solely for the use of the >>>>>>>>>>>>>>>>>>>> individual to >>>>>>>>>>>>>>>>>>>> whom it is addressed. Any views or opinions expressed are >>>>>>>>>>>>>>>>>>>> solely those of >>>>>>>>>>>>>>>>>>>> the author and do not necessarily represent those of >>>>>>>>>>>>>>>>>>>> Exonar Ltd. If >>>>>>>>>>>>>>>>>>>> you are not the intended recipient of this email, you must >>>>>>>>>>>>>>>>>>>> neither take any >>>>>>>>>>>>>>>>>>>> action based upon its contents, nor copy or show it to >>>>>>>>>>>>>>>>>>>> anyone. Please >>>>>>>>>>>>>>>>>>>> contact the sender if you believe you have received this >>>>>>>>>>>>>>>>>>>> email in error. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Wed, Sep 10, 2014 at 3:15 PM, Karl Wright < >>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi Baptiste, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ManifoldCF is not limited by the number of agents >>>>>>>>>>>>>>>>>>>>> processes or parallel connectors. Overall database >>>>>>>>>>>>>>>>>>>>> performance is the >>>>>>>>>>>>>>>>>>>>> limiting factor. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I would read this: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/performance-tuning.html >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Also, there's a section in ManifoldCF (I believe >>>>>>>>>>>>>>>>>>>>> Chapter 2) that discusses this issue. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Some five years ago, I successfully crawled 5 million >>>>>>>>>>>>>>>>>>>>> web documents, using Postgresql 8.3. Postgresql 9.x is >>>>>>>>>>>>>>>>>>>>> faster, and with >>>>>>>>>>>>>>>>>>>>> modern SSD's, I expect that you will do even better. In >>>>>>>>>>>>>>>>>>>>> general, I'd say >>>>>>>>>>>>>>>>>>>>> it was fine to shoot for 10M - 100M documents on >>>>>>>>>>>>>>>>>>>>> ManifoldCF, provided that >>>>>>>>>>>>>>>>>>>>> you use a good database, and provided that you maintain >>>>>>>>>>>>>>>>>>>>> it properly. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Wed, Sep 10, 2014 at 10:07 AM, Baptiste Berthier < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hi >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I would like to know what is the maximum number of >>>>>>>>>>>>>>>>>>>>>> documents that you managed to crawl with ManifoldCF and >>>>>>>>>>>>>>>>>>>>>> with how many >>>>>>>>>>>>>>>>>>>>>> connectors in parallel it could works ? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks for your answer >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Baptiste >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
