Re: Apache ManifoldCF Performance

Karl Wright Fri, 12 Sep 2014 09:52:09 -0700

It's done actually; my new laptop with PostgreSQL 9.3 can do 111111
documents in roughly 15 minutes.  No problems encountered; roughly 120
documents per second.


For sanity sake, could you try the following:

- check out or unpack 1.6.1 sources
- lay down downloaded lib dependencies
- build using "ant build"
- modify properties.xml and start the approparite example

Please see if you have any problems doing this process *without* any
patches.

Also, what kind of synchronization are you using?  File based, zookeeper,
or single-process?

Thanks,
Karl


On Fri, Sep 12, 2014 at 12:29 PM, Karl Wright <[email protected]> wrote:

> Hi Paul,
>
> The query looks right; the database driver determines the maximum number
> of clauses in a conjunction OR list, just like it does for an IN() list.
> In the case of Postgresql and OR, the limit is 25; for IN()'s it's 100.
>
> The standard integration tests generally run small jobs but that is
> typically sufficient to find query generation problems.  I have load tests
> I can also run but they take several hours to complete.  I'll start one
> now, but I may need to abort it before it finishes.
>
> Karl
>
>
> On Fri, Sep 12, 2014 at 11:26 AM, Paul Boichat <[email protected]>
> wrote:
>
>> Hi,
>>
>> I'm looking through the logs - can see the change from IN to OR in each
>> query - and there's clearly a difference in execution path but it's quite
>> verbose so will take a while.
>>
>> It may well be that document state has not been reprioritised or in some
>> way inconsistent. However, I don't think it's that which is causing the
>> issue - I can switch this behaviour on and off over by changing the
>> DBInterfacePostgres class and restarting Manifold. That seems to suggest a
>> query isn't behaving the same way between IN and OR - I just can't isolate
>> the particular query (yet).
>>
>> Have you tested with a job already in running state (on a restart) with a
>> large document count? For example am seeing this kind of thing which looks
>> messy but appears to execute as you'd expect:
>>
>> SELECT id,dockey,lastversion,lastoutputversion,authorityname,forcedparams
>> FROM ingeststatus WHERE  (dockey=? OR dockey=? OR dockey=? OR dockey=? OR
>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR
>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR
>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR
>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR
>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR
>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR
>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR
>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR
>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR
>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR
>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR
>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR
>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR
>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR
>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR
>> dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=? OR dockey=?) AND
>> connectionname=?]
>> DEBUG 2014-09-12 15:01:27,052 (Thread-542) -   Parameter 0:
>> '1407144048431:F42CD76D66FA6BAD396FF8F8A409DD211C184E6A'
>> DEBUG 2014-09-12 15:01:27,052 (Thread-542) -   Parameter 1:
>> '1407144048431:FE66CC4054300E4EB2A84138DC9B62B80F59F5B9'
>>
>>
>>
>>
>> VP Engineering,
>> Exonar Ltd
>>
>> T: +44 7940 567724
>>
>> twitter:@exonarco @pboichat
>> W: http://www.exonar.com
>> Nothing is secure. Now what? Exonar Raven <http://video.exonar.com/>
>>
>> Exonar Limited, registered in the UK, registration number 06439969 at 14
>> West Mills, Newbury, Berkshire, RG14 5HG
>> DISCLAIMER: This email and any attachments to it may be confidential and
>> are intended solely for the use of the individual to whom it is addressed.
>> Any views or opinions expressed are solely those of the author and do not
>> necessarily represent those of Exonar Ltd. If you are not the intended
>> recipient of this email, you must neither take any action based upon its
>> contents, nor copy or show it to anyone. Please contact the sender if
>> you believe you have received this email in error.
>>
>> On Fri, Sep 12, 2014 at 4:20 PM, Karl Wright <[email protected]> wrote:
>>
>>> Hi Paul,
>>>
>>> The tests in fact do multiple complete crawls, so it is extremely
>>> unlikely that the stuffer query is broken.  If you look at the queries
>>> generated, you should note that the only difference is that whenever an xxx
>>> IN(?,?) was generated before, a (xxx=? OR xxx=?) is generated instead.
>>> These should be completely equivalent; if they don't look equivalent to you
>>> in the log, then I will fix whatever is broken.  I'll make sure here that
>>> the queries look right visually too.
>>>
>>> One possibility is that when you restarted the agents process, the
>>> jobqueue records did not yet finish getting reprioritized.  Stuffer queries
>>> are fired all the time, but the running jobs must complete reprioritization
>>> before the stuffer query will pick up any records.  I wonder if they may
>>> not have managed to get to the right state before you aborted the
>>> experiment?  You can tell what is happening by using jstack to get a thread
>>> dump of the agents process.
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>> On Fri, Sep 12, 2014 at 11:05 AM, Paul Boichat <[email protected]>
>>> wrote:
>>>
>>>> I stayed with base 1.6.1 and manually patched the code to include the
>>>> two new methods in DBInterfacePostgreSQL
>>>>
>>>> Paul
>>>>
>>>>
>>>>
>>>> VP Engineering,
>>>> Exonar Ltd
>>>>
>>>> T: +44 7940 567724
>>>>
>>>> twitter:@exonarco @pboichat
>>>> W: http://www.exonar.com
>>>> Nothing is secure. Now what? Exonar Raven <http://video.exonar.com/>
>>>>
>>>> Exonar Limited, registered in the UK, registration number 06439969 at 14
>>>> West Mills, Newbury, Berkshire, RG14 5HG
>>>> DISCLAIMER: This email and any attachments to it may be confidential
>>>> and are intended solely for the use of the individual to whom it is
>>>> addressed. Any views or opinions expressed are solely those of the author
>>>> and do not necessarily represent those of Exonar Ltd. If you are not
>>>> the intended recipient of this email, you must neither take any action
>>>> based upon its contents, nor copy or show it to anyone. Please contact
>>>> the sender if you believe you have received this email in error.
>>>>
>>>> On Fri, Sep 12, 2014 at 4:01 PM, Karl Wright <[email protected]>
>>>> wrote:
>>>>
>>>>> The changes pass all tests here.  Is it possible that you attempted
>>>>> some upgrade that failed (or didn't attempt upgrade but went to a new code
>>>>> version)?
>>>>>
>>>>> If you could let me know as exactly as possible what you did, I can
>>>>> let you know if that should have worked or not.
>>>>>
>>>>> Thanks!
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Sep 12, 2014 at 10:57 AM, Paul Boichat <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Karl,
>>>>>>
>>>>>> We appear to be seeing an issue with the performance change to use an
>>>>>> OR clause rather than IN. After making the change, when we restart
>>>>>> manifoldcf (with one job in running state) documents in the running job 
>>>>>> are
>>>>>> not picked up for processing by the stuffer thread. If we redploy base
>>>>>> 1.6.1 and restart documents are processed. This is consistently 
>>>>>> switchable
>>>>>> depending on which code base is deployed.
>>>>>>
>>>>>> We have logs that I could upload to the ticket if you recommend that
>>>>>> we reopen the issue (or create a new one)?
>>>>>>
>>>>>> Paul
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> VP Engineering,
>>>>>> Exonar Ltd
>>>>>>
>>>>>> T: +44 7940 567724
>>>>>>
>>>>>> twitter:@exonarco @pboichat
>>>>>> W: http://www.exonar.com
>>>>>> Nothing is secure. Now what? Exonar Raven <http://video.exonar.com/>
>>>>>>
>>>>>> Exonar Limited, registered in the UK, registration number 06439969 at 14
>>>>>> West Mills, Newbury, Berkshire, RG14 5HG
>>>>>> DISCLAIMER: This email and any attachments to it may be confidential
>>>>>> and are intended solely for the use of the individual to whom it is
>>>>>> addressed. Any views or opinions expressed are solely those of the author
>>>>>> and do not necessarily represent those of Exonar Ltd. If you are not
>>>>>> the intended recipient of this email, you must neither take any action
>>>>>> based upon its contents, nor copy or show it to anyone. Please
>>>>>> contact the sender if you believe you have received this email in error.
>>>>>>
>>>>>> On Fri, Sep 12, 2014 at 6:05 AM, Karl Wright <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Paul --
>>>>>>>
>>>>>>> Just to be clear -- the branch for CONNECTORS-1027 is a branch of
>>>>>>> trunk, which is MCF 2.0.  MCF 2.0 is not backwards compatible with any
>>>>>>> previous MCF release, and indeed there is no upgrade from any 1.x 
>>>>>>> release
>>>>>>> to 2.0.  That's why I said to use the patches, and try to stay on 1.6.1 
>>>>>>> or
>>>>>>> at most to migrate to 1.7.
>>>>>>>
>>>>>>> IF you ALREADY tried an upgrade with the branch code, then you would
>>>>>>> have wound up in a schema state where the schema had more columns in it
>>>>>>> than the branch knew how to deal with.  That's bad, and you will need 
>>>>>>> to do
>>>>>>> things to fix the situation.  I believe you should still be able to do 
>>>>>>> the
>>>>>>> following:
>>>>>>>
>>>>>>> - Download 1.7 source, or check out
>>>>>>> https://svn.apache.org/repos/asf/manifoldcf/branches/release-1.7-branch
>>>>>>> - Apply the patches
>>>>>>> - Build
>>>>>>> - Modify your properties.xml to point to your postgresql instance
>>>>>>> - Run the upgrade (initialize.bat on the multi-process example, or
>>>>>>> start the single-process example)
>>>>>>>
>>>>>>> You should then have a working 1.7 release, with code patches
>>>>>>> applied.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Sep 11, 2014 at 11:34 AM, Paul Boichat <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Thanks - we've pulled down the branch and will test the changes. It
>>>>>>>> looks like a branch of 1.7 so it's going to take us a little while to 
>>>>>>>> test.
>>>>>>>> We need to migrate our connectors (there's some deprecated stuff 
>>>>>>>> that's now
>>>>>>>> been cleared in 1.7 .eg. getShareACL) and we'll need to patch the 
>>>>>>>> database
>>>>>>>> to include the pipeline and any other schema changes. We'll have some
>>>>>>>> environment contention over the next week as our performance test
>>>>>>>> environment needs to remain on 1.6.1 while we test a release. Once 
>>>>>>>> that's
>>>>>>>> clear I'll move to 1.7
>>>>>>>>
>>>>>>>> On the database schema patch moving from 1.6.1 to 1.7 - is there a
>>>>>>>> simple way to migrate and existing database?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Paul
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> VP Engineering,
>>>>>>>> Exonar Ltd
>>>>>>>>
>>>>>>>> T: +44 7940 567724
>>>>>>>>
>>>>>>>> twitter:@exonarco @pboichat
>>>>>>>> W: http://www.exonar.com
>>>>>>>> Nothing is secure. Now what? Exonar Raven
>>>>>>>> <http://video.exonar.com/>
>>>>>>>>
>>>>>>>> Exonar Limited, registered in the UK, registration number 06439969
>>>>>>>> at 14 West Mills, Newbury, Berkshire, RG14 5HG
>>>>>>>> DISCLAIMER: This email and any attachments to it may be
>>>>>>>> confidential and are intended solely for the use of the individual to 
>>>>>>>> whom
>>>>>>>> it is addressed. Any views or opinions expressed are solely those of 
>>>>>>>> the
>>>>>>>> author and do not necessarily represent those of Exonar Ltd. If
>>>>>>>> you are not the intended recipient of this email, you must neither 
>>>>>>>> take any
>>>>>>>> action based upon its contents, nor copy or show it to anyone. Please
>>>>>>>> contact the sender if you believe you have received this email in 
>>>>>>>> error.
>>>>>>>>
>>>>>>>> On Thu, Sep 11, 2014 at 1:27 PM, Karl Wright <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks -- I'll include that change as well then, in ticket
>>>>>>>>> CONNECTORS-1027.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Sep 11, 2014 at 7:45 AM, Paul Boichat <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> That comes back immediately with 10001 rows:
>>>>>>>>>>
>>>>>>>>>> explain analyze SELECT count(*) FROM (SELECT 'x' FROM jobqueue
>>>>>>>>>> LIMIT 10001) t;
>>>>>>>>>>
>>>>>>>>>> QUERY PLAN
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -----------------------------------------------------------------------------------------------------------------------
>>>>>>>>>> ----------------------------------
>>>>>>>>>>  Aggregate  (cost=544.08..544.09 rows=1 width=0) (actual
>>>>>>>>>> time=9.125..9.125 rows=1 loops=1)
>>>>>>>>>>    ->  Limit  (cost=0.00..419.07 rows=10001 width=0) (actual
>>>>>>>>>> time=0.033..6.945 rows=10001 loops=1)
>>>>>>>>>>          ->  Index Only Scan using jobqueue_pkey on jobqueue
>>>>>>>>>> (cost=0.00..431189.31 rows=10290271 width=0) (actual time
>>>>>>>>>> =0.031..3.257 rows=10001 loops=1)
>>>>>>>>>>                Heap Fetches: 725
>>>>>>>>>>  Total runtime: 9.157 ms
>>>>>>>>>> (5 rows)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Whereas:
>>>>>>>>>>
>>>>>>>>>> explain analyze SELECT count(*) FROM jobqueue limit 10001;
>>>>>>>>>>
>>>>>>>>>> QUERY PLAN
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -----------------------------------------------------------------------------------------------------------------------
>>>>>>>>>> ----------------------------------------
>>>>>>>>>>  Limit  (cost=456922.99..456923.00 rows=1 width=0) (actual
>>>>>>>>>> time=5225.107..5225.109 rows=1 loops=1)
>>>>>>>>>>    ->  Aggregate  (cost=456922.99..456923.00 rows=1 width=0)
>>>>>>>>>> (actual time=5225.105..5225.106 rows=1 loops=1)
>>>>>>>>>>          ->  Index Only Scan using jobqueue_pkey on jobqueue
>>>>>>>>>> (cost=0.00..431197.31 rows=10290271 width=0) (actual time
>>>>>>>>>> =0.108..3090.848 rows=10370209 loops=1)
>>>>>>>>>>                Heap Fetches: 684297
>>>>>>>>>>  Total runtime: 5225.151 ms
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Paul
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> VP Engineering,
>>>>>>>>>> Exonar Ltd
>>>>>>>>>>
>>>>>>>>>> T: +44 7940 567724
>>>>>>>>>>
>>>>>>>>>> twitter:@exonarco @pboichat
>>>>>>>>>> W: http://www.exonar.com
>>>>>>>>>> Nothing is secure. Now what? Exonar Raven
>>>>>>>>>> <http://video.exonar.com/>
>>>>>>>>>>
>>>>>>>>>> Exonar Limited, registered in the UK, registration number
>>>>>>>>>> 06439969 at 14 West Mills, Newbury, Berkshire, RG14 5HG
>>>>>>>>>> DISCLAIMER: This email and any attachments to it may be
>>>>>>>>>> confidential and are intended solely for the use of the individual 
>>>>>>>>>> to whom
>>>>>>>>>> it is addressed. Any views or opinions expressed are solely those of 
>>>>>>>>>> the
>>>>>>>>>> author and do not necessarily represent those of Exonar Ltd. If
>>>>>>>>>> you are not the intended recipient of this email, you must neither 
>>>>>>>>>> take any
>>>>>>>>>> action based upon its contents, nor copy or show it to anyone. Please
>>>>>>>>>> contact the sender if you believe you have received this email in 
>>>>>>>>>> error.
>>>>>>>>>>
>>>>>>>>>> On Thu, Sep 11, 2014 at 12:25 PM, Karl Wright <[email protected]
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Paul,
>>>>>>>>>>>
>>>>>>>>>>> Could you try this query on your database please and tell me if
>>>>>>>>>>> it executes promptly:
>>>>>>>>>>>
>>>>>>>>>>> SELECT count(*) FROM (SELECT 'x' FROM jobqueue LIMIT 10001) t
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I vaguely remember that I had to change the form of this query
>>>>>>>>>>> in order to support MySQL -- but first let's see if this helps.
>>>>>>>>>>>
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Sep 11, 2014 at 6:01 AM, Karl Wright <[email protected]
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I've created a ticket (CONNECTORS-1027) and a trunk-based
>>>>>>>>>>>> branch (branches/CONNECTORS-1027) for looking at any changes we do 
>>>>>>>>>>>> for
>>>>>>>>>>>> large-scale Postgresql optimization work.
>>>>>>>>>>>>
>>>>>>>>>>>> Please note that trunk code already has schema changes relative
>>>>>>>>>>>> to MCF 1.7, so you will not be able to work directly with this 
>>>>>>>>>>>> branch
>>>>>>>>>>>> code.  I'll have to create patches for whatever changes you would 
>>>>>>>>>>>> need to
>>>>>>>>>>>> try.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Sep 11, 2014 at 5:56 AM, Paul Boichat <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> We're on Postgres 9.2. I'll get the query plans and add them
>>>>>>>>>>>>> to the doc.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>
>>>>>>>>>>>>> Paul
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> VP Engineering,
>>>>>>>>>>>>> Exonar Ltd
>>>>>>>>>>>>>
>>>>>>>>>>>>> T: +44 7940 567724
>>>>>>>>>>>>>
>>>>>>>>>>>>> twitter:@exonarco @pboichat
>>>>>>>>>>>>> W: http://www.exonar.com
>>>>>>>>>>>>> Nothing is secure. Now what? Exonar Raven
>>>>>>>>>>>>> <http://video.exonar.com/>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Exonar Limited, registered in the UK, registration number
>>>>>>>>>>>>> 06439969 at 14 West Mills, Newbury, Berkshire, RG14 5HG
>>>>>>>>>>>>> DISCLAIMER: This email and any attachments to it may be
>>>>>>>>>>>>> confidential and are intended solely for the use of the 
>>>>>>>>>>>>> individual to whom
>>>>>>>>>>>>> it is addressed. Any views or opinions expressed are solely those 
>>>>>>>>>>>>> of the
>>>>>>>>>>>>> author and do not necessarily represent those of Exonar Ltd. If
>>>>>>>>>>>>> you are not the intended recipient of this email, you must 
>>>>>>>>>>>>> neither take any
>>>>>>>>>>>>> action based upon its contents, nor copy or show it to anyone. 
>>>>>>>>>>>>> Please
>>>>>>>>>>>>> contact the sender if you believe you have received this email in 
>>>>>>>>>>>>> error.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Sep 11, 2014 at 10:51 AM, Karl Wright <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Paul,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you include the logged plan for this query; this is an
>>>>>>>>>>>>>> actual query encountered during crawling:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> WARN 2014-09-05 12:43:39,897 (Worker thread '61') - Found a
>>>>>>>>>>>>>> long-running query (596499 ms): [SELECT t0.id,t0.dochash,t0.docid
>>>>>>>>>>>>>> FROM carrydown t1, jobqueue t0 WHERE t1.jobid=? AND 
>>>>>>>>>>>>>> t1.parentidhash=? AND
>>>>>>>>>>>>>> t0.dochash=t1.childidhash AND t0.jobid=t1.jobid AND t1.isnew=?]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> These queries are all from the UI; it is what gets generated
>>>>>>>>>>>>>> when no limits are in place:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  WARN 2014-09-05 12:33:47,445 (http-apr-8081-exec-2) - Found
>>>>>>>>>>>>>> a long-running query (166845 ms): [SELECT jobid,COUNT(dochash) 
>>>>>>>>>>>>>> AS doccount
>>>>>>>>>>>>>> FROM jobqueue t1 GROUP BY jobid]
>>>>>>>>>>>>>>  WARN 2014-09-05 12:33:47,908 (http-apr-8081-exec-3) - Found
>>>>>>>>>>>>>> a long-running query (107222 ms): [SELECT jobid,COUNT(dochash) 
>>>>>>>>>>>>>> AS doccount
>>>>>>>>>>>>>> FROM jobqueue t1 GROUP BY jobid]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This query is from the UI with a limit of 1000000:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> WARN 2014-09-05 12:33:45,390 (http-apr-8081-exec-10) - Found
>>>>>>>>>>>>>> a long-running query (254851 ms): [SELECT COUNT(dochash) AS 
>>>>>>>>>>>>>> doccount FROM
>>>>>>>>>>>>>> jobqueue t1 LIMIT 1000001]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I honestly don't understand why PostgreSQL would execute a
>>>>>>>>>>>>>> sequential scan of the entire table when given a limit clause.  
>>>>>>>>>>>>>> It
>>>>>>>>>>>>>> certainly didn't used to do that.  If you have any other 
>>>>>>>>>>>>>> suggestions please
>>>>>>>>>>>>>> let me know.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Some queries show up in this list because MCF periodically
>>>>>>>>>>>>>> reindexes tables.  For example, this query goes only against the 
>>>>>>>>>>>>>> (small)
>>>>>>>>>>>>>> jobs table.  Its poor performance on occasion is likely due to 
>>>>>>>>>>>>>> something
>>>>>>>>>>>>>> else happening to the database, probably a reindex:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  WARN 2014-09-05 12:43:40,404 (Finisher thread) - Found a
>>>>>>>>>>>>>> long-running query (592474 ms): [SELECT id FROM jobs WHERE 
>>>>>>>>>>>>>> status IN
>>>>>>>>>>>>>> (?,?,?,?,?) FOR UPDATE]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The final query is the document stuffing query, which is
>>>>>>>>>>>>>> perhaps the most critical query in the whole system:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  SELECT
>>>>>>>>>>>>>>  t0.id
>>>>>>>>>>>>>> ,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,
>>>>>>>>>>>>>>  t0.priorityset FROM jobqueue t0
>>>>>>>>>>>>>>  WHERE t0.status IN ('P','G')  AND t0.checkaction='R' AND
>>>>>>>>>>>>>> t0.checktime
>>>>>>>>>>>>>>  <= 1407246846166
>>>>>>>>>>>>>>  AND EXISTS (
>>>>>>>>>>>>>>    SELECT 'x' FROM jobs t1
>>>>>>>>>>>>>>    WHERE t1.status  IN ('A','a')  AND t1.id=t0.jobid  AND
>>>>>>>>>>>>>> t1.priority=5
>>>>>>>>>>>>>>  )
>>>>>>>>>>>>>>  AND NOT EXISTS (
>>>>>>>>>>>>>>    SELECT 'x' FROM jobqueue t2
>>>>>>>>>>>>>>    WHERE t2.dochash=t0.dochash AND t2.status IN
>>>>>>>>>>>>>>  ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid
>>>>>>>>>>>>>>  )
>>>>>>>>>>>>>>  AND NOT EXISTS (
>>>>>>>>>>>>>>    SELECT 'x' FROM prereqevents t3,events t4
>>>>>>>>>>>>>>    WHERE t0.id=t3.owner AND t3.eventname=t4.name
>>>>>>>>>>>>>>  )
>>>>>>>>>>>>>>  ORDER BY t0.docpriority ASC
>>>>>>>>>>>>>>  LIMIT 480;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Your analysis of whether IN beats OR does not agree with
>>>>>>>>>>>>>> experiments I did on postgresql 8.7 which showed no difference.  
>>>>>>>>>>>>>> What
>>>>>>>>>>>>>> Postgresql version are you using?  Also, I trust you have query 
>>>>>>>>>>>>>> plans that
>>>>>>>>>>>>>> demonstrate your claim?  In any case, whether IN vs. OR is 
>>>>>>>>>>>>>> generated is a
>>>>>>>>>>>>>> function of the MCF database driver, so this is trivial to 
>>>>>>>>>>>>>> experiment
>>>>>>>>>>>>>> with.  I'll create a ticket and a branch for experimentation.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Sep 11, 2014 at 5:32 AM, Paul Boichat <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Changing maxcountstatus to something much smaller (10,000)
>>>>>>>>>>>>>>> doesn't seem to buy us that much on the table scan - in the 
>>>>>>>>>>>>>>> attached you'll
>>>>>>>>>>>>>>> see that it's still taking a long time to return the job status 
>>>>>>>>>>>>>>> page. Also
>>>>>>>>>>>>>>> in the attached are some sample other long running queries that 
>>>>>>>>>>>>>>> we're
>>>>>>>>>>>>>>> beginning to see more frequently. There's also an example of a 
>>>>>>>>>>>>>>> query that's
>>>>>>>>>>>>>>> frequently executed and regularly takes > 4 secs (plus a 
>>>>>>>>>>>>>>> suggested change
>>>>>>>>>>>>>>> to improve performance). This one in particular would certainly 
>>>>>>>>>>>>>>> benefit
>>>>>>>>>>>>>>> from a change to SSDs which should relieve the I/O bound 
>>>>>>>>>>>>>>> bottleneck on
>>>>>>>>>>>>>>> postgres.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We're loading the system from 10mil towards 100mil so would
>>>>>>>>>>>>>>> be keen to work with you to optimise where possible.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Paul
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> VP Engineering,
>>>>>>>>>>>>>>> Exonar Ltd
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> T: +44 7940 567724
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> twitter:@exonarco @pboichat
>>>>>>>>>>>>>>> W: http://www.exonar.com
>>>>>>>>>>>>>>> Nothing is secure. Now what? Exonar Raven
>>>>>>>>>>>>>>> <http://video.exonar.com/>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Exonar Limited, registered in the UK, registration number
>>>>>>>>>>>>>>> 06439969 at 14 West Mills, Newbury, Berkshire, RG14 5HG
>>>>>>>>>>>>>>> DISCLAIMER: This email and any attachments to it may be
>>>>>>>>>>>>>>> confidential and are intended solely for the use of the 
>>>>>>>>>>>>>>> individual to whom
>>>>>>>>>>>>>>> it is addressed. Any views or opinions expressed are solely 
>>>>>>>>>>>>>>> those of the
>>>>>>>>>>>>>>> author and do not necessarily represent those of Exonar Ltd. If
>>>>>>>>>>>>>>> you are not the intended recipient of this email, you must 
>>>>>>>>>>>>>>> neither take any
>>>>>>>>>>>>>>> action based upon its contents, nor copy or show it to anyone. 
>>>>>>>>>>>>>>> Please
>>>>>>>>>>>>>>> contact the sender if you believe you have received this email 
>>>>>>>>>>>>>>> in error.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Sep 10, 2014 at 6:34 PM, Karl Wright <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Paul,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The jobstatus query that uses count(*) should be doing
>>>>>>>>>>>>>>>> something like this when the maxdocumentstatuscount value is 
>>>>>>>>>>>>>>>> set:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> select count(*) from jobqueue where xxx limit 500001
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This will still do a sequential scan, but it will be an
>>>>>>>>>>>>>>>> aborted one, so you can control the maximum amount of time 
>>>>>>>>>>>>>>>> spent doing the
>>>>>>>>>>>>>>>> query.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Sep 10, 2014 at 1:23 PM, Paul Boichat <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We've had a play with maxstatuscount and couldn't stop it
>>>>>>>>>>>>>>>>> from count(*)-ing but I'll certainly have another look to see 
>>>>>>>>>>>>>>>>> if we've
>>>>>>>>>>>>>>>>> missed something.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We're increasingly seeing long running threads and I'll
>>>>>>>>>>>>>>>>> put together some samples. As an example, on a job that's 
>>>>>>>>>>>>>>>>> currently
>>>>>>>>>>>>>>>>> aborting:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> WARN 2014-09-10 18:37:29,900 (Job reset thread) - Found a
>>>>>>>>>>>>>>>>> long-running query (72902 ms): [UPDATE jobqueue SET
>>>>>>>>>>>>>>>>> docpriority=?,priorityset=NULL WHERE jobid=?]
>>>>>>>>>>>>>>>>>  WARN 2014-09-10 18:37:29,900 (Job reset thread) -
>>>>>>>>>>>>>>>>> Parameter 0: '1.000000001E9'
>>>>>>>>>>>>>>>>>  WARN 2014-09-10 18:37:29,900 (Job reset thread) -
>>>>>>>>>>>>>>>>> Parameter 1: '1407144048075'
>>>>>>>>>>>>>>>>>  WARN 2014-09-10 18:37:29,960 (Job reset thread) -  Plan:
>>>>>>>>>>>>>>>>> Update on jobqueue  (cost=18806.08..445770.39 rows=764916 
>>>>>>>>>>>>>>>>> width=287)
>>>>>>>>>>>>>>>>>  WARN 2014-09-10 18:37:29,960 (Job reset thread) -
>>>>>>>>>>>>>>>>> Plan:   ->  Bitmap Heap Scan on jobqueue  
>>>>>>>>>>>>>>>>> (cost=18806.08..445770.39
>>>>>>>>>>>>>>>>> rows=764916 width=287)
>>>>>>>>>>>>>>>>>  WARN 2014-09-10 18:37:29,960 (Job reset thread) -
>>>>>>>>>>>>>>>>> Plan:         Recheck Cond: (jobid = 1407144048075::bigint)
>>>>>>>>>>>>>>>>>  WARN 2014-09-10 18:37:29,960 (Job reset thread) -
>>>>>>>>>>>>>>>>> Plan:         ->  Bitmap Index Scan on i1392985450177  
>>>>>>>>>>>>>>>>> (cost=0.00..18614.85
>>>>>>>>>>>>>>>>> rows=764916 width=0)
>>>>>>>>>>>>>>>>>  WARN 2014-09-10 18:37:29,960 (Job reset thread) -
>>>>>>>>>>>>>>>>> Plan:               Index Cond: (jobid = 
>>>>>>>>>>>>>>>>> 1407144048075::bigint)
>>>>>>>>>>>>>>>>>  WARN 2014-09-10 18:37:29,960 (Job reset thread) -
>>>>>>>>>>>>>>>>>  WARN 2014-09-10 18:37:30,140 (Job reset thread) -  Stats:
>>>>>>>>>>>>>>>>> n_distinct=4.0 most_common_vals={G,C,Z,P}
>>>>>>>>>>>>>>>>> most_common_freqs={0.40676665,0.36629999,0.16606666,0.060866665}
>>>>>>>>>>>>>>>>>  WARN 2014-09-10 18:37:30,140 (Job reset thread) -
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Paul
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> VP Engineering,
>>>>>>>>>>>>>>>>> Exonar Ltd
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> T: +44 7940 567724
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> twitter:@exonarco @pboichat
>>>>>>>>>>>>>>>>> W: http://www.exonar.com
>>>>>>>>>>>>>>>>> Nothing is secure. Now what? Exonar Raven
>>>>>>>>>>>>>>>>> <http://video.exonar.com/>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Exonar Limited, registered in the UK, registration number
>>>>>>>>>>>>>>>>> 06439969 at 14 West Mills, Newbury, Berkshire, RG14 5HG
>>>>>>>>>>>>>>>>> DISCLAIMER: This email and any attachments to it may be
>>>>>>>>>>>>>>>>> confidential and are intended solely for the use of the 
>>>>>>>>>>>>>>>>> individual to whom
>>>>>>>>>>>>>>>>> it is addressed. Any views or opinions expressed are solely 
>>>>>>>>>>>>>>>>> those of the
>>>>>>>>>>>>>>>>> author and do not necessarily represent those of Exonar Ltd. 
>>>>>>>>>>>>>>>>> If
>>>>>>>>>>>>>>>>> you are not the intended recipient of this email, you must 
>>>>>>>>>>>>>>>>> neither take any
>>>>>>>>>>>>>>>>> action based upon its contents, nor copy or show it to 
>>>>>>>>>>>>>>>>> anyone. Please
>>>>>>>>>>>>>>>>> contact the sender if you believe you have received this 
>>>>>>>>>>>>>>>>> email in error.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Sep 10, 2014 at 6:14 PM, Karl Wright <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Paul,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> For the jobqueue scans from the UI, there is a parameter
>>>>>>>>>>>>>>>>>> you can set which limits the number of documents counted to 
>>>>>>>>>>>>>>>>>> at most a
>>>>>>>>>>>>>>>>>> specified amount.  This uses a limit clause, which should 
>>>>>>>>>>>>>>>>>> prevent unbounded
>>>>>>>>>>>>>>>>>> time doing these kinds of queries:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> org.apache.manifoldcf.ui.maxstatuscount
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The documentation says that the default value for this
>>>>>>>>>>>>>>>>>> parameter is 10000, which however is incorrect.  The actual 
>>>>>>>>>>>>>>>>>> true default is
>>>>>>>>>>>>>>>>>> 500000.  You could set that lower for better UI performance 
>>>>>>>>>>>>>>>>>> (losing some
>>>>>>>>>>>>>>>>>> information, of course.)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> As for long-running queries, a lot of time and effort has
>>>>>>>>>>>>>>>>>> been spent in MCF to insure that this doesn't happen.  
>>>>>>>>>>>>>>>>>> Specifically, the
>>>>>>>>>>>>>>>>>> main document queuing query is structured to read directly 
>>>>>>>>>>>>>>>>>> out of a
>>>>>>>>>>>>>>>>>> specific jobqueue index.  This is the crucial query that 
>>>>>>>>>>>>>>>>>> must work properly
>>>>>>>>>>>>>>>>>> for scalability, since doing a query that is effectively 
>>>>>>>>>>>>>>>>>> just a sort on the
>>>>>>>>>>>>>>>>>> entire jobqueue would be a major problem.  There are some 
>>>>>>>>>>>>>>>>>> times where
>>>>>>>>>>>>>>>>>> Postgresql's optimizer fails to do the right thing here, 
>>>>>>>>>>>>>>>>>> mostly because it
>>>>>>>>>>>>>>>>>> makes a huge distinction between whether there's zero of 
>>>>>>>>>>>>>>>>>> something or one
>>>>>>>>>>>>>>>>>> of something, but you can work around that particular issue 
>>>>>>>>>>>>>>>>>> by setting the
>>>>>>>>>>>>>>>>>> analyze count to 1 if you start to see this problem -- which 
>>>>>>>>>>>>>>>>>> basically
>>>>>>>>>>>>>>>>>> means that reanalysis of the table has to occur on every 
>>>>>>>>>>>>>>>>>> stuffing query.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'd appreciate seeing the queries that are long-running
>>>>>>>>>>>>>>>>>> in your case so that I can see if that is what you are 
>>>>>>>>>>>>>>>>>> encountering or not.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Sep 10, 2014 at 1:01 PM, Paul Boichat <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We're beginning to see issues with a document count > 10
>>>>>>>>>>>>>>>>>>> million. At that point, even with good postgres
>>>>>>>>>>>>>>>>>>> vacuuming the jobqueue table is starting to become a
>>>>>>>>>>>>>>>>>>> bottleneck.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> For example select count(*) from jobqueue, which is
>>>>>>>>>>>>>>>>>>> executed when querying job status will do a full table scan 
>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> jobqueue which has more than 10 million rows. That's
>>>>>>>>>>>>>>>>>>> going to take some time in postgres.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> SSDs will certainly make a big difference to document
>>>>>>>>>>>>>>>>>>> processing through-put (which we see is largely I/O bound in
>>>>>>>>>>>>>>>>>>> postgres) but we are increasingly seeing long running
>>>>>>>>>>>>>>>>>>> queries in the logs. Our current thinking is that we'll 
>>>>>>>>>>>>>>>>>>> need to refactor
>>>>>>>>>>>>>>>>>>> JobQueue somewhat to optimise queries and, potentially
>>>>>>>>>>>>>>>>>>> partition jobqueue into a subset of tables (table per
>>>>>>>>>>>>>>>>>>> queue for example).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Paul
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> VP Engineering,
>>>>>>>>>>>>>>>>>>> Exonar Ltd
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> T: +44 7940 567724
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> twitter:@exonarco @pboichat
>>>>>>>>>>>>>>>>>>> W: http://www.exonar.com
>>>>>>>>>>>>>>>>>>> Nothing is secure. Now what? Exonar Raven
>>>>>>>>>>>>>>>>>>> <http://video.exonar.com/>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Exonar Limited, registered in the UK, registration
>>>>>>>>>>>>>>>>>>> number 06439969 at 14 West Mills, Newbury, Berkshire,
>>>>>>>>>>>>>>>>>>> RG14 5HG
>>>>>>>>>>>>>>>>>>> DISCLAIMER: This email and any attachments to it may be
>>>>>>>>>>>>>>>>>>> confidential and are intended solely for the use of the 
>>>>>>>>>>>>>>>>>>> individual to whom
>>>>>>>>>>>>>>>>>>> it is addressed. Any views or opinions expressed are solely 
>>>>>>>>>>>>>>>>>>> those of the
>>>>>>>>>>>>>>>>>>> author and do not necessarily represent those of Exonar 
>>>>>>>>>>>>>>>>>>> Ltd. If
>>>>>>>>>>>>>>>>>>> you are not the intended recipient of this email, you must 
>>>>>>>>>>>>>>>>>>> neither take any
>>>>>>>>>>>>>>>>>>> action based upon its contents, nor copy or show it to 
>>>>>>>>>>>>>>>>>>> anyone. Please
>>>>>>>>>>>>>>>>>>> contact the sender if you believe you have received this 
>>>>>>>>>>>>>>>>>>> email in error.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Sep 10, 2014 at 3:15 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Baptiste,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ManifoldCF is not limited by the number of agents
>>>>>>>>>>>>>>>>>>>> processes or parallel connectors.  Overall database 
>>>>>>>>>>>>>>>>>>>> performance is the
>>>>>>>>>>>>>>>>>>>> limiting factor.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I would read this:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/performance-tuning.html
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Also, there's a section in ManifoldCF (I believe
>>>>>>>>>>>>>>>>>>>> Chapter 2) that discusses this issue.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Some five years ago, I successfully crawled 5 million
>>>>>>>>>>>>>>>>>>>> web documents, using Postgresql 8.3.  Postgresql 9.x is 
>>>>>>>>>>>>>>>>>>>> faster, and with
>>>>>>>>>>>>>>>>>>>> modern SSD's, I expect that you will do even better.  In 
>>>>>>>>>>>>>>>>>>>> general, I'd say
>>>>>>>>>>>>>>>>>>>> it was fine to shoot for 10M - 100M documents on 
>>>>>>>>>>>>>>>>>>>> ManifoldCF, provided that
>>>>>>>>>>>>>>>>>>>> you use a good database, and provided that you maintain it 
>>>>>>>>>>>>>>>>>>>> properly.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Wed, Sep 10, 2014 at 10:07 AM, Baptiste Berthier <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I would like to know what is the maximum number of
>>>>>>>>>>>>>>>>>>>>> documents that you managed to crawl with ManifoldCF and 
>>>>>>>>>>>>>>>>>>>>> with how many
>>>>>>>>>>>>>>>>>>>>> connectors in parallel it could works ?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks for your answer
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Baptiste
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Apache ManifoldCF Performance

Reply via email to