Hi Konstantin, The query you are looking at is performed by the UI only, and there is a parameter you can set which applies a limit to the number of documents so that the count is reported as "<limit>+" in the UI. This is the parameter:
org.apache.manifoldcf.ui.maxstatuscount As for why the database gets slow for crawling, unless you are seeing reports in the log of long-running queries, then it's a good chance you need to vacuum your database instance. I generally recommend that a vacuum full be done periodically for database instances. Autovacuuming has gotten a lot better in postgres than it used to be but at least in the past the autovacuuming process would get far behind ManifoldCF and so the database would get quite bloated anyway. So I'd give that a try. If you are seeing logging output mentioning slow queries, you may need to tune how often MCF analyzes certain tables. There are parameters that control that as well. In general, if there is a slow query with a bad plan, and analyzing the tables involved makes it come up with a much better plan, analysis is not happening often enough. But first, before you get to that point, have a look at the log and see whether this is likely to be the problem. (Usually it is the stuffer query that gets slow when there's an issue with table analysis, FWIW). Please feel free to post the plan of the queries being reported here. Thanks, Karl On Wed, Apr 27, 2016 at 7:33 AM, jetnet <[email protected]> wrote: > Hi Karl, > > I set up two MCF instances (quick setup) on the same machine, using > the same Postgres 9.3 instance (with different databases > "org.apache.manifoldcf.database.name" of course). > After a couple of days I've got a performance issue: one MCF instance > has become very slow - it processes a few docs per hour only. I guess, > the bottleneck is the database: > > "normal" instance: > SELECT status, count(*) AS count FROM jobqueue GROUP BY status -- > 738.311 rows in the table, took 1,2 sec > "G";50674 > "F";68 > "P";149179 > "C";402367 > "A";33 > "Z";136676 > > "slow" instance (currently with a single active job): > SELECT status, count(*) AS count FROM jobqueue GROUP BY status -- > 2.745.329 rows in the table, took 350 sec > "G";337922 --STATUS_PENDINGPURGATORY > "F";449 --STATUS_ACTIVEPURGATORY > "P";25909 --STATUS_PENDING > "C";562772 --STATUS_COMPLETE > "A";9 --STATUS_ACTIVE > "Z";1644927 --STATUS_PURGATORY > > Since "count(*)" is terrible slow in Postgres, I used the following > sql to count jobqueue's rows: > SELECT reltuples::bigint AS approximate_row_count FROM pg_class WHERE > relname = 'jobqueue'; > > Both MCF instances have the same number of working threads, database > handles etc. > Is the database "full"? What could you recommend to improve the > performance? > > Thank you! > Konstantin >
