Re: Database performance

Karl Wright Wed, 27 Apr 2016 04:44:08 -0700

Hi Konstantin,

The query you are looking at is performed by the UI only, and there is a
parameter you can set which applies a limit to the number of documents so
that the count is reported as "<limit>+" in the UI.  This is the parameter:

org.apache.manifoldcf.ui.maxstatuscount

As for why the database gets slow for crawling, unless you are seeing
reports in the log of long-running queries, then it's a good chance you
need to vacuum your database instance.  I generally recommend that a vacuum
full be done periodically for database instances.  Autovacuuming has gotten
a lot better in postgres than it used to be but at least in the past the
autovacuuming process would get far behind ManifoldCF and so the database
would get quite bloated anyway.  So I'd give that a try.

If you are seeing logging output mentioning slow queries, you may need to
tune how often MCF analyzes certain tables.  There are parameters that
control that as well.  In general, if there is a slow query with a bad
plan, and analyzing the tables involved makes it come up with a much better
plan, analysis is not happening often enough.  But first, before you get to
that point, have a look at the log and see whether this is likely to be the
problem.  (Usually it is the stuffer query that gets slow when there's an
issue with table analysis, FWIW).  Please feel free to post the plan of the
queries being reported here.

Thanks,
Karl

On Wed, Apr 27, 2016 at 7:33 AM, jetnet <[email protected]> wrote:

> Hi Karl,
>
> I set up two MCF instances (quick setup) on the same machine, using
> the same Postgres 9.3 instance (with different databases
> "org.apache.manifoldcf.database.name" of course).
> After a couple of days I've got a performance issue: one MCF instance
> has become very slow - it processes a few docs per hour only. I guess,
> the bottleneck is the database:
>
> "normal" instance:
> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
> 738.311 rows in the table, took 1,2 sec
> "G";50674
> "F";68
> "P";149179
> "C";402367
> "A";33
> "Z";136676
>
> "slow" instance (currently with a single active job):
> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
> 2.745.329 rows in the table, took 350 sec
> "G";337922  --STATUS_PENDINGPURGATORY
> "F";449     --STATUS_ACTIVEPURGATORY
> "P";25909   --STATUS_PENDING
> "C";562772  --STATUS_COMPLETE
> "A";9       --STATUS_ACTIVE
> "Z";1644927 --STATUS_PURGATORY
>
> Since "count(*)" is terrible slow in Postgres, I used the following
> sql to count jobqueue's rows:
> SELECT reltuples::bigint AS approximate_row_count FROM pg_class WHERE
> relname = 'jobqueue';
>
> Both MCF instances have the same number of working threads, database
> handles etc.
> Is the database "full"? What could you recommend to improve the
> performance?
>
> Thank you!
> Konstantin
>

Re: Database performance

Reply via email to