Re: [PERFORM] Trouble with plan statistics for behaviour for query.

Trevor Campbell Thu, 31 May 2012 16:02:01 -0700

On 01/06/12 08:55, Craig James wrote:



On Thu, May 31, 2012 at 3:29 PM, Trevor Campbell <[email protected] 
<mailto:[email protected]>> wrote:

    We are having trouble with a particular query being slow in a strange 
manner.

    The query is a join over two large tables that are suitably indexed.

    select CG.ID <http://CG.ID>, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID 
<http://CI.ID>, CI.FIELDTYPE, CI.FIELD,
    CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
       from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID 
<http://CG.ID> = CI.GROUPID where
    CG.ISSUEID=? order by CG.CREATED asc, CI.ID <http://CI.ID> asc


This has an unbound variable '?' in it.

These queries are being run from a java application using JDBC and when run the variable is bound to an long integervalue. While trying to investigate the problem, I have been just hard coding a value in the statement.



    For some tasks we run this particular query a very large number of times 
and it has a significant performance
    impact when it runs slowly.

    If we run ANALYSE over the CHANGEITEM table then the performance picks up 
by a factor of 5 or more.  The problem
    is that a day later the performance will have dropped back to its 
previously slow state.

    The reason this is so hard to understand is that the activity on this table 
is very low, with no updates and only
    a relatively small number of inserts each day, < 0.1% of the table size.

    Explain output:
    Sort  (cost=86.90..86.93 rows=11 width=118) (actual time=0.086..0.087 
rows=14 loops=1)
      Sort Key: cg.created, ci.id <http://ci.id>
      Sort Method: quicksort  Memory: 26kB
      ->  Nested Loop  (cost=0.00..86.71 rows=11 width=118) (actual 
time=0.022..0.061 rows=14 loops=1)
            ->  Index Scan using chggroup_issue on changegroup cg  
(cost=0.00..17.91 rows=8 width=33) (actual
    time=0.012..0.015 rows=7 loops=1)
                  Index Cond: (issueid = 81001::numeric)
            ->  Index Scan using chgitem_chggrp on changeitem ci  
(cost=0.00..8.58 rows=2 width=91) (actual
    time=0.005..0.005 rows=2 loops=7)
                  Index Cond: (groupid = cg.id <http://cg.id>)
    Total runtime: 0.116 ms

What's the exact SQL you used to get this ... did you use a specific CG.ISSUEID to run your test? If that's the case,this EXPLAIN ANALYZE won't be the same as the one generated for your actual application.


Craig


    The explain output always seems the same even when the performance is poor, 
but I can't be sure of that.

    Overall it seems like PostgreSQL just forgets about the statistics it has 
gathered after a short while.

    Schema details:
    CREATE TABLE changegroup
    (
      id numeric(18,0) NOT NULL,
      issueid numeric(18,0),
      author character varying(255),
      created timestamp with time zone,
      CONSTRAINT pk_changegroup PRIMARY KEY (id )
    )
    WITH (
      OIDS=FALSE
    );
    CREATE INDEX chggroup_issue
      ON changegroup
      USING btree
      (issueid );

    CREATE TABLE changeitem
    (
      id numeric(18,0) NOT NULL,
      groupid numeric(18,0),
      fieldtype character varying(255),
      field character varying(255),
      oldvalue text,
      oldstring text,
      newvalue text,
      newstring text,
      CONSTRAINT pk_changeitem PRIMARY KEY (id )
    )
    WITH (
      OIDS=FALSE
    );

    CREATE INDEX chgitem_chggrp
      ON changeitem
      USING btree
      (groupid );

    CREATE INDEX chgitem_field
      ON changeitem
      USING btree
      (field COLLATE pg_catalog."default" );

    Table sizes
    changegroup  : 2,000,000 rows
    changeitem    :  2,500,000  rows

    The changegroup table has on average about 4 rows per issueid value, which 
is the query parameter.

    We run autovacuum and autoanalyse, but as the activity in the table is low 
these are rarely if ever invoked on
    these tables.

    Environment.
    Testing using PostgreSQL 9.1.3 on x86_64-redhat-linux-gnu,  although this 
is a problem across a variety of
    postgres versions.

Re: [PERFORM] Trouble with plan statistics for behaviour for query.

Reply via email to