[HACKERS] ANALYZE and index/stats degradation

2007-07-02 Thread Jeroen T. Vermeulen
Hi all,

I've run into a case where I get bad performance that doesn't sound too
hard to solve.  Question is: is it worth solving?

The situation is this: I have a table that can grow to a large number of
rows, then shrink to zero over a large number of quick, consecutive
transactions.  The primary key index for the table is getting a lot of use
in the process.

But whenever perhaps one-third or so of the rows have been deleted, the
planner stops using that index and resorts to sequential scans.  I tried
suppressing that by toggling enable_seqscan: works as advertised, but
performance is still terrible until (as far as I can make out) the next
analyze run has completed!

So I suppose the planner has a good reason to ignore the index at that
point.  I'm assuming that this is something to do with the correlation
between the index and the column's statistics degrading in some way.

I also tried doing my own analyze runs on just the primary key index. 
That will complete very quickly, and performance is restored for a while. 
But as far as I can tell, a regular automatic analyze run will block my
own, more limited one on the same table.  So performance is still bad, and
now it's irregular to boot.

This makes me wonder: when the planner finds that an index is no longer
worth using because its corresponding statistics are out of date, and it's
cheap to update those same stats, maybe it should do so?  Even if there's
also going to be a full analyze on the table, it could be worthwhile to do
this quick limited run first.  (Though not if one is already underway, of
course).

All this is based largely on guesswork, so if I've got it all wrong,
please enlighten me!


Jeroen



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] ANALYZE and index/stats degradation

2007-07-02 Thread Gregory Stark

Jeroen T. Vermeulen [EMAIL PROTECTED] writes:

 So I suppose the planner has a good reason to ignore the index at that
 point.  I'm assuming that this is something to do with the correlation
 between the index and the column's statistics degrading in some way.

Best to post explain analyze query for when the performance is good and
bad. Perhaps also an explain analyze for the query with enable_seqscan off
when it's bad.

Also, which version of Postgres is this? 

It's possible you just need vacuum to run more frequently on this table and
autovacuum isn't doing it often enough. In which case you might have a cron
job run vacuum (or vacuum analyze) on this table more frequently. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] ANALYZE and index/stats degradation

2007-07-02 Thread Jeroen T. Vermeulen
On Mon, July 2, 2007 18:15, Gregory Stark wrote:

 So I suppose the planner has a good reason to ignore the index at that
 point.  I'm assuming that this is something to do with the correlation
 between the index and the column's statistics degrading in some way.

 Best to post explain analyze query for when the performance is good
 and
 bad. Perhaps also an explain analyze for the query with enable_seqscan off
 when it's bad.

Can't easily do that anymore...  AFAIR the plans were all identical
anyway, except in the enable_seqscan bad case which used a sequential
scan instead of using the index.  The queries are very simple, along the
lines of select * from foo where id = x and id  y.


 Also, which version of Postgres is this?

It was an 8.2 version.


 It's possible you just need vacuum to run more frequently on this table
 and
 autovacuum isn't doing it often enough. In which case you might have a
 cron
 job run vacuum (or vacuum analyze) on this table more frequently.

Actually, come to think of it, I don't think I'd want any vacuums at all
on this particular table.  Just the analyze on the primary key, no
vacuums, no statistics on anything else.  Unfortunately it's not just one
table, but a set of tables that can be created dynamically.  I could
change that, but in this particular case I don't think I should.


Jeroen



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] ANALYZE and index/stats degradation

2007-07-02 Thread Gregory Stark
Jeroen T. Vermeulen [EMAIL PROTECTED] writes:

 Actually, come to think of it, I don't think I'd want any vacuums at all
 on this particular table.  Just the analyze on the primary key, no
 vacuums, no statistics on anything else.  Unfortunately it's not just one
 table, but a set of tables that can be created dynamically.  I could
 change that, but in this particular case I don't think I should.

The way you described it there were records being inserted and later deleted.
Why wouldn't you need vacuums?

Or are all the records eventually deleted and then the table truncated or
dropped before the next batch of inserts?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] ANALYZE and index/stats degradation

2007-07-02 Thread Jeroen T. Vermeulen
On Mon, July 2, 2007 22:17, Gregory Stark wrote:

 The way you described it there were records being inserted and later
 deleted.
 Why wouldn't you need vacuums?

 Or are all the records eventually deleted and then the table truncated or
 dropped before the next batch of inserts?

In a nuthshell, yes.  The problem is I can't delete them all at once; it
happens in batches, and that means that stats degrade in the meantime.


Jeroen



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster