On Oct 12, 2012, at 22:13, Stephen Frost <sfr...@snowman.net> wrote:

> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> [ shrug... ]  You're attacking a straw man, or more precisely putting
>> words into my mouth about what the percentage-based thresholds might be.
>> Notice the examples I gave involved update percentages quite far north
>> of 100%.  It's possible and maybe likely that we need a sliding scale.
> 
> I was just discussing such a sliding scale approach w/ Josh on IRC, my
> thinking was that we could use a logarithmic approach based on table
> size.
> 
>> Also, I don't necessarily accept the conclusion you seem to be drawing,
>> that it's okay to have complete turnover of a small table and not redo
>> its stats.  If you don't like the current behavior when there's no
>> stats, why would you like the behavior when there are some stats but
>> they no longer have the remotest relationship to reality?
> 
> Josh's concern is about autovacuum causing lots of stats churn, which is
> understandable, we don't want it constantly rescanning a table, but
> perhaps we could use some kind of threshold for preventing autovac from
> rescanning a table it just scanned?  Note that I did *not* say 'GUC',
> but I don't know what the 'right' answer is for how frequently is
> good-but-not-too-frequent.  I'd also like to try and avoid adding GUCs.
> 
> 

Instead of global could you attach an interface function to the table and have 
the auto-analyzer call that function to basically ask the table whether it 
needs to be analyzed?  Still need to deal with defaults and provide a decent 
supply of built-in algorithms but at least the system can be made tunable.  The 
default algorithm could maybe just handoff to a table size specific handler.  
The create table and alter table commands could be used to change the assigned 
algorithm if desired and new ones could be supplied via extensions.

The 1000 row default seems unusual at first glance and contributes to the 
problem described.

It is likely that the first I sent following the create table is going to be a 
bulk load if the table is going to have many rows.  In the case where rows are 
inserted individually it is likely that the expected row count will be closer 
to 1 than 1000.

One useful algorithm to provide the user is analyze on insert and, though maybe 
less so, analyze on update.  So that any insert/update causes the table to be 
re-analyzed.  Not a good default but, combined with "delayed analyze" logic to 
establish a minimum frequency, is a possible option for some use cases.

Temporary table creation should have special attention given if changes are 
going to be made here.

Another idea is to have system after [command] trigger(s) than can be used to 
call analyze without waiting for the auto-vacuum process.  Provide some way for 
CREATE/ALTER TABLE and maybe auto-vacuum to enable and disable the trigger.

David J.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to