On Wed, 2007-08-08 at 09:55 +0100, Heikki Linnakangas wrote:

> >> 3. Statistics and autovacuum integration. How should HOT updates be
> >> taken into account when deciding when to autovacuum and autoanalyze?
> >> There's a FIXME comment in analyze.c related to this as well. What
> >> additional statio counters do we need? The patch adds counters for HOT
> >> updates and for following a HOT chain. Should we have counters for
> >> pruning and defraging a page as well?
> > 
> > ISTM we should have stat counters (not statio counters) that describe
> > the number of row versions defragged. Statio counters refer to block
> > accesses, so HOT has nothing at all to do with that. Not sure we need to
> > know about pruning, any more than we need to know about hint bits
> > setting.
> > 
> > We should then perform a vacuum if
> > ( number of cold updates
> > + number of hot updates 
> > + number of deletes
> > - number of row versions removed by defragging)
> >> (autovacuum threshold * size of table)
> > 
> > Defragging could remove deletes and cold updates as well as hot updates,
> > so we must take that into account.
> 
> It seems you're confusing pruning and defragging. I should probably
> update the glossary I wrote earlier...

Yes, I was. Glossary much appreciated.

> I think the formula for triggering autovacuum should be left unchanged,
> (# of dead tuples > autovacuum threshold).
> 
> # of dead tuples should be increased by cold update and deletes, as
> before. A hot update shouldn't increase it, because the old version can
> be removed by pruning.

So pruning removes dead hot updated tuples, while defragging will remove
dead cold updated tuples and deletes, as well as rearranging space.

It's easily possible that 100% of the cold updates and deletes are
removed by defragging because of HOT updates. It would be pointless to
trigger a VACUUM when it might find nothing to clear up. Its also
possible that the deletes are at one end of the table and the updates at
the other. So we really need to keep track of the effects of defragging
dead cold updates and deletes, so they can be subtracted from the cold
update + deletes. 

The argument for including HOT updates is weaker, but I'm still thinking
that they should eventually cause a VACUUM to take place. A very slowly
updated table with spread out updates might bloat a table to 200 times
its size, even with HOT, in the worst case. Perhaps we should count each
HOT update as 1/5th of a row for autovacuum purposes, or some other
discounting factor?

> The analyze threshold is trickier. HOT updates should probably be taken
> into account, but with a smaller weight than other updates.

Sorry, but I disagree here. Any change to a row can effect the
distribution. HOT updates don't change indexed cols but they do change
some columns. The ANALYZE collects stats for all columns, not just the
indexed ones; there is no way you can say that non-indexed columns are
less important than indexed ones, so we need to analyze just as
frequently as we do with cold updates. 

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Reply via email to