On Wed, Mar 16, 2011 at 6:36 PM, Jim Nasby <j...@nasby.net> wrote:
> One way to look at this is that any system will have a limit on how quickly 
> it can vacuum everything. If it's having trouble dedicating enough IO to 
> vacuum, then autovac is going to have a long list of tables that it wants to 
> vacuum. When you're in that situation, you want to get to the next table that 
> needs vacuuming as quickly as possible, so if you've run through the first 
> heap scan and found only a limited number of dead tuples, it doesn't make 
> sense to spend a bunch of time scanning indexes and making a second heap scan 
> (though, IIRC the second scan doesn't hit the entire heap; it only hits the 
> tuples that were remembered as being dead).

I mostly agree with this, but you also can't postpone vacuuming
indefinitely just because you're too busy; that's going to blow up in
your face.

> Of course, going along the lines of an autovac-based tuning mechanism, you 
> have to question how a table would show up for autovac if there's not 
> actually a number of dead tuples. One scenario is freezing (though I'm not 
> sure if your super-lazy vacuum could freeze tuples or not). Another is 
> inserts. That might become a big win; you might want to aggressively scan a 
> table that gets data loaded into it in order to set hint/all visible bits.

Right.  Really-lazy vacuum could freeze tuples.  Unlike regular
vacuum, it can also sensibly be done incrementally.  One thing I was
thinking about is counting the number of times that we fetched a tuple
that was older than RecentGlobalXmin and had a committed xmin and an
invalid xmax, but where the page was not PD_ALL_VISIBLE.  If that's
happening a lot, it probably means that some vacuuming would speed
things up, by getting those PD_ALL_VISIBLE bits set.  Perhaps you
could work out some formula where you do a variable amount of
super-lazy vacuuming depending on the number of such tuple fetches.
The trick would be to avoid overdoing it (so that you swamp the I/O
system) or underdoing it (so that the system never converges).  It
would be really nice (for this and for other things) if we had some
way of measuring the I/O saturation of the system, so that we could
automatically adjust the aggressiveness of background processes
accordingly.

Note also that if and when we get index-only scans, making sure the
PD_ALL_VISIBLE bits (and thus the visibility map bits) actually get
set is going to be a lot more important.

> From a manual standpoint, ISTM that super-lazy vac would be extremely useful 
> for dealing with hint bits after a bulk insert to a table that also has some 
> update activity. Using a regular vacuum in that case would result in a lot of 
> extra work to deal with the small number of dead tuples.

I can see that.

> Perhaps it would be useful to write a script that analyzed the output of 
> vacuum verbose looking for tables where a super-lazy vacuum would have made 
> sense (assuming vacuum verbose provides the needed info). If we had such a 
> script we could ask folks to run it and see how much super-lazy vacuuming 
> would help in the real world.

I'm a bit doubtful about this part.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to