We have run across a problem with autovacuum that occurs when it can truncate off a large amount of empty blocks. It behaves different in version 9.0 than previous versions. Both behaviors are bad.

Consider a relation receives constant inserts/updates that are satisfied using freespace at the beginning of the heap. Delete operations now have removed large amounts of tuples at the end of the relation. The following autovacuum will find a large amount of blocks at the end, that can be truncated.

Vacuumlazy now takes out an access exclusive lock and scans the relation *backwards* to find out if concurrent access has created new tuples in the to be truncated space. Apparently such a backward scan in 8K blocks isn't really a good access strategy.

Up to 8.4, it simply holds the lock until it is done, which in our case stalled a production system for 12 minutes! This is obviously bad.

In 9.0, the autovacuum process will be aborted about 1 second after another transaction starts waiting for a lock. The result is that even a simple INSERT will take 1 second. The autovacuum restarts shortly after and somehow gets to a point, where it will cause this 1 second hiccup ever 2 minutes. This is slightly better but still far from optimal in a world, where transaction response times are measured in milliseconds.

My current idea for a fix is to modify lazy_truncate_heap(). It does acquire and release the exclusive lock, so it should be possible to do this in smaller chunks, releasing and reacquiring the lock so that client transactions can get their work done as well. At the same time I would change count_nondeletable_pages() so that it uses a forward scan direction (if that leads to a speedup).


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to