On 3/27/2011 9:51 PM, Robert Haas wrote:
On Sun, Mar 27, 2011 at 9:41 PM, Jan Wieck<janwi...@yahoo.com>  wrote:
 On 3/27/2011 6:21 PM, Robert Haas wrote:

 On Sun, Mar 27, 2011 at 3:25 PM, Jan Wieck<janwi...@yahoo.com>    wrote:

   Since we are talking about stable releases, I think just releasing and
   reacquiring the exclusive lock is enough. We can then try to further
 improve
   things for future releases.

 That seems unsafe - things can change under you while you don't hold the
 lock...

 The only change relevant in this case would be some concurrent client
 extending the relation while we don't hold the lock. A call to
 RelationGetNumberOfBlocks() after reacquiring the lock will tell. Safety
 reestablished.

I thought that the risk was that someone might write tuples into the
blocks that we're thinking of truncating.

Currently the risk is that while vacuum is doing its main work, someone can either extend the relation or reuse space inside one of the empty blocks (that are about to be truncated away). That is why the function lazy_truncate_heap() does the following:

    1) acquire exclusive lock
    2) check via RelationGetNumberOfBlocks() if it has been extended
       before locking - abort if so
    3) check via count_nondeletable_pages() what the highest block
       in the to be truncated range is, that contains a (newly created)
       tuple
    4) truncate the relation
    5) release the lock

The function count_nondeletable_pages() is the one doing the block wise reverse scan. It does check for interrupts and that is the place, where the deadlock code will boot vacuum.

What I am proposing is to put all those 5 steps into a loop that tries to bite off smaller bits from the end of the table, instead of trying to swallow the whole dead space at once.

count_nondeletable_pages() is a static function and only called from lazy_truncate_heap(), so fiddling with the scan direction inside of it would be totally safe from a functional side effect point of view. Doing so or not depends on whether reversing its scan direction does have a performance benefit or not. I agree with Tom that at some "chunk" size, the effect might be negative. That is because currently it scans backwards and returns at the first block containing a tuple. To scan forward, it has to scan all the blocks, remembering the last that contained a tuple.

 I don't like a 1,000 ms hiccup in my system, regardless of how many
 transaction hoops you make it go through.

I can't argue with that.

I assumed we have a consensus that both, locking a system for 10+ minutes as well as having a 1,000ms hiccup every 2 minutes, are problems we need to fix.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to