On Fri, 5 May 2006, Tom Lane wrote:

I wrote:
BTW, I just realized another bug in the patch: btbulkdelete fails to
guarantee that it visits every page in the index.  It was OK for
btvacuumcleanup to ignore pages added to the index after it starts,
but btbulkdelete has to deal with such pages.

Actually, as written this patch does not work.  At all.  btbulkdelete
has to guarantee that it removes every index entry it's told to, and
it cannot guarantee that in the presence of concurrent page splits.
A split could move index items from a page that btbulkdelete hasn't
visited to one it's already passed over.  This is not possible with an
index-order traversal (because splits only move items to the right)
but it's definitely possible with a physical-order traversal.

True. :(

The first solution that occurs to me is to force page splits to choose the target page so that it's blkno > the original page's blkno during vacuum. It would cause the index to become more fragmented more quickly, which is bad but perhaps tolerable.

I was toying with the idea of remembering deletable pages (which
btvacuumcleanup does anyway), which are the only ones that page splits
could move items to, and then rescanning those after the completion
of the primary pass.  This has a couple of pretty unpleasant
consequences though:
* We have to remember *every* deletable page for correctness, compared
to the current situation where btvacuumcleanup can bound the number of
pages it tracks.  This creates a situation where VACUUM may fail
outright if it doesn't have gobs of memory.  Since one of the main
reasons for developing lazy VACUUM was to ensure we could vacuum
arbitrarily large tables in bounded memory, I'm not happy with this.
* The rescan could be far from cheap if there are many such pages.

Yep, that's not good.

- Heikki

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Reply via email to