I wrote: > BTW, I just realized another bug in the patch: btbulkdelete fails to > guarantee that it visits every page in the index. It was OK for > btvacuumcleanup to ignore pages added to the index after it starts, > but btbulkdelete has to deal with such pages.
Actually, as written this patch does not work. At all. btbulkdelete has to guarantee that it removes every index entry it's told to, and it cannot guarantee that in the presence of concurrent page splits. A split could move index items from a page that btbulkdelete hasn't visited to one it's already passed over. This is not possible with an index-order traversal (because splits only move items to the right) but it's definitely possible with a physical-order traversal. I was toying with the idea of remembering deletable pages (which btvacuumcleanup does anyway), which are the only ones that page splits could move items to, and then rescanning those after the completion of the primary pass. This has a couple of pretty unpleasant consequences though: * We have to remember *every* deletable page for correctness, compared to the current situation where btvacuumcleanup can bound the number of pages it tracks. This creates a situation where VACUUM may fail outright if it doesn't have gobs of memory. Since one of the main reasons for developing lazy VACUUM was to ensure we could vacuum arbitrarily large tables in bounded memory, I'm not happy with this. * The rescan could be far from cheap if there are many such pages. Also, I'm not sure when you can stop: it certainly seems possible that items could be moved down during the primary scan and then moved down again while btbulkdelete is reconsidering the deletable pages. Without locking out splits entirely, it's far from obvious that we can make it work. Thoughts? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org