On Fri, Mar 3, 2017 at 2:41 PM, Peter Geoghegan <p...@bowt.ie> wrote: > In other words, the number of B-Tree pages that the last VACUUM > deleted, and thus made eligible to recycle by the next VACUUM has no > relationship with the number of pages the next VACUUM will itself end > up deleting, in general, or how long it will be before that next > VACUUM comes, if it comes at all, or anything else that seems at all > relevant.
This raises another question, though: Why have this GUC at all? Why use *any* threshold that is to be compared against the number of heap pages that were processed by VACUUM this time? B-Tree page deletion isn't really part of the ordinary life cycle of a B-Tree index. In order for that to be the case, somebody would have to delete large ranges of indexed values (typically hundreds of logically contiguous values -- no gaps), without anyone else ever inserting new tuples that are in the same range before the next VACUUM. It's very unlikely that this would happen again and again in the real world. So, even if we never freeze, the number of B-Tree pages that we delete when we VACUUM today is generally a useless predictor of how many will be deleted by a VACUUM that occurs tomorrow. This is true despite the fact that the number of dead heap tuples is probably almost identical for each VACUUM (or the number of heap pages that end up being processed by VACUUM, if you prefer). Barring any concerns about crash safety, we can be completely certain that any "recycling-orientated B-Tree VACUUM" (a btvacuumcleanup() call to btvacuumscan(), which happens because there are no tuples in the index to kill) will end up recycling however many pages the last VACUUM managed to delete, which is a precisely knowable number (or could be made knowable if we stashed that number somewhere, like the meta-page). It will typically only take seconds or minutes after the VACUUM finishes for its RecentGlobalXmin interlock to stop being a problem (that is, for _bt_page_recyclable() to return "true" for any pages that that VACUUM deleted). From that point on, those deleted pages are "money in the bank" for the FSM. The only reason why we'd want to tie "the FSM withdrawing that money" to VACUUM is because that might be needed to clean up regular bloat anyway. The test performed by this patch within lazy_scan_heap(), to determine whether we should avoid calling lazy_cleanup_index() would therefore look like this, ideally: Do I want to go to the trouble of scanning this index (a cost that is proportionate to the size of the index) in order to recycle this number of known-deleted pages (a benefit that is proportionate to that number)? (I still think that the important thing is that we don't let the number of unclaimed-by-FSM recyclable pages grow forever, though.) (Thinks about it some more...) Unfortunately, I just saw a whole new problem with this patch: _bt_page_recyclable() is the one place in the B-Tree AM where we stash an XID. We don't need to set this to FrozenTransactionId at any point, because this is stashed for deleted pages only, pages that are likely to be recycled very soon. It might be that the VACUUM that ends up deleting any such page is an anti-wraparound VACUUM, especially in the case that this patch really wants to improve. However, with this patch, that recycling won't happen, of course. As a result, _bt_page_recyclable() will falsely report that the page is not recyclable if it is ever asked again. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers