On Fri, Feb 12, 2021 at 8:38 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > I agree that there already are huge problems in that case. But I think > we need to consider an append-only case as well; after bulk deletion > on an append-only table, vacuum deletes heap tuples and index tuples, > marking some index pages as dead and setting an XID into btpo.xact. > Since we trigger autovacuums even by insertions based on > autovacuum_vacuum_insert_scale_factor/threshold autovacuum will run on > the table again. But if there is a long-running query a "wasted" > cleanup scan could happen many times depending on the values of > autovacuum_vacuum_insert_scale_factor/threshold and > vacuum_cleanup_index_scale_factor. This should not happen in the old > code. I agree this is DBA problem but it also means this could bring > another new problem in a long-running query case.
I see your point. This will only not be a problem with the old code because the oldest XID in the metapage happens to restrict VACUUM in what turns out to be exactly perfect. But why assume that? It's actually rather unlikely that we won't be able to free even one block, even in this scenario. The oldest XID isn't truly special -- at least not without the restrictions that go with 32-bit XIDs. The other thing is that vacuum_cleanup_index_scale_factor is mostly about limiting how long we'll go before having stale statistics, and so presumably the user gets the benefit of not having stale statistics (maybe that theory is a bit questionable in some cases, but that doesn't have all that much to do with page deletion -- in fact the problem exists without page deletion ever occuring). BTW, I am thinking about making recycling take place for pages that were deleted during the same VACUUM. We can just use a work_mem-limited array to remember a list of blocks that are deleted but not yet recyclable (plus the XID found in the block). At the end of the VACUUM, (just before calling IndexFreeSpaceMapVacuum() from within btvacuumscan()), we can then determine which blocks are now safe to recycle, and recycle them after all using some "late" calls to RecordFreeIndexPage() (and without revisiting the pages a second time). No need to wait for the next VACUUM to recycle pages this way, at least in many common cases. The reality is that it usually doesn't take very long for a deleted page to become recyclable -- why wait? This idea is enabled by commit c79f6df75dd from 2018. I think it's the next logical step. -- Peter Geoghegan