On Fri, Jul 22, 2022 at 8:09 AM Jonathan S. Katz <jk...@postgresql.org> wrote: > Thanks! This is great. Probably the most concise and clear explanation I > have seen for HOT, which is exactly what we need for the docs :)
I'm delighted that this is finally happening! > I think we should expand on this and explain how adjusting "fillfactor" > will affect this. I think that may change the final sentence too. Definitely -- this is by far the most important reason to tune heap fill factor, which can make a big difference. There should be a link from the fill factor docs to the new HOT section, at a minimum. Another thing: would be good to point out how to monitor the number of HOT updates vs non-HOT updates using views like pg_stat_user_tables. One minor correction: Opportunistic pruning isn't limited to heap-only tuples -- "HOT pruning" is actually a bit of a misnomer, that somehow caught on in the Postgres community. Opportunistic pruning can and will happen even with non-HOT updates -- the only difference on the heap page level is that 4 byte LP_DEAD line pointers will accumulate over time, which only VACUUM is able to clean up (we need LP_DEAD items to stick around until then as tombstones, so that index scans don't ever get confused). But pruning will still be able to free most of the space on the heap page, in almost the same way. Of course the indexes on the table are another matter -- each and every index will accumulate versions when there are many non-HOT updates, which can be a huge problem. My point is that *from the point of view of an affected heap page*, pruning (whether opportunistic or performed by VACUUM) can go ahead without HOT tuples/updates, and isn't really that different in terms of the amount of space freed each time. Bottom-up index deletion is effective is no small part because pruning was always pretty effective even with non-HOT updates, except within indexes -- it pretty much compensates for that remaining problem. The heuristics that it uses in the nbtree code work quite well because of the natural tendency of those 4 byte LP_DEAD item pointers to concentrate on individual heap pages. We can fit so many more 4 byte LP_DEAD items than full sized live heap tuples that it becomes pretty obvious (from the point of view of the nbtree code that starts out by looking at an index leaf page) which pointed-to heap blocks are likely to enable it to free space via index tuple deletes -- it often visits the heap pages with the most TIDs. Before bottom-up index deletion was added, I would notice that pgbench variant workloads with non-HOT updates would result in a pgbench_accounts table that is maybe 1%-2% larger after a few hours, while the indexes on the same table would grow by 100% - 200%. Opportunistic pruning would work pretty well for the heap, but wouldn't do anything at all to control bloat in indexes. So opportunistic pruning was always very effective with non-HOT updates in one important way, but totally ineffective in another way. There is a section about bottom-up index deletion in the docs (in the B-Tree internals chapter) that already references HOT. It should link to this new chapter now, I think. -- Peter Geoghegan