On Sat, Jan 29, 2022 at 11:43 PM Peter Geoghegan <p...@bowt.ie> wrote: > When VACUUM sees that all remaining/unpruned tuples on a page are > all-visible, it isn't just important because of cost control > considerations. It's deeper than that. It's also treated as a > tentative signal from the application itself, about the data itself. > Which is: this page looks "settled" -- it may never be updated again, > but if there is an update it likely won't change too much about the > whole page.
While I agree that there's some case to be made for leaving settled pages well enough alone, your criterion for settled seems pretty much accidental. Imagine a system where there are two applications running, A and B. Application A runs all the time and all the transactions which it performs are short. Therefore, when a certain page is not modified by transaction A for a short period of time, the page will become all-visible and will be considered settled. Application B runs once a month and performs various transactions all of which are long, perhaps on a completely separate set of tables. While application B is running, pages take longer to settle not only for application B but also for application A. It doesn't make sense to say that the application is in control of the behavior when, in reality, it may be some completely separate application that is controlling the behavior. > The application is in charge, really -- not VACUUM. This is already > the case, whether we like it or not. VACUUM needs to learn to live in > that reality, rather than fighting it. When VACUUM considers a page > settled, and the physical page still has a relatively large amount of > free space (say 45% of BLCKSZ, a borderline case in the new FSM > patch), "losing" so much free space certainly is unappealing. We set > the free space to 0 in the free space map all the same, because we're > cutting our losses at that point. While the exact threshold I've > proposed is tentative, the underlying theory seems pretty sound to me. > The BLCKSZ/2 cutoff (and the way that it extends the general rules for > whole-page freezing) is intended to catch pages that are qualitatively > different, as well as quantitatively different. It is a balancing act, > between not wasting space, and the risk of systemic problems involving > excessive amounts of non-HOT updates that must move a successor > version to another page. I can see that this could have significant advantages under some circumstances. But I think it could easily be far worse under other circumstances. I mean, you can have workloads where you do some amount of read-write work on a table and then go read only and sequential scan it an infinite number of times. An algorithm that causes the table to be smaller at the point where we switch to read-only operations, even by a modest amount, wins infinitely over anything else. But even if you have no change in the access pattern, is it a good idea to allow the table to be, say, 5% larger if it means that correlated data is colocated? In general, probably yes. If that means that the table fails to fit in shared_buffers instead of fitting, no. If that means that the table fails to fit in the OS cache instead of fitting, definitely no. And to me, that kind of effect is why it's hard to gain much confidence in regards to stuff like this via laboratory testing. I mean, I'm glad you're doing such tests. But in a laboratory test, you tend not to have things like a sudden and complete change in the workload, or a random other application sometimes sharing the machine, or only being on the edge of running out of memory. I think in general people tend to avoid such things in benchmarking scenarios, but even if include stuff like this, it's hard to know what to include that would be representative of real life, because just about anything *could* happen in real life. -- Robert Haas EDB: http://www.enterprisedb.com