On Fri, Jan 21, 2022 at 12:07 PM Greg Stark <st...@mit.edu> wrote: > This confuses me. "Transactions per second" is a headline database > metric that lots of users actually focus on quite heavily -- rather > too heavily imho.
But transactions per second is for the whole database, not for individual tables. It's also really a benchmarking thing, where the size and variety of transactions is fixed. With something like pgbench it actually is exactly the same thing, but such a workload is not at all realistic. Even BenchmarkSQL/TPC-C isn't like that, despite the fact that it is a fairly synthetic workload (it's just not super synthetic). > Ok, XID consumption is only a subset of transactions > that are not read-only but that's a detail that's pretty easy to > explain and users get pretty quickly. My point was mostly this: the number of distinct extant unfrozen tuple headers (and the range of the relevant XIDs) is generally highly unpredictable today. And the number of tuples we'll have to freeze to be able to advance relfrozenxid by a good amount is quite variable, in general. For example, if we bulk extend a relation as part of an ETL process, then the number of distinct XIDs could be as low as 1, even though we can expect a great deal of "freeze debt" that will have to be paid off at some point (with the current design, in the common case where the user doesn't account for this effect because they're not already an expert). There are other common cases that are not quite as extreme as that, that still have the same effect -- even an expert will find it hard or impossible to tune autovacuum_freeze_min_age for that. Another case of interest (that illustrates the general principle) is something like pgbench_tellers. We'll never have an aggressive VACUUM of the table with the patch, and we shouldn't ever need to freeze any tuples. But, owing to workload characteristics, we'll constantly be able to keep its relfrozenxid very current, because (even if we introduce skew) each individual row cannot go very long without being updated, allowing old XIDs to age out that way. There is also an interesting middle ground, where you get a mixture of both tendencies due to skew. The tuple that's most likely to get updated was the one that was just updated. How are you as a DBA ever supposed to tune autovacuum_freeze_min_age if tuples happen to be qualitatively different in this way? > What I find confuses people much more is the concept of the > oldestxmin. I think most of the autovacuum problems I've seen come > from cases where autovacuum is happily kicking off useless vacuums > because the oldestxmin hasn't actually advanced enough for them to do > any useful work. As it happens, the proposed log output won't use the term oldestxmin anymore -- I think that it makes sense to rename it to "removable cutoff". Here's an example: LOG: automatic vacuum of table "regression.public.bmsql_oorder": index scans: 1 pages: 0 removed, 317308 remain, 250258 skipped using visibility map (78.87% of total) tuples: 70 removed, 34105925 remain (6830471 newly frozen), 2528 are dead but not yet removable removable cutoff: 37574752, which is 230115 xids behind next new relfrozenxid: 35221275, which is 5219310 xids ahead of previous value index scan needed: 55540 pages from table (17.50% of total) had 3339809 dead item identifiers removed index "bmsql_oorder_pkey": pages: 144257 in total, 0 newly deleted, 0 currently deleted, 0 reusable index "bmsql_oorder_idx2": pages: 330083 in total, 0 newly deleted, 0 currently deleted, 0 reusable I/O timings: read: 7928.207 ms, write: 1386.662 ms avg read rate: 33.107 MB/s, avg write rate: 26.218 MB/s buffer usage: 220825 hits, 443331 misses, 351084 dirtied WAL usage: 576110 records, 364797 full page images, 2046767817 bytes system usage: CPU: user: 10.62 s, system: 7.56 s, elapsed: 104.61 s Note also that I deliberately made the "new relfrozenxid" line that immediately follows (information that we haven't shown before now) similar, to highlight that they're now closely related concepts. Now if you VACUUM a table that is either empty or has only frozen tuples, VACUUM will set relfrozenxid to oldestxmin/removable cutoff. Internally, oldestxmin is the "starting point" for our final/target relfrozenxid for the table. We ratchet it back dynamically, whenever we see an older-than-current-target XID that cannot be immediately frozen (e.g., when we can't easily get a cleanup lock on the page). -- Peter Geoghegan