On Thu, Jan 23, 2025 at 1:31 PM Robert Haas <robertmh...@gmail.com> wrote: > > On Wed, Jan 22, 2025 at 5:48 PM Melanie Plageman > <melanieplage...@gmail.com> wrote: > > Thanks! Attached v9 incorporates all your suggested changes. > > I'm not exactly sure what to do about it, but I feel like the > documentation of vacuum_eager_scan_max_fails is going to be > incomprehensible to someone who doesn't already have a deep knowledge > of how this patch works.
[ hit "Send" WAY too early ] There's a lot of magic numbers here: 4096 blocks per region, 128 fails per region, some other number of successes per region. I don't know why this particular one is configurable and none of the others are. I'm not saying it's the wrong decision, but it's not clear to me what the reasoning is. Also, the units of this parameter are pretty terrible. Like, it could be measured in some unit that involves bytes or kilobytes or megabytes, or, maybe better, it could be measured as a percentage. Instead of either of those things, it's an integer that only makes sense in reference to another (hard-coded, magical) integer. If you made this a percentage, then you wouldn't necessarily even have to tell people that the magic internal number is 4096. Or at least that knowledge would be a lot less critical. + <command>VACUUM</command> typically scans pages that have been + modified since the last vacuum. While some all-visible but not + all-frozen pages are eagerly scanned to try and freeze them, the + <structfield>relfrozenxid</structfield> can only be advanced when + every page of the table that might contain unfrozen XIDs is scanned. The phrase "While some all-visible but not all-frozen pages are eagerly scanned to try and freeze them" seems odd here, because what follows is true either way. This is like saying "When it's Tuesday, I get hungry if I don't eat any food." + /* + * A normal vacuum that has failed to freeze too many eagerly scanned + * blocks in a row suspends eager scanning. next_eager_scan_region_start + * is the block number of the first block eligible for resumed eager + * scanning. + * + * When eager scanning is permanently disabled, either initially + * (including for aggressive vacuum) or due to hitting the success limit, + * this is set to InvalidBlockNumber. + */ This and the other comments for the new LVRelState members are quite long, but I really like them. Arguably we ought to have more of this kind of thing. + /* + * We only want to enable eager scanning if we are likely to be able to + * freeze some of the pages in the relation. We can freeze tuples older + * than the visibility horizon calculated at the beginning of vacuum, but + * we are only guaranteed to freeze them if at least one tuple on the page + * precedes the freeze limit or multixact cutoff (calculated from + * vacuum_[multixact_]freeze_min_age). So, if the oldest unfrozen xid + * (relfrozenxid/relminmxid) does not precede the freeze cutoff, we aren't + * likely to freeze many tuples. + */ I think this could be clearer. And I wonder if "We can freeze tuples older" is actually a typo for "We can freeze tuples newer", because otherwise I can't make sense of the rest of the sentence. Right now it seems to boil down to something like: We can freeze tuples older...but we are only guaranteed to freeze them if they are older. Maybe what we want to say here is something like: We only want to enable eager scanning if we are likely to be able to freeze some of the pages in the relation. If FreezeLimit has not advanced past relfrozenxid, or if MultiXactCutoff has not advanced passed relminmxid, then there's a good chance we won't be able to freeze anything more than we already have. Success would not be impossible, because there may be pages we didn't try to freeze previously, and it's also possible that some XID greater than FreezeLimit has ended, allowing for freezing. But as a heuristic, we wait until the FreezeLimit advances, so that we won't repeatedly attempt eager freezing while the same long-running transaction blocks progress every time. + ereport(INFO, + (errmsg("Vacuum successfully froze %u eager scanned blocks of \"%s.%s.%s\". Now disabling eager scanning.", I predict that if Tom sees this, you will get complaints about both the wording of the message, which pretty clearly does not conform to style guidelines for a primary error message, and also about the use of the INFO level. Allow me to suggest DEBUG1 or DEBUG2 and "disabling eager scanning after freezing %u eagerly scanned blocks". + /* + * Aggressive vacuums cannot skip all-visible pages that are not also + * all-frozen. Normal vacuums with eager scanning enabled only skip + * such pages if they have hit the failure limit for the current eager + * scan region. + */ + if (vacrel->aggressive || + vacrel->eager_scan_remaining_fails > 0) + { + if (!vacrel->aggressive) + next_unskippable_eager_scanned = true; + break; } I think this would be clearer as two separate if statements. if (aggressive) break; if (vacrel->eager_scan_remaining_fails) { next_unskippable_eager_scanned = true; break; } -- Robert Haas EDB: http://www.enterprisedb.com