Thanks for the review! Attached v10 addresses the review feedback except that about the GUC/table option meaning and format.
On Thu, Jan 23, 2025 at 2:22 PM Robert Haas <robertmh...@gmail.com> wrote: > > On Thu, Jan 23, 2025 at 1:31 PM Robert Haas <robertmh...@gmail.com> wrote: > > > > On Wed, Jan 22, 2025 at 5:48 PM Melanie Plageman > > <melanieplage...@gmail.com> wrote: > > > Thanks! Attached v9 incorporates all your suggested changes. > > > > I'm not exactly sure what to do about it, but I feel like the > > documentation of vacuum_eager_scan_max_fails is going to be > > incomprehensible to someone who doesn't already have a deep knowledge > > of how this patch works. > > There's a lot of magic numbers here: 4096 blocks per region, 128 fails > per region, some other number of successes per region. I don't know > why this particular one is configurable and none of the others are. > I'm not saying it's the wrong decision, but it's not clear to me what > the reasoning is. The reasoning for making the delete cap configurable and not the success cap is that it made more sense to me to configure your failure tolerance and not your success tolerance. If you make the success cap configurable, you can end up scanning and failing to freeze just as many pages when it is 0 as when it is 100. This thought exercise made me realize something is wrong with my current patch, though. If you set the failure tolerance (vacuum_eager_scan_max_fails) to 0 right now, it disables eager scanning altogether. That might be unexpected. You would probably expect setting that to 0 to still allow eager scanning if it is only succeeding. That made me think that perhaps I should have a -1 value that disables eager scanning altogether and 0 just sets the failure tolerance to 0. I don't think the region size should be configurable because it is unclear how you might tune that based on your workload to get different results. We knew we wanted some kind of retry interval, so we picked one that seemed reasonable. > Also, the units of this parameter are pretty terrible. Like, it could > be measured in some unit that involves bytes or kilobytes or > megabytes, or, maybe better, it could be measured as a percentage. > Instead of either of those things, it's an integer that only makes > sense in reference to another (hard-coded, magical) integer. If you > made this a percentage, then you wouldn't necessarily even have to > tell people that the magic internal number is 4096. Or at least that > knowledge would be a lot less critical. I think you're right. I would go with a percentage. I don't see many other GUCs that are percents. What would you call it? Perhaps vacuum_eager_scan_fail_threshold? The % of the blocks in the table vacuum may scan and fail to freeze. There is an issue I see with making it a percentage. The current default vacuum_eager_scan_max_fails is 128 out of 4096. That means you are allowed to scan about 3% of the blocks in the table even if you fail to freeze every one. I don't think there are very many reasonable values above 256, personally. So, it might be weird to add a percentage GUC value with only very low acceptable values. Part of this is that we are not making the success cap configurable, so that means that you might have lots of extra I/O if you are both failing and succeeding. Someone configuring this GUC might think this is controlling the amount of extra I/O they are incurring. Note attached v10 does not change the GUC. I thought I would wait for additional discussion. > + <command>VACUUM</command> typically scans pages that have been > + modified since the last vacuum. While some all-visible but not > + all-frozen pages are eagerly scanned to try and freeze them, the > + <structfield>relfrozenxid</structfield> can only be advanced when > + every page of the table that might contain unfrozen XIDs is scanned. > > The phrase "While some all-visible but not all-frozen pages are > eagerly scanned to try and freeze them" seems odd here, because what > follows is true either way. This is like saying "When it's Tuesday, I > get hungry if I don't eat any food." The while was meant to contrast the sentence before it about typically only scanning pages that have been modified since the previous vacuum. But, I see how that didn't work out for me. How about this: """ <command>VACUUM</command> typically scans pages that have been modified since the last vacuum. However, some all-visible but not all-frozen pages are eagerly scanned to try and freeze them. Note that the <structfield>relfrozenxid</structfield> can only be advanced when every page of the table that might contain unfrozen XIDs is scanned. """ I know the last sentence seems a bit unrelated to the sentence before it. But I am trying to contrast the three situations. Most of the pages vacuum scanned are ones that have been modified since the previous vacuum. Some of them might be unmodified and we are scanning them because we want to freeze them. But, we also need to advance the relfrozenxid, so occasionally a large number of the pages that we scan might be unmodified. > + /* > + * We only want to enable eager scanning if we are likely to be able to > + * freeze some of the pages in the relation. We can freeze tuples older > + * than the visibility horizon calculated at the beginning of vacuum, but > + * we are only guaranteed to freeze them if at least one tuple on the page > + * precedes the freeze limit or multixact cutoff (calculated from > + * vacuum_[multixact_]freeze_min_age). So, if the oldest unfrozen xid > + * (relfrozenxid/relminmxid) does not precede the freeze cutoff, we aren't > + * likely to freeze many tuples. > + */ > > I think this could be clearer. And I wonder if "We can freeze tuples > older" is actually a typo for "We can freeze tuples newer", because > otherwise I can't make sense of the rest of the sentence. Right now it > seems to boil down to something like: We can freeze tuples older...but > we are only guaranteed to freeze them if they are older. I was trying to contrast opportunistic freezing with "guaranteed freezing". I was trying to say that we can freeze tuples older than FreezeLimit but we are only guaranteed to freeze tuples older than OldestXmin. > Maybe what we want to say here is something like: We only want to > enable eager scanning if we are likely to be able to freeze some of > the pages in the relation. If FreezeLimit has not advanced past > relfrozenxid, or if MultiXactCutoff has not advanced passed > relminmxid, then there's a good chance we won't be able to freeze > anything more than we already have. Success would not be impossible, > because there may be pages we didn't try to freeze previously, and > it's also possible that some XID greater than FreezeLimit has ended, > allowing for freezing. But as a heuristic, we wait until the > FreezeLimit advances, so that we won't repeatedly attempt eager > freezing while the same long-running transaction blocks progress every > time. I rewrote the comment using some of your input and trying to clarify my initial point about opportunistic vs guaranteed freezing. """ If FreezeLimit has not advanced past the relfrozenxid or MultiXactCutoff has not advanced past relminmxid, we are unlikely to be able to freeze anything new. Tuples with XIDs older than OldestXmin or MXIDs older than OldestMxact are technically freezable, but we won't freeze them unless some other criteria for opportunistic freezing is met. It is also possible than a transaction newer than the FreezeLimit has ended, rendering additional tuples freezable. As a heuristic, however, it makes sense to wait until the FreezeLimit or MultiXactCutoff has advanced before eager scanning. """ > + ereport(INFO, > + (errmsg("Vacuum successfully froze %u eager scanned blocks of > \"%s.%s.%s\". Now disabling eager scanning.", > > I predict that if Tom sees this, you will get complaints about both > the wording of the message, which pretty clearly does not conform to > style guidelines for a primary error message, and also about the use > of the INFO level. Allow me to suggest DEBUG1 or DEBUG2 and "disabling > eager scanning after freezing %u eagerly scanned blocks". I've used your wording. Just for future reference, are the style guidelines I was violating the capitalization and punctuation? Are these documented somewhere? Also, what is a primary error message? INFO level? Or ones that use ereport and are translated? I looked at other messages and saw that they don't capitalize the first word or use punctuation, so I assume that those were problems. > + /* > + * Aggressive vacuums cannot skip all-visible pages that are not also > + * all-frozen. Normal vacuums with eager scanning enabled only skip > + * such pages if they have hit the failure limit for the current eager > + * scan region. > + */ > + if (vacrel->aggressive || > + vacrel->eager_scan_remaining_fails > 0) > + { > + if (!vacrel->aggressive) > + next_unskippable_eager_scanned = true; > + break; > } > > I think this would be clearer as two separate if statements. if > (aggressive) break; if (vacrel->eager_scan_remaining_fails) { > next_unskippable_eager_scanned = true; break; } Very true. Fixed. - Melanie
v10-0001-Eagerly-scan-all-visible-pages-to-amortize-aggre.patch
Description: Binary data