On Fri, Feb 7, 2025 at 3:38 PM Nathan Bossart <nathandboss...@gmail.com> wrote: > > On Fri, Feb 07, 2025 at 02:21:07PM -0500, Melanie Plageman wrote: > > On Fri, Feb 7, 2025 at 12:37 PM Nathan Bossart <nathandboss...@gmail.com> > > wrote: > >> > >> Wouldn't relallvisible be sufficient here? We'll skip all-visible pages > >> unless this is an anti-wraparound vacuum, at which point I would think the > >> insert threshold goes out the window. > > > > It's a great question. There are a couple reasons why I don't think so. > > > > I think this might lead to triggering vacuums too often for > > insert-mostly tables. For those tables, the pages that are not > > all-visible will largely be just those with data that is new since the > > last vacuum. And if we trigger vacuums based off of the % not > > all-visible, we might decrease the number of cases where we are able > > to vacuum inserted data and freeze it the first time it is vacuumed -- > > thereby increasing the total amount of work. > > Rephrasing to make sure I understand correctly: you're saying that using > all-frozen would trigger less frequent insert vacuums, which would give us > a better chance of freezing more than more frequent insert vacuums > triggered via all-visible? My suspicion is that the difference would tend > to be quite subtle in practice, but I have no concrete evidence to back > that up.
You understood me correctly. As for relallfrozen, one of the justifications for adding it to pg_class is actually for the visibility it would provide. We have no way of knowing how many all-visible but not all-frozen pages there are on users' systems without pg_visibility. If users had this information, they could potentially tune their freeze-related settings more aggressively. Regularly reading the whole visibility map with pg_visibilitymap_summary() is pretty hard to justify on most production systems. But querying pg_class every 10 minutes or something is much more reasonable. - Melanie