On Thu, Dec 18, 2025 at 3:55 AM Kirill Reshke <[email protected]> wrote:
>
> On Thu, 18 Dec 2025 at 05:30, Melanie Plageman
> <[email protected]> wrote:
>
> > If I was trying to guess how empty pages with PD_ALL_VISIBLE set are
> > getting vacuumed, I would think it is due to SKIP_PAGES_THRESHOLD
> > causing us to vacuum an all-frozen empty page.
>
> Yes, vacuum (disable_page_skipping);
Ah, right, that would be a reliable way for it to happen.
> > Then the question is, why wouldn't we have coverage of the empty page
> > first being set all-visible/all-frozen? It can't be COPY FREEZE
> > because the page is empty. And it can't be vacuum, because then we
> > would have coverage. It's very mysterious.
<--snip-->
> I am currently inclined to think that we cannot see an empty page that
> has PD_ALL_VISIBLE not-set. This is because when we make a page empty,
> we are in a critical section, and we WAL-log everything we do, so our
> changes should not be half-made. Maybe as of 608195a3a365, there was a
> case with empry-page-without-PD_ALL_VISIBLE, but I dont think this
> happens on HEAD.
Right, so the way that empty pages get set PD_ALL_VISIBLE is when a
page has all its tuples deleted, the next time it is vacuumed it will
be set all-visible and all-frozen and have PD_ALL_VISIBLE set. (if
it's a trailing page it will be truncated, but any non-trailing page
will be like this).
But you are right, I don't see any non-error code path where a heap
page would become empty (all line pointers set unused) and then not be
set all-visible. Only vacuum sets line pointers unused and if all the
line pointers are unused it will always set the page all-visible.
I think, though, that if we error out in lazy_scan_prune() after
returning from heap_page_prune_and_freeze() such that we don't set the
empty page all-visible, we can end up with an empty page without
PD_ALL_VISIBLE set. You can see how this might work by patching the VM
set code in lazy_scan_prune() to skip empty pages.
> I did small archeology and this "if (PageIsEmpty(page)) { if
> (!PageIsAllVisible(page)) { .... }}" code originates back to
> 608195a3a365. Comment about not WAL-logged relation extension is from
> a6370fd9ed3d, and I don't think we need to think about this case.
Thanks for looking into this. Even if this code was added to handle
the error codepath I mentioned above, it seems like it would have been
good enough to just let lazy_scan_prune() handle setting the empty
page all-visible the next time the page was vacuumed. Since there is
no non-error code path where this can happen, it doesn't seem like it
would merit its own special case.
It is possible it was more common as of 608195a3a365, as you say.
I don't understand how the bug fixed by a6370fd9ed3d can happen. When
a new page is initialized, flags are set to 0, so regardless of WAL
logging of the extension not happening, how would the new page have
been set PD_ALL_VISIBLE? We'll have to ask Andres or Robert about how
this was hit.
> Also, after the whole set is committed, we should then never
> experience discrepancy between PD_ALL_VISIBLE and VM bits? Because
> they will be set in a single WAL record. The only cases when heap and
> VM disagrees on all-visibility then are corruption,
> pg_visibilitymap_truncate and old data (data before v19+ upgrade?)
> If my understanding is correct, should we add document this?
Even on current master, I don't see a scenario other than VM
corruption or truncation where PD_ALL_VISIBLE can be set but not the
VM (or vice versa). The only way would be if you error out after
setting PD_ALL_VISIBLE before setting the VM. Setting PD_ALL_VISIBLE
is not in a critical section in lazy_scan_prune(), so it won't panic
and dump shared memory, so the buffer with PD_ALL_VISIBLE set may
later get written out. But the only obvious way I see to error out of
MarkBufferDirty() is if the buffer is not valid -- which would have
kept us from doing previous operations on the buffer, I would think.
It's true this will no longer happen after my patches, as
PageSetAllVisible() will happen in a critical section. We could add a
comment about this particular scenario in the code somewhere. But I
don't think we should document it in any user-facing documentation
since you could still truncate the VM and have the two out of sync.
- Melanie