Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

Kirill Reshke Thu, 18 Dec 2025 00:56:24 -0800

On Thu, 18 Dec 2025 at 05:30, Melanie Plageman
<[email protected]> wrote:
>
> > in v27-0005. This patch changes code which is not exercised in
> > tests[0]. I spent some time understanding the conditions when we
> > entered this. There is a comment about non-finished relation
> > extension, but I got no success trying to reproduce this. I ended up
> > modifying code to lose PageSetAllVisible in proper places and running
> > vacuum. Looks like everything works as expected. I will spend some
> > more time on this, maybe I will be successful in writing an
> > injection-point-based TAP test which hits this...
>
> Based on the coverage report link you provided, that code is changed
> by v27 0007, not 0005. 0005 is about moving an assertion out of
> lazy_scan_prune(). 0007 changes lazy_scan_new_or_empty() (the code in
> question).
>
> Regarding 0007, it looks like what is uncovered (the orange bits in
> the coverage report are uncovered, I assume) is empty pages _without_
> PD_ALL_VISIBLE set. I don't see anywhere where PageSetAllVisible() is
> called except vacuum and COPY FREEZE.


Sure, I meant 0007.

> If I was trying to guess how empty pages with PD_ALL_VISIBLE set are
> getting vacuumed, I would think it is due to SKIP_PAGES_THRESHOLD
> causing us to vacuum an all-frozen empty page.

Yes, vacuum (disable_page_skipping);

> Then the question is, why wouldn't we have coverage of the empty page
> first being set all-visible/all-frozen? It can't be COPY FREEZE
> because the page is empty. And it can't be vacuum, because then we
> would have coverage. It's very mysterious.
>
> It would be good to have coverage for this case. I don't think you'll
> need an injection point for the main case of "empty page not yet set
> all-visible is vacuumed for the first time" (unless I'm
> misunderstanding something).
>
> I'm not sure how you'll test the "vacuuming an empty, previously
> uninitialized page" case described in this comment, though.
>
>              * It's possible that another backend has extended the heap,
>              * initialized the page, and then failed to WAL-log the page due
>              * to an ERROR.  Since heap extension is not WAL-logged, recovery
>              * might try to replay our record setting the page all-visible and
>              * find that the page isn't initialized, which will cause a PANIC.
>              * To prevent that, check whether the page has been previously
>              * WAL-logged, and if not, do that now.
>
> You'd want to force an error during relation extension and then vacuum
> the page. I don't know if you need an injection point to force the
> error -- depends on what kind of error, I think.

I did small archeology and this "if (PageIsEmpty(page)) {   if
(!PageIsAllVisible(page)) { .... }}" code  originates back to
608195a3a365. Comment about not WAL-logged relation extension is from
a6370fd9ed3d, and I don't think we need to think about this case.

I am currently inclined to think that we cannot see an empty page that
has PD_ALL_VISIBLE not-set. This is because when we make a page empty,
we are in a critical section, and we WAL-log everything we do, so our
changes should not be half-made. Maybe as of 608195a3a365, there was a
case with empry-page-without-PD_ALL_VISIBLE, but I dont think this
happens on HEAD.

> So that I know for attribution, did you review 0003-0005?

yes, but I did not have any valuable review points for them.


Also, after the whole set is committed, we should then never
experience discrepancy between  PD_ALL_VISIBLE and VM bits? Because
they will be set in a single WAL record. The only cases when heap and
VM disagrees on all-visibility then are corruption,
pg_visibilitymap_truncate and old data (data before v19+ upgrade?)
If my understanding is correct, should we add document this?

-- 
Best regards,
Kirill Reshke

Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

Reply via email to