On Thu, Mar 28, 2024 at 4:49 AM Heikki Linnakangas <[email protected]> wrote:
>
> On 28/03/2024 03:53, Melanie Plageman wrote:
> > On Thu, Mar 28, 2024 at 01:04:04AM +0200, Heikki Linnakangas wrote:
> >> One change with this is that live_tuples and many of the other fields are
> >> now again updated, even if the caller doesn't need them. It was hard to
> >> skip
> >> them in a way that would save any cycles, with the other refactorings.
> >
> > I am worried we are writing checks that are going to have to come out of
> > SELECT queries' bank accounts, but I'll do some benchmarking when we're
> > all done with major refactoring.
>
> Sounds good, thanks.
Actually, after having looked at it again, there really are only a few
more increments of various counters, the memory consumed by revisit,
and the additional setting of offsets in marked. I think a few
carefully constructed queries which do on-access pruning could test
the impact of this (as opposed to a bigger benchmarking endeavor).
I also wonder if there would be any actual impact of marking the
various record_*() routines inline.
> >> @@ -728,10 +832,12 @@ htsv_get_valid_status(int status)
> >> * DEAD, our visibility test is just too coarse to detect it.
> >> *
> >> * In general, pruning must never leave behind a DEAD tuple that still
> >> has
> >> - * tuple storage. VACUUM isn't prepared to deal with that case. That's
> >> why
> >> + * tuple storage. VACUUM isn't prepared to deal with that case (FIXME:
> >> it no longer cares, right?).
> >> + * That's why
> >> * VACUUM prunes the same heap page a second time (without dropping its
> >> lock
> >> * in the interim) when it sees a newly DEAD tuple that we initially saw
> >> as
> >> - * in-progress. Retrying pruning like this can only happen when an
> >> inserting
> >> + * in-progress (FIXME: Really? Does it still do that?).
> >
> > Yea, I think all that is no longer true. I missed this comment back
> > then.
>
> Committed a patch to remove it.
>
> >> @@ -981,7 +1069,18 @@ heap_prune_chain(Buffer buffer, OffsetNumber
> >> rootoffnum,
> >> * RECENTLY_DEAD tuples just in case there's a DEAD one after
> >> them;
> >> * but we can't advance past anything else. We have to make
> >> sure that
> >> * we don't miss any DEAD tuples, since DEAD tuples that
> >> still have
> >> - * tuple storage after pruning will confuse VACUUM.
> >> + * tuple storage after pruning will confuse VACUUM (FIXME:
> >> not anymore
> >> + * I think?).
> >
> > Meaning, it won't confuse vacuum anymore or there won't be DEAD tuples
> > with storage after pruning anymore?
>
> I meant that it won't confuse VACUUM anymore. lazy_scan_prune() doesn't
> loop through the items on the page checking their visibility anymore.
>
> Hmm, one confusion remains though: In the 2nd phase of vacuum, we remove
> all the dead line pointers that we have now removed from the indexes.
> When we do that, we assume them all to be dead line pointers, without
> storage, rather than normal tuples that happen to be HEAPTUPLE_DEAD. So
> it's important that if pruning would leave behind HEAPTUPLE_DEAD tuples,
> they are not included in 'deadoffsets'.
It seems like master only adds items it is marking LP_DEAD to
deadoffsets. And things marked LP_DEAD have lp_len set to 0.
> In any case, let's just make sure that pruning doesn't leave
> HEAPTUPLE_DEAD tuples. There's no reason it should.
Maybe worth adding an assert to
static void
heap_prune_record_unchanged_lp_dead(ItemId itemid, PruneState
*prstate, OffsetNumber offnum)
{
...
Assert(!ItemIdHasStorage(itemid));
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
}
By the way, I wasn't quite sure about the purpose of
heap_prune_record_unchanged_lp_dead(). It is called in
heap_prune_chain() in a place where we didn't add things to
deadoffsets before, no?
/*
* Likewise, a dead line pointer can't be part of the chain. (We
* already eliminated the case of dead root tuple outside this
* function.)
*/
if (ItemIdIsDead(lp))
{
/*
* If the caller set PRUNE_DO_MARK_UNUSED_NOW, we can set dead
* line pointers LP_UNUSED now.
*/
if (unlikely(prstate->actions & PRUNE_DO_MARK_UNUSED_NOW))
heap_prune_record_unused(prstate, offnum, false);
else
heap_prune_record_unchanged_lp_dead(lp, prstate, offnum);
break;
}
> >> @@ -1224,10 +1327,21 @@ heap_prune_record_live_or_recently_dead(Page page,
> >> PruneState *prstate, OffsetNu
> >> * ensure the math works out. The assumption that the transaction will
> >> * commit and update the counters after we report is a bit shaky; but
> >> it
> >> * is what acquire_sample_rows() does, so we do the same to be
> >> consistent.
> >> + *
> >> + * HEAPTUPLE_DEAD are handled by the other heap_prune_record_*()
> >> + * subroutines. They don't count dead items like
> >> acquire_sample_rows()
> >> + * does, because we assume that all dead items will become LP_UNUSED
> >> + * before VACUUM finishes. This difference is only superficial.
> >> VACUUM
> >> + * effectively agrees with ANALYZE about DEAD items, in the end.
> >> VACUUM
> >> + * won't remember LP_DEAD items, but only because they're not
> >> supposed to
> >> + * be left behind when it is done. (Cases where we bypass index
> >> vacuuming
> >> + * will violate this optimistic assumption, but the overall impact of
> >> that
> >> + * should be negligible.) FIXME: I don't understand that last
> >> sentence in
> >> + * parens. It was copied from elsewhere.
> >
> > If we bypass index vacuuming, there will be LP_DEAD items left behind
> > when we are done because we didn't do index vacuuming and then reaping
> > of those dead items. All of these comments are kind of a copypasta,
> > though.
>
> Ah, gotcha, makes sense now. I didn't remember that we sometimes by pass
> index vacuuming if there are very few dead items. I thought this talked
> about the case that there are no indexes, but that case is OK.
These comments could use another pass. I had added some extra
(probably redundant) content when I thought I was refactoring it a
certain way and then changed my mind.
Attached is a diff with some ideas I had for a bit of code simplification.
Are you working on cleaning this patch up or should I pick it up?
I wonder if it makes sense to move some of the changes to
heap_prune_chain() with setting things in marked/revisited to the
start of the patch set (to be committed first).
- Melanie
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a8ed11a1858..2f477aa43b1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,7 +143,7 @@ static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber o
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_unchanged(Page page, PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_dead(ItemId itemid, PruneState *prstate, OffsetNumber offnum);
static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -766,7 +766,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* we can advance relfrozenxid and relminmxid to the values in
* pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
* MFIXME: which one should be pick if presult->nfrozen == 0 and
- * presult->all_frozen = True.
+ * presult->all_frozen = True. MTODO: see Peter's response here
+ * https://www.postgresql.org/message-id/CAH2-Wz%3DLmOs%3DiJ%3DFfCERnma0q7QjaNSnCgWEp7zOK7hD24YC_w%40mail.gmail.com
*/
if (new_relfrozen_xid)
{
@@ -868,9 +869,12 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
OffsetNumber latestdead = InvalidOffsetNumber,
maxoff = PageGetMaxOffsetNumber(dp),
offnum;
+
+ /* TODO: while maybe self-explanatory, I would prefer if chainitems and */
+ /* nchain had a comment up here */
OffsetNumber chainitems[MaxHeapTuplesPerPage];
- int nchain = 0,
- i;
+ int nchain = 0;
+ int i = 0;
rootlp = PageGetItemId(dp, rootoffnum);
@@ -943,6 +947,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
prstate->revisit[prstate->nrevisit++] = rootoffnum;
+ /* TODO: I don't like this comment now */
/* Nothing more to do */
return;
}
@@ -1020,7 +1025,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (unlikely(prstate->actions & PRUNE_DO_MARK_UNUSED_NOW))
heap_prune_record_unused(prstate, offnum, false);
else
- heap_prune_record_unchanged_lp_dead(prstate, offnum);
+ heap_prune_record_unchanged_lp_dead(lp, prstate, offnum);
break;
}
@@ -1044,6 +1049,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
tupdead = recent_dead = false;
+ /* TODO: maybe this should just be an if statement now */
switch (htsv_get_valid_status(prstate->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
@@ -1132,42 +1138,34 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* dead and the root line pointer can be marked dead. Otherwise just
* redirect the root to the correct chain member.
*/
- if (i >= nchain)
- heap_prune_record_dead_or_unused(prstate, rootoffnum, ItemIdIsNormal(rootlp));
- else
- {
+ if (i < nchain)
heap_prune_record_redirect(prstate, rootoffnum, chainitems[i], ItemIdIsNormal(rootlp));
-
- /* the rest of tuples in the chain are normal, unchanged tuples */
- for (; i < nchain; i++)
- heap_prune_record_unchanged(dp, prstate, chainitems[i]);
- }
+ else
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, ItemIdIsNormal(rootlp));
}
- else
+ else if ((i = ItemIdIsRedirected(rootlp)))
{
- i = 0;
- if (ItemIdIsRedirected(rootlp))
+ if (i < nchain)
{
- if (nchain < 2)
- {
- /*
- * We found a redirect item that doesn't point to a valid
- * follow-on item. This can happen if the loop in
- * heap_page_prune_and_freeze() caused us to visit the dead
- * successor of a redirect item before visiting the redirect
- * item. We can clean up by setting the redirect item to DEAD
- * state or LP_UNUSED if the caller indicated.
- */
- heap_prune_record_dead_or_unused(prstate, rootoffnum, false);
- }
- else
- heap_prune_record_unchanged_lp_redirect(prstate, rootoffnum);
- i++;
+ heap_prune_record_unchanged_lp_redirect(prstate, rootoffnum);
+ }
+ else
+ {
+ /*
+ * We found a redirect item that doesn't point to a valid
+ * follow-on item. This can happen if the loop in
+ * heap_page_prune_and_freeze() caused us to visit the dead
+ * successor of a redirect item before visiting the redirect item.
+ * We can clean up by setting the redirect item to DEAD state or
+ * LP_UNUSED if the caller indicated.
+ */
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, false);
}
- /* the rest of tuples in the chain are normal, unchanged tuples */
- for (; i < nchain; i++)
- heap_prune_record_unchanged(dp, prstate, chainitems[i]);
}
+
+ /* the rest of tuples in the chain are normal, unchanged tuples */
+ for (; i < nchain; i++)
+ heap_prune_record_unchanged(dp, prstate, chainitems[i]);
}
/* Record lowest soon-prunable XID */
@@ -1486,7 +1484,7 @@ heap_prune_record_unchanged(Page page, PruneState *prstate, OffsetNumber offnum)
* Record line pointer that was already LP_DEAD and is left unchanged.
*/
static void
-heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unchanged_lp_dead(ItemId itemid, PruneState *prstate, OffsetNumber offnum)
{
/*
* Deliberately don't set hastup for LP_DEAD items. We make the soft
@@ -1506,6 +1504,7 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
+ Assert(!ItemIdHasStorage(itemid));
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
}