On 01/04/2024 19:08, Melanie Plageman wrote:
On Mon, Apr 01, 2024 at 05:17:51PM +0300, Heikki Linnakangas wrote:
diff --git a/src/backend/access/heap/pruneheap.c
b/src/backend/access/heap/pruneheap.c
@@ -256,15 +270,16 @@ heap_page_prune(Relation relation, Buffer buffer,
tup.t_tableOid = RelationGetRelid(relation);
/*
- * Determine HTSV for all tuples.
+ * Determine HTSV for all tuples, and queue them up for processing as
HOT
+ * chain roots or as a heap-only items.
Reading this comment now as a whole, I would add something like
"Determining HTSV for all tuples once is required for correctness" to
the start of the second paragraph. The new conjunction on the first
paragraph sentence followed by the next paragraph is a bit confusing
because it sounds like queuing them up for processing is required for
correctness (which, I suppose it is on some level). Basically, I'm just
saying that it is now less clear what is required for correctness.
Fixed.
While I recognize this is a matter of style and not important, I
personally prefer this for reverse looping:
for (int i = prstate.nheaponly_items; i --> 0;)
I don't think we use that style anywhere in the Postgres source tree
currently. (And I don't like it ;-) )
I do think a comment about the reverse order would be nice. I know it
says something above the first loop to this effect:
* Processing the items in reverse order (and thus the tuples in
* increasing order) increases prefetching efficiency significantly /
* decreases the number of cache misses.
So perhaps we could just say "as above, process the items in reverse
order"
I'm actually not sure why it makes a difference. I would assume all the
data to already be in CPU cache at this point, since the first loop
already accessed it, so I think there's something else going on. But I
didn't investigate it deeper. Anyway, added a comment.
Committed the first of the remaining patches with those changes. And
also this, which is worth calling out:
if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
{
ItemId itemid = PageGetItemId(page, offnum);
HeapTupleHeader htup = (HeapTupleHeader)
PageGetItem(page, itemid);
if (likely(!HeapTupleHeaderIsHotUpdated(htup)))
{
HeapTupleHeaderAdvanceConflictHorizon(htup,
&prstate.latest_xid_removed);
heap_prune_record_unused(&prstate, offnum,
true);
}
else
{
/*
* This tuple should've been processed and
removed as part of
* a HOT chain, so something's wrong. To
preserve evidence,
* we don't dare to remove it. We cannot leave
behind a DEAD
* tuple either, because that will cause VACUUM
to error out.
* Throwing an error with a distinct error
message seems like
* the least bad option.
*/
elog(ERROR, "dead heap-only tuple (%u, %d) is not
linked to from any HOT chain",
blockno, offnum);
}
}
else
heap_prune_record_unchanged_lp_normal(page, &prstate,
offnum);
As you can see, I turned that into a hard error. Previously, that code
was at the top of heap_prune_chain(), and it was normal to see DEAD
heap-only tuples there, because they would presumably get processed
later as part of a HOT chain. But now this is done after all the HOT
chains have already been processed.
Previously if there was a dead heap-only tuple like that on the page for
some reason, it was silently not processed by heap_prune_chain()
(because it was assumed that it would be processed later as part of a
HOT chain), and was left behind as a HEAPTUPLE_DEAD tuple. If the
pruning was done as part of VACUUM, VACUUM would fail with "ERROR:
unexpected HeapTupleSatisfiesVacuum result". Or am I missing something?
Now you get that above error also on on-access pruning, which is not
ideal. But I don't remember hearing about corruption like that, and
you'd get the error on VACUUM anyway.
With the next patches, heap_prune_record_unchanged() will do more, and
will also throw an error on a HEAPTUPLE_LIVE tuple, so even though in
the first patch we could print just a WARNING and move on, it gets more
awkward with the rest of the patches.
(Continuing with the remaining patches..)
--
Heikki Linnakangas
Neon (https://neon.tech)