On Wed, 31 Mar 2021 at 05:35, Peter Geoghegan <p...@bowt.ie> wrote:
>
> On Wed, Mar 10, 2021 at 6:01 AM Matthias van de Meent
> <boekewurm+postg...@gmail.com> wrote:
> > > The case I was concerned about back when is that there are various bits of
> > > code that may visit a page with a predetermined TID in mind to look at.
> > > An index lookup is an obvious example, and another one is chasing an
> > > update chain's t_ctid link.  You might argue that if the tuple was dead
> > > enough to be removed, there should be no such in-flight references to
> > > worry about, but I think such an assumption is unsafe.  There is not, for
> > > example, any interlock that ensures that a process that has obtained a TID
> > > from an index will have completed its heap visit before a VACUUM that
> > > subsequently removed that index entry also removes the heap tuple.
> >
> > I am aware of this problem. I will admit that I did not detected all
> > potentially problematic accesses, so I'll show you my work.
>
> Attached is a trivial rebase of your v3, which I've called v4. I am
> interested in getting this patch into Postgres 14.

Thanks, and that'd be great! PFA v5, which fixes 1 issue later
mentioned, and adds some comments on existing checks that are now in a
critical path.

> > In my search for problematic accesses, I make the following assumptions:
> > * PageRepairFragmentation as found in bufpage is only applicable to
> > heap pages; this function is not applied to other pages in core
> > postgres. So, any problems that occur are with due to access with an
> > OffsetNumber > PageGetMaxOffsetNumber.
> > * Items [on heap pages] are only extracted after using PageGetItemId
> > for that item on the page, whilst holding a lock.
>
> > The 3 problem cases were classified based on the origin of the
> > potentially invalid pointer.
> >
> > Problem case 1: table scan; heapam.c lines 678 and 811, in heapgettup
>
> I think that it boils down to this: 100% of the cases where this could
> be a problem all either involve old TIDs, or old line pointer -- in
> principle these could be invalidated in some way, like your backwards
> scan example. But that's it. Bear in mind that we always call
> PageRepairFragmentation() with a super-exclusive lock.

Yeah, that's the gist of what I found out. All accesses using old line
pointers need revalidation, and there were some cases in which this
was not yet done correctly.

> > This is in the replay of transaction logs, in heap_xlog_freeze_page.
> > As I am unaware whether or not pages to which these transaction logs
> > are applied can contain changes from the xlog-generating instance, I
> > flagged this as a potential problem. The application of the xlogs is
> > probably safe (it assumes the existence of a HeapTupleHeader for that
> > ItemId), but my limited knowledge put this on the 'potential problem'
> > list.
> >
> > Please advise on this; I cannot find the right documentation
>
> Are you aware of wal_consistency_checking?

I was vaguely aware that an option with that name exists, but that was
about the extent. Thanks for pointing me in that direction.

> I think that this should be fine. There are differences that are
> possible between a replica and primary, but they're very minor and
> don't seem relevant.

OK, then I'll assume that WAL replay shouldn't cause problems here.

> > Problem case 3: invalid redirect pointers; pruneheap.c line 949, in
> > heap_get_root_tuples
> >
> > The heap pruning mechanism currently assumes that all redirects are
> > valid. Currently, if a HOT root points to !ItemIdIsNormal, it breaks
> > out of the loop, but doesn't actually fail. This code is already
> > potentially problematic because it has no bounds or sanity checking
> > for the nextoffnum, but with shrinking pd_linp it would also add the
> > failure mode of HOT tuples pointing into what is now arbitrary data.
>
> heap_prune_chain() is less trusting than heap_get_root_tuples(),
> though -- it doesn't trust redirects (because there is a generic
> offnum sanity check at the start of its loop). I think that the
> inconsistency between these two functions probably isn't hugely
> significant.
>
> Ideally it would be 100% clear which of the defenses in code like this
> is merely extra hardening. The assumptions should be formalized. There
> is nothing wrong with hardening, but we should know it when we see it.

I realized one of my Assert()s was incorrectly asserting an actually
valid page state, so I've updated and documented that case.

> > This failure mode is now accompanied by an Assert, which fails when
> > the redirect is to an invalid OffsetNumber. This is not enough to not
> > exhibit arbitrary behaviour when accessing corrupted data with
> > non-assert builds, but I think that that is fine; we already do not
> > have a guaranteed behaviour for this failure mode.
>
> What about my "set would-be LP_UNUSED space to all-0x7F bytes in
> CLOBBER_FREED_MEMORY builds" idea? Did you think about that?

I had implemented it locally, but was waiting for some more feedback
before posting that and got busy with other stuff since, it's now
attached.

I've also played around with marking the free space on the page as
undefined for valgrind, but later realized that that would make the
test for defined memory in PageAddItemExtended fail. This is
documented in the commit message of the attached patch 0002.


With regards,

Matthias van de Meent
From 736d285bd9b9918b1803da63359c6537aedbc863 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekew...@gmail.com>
Date: Tue, 9 Mar 2021 14:42:52 +0100
Subject: [PATCH v5 1/2] Truncate a pages' line pointer array when it has
 trailing unused ItemIds.

This will allow reuse of what is effectively free space for data as well as
new line pointers, instead of keeping it reserved for line pointers only.

An additional benefit is that the HasFreeLinePointers hint-bit optimization
now doesn't hint for free line pointers at the end of the array, slightly
increasing the specificity of where the free lines are; and saving us from
needing to search to the end of the array if all other entries are already
filled.
---
 src/backend/access/heap/heapam.c    | 16 +++++++++++++++-
 src/backend/access/heap/pruneheap.c |  8 ++++++++
 src/backend/storage/page/bufpage.c  | 13 ++++++++++++-
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 90711b2fcd..9f9c9c13b5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -635,8 +635,15 @@ heapgettup(HeapScanDesc scan,
 		}
 		else
 		{
+			/* 
+			 * The previous returned tuple may have been vacuumed since the
+			 * previous scan when we use a non-MVCC snapshot, so we must
+			 * re-establish the lineoff <= PageGetMaxOffsetNumber(dp)
+			 * invariant
+			 */
 			lineoff =			/* previous offnum */
-				OffsetNumberPrev(ItemPointerGetOffsetNumber(&(tuple->t_self)));
+				Min(lines,
+					OffsetNumberPrev(ItemPointerGetOffsetNumber(&(tuple->t_self))));
 		}
 		/* page and lineoff now reference the physically previous tid */
 
@@ -678,6 +685,13 @@ heapgettup(HeapScanDesc scan,
 	lpp = PageGetItemId(dp, lineoff);
 	for (;;)
 	{
+		/*
+		 * Only continue scanning the page while we have lines left.
+		 *
+		 * Note that this protects us from accessing line pointers past
+		 * PageGetMaxOffsetNumber(); both for forward scans when we resume
+		 * the table scan, and for when we start scanning a new page.
+		 */
 		while (linesleft > 0)
 		{
 			if (ItemIdIsNormal(lpp))
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8bb38d6406..ae40f7d07e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -946,6 +946,14 @@ heap_get_root_tuples(Page page, OffsetNumber *root_offsets)
 		 */
 		for (;;)
 		{
+			/* 
+			 * Check for broken chains:
+			 * Vacuum has at some point removed never-committed hot update
+			 * and has since truncate the LP array.
+			 */
+			if (nextoffnum > maxoff)
+				break;
+
 			lp = PageGetItemId(page, nextoffnum);
 
 			/* Check for broken chains */
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 5d5989c2f5..a591055699 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -676,7 +676,11 @@ compactify_tuples(itemIdCompact itemidbase, int nitems, Page page, bool presorte
  * PageRepairFragmentation
  *
  * Frees fragmented space on a page.
- * It doesn't remove unused line pointers! Please don't change this.
+ * It doesn't remove intermediate unused line pointers (that would mean
+ * moving ItemIds, and that would imply invalidating indexed values), but it
+ * does truncate the page->pd_linp array to the last unused line pointer, so
+ * that this space may also be reused for data, instead of only for line
+ * pointers.
  *
  * This routine is usable for heap pages only, but see PageIndexMultiDelete.
  *
@@ -697,6 +701,7 @@ PageRepairFragmentation(Page page)
 	int			nline,
 				nstorage,
 				nunused;
+	OffsetNumber lastUsed = InvalidOffsetNumber;
 	int			i;
 	Size		totallen;
 	bool		presorted = true;	/* For now */
@@ -730,6 +735,7 @@ PageRepairFragmentation(Page page)
 		lp = PageGetItemId(page, i);
 		if (ItemIdIsUsed(lp))
 		{
+			lastUsed = i;
 			if (ItemIdHasStorage(lp))
 			{
 				itemidptr->offsetindex = i - 1;
@@ -777,6 +783,11 @@ PageRepairFragmentation(Page page)
 		compactify_tuples(itemidbase, nstorage, page, presorted);
 	}
 
+	if (lastUsed != nline) {
+		((PageHeader) page)->pd_lower = SizeOfPageHeaderData + (sizeof(ItemIdData) * lastUsed);
+		nunused = nunused - (nline - lastUsed);
+	}
+
 	/* Set hint bit for PageAddItemExtended */
 	if (nunused > 0)
 		PageSetHasFreeLinePointers(page);
-- 
2.20.1

From 65e5cb5d5dfa17f6373a5df7d24ce93a1f1fc1f5 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekew...@gmail.com>
Date: Wed, 31 Mar 2021 10:16:28 +0200
Subject: [PATCH v5 2/2] Clobber free page space in PageRepairFragmentation.

We cannot mark this undefined for valgrind because PageAddItemExtended
has checks that it is inserting into defined memory.
---
 src/backend/storage/page/bufpage.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index a591055699..e3fc1ca9cd 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -788,6 +788,12 @@ PageRepairFragmentation(Page page)
 		nunused = nunused - (nline - lastUsed);
 	}
 
+#ifdef CLOBBER_FREED_MEMORY
+	memset(((char *) page) + ((PageHeader) page)->pd_lower,
+		0x7f,
+		((PageHeader) page)->pd_upper - ((PageHeader) page)->pd_upper);
+#endif
+
 	/* Set hint bit for PageAddItemExtended */
 	if (nunused > 0)
 		PageSetHasFreeLinePointers(page);
-- 
2.20.1

Reply via email to