Hi,

On 2023-04-03 14:25:59 +0200, Tomas Vondra wrote:
> On 4/3/23 00:40, Andres Freund wrote:
> > Hi,
> >
> > On 2023-03-28 19:17:21 -0700, Andres Freund wrote:
> >> On 2023-03-28 18:21:02 -0700, Andres Freund wrote:
> >>> Here's a draft patch.
> >>
> >> Attached is v2, with a stupid bug fixed and a bit of comment / pgindent
> >> polish.
> >
> > I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead
> > with this.
> >
>
> I guess the 0001 part was already pushed, so I should be looking only at
> 0002, correct?

Yes.


> I think 0002 makes RelationGetBufferForTuple() harder to understand. I'm
> not saying it's incorrect, but I find it hard to reason about the new
> combinations of conditions :-(

> I mean, it only had this condition:
>
>     if (otherBuffer != InvalidBuffer)
>     {
>         ...
>     }
>
> but now it have
>
>     if (unlockedTargetBuffer)
>     {
>        ...
>     }
>     else if (otherBuffer != InvalidBuffer)
>     {
>         ...
>     }
>
>     if (unlockedTargetBuffer || otherBuffer != InvalidBuffer)
>     {
>         ...
>     }
>
> Not sure how to improve that :-/ but not exactly trivial to figure out
> what's going to happen.

It's not great, I agree. I tried to make it easier to read in this version by
a) changing GetVisibilityMapPins() as I proposed
b) added a new variable "recheckVmPins", that gets set in
   if (unlockedTargetBuffer)
   and
   if (otherBuffer != InvalidBuffer)
c) reformulated comments


> Maybe this
>
>  * If we unlocked the target buffer above, it's unlikely, but possible,
>  * that another backend used space on this page.
>
> might say what we're going to do in this case. I mean - I understand
> some backend may use space in unlocked page, but what does that mean for
> this code? What is it going to do? (The same comment talks about the
> next condition in much more detail, for example.)

There's a comment about that detail further below. But you're right, it wasn't
clear as-is. How about now?


> Also, isn't it a bit strange the free space check now happens outside
> any if condition? It used to happen in the
>
>     if (otherBuffer != InvalidBuffer)
>     {
>         ...
>     }
>
> block, but now it happens outside.

Well, the alternative is to repeat it in the two branches, which doesn't seem
great either. Particularly because there'll be a third branch after the bulk
extension patch.

Greetings,

Andres Freund
>From 49183283d6701d22cf12f6a6280ed1aba02fc063 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Mon, 3 Apr 2023 11:18:10 -0700
Subject: [PATCH v3 1/2] hio: Relax rules for calling GetVisibilityMapPins()

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/heap/hio.c | 37 ++++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 7479212d4e0..e8aea42f8a9 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -131,9 +131,9 @@ ReadBufferBI(Relation relation, BlockNumber targetBlock,
  * For each heap page which is all-visible, acquire a pin on the appropriate
  * visibility map page, if we haven't already got one.
  *
- * buffer2 may be InvalidBuffer, if only one buffer is involved.  buffer1
- * must not be InvalidBuffer.  If both buffers are specified, block1 must
- * be less than block2.
+ * To avoid complexity in the callers, either buffer1 or buffer2 may be
+ * InvalidBuffer if only one buffer is involved. For the same reason, block2
+ * may be smaller than block1.
  */
 static void
 GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
@@ -143,6 +143,26 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 	bool		need_to_pin_buffer1;
 	bool		need_to_pin_buffer2;
 
+	/*
+	 * Swap buffers around to handle case of a single block/buffer, and to
+	 * handle if lock ordering rules require to lock block2 first.
+	 */
+	if (!BufferIsValid(buffer1) ||
+		(BufferIsValid(buffer2) && block1 > block2))
+	{
+		Buffer tmpbuf = buffer1;
+		Buffer *tmpvmbuf = vmbuffer1;
+		BlockNumber tmpblock = block1;
+
+		buffer1 = buffer2;
+		vmbuffer1 = vmbuffer2;
+		block1 = block2;
+
+		buffer2 = tmpbuf;
+		vmbuffer2 = tmpvmbuf;
+		block2 = tmpblock;
+	}
+
 	Assert(BufferIsValid(buffer1));
 	Assert(buffer2 == InvalidBuffer || block1 <= block2);
 
@@ -502,14 +522,9 @@ loop:
 		 * done a bit of extra work for no gain, but there's no real harm
 		 * done.
 		 */
-		if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock)
-			GetVisibilityMapPins(relation, buffer, otherBuffer,
-								 targetBlock, otherBlock, vmbuffer,
-								 vmbuffer_other);
-		else
-			GetVisibilityMapPins(relation, otherBuffer, buffer,
-								 otherBlock, targetBlock, vmbuffer_other,
-								 vmbuffer);
+		GetVisibilityMapPins(relation, buffer, otherBuffer,
+							 targetBlock, otherBlock, vmbuffer,
+							 vmbuffer_other);
 
 		/*
 		 * Now we can check to see if there's enough free space here. If so,
-- 
2.38.0

>From 27f9ace30d9862f05b73676526a1b0233ef1faea Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Mon, 3 Apr 2023 11:26:16 -0700
Subject: [PATCH v3 2/2] hio: Don't pin the VM while holding buffer lock while
 extending

Discussion: http://postgr.es/m/20230325025740.wzvchp2kromw4...@awork3.anarazel.de
---
 src/backend/access/heap/hio.c | 126 +++++++++++++++++++++++-----------
 1 file changed, 85 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index e8aea42f8a9..15912f6415f 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -134,14 +134,17 @@ ReadBufferBI(Relation relation, BlockNumber targetBlock,
  * To avoid complexity in the callers, either buffer1 or buffer2 may be
  * InvalidBuffer if only one buffer is involved. For the same reason, block2
  * may be smaller than block1.
+ *
+ * Returns whether buffer locks were temporarily released.
  */
-static void
+static bool
 GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 					 BlockNumber block1, BlockNumber block2,
 					 Buffer *vmbuffer1, Buffer *vmbuffer2)
 {
 	bool		need_to_pin_buffer1;
 	bool		need_to_pin_buffer2;
+	bool		released_locks = false;
 
 	/*
 	 * Swap buffers around to handle case of a single block/buffer, and to
@@ -175,9 +178,10 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 			&& PageIsAllVisible(BufferGetPage(buffer2))
 			&& !visibilitymap_pin_ok(block2, *vmbuffer2);
 		if (!need_to_pin_buffer1 && !need_to_pin_buffer2)
-			return;
+			break;
 
 		/* We must unlock both buffers before doing any I/O. */
+		released_locks = true;
 		LockBuffer(buffer1, BUFFER_LOCK_UNLOCK);
 		if (buffer2 != InvalidBuffer && buffer2 != buffer1)
 			LockBuffer(buffer2, BUFFER_LOCK_UNLOCK);
@@ -203,6 +207,8 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 			|| (need_to_pin_buffer1 && need_to_pin_buffer2))
 			break;
 	}
+
+	return released_locks;
 }
 
 /*
@@ -365,6 +371,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	BlockNumber targetBlock,
 				otherBlock;
 	bool		needLock;
+	bool		unlockedTargetBuffer;
+	bool		recheckVmPins;
 
 	len = MAXALIGN(len);		/* be conservative */
 
@@ -645,6 +653,9 @@ loop:
 	if (needLock)
 		UnlockRelationForExtension(relation, ExclusiveLock);
 
+	unlockedTargetBuffer = false;
+	targetBlock = BufferGetBlockNumber(buffer);
+
 	/*
 	 * We need to initialize the empty new page.  Double-check that it really
 	 * is empty (this should never happen, but if it does we don't want to
@@ -654,75 +665,108 @@ loop:
 
 	if (!PageIsNew(page))
 		elog(ERROR, "page %u of relation \"%s\" should be empty but is not",
-			 BufferGetBlockNumber(buffer),
+			 targetBlock,
 			 RelationGetRelationName(relation));
 
 	PageInit(page, BufferGetPageSize(buffer), 0);
 	MarkBufferDirty(buffer);
 
 	/*
-	 * The page is empty, pin vmbuffer to set all_frozen bit.
+	 * The page is empty, pin vmbuffer to set all_frozen bit. We don't want to
+	 * do IO while the buffer is locked, so we unlock the page first if IO is
+	 * needed (necessitating checks below).
 	 */
 	if (options & HEAP_INSERT_FROZEN)
 	{
-		Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
-		visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+		Assert(PageGetMaxOffsetNumber(page) == 0);
+
+		if (!visibilitymap_pin_ok(targetBlock, *vmbuffer))
+		{
+			unlockedTargetBuffer = true;
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+			visibilitymap_pin(relation, targetBlock, vmbuffer);
+		}
 	}
 
 	/*
-	 * Lock the other buffer. It's guaranteed to be of a lower page number
-	 * than the new page. To conform with the deadlock prevent rules, we ought
-	 * to lock otherBuffer first, but that would give other backends a chance
-	 * to put tuples on our page. To reduce the likelihood of that, attempt to
-	 * lock the other buffer conditionally, that's very likely to work.
-	 * Otherwise we need to lock buffers in the correct order, and retry if
-	 * the space has been used in the mean time.
+	 * Reacquire locks if necessary.
 	 *
-	 * Alternatively, we could acquire the lock on otherBuffer before
-	 * extending the relation, but that'd require holding the lock while
-	 * performing IO, which seems worse than an unlikely retry.
+	 * If the target buffer was unlocked above, or is unlocked while
+	 * reacquiring the lock on otherBuffer below, it's unlikely, but possible,
+	 * that another backend used space on this page. We check for that below,
+	 * and retry if necessary.
 	 */
-	if (otherBuffer != InvalidBuffer)
+	recheckVmPins = false;
+	if (unlockedTargetBuffer)
 	{
+		/* released lock on target buffer above */
+		if (otherBuffer != InvalidBuffer)
+			LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
+		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+		recheckVmPins = true;
+	}
+	else if (otherBuffer != InvalidBuffer)
+	{
+		/*
+		 * We did not release the target buffer, and otherBuffer is valid,
+		 * need to lock the other buffer. It's guaranteed to be of a lower
+		 * page number than the new page.  To conform with the deadlock
+		 * prevent rules, we ought to lock otherBuffer first, but that would
+		 * give other backends a chance to put tuples on our page. To reduce
+		 * the likelihood of that, attempt to lock the other buffer
+		 * conditionally, that's very likely to work.
+		 *
+		 * Alternatively, we could acquire the lock on otherBuffer before
+		 * extending the relation, but that'd require holding the lock while
+		 * performing IO, which seems worse than an unlikely retry.
+		 */
 		Assert(otherBuffer != buffer);
-		targetBlock = BufferGetBlockNumber(buffer);
 		Assert(targetBlock > otherBlock);
 
 		if (unlikely(!ConditionalLockBuffer(otherBuffer)))
 		{
+			unlockedTargetBuffer = true;
 			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 			LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 		}
+		recheckVmPins = true;
+	}
 
-		/*
-		 * Because the buffers were unlocked for a while, it's possible,
-		 * although unlikely, that an all-visible flag became set or that
-		 * somebody used up the available space in the new page.  We can use
-		 * GetVisibilityMapPins to deal with the first case.  In the second
-		 * case, just retry from start.
-		 */
-		GetVisibilityMapPins(relation, otherBuffer, buffer,
-							 otherBlock, targetBlock, vmbuffer_other,
-							 vmbuffer);
+	/*
+	 * If one of the buffers was unlocked (always the case if otherBuffer is
+	 * valid), it's possible, although unlikely, that an all-visible flag
+	 * became set.  We can use GetVisibilityMapPins to deal with that. It's
+	 * possible that GetVisibilityMapPins() might need to temporarily release
+	 * buffer locks, in which case we'll need to check if there's still enough
+	 * space on the page below.
+	 */
+	if (recheckVmPins)
+	{
+		if (GetVisibilityMapPins(relation, otherBuffer, buffer,
+								 otherBlock, targetBlock, vmbuffer_other,
+								 vmbuffer))
+			unlockedTargetBuffer = true;
+	}
 
-		/*
-		 * Note that we have to check the available space even if our
-		 * conditional lock succeeded, because GetVisibilityMapPins might've
-		 * transiently released lock on the target buffer to acquire a VM pin
-		 * for the otherBuffer.
-		 */
-		if (len > PageGetHeapFreeSpace(page))
+	/*
+	 * If the target buffer was temporarily unlocked since the relation
+	 * extension, it's possible, although unlikely, that all the space on the
+	 * page was already used. If so, we just retry from the start.  If we
+	 * didn't unlock, something has gone wrong if there's not enough space -
+	 * the test at the top should have prevented reaching this case.
+	 */
+	pageFreeSpace = PageGetHeapFreeSpace(page);
+	if (len > pageFreeSpace)
+	{
+		if (unlockedTargetBuffer)
 		{
-			LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK);
+			if (otherBuffer != InvalidBuffer)
+				LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK);
 			UnlockReleaseBuffer(buffer);
 
 			goto loop;
 		}
-	}
-	else if (len > PageGetHeapFreeSpace(page))
-	{
-		/* We should not get here given the test at the top */
 		elog(PANIC, "tuple is too big: size %zu", len);
 	}
 
@@ -735,7 +779,7 @@ loop:
 	 * current backend to make more insertions or not, which is probably a
 	 * good bet most of the time.  So for now, don't add it to FSM yet.
 	 */
-	RelationSetTargetBlock(relation, BufferGetBlockNumber(buffer));
+	RelationSetTargetBlock(relation, targetBlock);
 
 	return buffer;
 }
-- 
2.38.0

Reply via email to