Hi,
On 2023-04-03 14:25:59 +0200, Tomas Vondra wrote:
> On 4/3/23 00:40, Andres Freund wrote:
> > Hi,
> >
> > On 2023-03-28 19:17:21 -0700, Andres Freund wrote:
> >> On 2023-03-28 18:21:02 -0700, Andres Freund wrote:
> >>> Here's a draft patch.
> >>
> >> Attached is v2, with a stupid bug fixed and a bit of comment / pgindent
> >> polish.
> >
> > I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead
> > with this.
> >
>
> I guess the 0001 part was already pushed, so I should be looking only at
> 0002, correct?
Yes.
> I think 0002 makes RelationGetBufferForTuple() harder to understand. I'm
> not saying it's incorrect, but I find it hard to reason about the new
> combinations of conditions :-(
> I mean, it only had this condition:
>
> if (otherBuffer != InvalidBuffer)
> {
> ...
> }
>
> but now it have
>
> if (unlockedTargetBuffer)
> {
> ...
> }
> else if (otherBuffer != InvalidBuffer)
> {
> ...
> }
>
> if (unlockedTargetBuffer || otherBuffer != InvalidBuffer)
> {
> ...
> }
>
> Not sure how to improve that :-/ but not exactly trivial to figure out
> what's going to happen.
It's not great, I agree. I tried to make it easier to read in this version by
a) changing GetVisibilityMapPins() as I proposed
b) added a new variable "recheckVmPins", that gets set in
if (unlockedTargetBuffer)
and
if (otherBuffer != InvalidBuffer)
c) reformulated comments
> Maybe this
>
> * If we unlocked the target buffer above, it's unlikely, but possible,
> * that another backend used space on this page.
>
> might say what we're going to do in this case. I mean - I understand
> some backend may use space in unlocked page, but what does that mean for
> this code? What is it going to do? (The same comment talks about the
> next condition in much more detail, for example.)
There's a comment about that detail further below. But you're right, it wasn't
clear as-is. How about now?
> Also, isn't it a bit strange the free space check now happens outside
> any if condition? It used to happen in the
>
> if (otherBuffer != InvalidBuffer)
> {
> ...
> }
>
> block, but now it happens outside.
Well, the alternative is to repeat it in the two branches, which doesn't seem
great either. Particularly because there'll be a third branch after the bulk
extension patch.
Greetings,
Andres Freund
>From 49183283d6701d22cf12f6a6280ed1aba02fc063 Mon Sep 17 00:00:00 2001
From: Andres Freund <[email protected]>
Date: Mon, 3 Apr 2023 11:18:10 -0700
Subject: [PATCH v3 1/2] hio: Relax rules for calling GetVisibilityMapPins()
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
src/backend/access/heap/hio.c | 37 ++++++++++++++++++++++++-----------
1 file changed, 26 insertions(+), 11 deletions(-)
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 7479212d4e0..e8aea42f8a9 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -131,9 +131,9 @@ ReadBufferBI(Relation relation, BlockNumber targetBlock,
* For each heap page which is all-visible, acquire a pin on the appropriate
* visibility map page, if we haven't already got one.
*
- * buffer2 may be InvalidBuffer, if only one buffer is involved. buffer1
- * must not be InvalidBuffer. If both buffers are specified, block1 must
- * be less than block2.
+ * To avoid complexity in the callers, either buffer1 or buffer2 may be
+ * InvalidBuffer if only one buffer is involved. For the same reason, block2
+ * may be smaller than block1.
*/
static void
GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
@@ -143,6 +143,26 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
bool need_to_pin_buffer1;
bool need_to_pin_buffer2;
+ /*
+ * Swap buffers around to handle case of a single block/buffer, and to
+ * handle if lock ordering rules require to lock block2 first.
+ */
+ if (!BufferIsValid(buffer1) ||
+ (BufferIsValid(buffer2) && block1 > block2))
+ {
+ Buffer tmpbuf = buffer1;
+ Buffer *tmpvmbuf = vmbuffer1;
+ BlockNumber tmpblock = block1;
+
+ buffer1 = buffer2;
+ vmbuffer1 = vmbuffer2;
+ block1 = block2;
+
+ buffer2 = tmpbuf;
+ vmbuffer2 = tmpvmbuf;
+ block2 = tmpblock;
+ }
+
Assert(BufferIsValid(buffer1));
Assert(buffer2 == InvalidBuffer || block1 <= block2);
@@ -502,14 +522,9 @@ loop:
* done a bit of extra work for no gain, but there's no real harm
* done.
*/
- if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock)
- GetVisibilityMapPins(relation, buffer, otherBuffer,
- targetBlock, otherBlock, vmbuffer,
- vmbuffer_other);
- else
- GetVisibilityMapPins(relation, otherBuffer, buffer,
- otherBlock, targetBlock, vmbuffer_other,
- vmbuffer);
+ GetVisibilityMapPins(relation, buffer, otherBuffer,
+ targetBlock, otherBlock, vmbuffer,
+ vmbuffer_other);
/*
* Now we can check to see if there's enough free space here. If so,
--
2.38.0
>From 27f9ace30d9862f05b73676526a1b0233ef1faea Mon Sep 17 00:00:00 2001
From: Andres Freund <[email protected]>
Date: Mon, 3 Apr 2023 11:26:16 -0700
Subject: [PATCH v3 2/2] hio: Don't pin the VM while holding buffer lock while
extending
Discussion: http://postgr.es/m/[email protected]
---
src/backend/access/heap/hio.c | 126 +++++++++++++++++++++++-----------
1 file changed, 85 insertions(+), 41 deletions(-)
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index e8aea42f8a9..15912f6415f 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -134,14 +134,17 @@ ReadBufferBI(Relation relation, BlockNumber targetBlock,
* To avoid complexity in the callers, either buffer1 or buffer2 may be
* InvalidBuffer if only one buffer is involved. For the same reason, block2
* may be smaller than block1.
+ *
+ * Returns whether buffer locks were temporarily released.
*/
-static void
+static bool
GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
BlockNumber block1, BlockNumber block2,
Buffer *vmbuffer1, Buffer *vmbuffer2)
{
bool need_to_pin_buffer1;
bool need_to_pin_buffer2;
+ bool released_locks = false;
/*
* Swap buffers around to handle case of a single block/buffer, and to
@@ -175,9 +178,10 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
&& PageIsAllVisible(BufferGetPage(buffer2))
&& !visibilitymap_pin_ok(block2, *vmbuffer2);
if (!need_to_pin_buffer1 && !need_to_pin_buffer2)
- return;
+ break;
/* We must unlock both buffers before doing any I/O. */
+ released_locks = true;
LockBuffer(buffer1, BUFFER_LOCK_UNLOCK);
if (buffer2 != InvalidBuffer && buffer2 != buffer1)
LockBuffer(buffer2, BUFFER_LOCK_UNLOCK);
@@ -203,6 +207,8 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
|| (need_to_pin_buffer1 && need_to_pin_buffer2))
break;
}
+
+ return released_locks;
}
/*
@@ -365,6 +371,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
BlockNumber targetBlock,
otherBlock;
bool needLock;
+ bool unlockedTargetBuffer;
+ bool recheckVmPins;
len = MAXALIGN(len); /* be conservative */
@@ -645,6 +653,9 @@ loop:
if (needLock)
UnlockRelationForExtension(relation, ExclusiveLock);
+ unlockedTargetBuffer = false;
+ targetBlock = BufferGetBlockNumber(buffer);
+
/*
* We need to initialize the empty new page. Double-check that it really
* is empty (this should never happen, but if it does we don't want to
@@ -654,75 +665,108 @@ loop:
if (!PageIsNew(page))
elog(ERROR, "page %u of relation \"%s\" should be empty but is not",
- BufferGetBlockNumber(buffer),
+ targetBlock,
RelationGetRelationName(relation));
PageInit(page, BufferGetPageSize(buffer), 0);
MarkBufferDirty(buffer);
/*
- * The page is empty, pin vmbuffer to set all_frozen bit.
+ * The page is empty, pin vmbuffer to set all_frozen bit. We don't want to
+ * do IO while the buffer is locked, so we unlock the page first if IO is
+ * needed (necessitating checks below).
*/
if (options & HEAP_INSERT_FROZEN)
{
- Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
- visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ Assert(PageGetMaxOffsetNumber(page) == 0);
+
+ if (!visibilitymap_pin_ok(targetBlock, *vmbuffer))
+ {
+ unlockedTargetBuffer = true;
+ LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+ visibilitymap_pin(relation, targetBlock, vmbuffer);
+ }
}
/*
- * Lock the other buffer. It's guaranteed to be of a lower page number
- * than the new page. To conform with the deadlock prevent rules, we ought
- * to lock otherBuffer first, but that would give other backends a chance
- * to put tuples on our page. To reduce the likelihood of that, attempt to
- * lock the other buffer conditionally, that's very likely to work.
- * Otherwise we need to lock buffers in the correct order, and retry if
- * the space has been used in the mean time.
+ * Reacquire locks if necessary.
*
- * Alternatively, we could acquire the lock on otherBuffer before
- * extending the relation, but that'd require holding the lock while
- * performing IO, which seems worse than an unlikely retry.
+ * If the target buffer was unlocked above, or is unlocked while
+ * reacquiring the lock on otherBuffer below, it's unlikely, but possible,
+ * that another backend used space on this page. We check for that below,
+ * and retry if necessary.
*/
- if (otherBuffer != InvalidBuffer)
+ recheckVmPins = false;
+ if (unlockedTargetBuffer)
{
+ /* released lock on target buffer above */
+ if (otherBuffer != InvalidBuffer)
+ LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
+ LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+ recheckVmPins = true;
+ }
+ else if (otherBuffer != InvalidBuffer)
+ {
+ /*
+ * We did not release the target buffer, and otherBuffer is valid,
+ * need to lock the other buffer. It's guaranteed to be of a lower
+ * page number than the new page. To conform with the deadlock
+ * prevent rules, we ought to lock otherBuffer first, but that would
+ * give other backends a chance to put tuples on our page. To reduce
+ * the likelihood of that, attempt to lock the other buffer
+ * conditionally, that's very likely to work.
+ *
+ * Alternatively, we could acquire the lock on otherBuffer before
+ * extending the relation, but that'd require holding the lock while
+ * performing IO, which seems worse than an unlikely retry.
+ */
Assert(otherBuffer != buffer);
- targetBlock = BufferGetBlockNumber(buffer);
Assert(targetBlock > otherBlock);
if (unlikely(!ConditionalLockBuffer(otherBuffer)))
{
+ unlockedTargetBuffer = true;
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
}
+ recheckVmPins = true;
+ }
- /*
- * Because the buffers were unlocked for a while, it's possible,
- * although unlikely, that an all-visible flag became set or that
- * somebody used up the available space in the new page. We can use
- * GetVisibilityMapPins to deal with the first case. In the second
- * case, just retry from start.
- */
- GetVisibilityMapPins(relation, otherBuffer, buffer,
- otherBlock, targetBlock, vmbuffer_other,
- vmbuffer);
+ /*
+ * If one of the buffers was unlocked (always the case if otherBuffer is
+ * valid), it's possible, although unlikely, that an all-visible flag
+ * became set. We can use GetVisibilityMapPins to deal with that. It's
+ * possible that GetVisibilityMapPins() might need to temporarily release
+ * buffer locks, in which case we'll need to check if there's still enough
+ * space on the page below.
+ */
+ if (recheckVmPins)
+ {
+ if (GetVisibilityMapPins(relation, otherBuffer, buffer,
+ otherBlock, targetBlock, vmbuffer_other,
+ vmbuffer))
+ unlockedTargetBuffer = true;
+ }
- /*
- * Note that we have to check the available space even if our
- * conditional lock succeeded, because GetVisibilityMapPins might've
- * transiently released lock on the target buffer to acquire a VM pin
- * for the otherBuffer.
- */
- if (len > PageGetHeapFreeSpace(page))
+ /*
+ * If the target buffer was temporarily unlocked since the relation
+ * extension, it's possible, although unlikely, that all the space on the
+ * page was already used. If so, we just retry from the start. If we
+ * didn't unlock, something has gone wrong if there's not enough space -
+ * the test at the top should have prevented reaching this case.
+ */
+ pageFreeSpace = PageGetHeapFreeSpace(page);
+ if (len > pageFreeSpace)
+ {
+ if (unlockedTargetBuffer)
{
- LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK);
+ if (otherBuffer != InvalidBuffer)
+ LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK);
UnlockReleaseBuffer(buffer);
goto loop;
}
- }
- else if (len > PageGetHeapFreeSpace(page))
- {
- /* We should not get here given the test at the top */
elog(PANIC, "tuple is too big: size %zu", len);
}
@@ -735,7 +779,7 @@ loop:
* current backend to make more insertions or not, which is probably a
* good bet most of the time. So for now, don't add it to FSM yet.
*/
- RelationSetTargetBlock(relation, BufferGetBlockNumber(buffer));
+ RelationSetTargetBlock(relation, targetBlock);
return buffer;
}
--
2.38.0