Hi, On 2023-04-03 14:25:59 +0200, Tomas Vondra wrote: > On 4/3/23 00:40, Andres Freund wrote: > > Hi, > > > > On 2023-03-28 19:17:21 -0700, Andres Freund wrote: > >> On 2023-03-28 18:21:02 -0700, Andres Freund wrote: > >>> Here's a draft patch. > >> > >> Attached is v2, with a stupid bug fixed and a bit of comment / pgindent > >> polish. > > > > I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead > > with this. > > > > I guess the 0001 part was already pushed, so I should be looking only at > 0002, correct?
Yes. > I think 0002 makes RelationGetBufferForTuple() harder to understand. I'm > not saying it's incorrect, but I find it hard to reason about the new > combinations of conditions :-( > I mean, it only had this condition: > > if (otherBuffer != InvalidBuffer) > { > ... > } > > but now it have > > if (unlockedTargetBuffer) > { > ... > } > else if (otherBuffer != InvalidBuffer) > { > ... > } > > if (unlockedTargetBuffer || otherBuffer != InvalidBuffer) > { > ... > } > > Not sure how to improve that :-/ but not exactly trivial to figure out > what's going to happen. It's not great, I agree. I tried to make it easier to read in this version by a) changing GetVisibilityMapPins() as I proposed b) added a new variable "recheckVmPins", that gets set in if (unlockedTargetBuffer) and if (otherBuffer != InvalidBuffer) c) reformulated comments > Maybe this > > * If we unlocked the target buffer above, it's unlikely, but possible, > * that another backend used space on this page. > > might say what we're going to do in this case. I mean - I understand > some backend may use space in unlocked page, but what does that mean for > this code? What is it going to do? (The same comment talks about the > next condition in much more detail, for example.) There's a comment about that detail further below. But you're right, it wasn't clear as-is. How about now? > Also, isn't it a bit strange the free space check now happens outside > any if condition? It used to happen in the > > if (otherBuffer != InvalidBuffer) > { > ... > } > > block, but now it happens outside. Well, the alternative is to repeat it in the two branches, which doesn't seem great either. Particularly because there'll be a third branch after the bulk extension patch. Greetings, Andres Freund
>From 49183283d6701d22cf12f6a6280ed1aba02fc063 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Mon, 3 Apr 2023 11:18:10 -0700 Subject: [PATCH v3 1/2] hio: Relax rules for calling GetVisibilityMapPins() Author: Reviewed-by: Discussion: https://postgr.es/m/ Backpatch: --- src/backend/access/heap/hio.c | 37 ++++++++++++++++++++++++----------- 1 file changed, 26 insertions(+), 11 deletions(-) diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c index 7479212d4e0..e8aea42f8a9 100644 --- a/src/backend/access/heap/hio.c +++ b/src/backend/access/heap/hio.c @@ -131,9 +131,9 @@ ReadBufferBI(Relation relation, BlockNumber targetBlock, * For each heap page which is all-visible, acquire a pin on the appropriate * visibility map page, if we haven't already got one. * - * buffer2 may be InvalidBuffer, if only one buffer is involved. buffer1 - * must not be InvalidBuffer. If both buffers are specified, block1 must - * be less than block2. + * To avoid complexity in the callers, either buffer1 or buffer2 may be + * InvalidBuffer if only one buffer is involved. For the same reason, block2 + * may be smaller than block1. */ static void GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2, @@ -143,6 +143,26 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2, bool need_to_pin_buffer1; bool need_to_pin_buffer2; + /* + * Swap buffers around to handle case of a single block/buffer, and to + * handle if lock ordering rules require to lock block2 first. + */ + if (!BufferIsValid(buffer1) || + (BufferIsValid(buffer2) && block1 > block2)) + { + Buffer tmpbuf = buffer1; + Buffer *tmpvmbuf = vmbuffer1; + BlockNumber tmpblock = block1; + + buffer1 = buffer2; + vmbuffer1 = vmbuffer2; + block1 = block2; + + buffer2 = tmpbuf; + vmbuffer2 = tmpvmbuf; + block2 = tmpblock; + } + Assert(BufferIsValid(buffer1)); Assert(buffer2 == InvalidBuffer || block1 <= block2); @@ -502,14 +522,9 @@ loop: * done a bit of extra work for no gain, but there's no real harm * done. */ - if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock) - GetVisibilityMapPins(relation, buffer, otherBuffer, - targetBlock, otherBlock, vmbuffer, - vmbuffer_other); - else - GetVisibilityMapPins(relation, otherBuffer, buffer, - otherBlock, targetBlock, vmbuffer_other, - vmbuffer); + GetVisibilityMapPins(relation, buffer, otherBuffer, + targetBlock, otherBlock, vmbuffer, + vmbuffer_other); /* * Now we can check to see if there's enough free space here. If so, -- 2.38.0
>From 27f9ace30d9862f05b73676526a1b0233ef1faea Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Mon, 3 Apr 2023 11:26:16 -0700 Subject: [PATCH v3 2/2] hio: Don't pin the VM while holding buffer lock while extending Discussion: http://postgr.es/m/20230325025740.wzvchp2kromw4...@awork3.anarazel.de --- src/backend/access/heap/hio.c | 126 +++++++++++++++++++++++----------- 1 file changed, 85 insertions(+), 41 deletions(-) diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c index e8aea42f8a9..15912f6415f 100644 --- a/src/backend/access/heap/hio.c +++ b/src/backend/access/heap/hio.c @@ -134,14 +134,17 @@ ReadBufferBI(Relation relation, BlockNumber targetBlock, * To avoid complexity in the callers, either buffer1 or buffer2 may be * InvalidBuffer if only one buffer is involved. For the same reason, block2 * may be smaller than block1. + * + * Returns whether buffer locks were temporarily released. */ -static void +static bool GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2, BlockNumber block1, BlockNumber block2, Buffer *vmbuffer1, Buffer *vmbuffer2) { bool need_to_pin_buffer1; bool need_to_pin_buffer2; + bool released_locks = false; /* * Swap buffers around to handle case of a single block/buffer, and to @@ -175,9 +178,10 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2, && PageIsAllVisible(BufferGetPage(buffer2)) && !visibilitymap_pin_ok(block2, *vmbuffer2); if (!need_to_pin_buffer1 && !need_to_pin_buffer2) - return; + break; /* We must unlock both buffers before doing any I/O. */ + released_locks = true; LockBuffer(buffer1, BUFFER_LOCK_UNLOCK); if (buffer2 != InvalidBuffer && buffer2 != buffer1) LockBuffer(buffer2, BUFFER_LOCK_UNLOCK); @@ -203,6 +207,8 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2, || (need_to_pin_buffer1 && need_to_pin_buffer2)) break; } + + return released_locks; } /* @@ -365,6 +371,8 @@ RelationGetBufferForTuple(Relation relation, Size len, BlockNumber targetBlock, otherBlock; bool needLock; + bool unlockedTargetBuffer; + bool recheckVmPins; len = MAXALIGN(len); /* be conservative */ @@ -645,6 +653,9 @@ loop: if (needLock) UnlockRelationForExtension(relation, ExclusiveLock); + unlockedTargetBuffer = false; + targetBlock = BufferGetBlockNumber(buffer); + /* * We need to initialize the empty new page. Double-check that it really * is empty (this should never happen, but if it does we don't want to @@ -654,75 +665,108 @@ loop: if (!PageIsNew(page)) elog(ERROR, "page %u of relation \"%s\" should be empty but is not", - BufferGetBlockNumber(buffer), + targetBlock, RelationGetRelationName(relation)); PageInit(page, BufferGetPageSize(buffer), 0); MarkBufferDirty(buffer); /* - * The page is empty, pin vmbuffer to set all_frozen bit. + * The page is empty, pin vmbuffer to set all_frozen bit. We don't want to + * do IO while the buffer is locked, so we unlock the page first if IO is + * needed (necessitating checks below). */ if (options & HEAP_INSERT_FROZEN) { - Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0); - visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer); + Assert(PageGetMaxOffsetNumber(page) == 0); + + if (!visibilitymap_pin_ok(targetBlock, *vmbuffer)) + { + unlockedTargetBuffer = true; + LockBuffer(buffer, BUFFER_LOCK_UNLOCK); + visibilitymap_pin(relation, targetBlock, vmbuffer); + } } /* - * Lock the other buffer. It's guaranteed to be of a lower page number - * than the new page. To conform with the deadlock prevent rules, we ought - * to lock otherBuffer first, but that would give other backends a chance - * to put tuples on our page. To reduce the likelihood of that, attempt to - * lock the other buffer conditionally, that's very likely to work. - * Otherwise we need to lock buffers in the correct order, and retry if - * the space has been used in the mean time. + * Reacquire locks if necessary. * - * Alternatively, we could acquire the lock on otherBuffer before - * extending the relation, but that'd require holding the lock while - * performing IO, which seems worse than an unlikely retry. + * If the target buffer was unlocked above, or is unlocked while + * reacquiring the lock on otherBuffer below, it's unlikely, but possible, + * that another backend used space on this page. We check for that below, + * and retry if necessary. */ - if (otherBuffer != InvalidBuffer) + recheckVmPins = false; + if (unlockedTargetBuffer) { + /* released lock on target buffer above */ + if (otherBuffer != InvalidBuffer) + LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE); + LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); + recheckVmPins = true; + } + else if (otherBuffer != InvalidBuffer) + { + /* + * We did not release the target buffer, and otherBuffer is valid, + * need to lock the other buffer. It's guaranteed to be of a lower + * page number than the new page. To conform with the deadlock + * prevent rules, we ought to lock otherBuffer first, but that would + * give other backends a chance to put tuples on our page. To reduce + * the likelihood of that, attempt to lock the other buffer + * conditionally, that's very likely to work. + * + * Alternatively, we could acquire the lock on otherBuffer before + * extending the relation, but that'd require holding the lock while + * performing IO, which seems worse than an unlikely retry. + */ Assert(otherBuffer != buffer); - targetBlock = BufferGetBlockNumber(buffer); Assert(targetBlock > otherBlock); if (unlikely(!ConditionalLockBuffer(otherBuffer))) { + unlockedTargetBuffer = true; LockBuffer(buffer, BUFFER_LOCK_UNLOCK); LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE); LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); } + recheckVmPins = true; + } - /* - * Because the buffers were unlocked for a while, it's possible, - * although unlikely, that an all-visible flag became set or that - * somebody used up the available space in the new page. We can use - * GetVisibilityMapPins to deal with the first case. In the second - * case, just retry from start. - */ - GetVisibilityMapPins(relation, otherBuffer, buffer, - otherBlock, targetBlock, vmbuffer_other, - vmbuffer); + /* + * If one of the buffers was unlocked (always the case if otherBuffer is + * valid), it's possible, although unlikely, that an all-visible flag + * became set. We can use GetVisibilityMapPins to deal with that. It's + * possible that GetVisibilityMapPins() might need to temporarily release + * buffer locks, in which case we'll need to check if there's still enough + * space on the page below. + */ + if (recheckVmPins) + { + if (GetVisibilityMapPins(relation, otherBuffer, buffer, + otherBlock, targetBlock, vmbuffer_other, + vmbuffer)) + unlockedTargetBuffer = true; + } - /* - * Note that we have to check the available space even if our - * conditional lock succeeded, because GetVisibilityMapPins might've - * transiently released lock on the target buffer to acquire a VM pin - * for the otherBuffer. - */ - if (len > PageGetHeapFreeSpace(page)) + /* + * If the target buffer was temporarily unlocked since the relation + * extension, it's possible, although unlikely, that all the space on the + * page was already used. If so, we just retry from the start. If we + * didn't unlock, something has gone wrong if there's not enough space - + * the test at the top should have prevented reaching this case. + */ + pageFreeSpace = PageGetHeapFreeSpace(page); + if (len > pageFreeSpace) + { + if (unlockedTargetBuffer) { - LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK); + if (otherBuffer != InvalidBuffer) + LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK); UnlockReleaseBuffer(buffer); goto loop; } - } - else if (len > PageGetHeapFreeSpace(page)) - { - /* We should not get here given the test at the top */ elog(PANIC, "tuple is too big: size %zu", len); } @@ -735,7 +779,7 @@ loop: * current backend to make more insertions or not, which is probably a * good bet most of the time. So for now, don't add it to FSM yet. */ - RelationSetTargetBlock(relation, BufferGetBlockNumber(buffer)); + RelationSetTargetBlock(relation, targetBlock); return buffer; } -- 2.38.0