On Wed, Mar 14, 2018 at 12:05 PM, Claudio Freire <klaussfre...@gmail.com> wrote: > On Wed, Mar 14, 2018 at 1:36 AM, Pavan Deolasee > <pavan.deola...@gmail.com> wrote: >> >> >> On Sun, Mar 11, 2018 at 9:18 PM, Claudio Freire <klaussfre...@gmail.com> >> wrote: >>> >>> On Sun, Mar 11, 2018 at 2:27 AM, Pavan Deolasee >>> >>> > >>> > Yes, I will try that next - it seems like a good idea. So the idea would >>> > be: >>> > check if the block is still the rightmost block and the insertion-key is >>> > greater than the first key in the page. If those conditions are >>> > satisfied, >>> > then we do a regular binary search within the page to find the correct >>> > location. This might add an overhead of binary search when keys are >>> > strictly >>> > ordered and a single client is inserting the data. If that becomes a >>> > concern, we might be able to look for that special case too and optimise >>> > for >>> > it too. >>> >>> Yeah, pretty much that's the idea. Beware, if the new item doesn't >>> fall in the rightmost place, you still need to check for serialization >>> conflicts. >> >> >> So I've been toying with this idea since yesterday and I am quite puzzled >> with the results. See the attached patch which compares the insertion key >> with the last key inserted by this backend, if the cached block is still the >> rightmost block in the tree. I initially only compared with the first key in >> the page, but I tried this version because of the strange performance >> regression which I still have no answers. >> >> For a small number of clients, the patched version does better. But as the >> number of clients go up, the patched version significantly underperforms >> master. I roughly counted the number of times the fastpath is taken and I >> noticed that almost 98% inserts take the fastpath. I first thought that the >> "firstkey" location in the page might be becoming a hot-spot for concurrent >> processes and hence changed that to track the per-backend last offset and >> compare against that the next time. But that did not help much. > > + _bt_compare(rel, natts, itup_scankey, page, > + RelationGetLastOffset(rel)) >= 0) > > Won't this access garbage if the last offset is stale and beyond the > current rightmost page's last offset? > > I'd suggest simply using P_FIRSTDATAKEY after checking that the page > isn't empty (see _bt_binsrch). > > On Wed, Mar 14, 2018 at 11:12 AM, Pavan Deolasee > <pavan.deola...@gmail.com> wrote: >>> > Hmm. I can try that. It's quite puzzling though that slowing down things >>> > actually make them better. >>> >>> That's not what is happening though. >>> >>> The cache path is 1) try to lock cached block, 2) when got lock check >>> relevance, if stale 3) recheck from top >>> >>> The non-cached path is just 3) recheck from top >>> >>> The overall path length is longer in the cached case but provides >>> benefit if we can skip step 3 in high % of cases. The non-cached path >>> is more flexible because it locates the correct RHS block, even if it >>> changes dynamically between starting the recheck from top. >>> >> >> So as I noted in one of the previous emails, the revised patch still takes >> fast path in 98% cases. So it's not clear why the taking steps 1, 2 and 3 in >> just 2% cases should cause such dramatic slowdown. > > Real-world workloads will probably take the slow path more often, so > it's probably worth keeping the failure path as contention-free as > possible. > > Besides, even though it may be just 2% the times it lands there, it > could still block for a considerable amount of time for no benefit. > > So I guess a conditional lock is not a bad idea.
Re all of the above, I did some tests. I couldn't reproduce the regression you showed so clearly, in fact at all, but I was able to get consistently lock-bound with 8 clients by setting fsync=off. Something noteworthy, is that both master and patched are lock-bound. It just must be that when that happens, the extra check for the rightmost page really hurts. So, I tried the conditional locks, and indeed it (at least in my meager hardware) turns the lock-bound test into an I/O bound one. I applied the attached patches on top of your patch, so it would be nice to see if you can give it a try in your test hardware to see whether it helps.
From 95d558c50b95729d109aa0135795c1887812ee1d Mon Sep 17 00:00:00 2001 From: Claudio Freire <klaussfre...@gmail.com> Date: Wed, 14 Mar 2018 19:22:29 -0300 Subject: [PATCH 1/2] Ignore last offset --- src/backend/access/nbtree/nbtinsert.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c index 2525667..ab239c4 100644 --- a/src/backend/access/nbtree/nbtinsert.c +++ b/src/backend/access/nbtree/nbtinsert.c @@ -159,8 +159,9 @@ top: !P_INCOMPLETE_SPLIT(lpageop) && !P_IGNORE(lpageop) && (PageGetFreeSpace(page) > itemsz) && + PageGetMaxOffsetNumber(page) >= P_FIRSTDATAKEY(lpageop) && _bt_compare(rel, natts, itup_scankey, page, - RelationGetLastOffset(rel)) >= 0) + P_FIRSTDATAKEY(lpageop)) > 0) { offset = InvalidOffsetNumber; fastpath = true; -- 1.8.4.5
From 4ce6abc456414345c3d04101776c50504d345768 Mon Sep 17 00:00:00 2001 From: Claudio Freire <klaussfre...@gmail.com> Date: Wed, 14 Mar 2018 22:15:31 -0300 Subject: [PATCH 2/2] Conditional locks --- src/backend/access/nbtree/nbtinsert.c | 68 +++++++++++++++++++++-------------- 1 file changed, 42 insertions(+), 26 deletions(-) diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c index ab239c4..f9afb4d 100644 --- a/src/backend/access/nbtree/nbtinsert.c +++ b/src/backend/access/nbtree/nbtinsert.c @@ -142,38 +142,54 @@ top: * ensures that the index state cannot change, as far as the rightmost * part of the index is concerned. */ - buf = _bt_getbuf(rel, RelationGetTargetBlock(rel), BT_WRITE); - page = BufferGetPage(buf); + /* buf = _bt_getbuf(rel, RelationGetTargetBlock(rel), BT_WRITE); */ + buf = ReadBuffer(rel, RelationGetTargetBlock(rel)); + + if (ConditionalLockBuffer(buf)) + { + _bt_checkpage(rel, buf); - lpageop = (BTPageOpaque) PageGetSpecialPointer(page); - itemsz = IndexTupleSize(itup); - itemsz = MAXALIGN(itemsz); /* be safe, PageAddItem will do this but we + page = BufferGetPage(buf); + + lpageop = (BTPageOpaque) PageGetSpecialPointer(page); + itemsz = IndexTupleSize(itup); + itemsz = MAXALIGN(itemsz); /* be safe, PageAddItem will do this but we * need to be consistent */ - /* - * Check if the page is still the rightmost leaf page, has enough free - * space to accommodate the new tuple, no split is in progress and the - * scankey is greater than or equal to the first key on the page. - */ - if (P_ISLEAF(lpageop) && P_RIGHTMOST(lpageop) && - !P_INCOMPLETE_SPLIT(lpageop) && - !P_IGNORE(lpageop) && - (PageGetFreeSpace(page) > itemsz) && - PageGetMaxOffsetNumber(page) >= P_FIRSTDATAKEY(lpageop) && - _bt_compare(rel, natts, itup_scankey, page, - P_FIRSTDATAKEY(lpageop)) > 0) - { - offset = InvalidOffsetNumber; - fastpath = true; + /* + * Check if the page is still the rightmost leaf page, has enough free + * space to accommodate the new tuple, no split is in progress and the + * scankey is greater than or equal to the first key on the page. + */ + if (P_ISLEAF(lpageop) && P_RIGHTMOST(lpageop) && + !P_INCOMPLETE_SPLIT(lpageop) && + !P_IGNORE(lpageop) && + (PageGetFreeSpace(page) > itemsz) && + PageGetMaxOffsetNumber(page) >= P_FIRSTDATAKEY(lpageop) && + _bt_compare(rel, natts, itup_scankey, page, + P_FIRSTDATAKEY(lpageop)) > 0) + { + offset = InvalidOffsetNumber; + fastpath = true; + } + else + { + _bt_relbuf(rel, buf); + + /* + * Something did not workout. Just forget about the cached block + * and follow the normal path. It might be set again if the + * conditions are favourble. + */ + RelationSetTargetBlock(rel, InvalidBlockNumber); + } } - else - { - _bt_relbuf(rel, buf); + else { + ReleaseBuffer(buf); /* - * Something did not workout. Just forget about the cached block - * and follow the normal path. It might be set again if the - * conditions are favourble. + * If someone's holding a lock, it's likely to change anyway, + * so don't try again until we get an updated rightmost leaf. */ RelationSetTargetBlock(rel, InvalidBlockNumber); } -- 1.8.4.5