On Wed, Mar 14, 2018 at 12:05 PM, Claudio Freire <klaussfre...@gmail.com> wrote:
> On Wed, Mar 14, 2018 at 1:36 AM, Pavan Deolasee
> <pavan.deola...@gmail.com> wrote:
>>
>>
>> On Sun, Mar 11, 2018 at 9:18 PM, Claudio Freire <klaussfre...@gmail.com>
>> wrote:
>>>
>>> On Sun, Mar 11, 2018 at 2:27 AM, Pavan Deolasee
>>>
>>> >
>>> > Yes, I will try that next - it seems like a good idea. So the idea would
>>> > be:
>>> > check if the block is still the rightmost block and the insertion-key is
>>> > greater than the first key in the page. If those conditions are
>>> > satisfied,
>>> > then we do a regular binary search within the page to find the correct
>>> > location. This might add an overhead of binary search when keys are
>>> > strictly
>>> > ordered and a single client is inserting the data. If that becomes a
>>> > concern, we might be able to look for that special case too and optimise
>>> > for
>>> > it too.
>>>
>>> Yeah, pretty much that's the idea. Beware, if the new item doesn't
>>> fall in the rightmost place, you still need to check for serialization
>>> conflicts.
>>
>>
>> So I've been toying with this idea since yesterday and I am quite puzzled
>> with the results. See the attached patch which compares the insertion key
>> with the last key inserted by this backend, if the cached block is still the
>> rightmost block in the tree. I initially only compared with the first key in
>> the page, but I tried this version because of the strange performance
>> regression which I still have no answers.
>>
>> For a small number of clients, the patched version does better. But as the
>> number of clients go up, the patched version significantly underperforms
>> master. I roughly counted the number of times the fastpath is taken and I
>> noticed that almost 98% inserts take the fastpath. I first thought that the
>> "firstkey" location in the page might be becoming a hot-spot for concurrent
>> processes and hence changed that to track the per-backend last offset and
>> compare against that the next time. But that did not help much.
>
> +            _bt_compare(rel, natts, itup_scankey, page,
> +                        RelationGetLastOffset(rel)) >= 0)
>
> Won't this access garbage if the last offset is stale and beyond the
> current rightmost page's last offset?
>
> I'd suggest simply using P_FIRSTDATAKEY after checking that the page
> isn't empty (see _bt_binsrch).
>
> On Wed, Mar 14, 2018 at 11:12 AM, Pavan Deolasee
> <pavan.deola...@gmail.com> wrote:
>>> > Hmm. I can try that. It's quite puzzling though that slowing down things
>>> > actually make them better.
>>>
>>> That's not what is happening though.
>>>
>>> The cache path is 1) try to lock cached block, 2) when got lock check
>>> relevance, if stale 3) recheck from top
>>>
>>> The non-cached path is just 3) recheck from top
>>>
>>> The overall path length is longer in the cached case but provides
>>> benefit if we can skip step 3 in high % of cases. The non-cached path
>>> is more flexible because it locates the correct RHS block, even if it
>>> changes dynamically between starting the recheck from top.
>>>
>>
>> So as I noted in one of the previous emails, the revised patch still takes
>> fast path in 98% cases. So it's not clear why the taking steps 1, 2 and 3 in
>> just 2% cases should cause such dramatic slowdown.
>
> Real-world workloads will probably take the slow path more often, so
> it's probably worth keeping the failure path as contention-free as
> possible.
>
> Besides, even though it may be just 2% the times it lands there, it
> could still block for a considerable amount of time for no benefit.
>
> So I guess a conditional lock is not a bad idea.

Re all of the above, I did some tests.

I couldn't reproduce the regression you showed so clearly, in fact at
all, but I was able to get consistently lock-bound with 8 clients by
setting fsync=off.

Something noteworthy, is that both master and patched are lock-bound.
It just must be that when that happens, the extra check for the
rightmost page really hurts.

So, I tried the conditional locks, and indeed it (at least in my
meager hardware) turns the lock-bound test into an I/O bound one.

I applied the attached patches on top of your patch, so it would be
nice to see if you can give it a try in your test hardware to see
whether it helps.
From 95d558c50b95729d109aa0135795c1887812ee1d Mon Sep 17 00:00:00 2001
From: Claudio Freire <klaussfre...@gmail.com>
Date: Wed, 14 Mar 2018 19:22:29 -0300
Subject: [PATCH 1/2] Ignore last offset

---
 src/backend/access/nbtree/nbtinsert.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index 2525667..ab239c4 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -159,8 +159,9 @@ top:
 			!P_INCOMPLETE_SPLIT(lpageop) &&
 			!P_IGNORE(lpageop) &&
 			(PageGetFreeSpace(page) > itemsz) &&
+			PageGetMaxOffsetNumber(page) >= P_FIRSTDATAKEY(lpageop) &&
 			_bt_compare(rel, natts, itup_scankey, page,
-						RelationGetLastOffset(rel)) >= 0)
+						P_FIRSTDATAKEY(lpageop)) > 0)
 		{
 			offset = InvalidOffsetNumber;
 			fastpath = true;
-- 
1.8.4.5

From 4ce6abc456414345c3d04101776c50504d345768 Mon Sep 17 00:00:00 2001
From: Claudio Freire <klaussfre...@gmail.com>
Date: Wed, 14 Mar 2018 22:15:31 -0300
Subject: [PATCH 2/2] Conditional locks

---
 src/backend/access/nbtree/nbtinsert.c | 68 +++++++++++++++++++++--------------
 1 file changed, 42 insertions(+), 26 deletions(-)

diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index ab239c4..f9afb4d 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -142,38 +142,54 @@ top:
 		 * ensures that the index state cannot change, as far as the rightmost
 		 * part of the index is concerned.
 		 */
-		buf = _bt_getbuf(rel, RelationGetTargetBlock(rel), BT_WRITE);
-		page = BufferGetPage(buf);
+		/* buf = _bt_getbuf(rel, RelationGetTargetBlock(rel), BT_WRITE); */
+		buf = ReadBuffer(rel, RelationGetTargetBlock(rel));
+
+		if (ConditionalLockBuffer(buf))
+		{
+			_bt_checkpage(rel, buf);
 
-		lpageop = (BTPageOpaque) PageGetSpecialPointer(page);
-		itemsz = IndexTupleSize(itup);
-		itemsz = MAXALIGN(itemsz);	/* be safe, PageAddItem will do this but we
+			page = BufferGetPage(buf);
+
+			lpageop = (BTPageOpaque) PageGetSpecialPointer(page);
+			itemsz = IndexTupleSize(itup);
+			itemsz = MAXALIGN(itemsz);	/* be safe, PageAddItem will do this but we
 									 * need to be consistent */
 
-		/*
-		 * Check if the page is still the rightmost leaf page, has enough free
-		 * space to accommodate the new tuple, no split is in progress and the
-		 * scankey is greater than or equal to the first key on the page.
-		 */
-		if (P_ISLEAF(lpageop) && P_RIGHTMOST(lpageop) &&
-			!P_INCOMPLETE_SPLIT(lpageop) &&
-			!P_IGNORE(lpageop) &&
-			(PageGetFreeSpace(page) > itemsz) &&
-			PageGetMaxOffsetNumber(page) >= P_FIRSTDATAKEY(lpageop) &&
-			_bt_compare(rel, natts, itup_scankey, page,
-						P_FIRSTDATAKEY(lpageop)) > 0)
-		{
-			offset = InvalidOffsetNumber;
-			fastpath = true;
+			/*
+			 * Check if the page is still the rightmost leaf page, has enough free
+			 * space to accommodate the new tuple, no split is in progress and the
+			 * scankey is greater than or equal to the first key on the page.
+			 */
+			if (P_ISLEAF(lpageop) && P_RIGHTMOST(lpageop) &&
+				!P_INCOMPLETE_SPLIT(lpageop) &&
+				!P_IGNORE(lpageop) &&
+				(PageGetFreeSpace(page) > itemsz) &&
+				PageGetMaxOffsetNumber(page) >= P_FIRSTDATAKEY(lpageop) &&
+				_bt_compare(rel, natts, itup_scankey, page,
+							P_FIRSTDATAKEY(lpageop)) > 0)
+			{
+				offset = InvalidOffsetNumber;
+				fastpath = true;
+			}
+			else
+			{
+				_bt_relbuf(rel, buf);
+
+				/*
+				 * Something did not workout. Just forget about the cached block
+				 * and follow the normal path. It might be set again if the
+				 * conditions are favourble.
+				 */
+				RelationSetTargetBlock(rel, InvalidBlockNumber);
+			}
 		}
-		else
-		{
-			_bt_relbuf(rel, buf);
+		else {
+			ReleaseBuffer(buf);
 
 			/*
-			 * Something did not workout. Just forget about the cached block
-			 * and follow the normal path. It might be set again if the
-			 * conditions are favourble.
+			 * If someone's holding a lock, it's likely to change anyway,
+			 * so don't try again until we get an updated rightmost leaf.
 			 */
 			RelationSetTargetBlock(rel, InvalidBlockNumber);
 		}
-- 
1.8.4.5

Reply via email to