Hi,

On 2023-08-08 12:45:05 +0900, Masahiko Sawada wrote:
> > I think there could be a quite simple fix: Track by how much we've extended
> > the relation previously in the same bistate. If we already extended by many
> > blocks, it's very likey that we'll do so further.
> >
> > A simple prototype patch attached. The results for me are promising. I 
> > copied
> > a smaller file [1], to have more accurate throughput results at shorter runs
> > (15s).
> 
> Thank you for the patch!

Attached is a mildly updated version of that patch. No functional changes,
just polished comments and added a commit message.


> > Any chance you could your benchmark? I don't see as much of a regression vs 
> > 16
> > as you...
> 
> Sure. The results are promising for me too:
>
>  nclients = 1, execution time = 13.743
>  nclients = 2, execution time = 7.552
>  nclients = 4, execution time = 4.758
>  nclients = 8, execution time = 3.035
>  nclients = 16, execution time = 2.172
>  nclients = 32, execution time = 1.959
> nclients = 64, execution time = 1.819
> nclients = 128, execution time = 1.583
> nclients = 256, execution time = 1.631

Nice. We are consistently better than both 15 and "post integer parsing 16".


I'm really a bit baffled at myself for not using this approach from the get
go.

This also would make it much more beneficial to use a BulkInsertState in
nodeModifyTable.c, even without converting to table_multi_insert().


I was tempted to optimize RelationAddBlocks() a bit, by not calling
RelationExtensionLockWaiterCount() if we are already extending by
MAX_BUFFERS_TO_EXTEND_BY. Before this patch, it was pretty much impossible to
reach that case, because of the MAX_BUFFERED_* limits in copyfrom.c, but now
it's more common. But that probably ought to be done only HEAD, not 16.

A related oddity: RelationExtensionLockWaiterCount()->LockWaiterCount() uses
an exclusive lock on the lock partition - which seems not at all necessary?


Unless somebody sees a reason not to, I'm planning to push this?

Greetings,

Andres Freund
>From 26e7faad65273beefeb7b40a169c1b631e204501 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Sat, 12 Aug 2023 12:31:43 -0700
Subject: [PATCH v2] hio: Take number of prior relation extensions into account

The new relation extension logic, introduced in 00d1e02be24, could lead to
slowdowns in some scenarios. E.g., when loading narrow rows into a table using
COPY, the caller of RelationGetBufferForTuple() will only request a small
number of pages. Without concurrency, we just extended using pwritev() in that
case. However, if there is *some* concurrency, we switched between extending
by a small number of pages and a larger number of pages, depending on the
number of waiters for the relation extension logic.  However, some
filesystems, XFS in particular, do not perform well when switching between
extending files using fallocate() and pwritev().

To avoid that issue, remember the number of prior relation extensions in
BulkInsertState and extend more aggressively if there were prior relation
extensions. That not just avoids the aforementioned slowdown, but also leads
to noticeable performance gains in other situations, primarily due to
extending more aggressively when there is no concurrency. I should have done
it this way from the get go.

Reported-by: Masahiko Sawada <sawada.m...@gmail.com>
Author: Andres Freund <and...@anarazel.de>
Discussion: https://postgr.es/m/CAD21AoDvDmUQeJtZrau1ovnT_smN940=kp6mszngk3bq9yr...@mail.gmail.com
Backpatch: 16-, where the new relation extension code was added
---
 src/include/access/hio.h         | 13 ++++++++++---
 src/backend/access/heap/heapam.c |  1 +
 src/backend/access/heap/hio.c    | 19 +++++++++++++++++++
 3 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/src/include/access/hio.h b/src/include/access/hio.h
index 228433ee4a2..9bc563b7628 100644
--- a/src/include/access/hio.h
+++ b/src/include/access/hio.h
@@ -32,15 +32,22 @@ typedef struct BulkInsertStateData
 	Buffer		current_buf;	/* current insertion target page */
 
 	/*
-	 * State for bulk extensions. Further pages that were unused at the time
-	 * of the extension. They might be in use by the time we use them though,
-	 * so rechecks are needed.
+	 * State for bulk extensions.
+	 *
+	 * last_free..next_free are further pages that were unused at the time of
+	 * the last extension. They might be in use by the time we use them
+	 * though, so rechecks are needed.
 	 *
 	 * XXX: Eventually these should probably live in RelationData instead,
 	 * alongside targetblock.
+	 *
+	 * already_extended_by is the number of pages that this bulk inserted
+	 * extended by. If we already extended by a significant number of pages,
+	 * we can be more aggressive about extending going forward.
 	 */
 	BlockNumber next_free;
 	BlockNumber last_free;
+	uint32		already_extended_by;
 } BulkInsertStateData;
 
 
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7ed72abe597..6a66214a580 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1776,6 +1776,7 @@ GetBulkInsertState(void)
 	bistate->current_buf = InvalidBuffer;
 	bistate->next_free = InvalidBlockNumber;
 	bistate->last_free = InvalidBlockNumber;
+	bistate->already_extended_by = 0;
 	return bistate;
 }
 
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index c275b08494d..21f808fecb5 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -283,6 +283,24 @@ RelationAddBlocks(Relation relation, BulkInsertState bistate,
 		 */
 		extend_by_pages += extend_by_pages * waitcount;
 
+		/* ---
+		 * If we previously extended using the same bistate, it's very likely
+		 * we'll extend some more. Try to extend by as many pages as
+		 * before. This can be important for performance for several reasons,
+		 * including:
+		 *
+		 * - It prevents mdzeroextend() switching between extending the
+		 *   relation in different ways, which is inefficient for some
+		 *   filesystems.
+		 *
+		 * - Contention is often intermittent. Even if we currently don't see
+		 *   other waiters (see above), extending by larger amounts can
+		 *   prevent future contention.
+		 * ---
+		 */
+		if (bistate)
+			extend_by_pages = Max(extend_by_pages, bistate->already_extended_by);
+
 		/*
 		 * Can't extend by more than MAX_BUFFERS_TO_EXTEND_BY, we need to pin
 		 * them all concurrently.
@@ -409,6 +427,7 @@ RelationAddBlocks(Relation relation, BulkInsertState bistate,
 		/* maintain bistate->current_buf */
 		IncrBufferRefCount(buffer);
 		bistate->current_buf = buffer;
+		bistate->already_extended_by += extend_by_pages;
 	}
 
 	return buffer;
-- 
2.38.0

Reply via email to