On 12/21/21 03:49, Tomas Vondra wrote:
On 12/21/21 02:01, Tom Lane wrote:
Tomas Vondra <tomas.von...@enterprisedb.com> writes:
OK, I did a quick test with two very simple benchmarks - simple select
from a sequence, and 'pgbench -N' on scale 1. Benchmark was on current
master, patched means SEQ_LOG_VALS was set to 1.

But ... pgbench -N doesn't use sequences at all, does it?

Probably inserts into a table with a serial column would constitute a
plausible real-world case.


D'oh! For some reason I thought pgbench has a sequence on the history table, but clearly I was mistaken. There's another thinko, because after inspecting pg_waldump output I realized "SEQ_LOG_VALS 1" actually logs only every 2nd increment. So it should be "SEQ_LOG_VALS 0".

So I repeated the test fixing SEQ_LOG_VALS, and doing the pgbench with a table like this:

   create table test (a serial, b int);

and a script doing

   insert into test (b) values (1);

The results look like this:

1) select nextval('s');

      clients          1         4
     ------------------------------
      master       39533    124998
      patched       3748      9114
     ------------------------------
      diff          -91%      -93%


2) insert into test (b) values (1);

      clients          1         4
     ------------------------------
      master        3718      9188
      patched       3698      9209
     ------------------------------
      diff            0%        0%

So the nextval() results are a bit worse, due to not caching 1/2 the nextval calls. The -90% is roughly expected, due to generating about 32x more WAL (and having to wait for commit).

But results for the more realistic insert workload are about the same as before (i.e. no measurable difference). Also kinda expected, because those transactions have to wait for WAL anyway.


Attached is a patch tweaking WAL logging - in wal_level=minimal we do the same thing as now, in higher levels we log every sequence fetch.

After thinking about this a bit more, I think even the nextval workload is not such a big issue, because we can set cache for the sequences. Until now this had fairly limited impact, but it can significantly reduce the performance drop caused by WAL-logging every sequence fetch.

I've repeated the nextval test on a different machine (the one I used before is busy with something else), and the results look like this:

1) 1 client

    cache          1         32        128
    --------------------------------------
    master     13975      14425      19886
    patched      886       7900      18397
    --------------------------------------
    diff        -94%       -45%        -7%

4) 4 clients

    cache             1         32        128
    -----------------------------------------
    master         8338      12849      18248
    patched         331       8124      18983
    -----------------------------------------
    diff           -96%       -37%         4%

So I think this makes it acceptable / manageable. Of course, this means the values are much less monotonous (across backends), but I don't think we really promised that. And I doubt anyone is really using sequences like this (just nextval) in performance critical use cases.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From eeaa7cb36c69af048f0321e4883864ebe2542429 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.von...@postgresql.org>
Date: Wed, 22 Dec 2021 03:18:46 +0100
Subject: [PATCH 1/6] WAL-log individual sequence fetches

---
 src/backend/commands/sequence.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 72bfdc07a4..0f309d0a4e 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -52,6 +52,9 @@
  * We don't want to log each fetching of a value from a sequence,
  * so we pre-log a few fetches in advance. In the event of
  * crash we can lose (skip over) as many values as we pre-logged.
+ *
+ * We only pre-log fetches in wal_level=minimal. For higher levels we
+ * WAL-log every individual sequence increment, as if this was 0.
  */
 #define SEQ_LOG_VALS	32
 
@@ -666,11 +669,18 @@ nextval_internal(Oid relid, bool check_permissions)
 	 * WAL record to be written anyway, else replay starting from the
 	 * checkpoint would fail to advance the sequence past the logged values.
 	 * In this case we may as well fetch extra values.
+	 *
+	 * We only pre-log fetches in wal_level=minimal. For higher levels we
+	 * WAL-log every individual sequence increment.
 	 */
 	if (log < fetch || !seq->is_called)
 	{
 		/* forced log to satisfy local demand for values */
-		fetch = log = fetch + SEQ_LOG_VALS;
+		if (XLogIsNeeded())
+			fetch = log = fetch;
+		else
+			fetch = log = fetch + SEQ_LOG_VALS;
+
 		logit = true;
 	}
 	else
@@ -680,7 +690,11 @@ nextval_internal(Oid relid, bool check_permissions)
 		if (PageGetLSN(page) <= redoptr)
 		{
 			/* last update of seq was before checkpoint */
-			fetch = log = fetch + SEQ_LOG_VALS;
+			if (XLogIsNeeded())
+				fetch = log = fetch;
+			else
+				fetch = log = fetch + SEQ_LOG_VALS;
+
 			logit = true;
 		}
 	}
-- 
2.31.1

Reply via email to