Hi, Heikki! On Mon, Oct 28, 2024 at 9:42 PM Alexander Korotkov <aekorot...@gmail.com> wrote: > > On Mon, Oct 28, 2024 at 11:36 AM Heikki Linnakangas <hlinn...@iki.fi> wrote: > > > > On 25/10/2024 14:56, Alexander Korotkov wrote: > > > I see that pg_wal_replay_wait_status() might look weird, but it seems > > > to me like the best of feasible solutions. > > > > I haven't written many procedures, but our docs say: > > > > > Procedures do not return a function value; hence CREATE PROCEDURE > > lacks a RETURNS clause. However, procedures can instead return data to > > their callers via output parameters. > > > > Did you consider using an output parameter? > > Yes I did consider them and found two issues. > 1) You still need to pass something to them. And that couldn't be > default values. That's a bit awkward. > 2) Usage of them causes extra snapshot to be held. > I'll recheck if it's possible to workaround any of these two.
I've rechecked the output parameters for stored procedures. And I think the behavior I previously discovered is an anomaly. CREATE PROCEDURE test_proc(a integer, out b integer) LANGUAGE plpgsql AS $$ BEGIN b := a; END; $$; # call test_proc(1); ERROR: procedure test_proc(integer) does not exist LINE 1: call test_proc(1); ^ HINT: No procedure matches the given name and argument types. You might need to add explicit type casts. # call test_proc(1,2); b --- 1 (1 row) Looks weird that we have to pass in some (ignored?) values for output parameters. In contrast, functions don't require this. CREATE FUNCTION test_func(a integer, out b integer) LANGUAGE plpgsql AS $$ BEGIN b := a; END; $$; # select test_func(1); test_func ----------- 1 (1 row) This makes me think we have an issue with stored procedures here. I'll try to investigate it further. Also when we have output parameters, PortalRun() have portal->strategy == PORTAL_UTIL_SELECT (it's PORTAL_MULTI_QUERY without them). This eventually makes PortalRunUtility() to do RegisterSnapshot(). This is not clear whether we can release this snapshot without a stored procedure without the consequences. > > > Given that > > > pg_wal_replay_wait() procedure can't work concurrently to a query > > > involving pg_wal_replay_wait_status() function, I think > > > pg_wal_replay_wait_status() should be stable and parallel safe. > > > > If you call pg_wal_replay_wait() in the backend process, and > > pg_wal_replay_wait_status() in a parallel worker process, it won't > > return the result of the wait. Probably not what you'd expect. So I'd > > argue that it should be parallel unsafe. > > Oh, sorry. You're absolutely correct. That should be parallel unsafe. > > > > This is the brief answer. I will be able to come back with more > > > details on Monday. > > > > Thanks. A few more minor issues I spotted while playing with this: > > > > - If you pass a very high value as the timeout, e.g. INT_MAX-1, it wraps > > around and doesn't wait at all > > - You can pass NULLs as arguments. That should probably not be allowed, > > or we need to document what it means. > > > > This is disappointing: > > > > > postgres=# set default_transaction_isolation ='repeatable read'; > > > SET > > > postgres=# call pg_wal_replay_wait('0/55DA24F'); > > > ERROR: pg_wal_replay_wait() must be only called without an active or registered snapshot > > > DETAIL: Make sure pg_wal_replay_wait() isn't called within a transaction with an isolation level higher than READ COMMITTED, another procedure, or a function. > > > > Is there any way we could make that work? Otherwise, the feature just > > basically doesn't work if you use repeatable read. > > Thank you for catching this. The last one is really disappointing. > I'm exploring on what could be done there. If we set repeatable read as the default transaction isolation level, then the catalog snapshot used to find pg_wal_replay_wait() procedure got saved as the transaction snapshot. As I get the correct way to release that snapshot is to finish current transaction and start another one. Implemented in the attached patch. ------ Regards, Alexander Korotkov Supabase
From 96a5df8ed096a117276c89da1968139e1d20c997 Mon Sep 17 00:00:00 2001 From: Alexander Korotkov <akorotkov@postgresql.org> Date: Sun, 3 Nov 2024 22:47:05 +0200 Subject: [PATCH v1] Teach pg_wal_replay_wait() to handle REPEATABLE READ default level Reported-by: Bug: Discussion: Author: Reviewed-by: Tested-by: Backpatch-through: --- src/backend/access/transam/xlogfuncs.c | 24 ++++++++++++++++++++-- src/test/recovery/t/043_wal_replay_wait.pl | 5 ++++- 2 files changed, 26 insertions(+), 3 deletions(-) diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index bca1d395683..8e90e6ae444 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -34,6 +34,7 @@ #include "utils/builtins.h" #include "utils/memutils.h" #include "utils/pg_lsn.h" +#include "utils/portal.h" #include "utils/snapmgr.h" #include "utils/timestamp.h" @@ -763,6 +764,8 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS) XLogRecPtr target_lsn = PG_GETARG_LSN(0); int64 timeout = PG_GETARG_INT64(1); bool no_error = PG_GETARG_BOOL(2); + CallContext *context = (CallContext *) fcinfo->context; + bool start_tranaction = false; if (timeout < 0) ereport(ERROR, @@ -776,12 +779,26 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS) * implying a kind of self-deadlock. This is the reason why * pg_wal_replay_wait() is a procedure, not a function. * - * At first, we should check there is no active snapshot. According to + * If we're in REPEATABLE READ isolation level or higher, the catalog + * snapshot used to find this procedure will be saved as the transaction + * snapshot. Thus, we need to finish current transaction in order to + * release this snapshot. We can do this in non-atomic context. + * Additionally check we're in the first command of the transaction. + * + * Also, we should check there is no active snapshot. According to * PlannedStmtRequiresSnapshot(), even in an atomic context, CallStmt is * processed with a snapshot. Thankfully, we can pop this snapshot, * because PortalRunUtility() can tolerate this. */ - if (ActiveSnapshotSet()) + if (IsolationUsesXactSnapshot() && + GetCurrentCommandId(false) == 0 && + !context->atomic) + { + ForgetPortalSnapshots(); + CommitTransactionCommand(); + start_tranaction = true; + } + else if (ActiveSnapshotSet()) PopActiveSnapshot(); /* @@ -805,6 +822,9 @@ pg_wal_replay_wait(PG_FUNCTION_ARGS) lastWaitLSNResult = WaitForLSNReplay(target_lsn, timeout); + if (start_tranaction) + StartTransactionCommand(); + if (no_error) PG_RETURN_VOID(); diff --git a/src/test/recovery/t/043_wal_replay_wait.pl b/src/test/recovery/t/043_wal_replay_wait.pl index 5857b943711..388741ee0aa 100644 --- a/src/test/recovery/t/043_wal_replay_wait.pl +++ b/src/test/recovery/t/043_wal_replay_wait.pl @@ -47,13 +47,16 @@ my $output = $node_standby->safe_psql( ok($output >= 0, "standby reached the same LSN as primary after pg_wal_replay_wait()"); -# 2. Check that new data is visible after calling pg_wal_replay_wait() +# 2. Check that new data is visible after calling pg_wal_replay_wait(). +# Do this check with REPEATABLE READ as the default transaction isolation +# mode. $node_primary->safe_psql('postgres', "INSERT INTO wait_test VALUES (generate_series(21, 30))"); my $lsn2 = $node_primary->safe_psql('postgres', "SELECT pg_current_wal_insert_lsn()"); $output = $node_standby->safe_psql( 'postgres', qq[ + SET default_transaction_isolation = 'repeatable read'; CALL pg_wal_replay_wait('${lsn2}'); SELECT count(*) FROM wait_test; ]); -- 2.39.5 (Apple Git-154)