Hi,
On 25/11/2025 17:16, Fujii Masao wrote:
Thanks for writing the test case and turning it into a patch. I agree that
we should add a regression test to ensure the reported issue doesn't recur.
Thanks for your feedback, updated patch is attached. Again, I checked
that it fails in master, but passes with your patch.
It looks like the v1 patch you attached accidentally includes
the patch file itself. Could you remove that?
Whoops, not sure what happened there, fixed.
+ '--fsync-interval', '1',
+ '--status-interval', '1',
Wouldn't it be safer to use a larger value (e.g., 100) for --status-interval?
With a very small interval, the status feedback might happen before
the walsender is terminated and pg_recvlogical reconnects, which could
prevent the duplicate data from appearing even without the patch.
Yes indeed good one. I actually had it set to 60 in the previous version
I sent earlier.
+use IPC::Run qw(start);
<snip>
+my $recv = start [
For simplicity, would it be better to avoid "use IPC::Run qw(start)" and
just call "IPC::Run::start" directly?
Indeed, done.
+# Wait only for initial connection
+$node->poll_query_until('postgres',
+ "SELECT active_pid IS NOT NULL FROM pg_replication_slots WHERE
slot_name = 'reconnect_test'");
This is unlikely, but pg_recvlogical's connection could be terminated
immediately after connecting, before receiving any data. If that happens,
the test might behave unexpectedly. To make the test more robust,
should we instead poll on:
SELECT pg_read_file('$outfile') ~ 'INSERT'
instead, to ensure that some data has actually been received before
terminating the backend?
+# Wait only for initial connection
+$node->poll_query_until('postgres',
+ "SELECT active_pid IS NOT NULL FROM pg_replication_slots WHERE
slot_name = 'reconnect_test'");
This is unlikely, but pg_recvlogical's connection could be terminated
immediately after connecting, before receiving any data. If that happens,
the test might behave unexpectedly. To make the test more robust,
should we instead poll on:
SELECT pg_read_file('$outfile') ~ 'INSERT'
instead, to ensure that some data has actually been received before
terminating the backend?
Secondly I noticed in function sendFeedback at line 166, the startpos is
set to output_written_lsn. This seems to counter conceptually the change
you made in the patch, however it seems to not affect correctness. Shall
we remove this line as to avoid confusion?
Isn't this necessary when - is specified for --file, causing
OutputFsync() to be skipped? Regards,
Yes, for sure. Would really like to avoid introducing flake in CI due to
this test.
Isn't this necessary when - is specified for --file, causing
OutputFsync() to be skipped?
Upon another look, indeed. When writing to a regular file (--file -)
that assignment is redundant but harmless. But like you said, when
writing to stdout, without that line, startpos would never be updated.
Additionally, when the --no-loop option is used, I found that
pg_recvlogical
could previously exit without flushing written data, risking data loss.
The attached patch fixes this issue by also ensuring that all data is
flushed
to disk before exiting with --no-loop.
Should we think of some kind of test also for this part?
--
Thanks,
Mircea Cadariu
From 34842f494304f323622a5c12d46300634c0e839b Mon Sep 17 00:00:00 2001
From: Mircea Cadariu <[email protected]>
Date: Wed, 26 Nov 2025 12:44:32 +0000
Subject: [PATCH v1 2/2] add test for pg_recvlogical reconnection
---
src/bin/pg_basebackup/t/030_pg_recvlogical.pl | 63 +++++++++++++++++++
1 file changed, 63 insertions(+)
diff --git a/src/bin/pg_basebackup/t/030_pg_recvlogical.pl
b/src/bin/pg_basebackup/t/030_pg_recvlogical.pl
index 1b7a6f6f43..255c361aa6 100644
--- a/src/bin/pg_basebackup/t/030_pg_recvlogical.pl
+++ b/src/bin/pg_basebackup/t/030_pg_recvlogical.pl
@@ -151,4 +151,67 @@ my $result = $node->safe_psql('postgres',
);
is($result, 't', "failover is enabled for the new slot");
+my $outfile = $node->basedir . '/reconnect.out';
+
+$node->command_ok(
+ [
+ 'pg_recvlogical',
+ '--slot' => 'reconnect_test',
+ '--dbname' => $node->connstr('postgres'),
+ '--create-slot',
+ ],
+ 'slot created for reconnection test');
+
+$node->safe_psql('postgres', 'CREATE TABLE t(x int);');
+$node->safe_psql('postgres', 'INSERT INTO t VALUES (1);');
+
+my $recv = IPC::Run::start [
+ 'pg_recvlogical',
+ '--slot', 'reconnect_test',
+ '--dbname', $node->connstr('postgres'),
+ '--start',
+ '--file', $outfile,
+ '--fsync-interval', '1',
+ '--status-interval', '100',
+ '--verbose'
+], '>', \my $out, '2>', \my $err;
+
+# Wait for file to contain first inserts
+$node->poll_query_until('postgres',
+ "SELECT (SELECT pg_read_file('$outfile') ~ 'INSERT') AS first_insert");
+
+# Terminate the backend
+my $backend_pid = $node->safe_psql('postgres',
+ "SELECT active_pid FROM pg_replication_slots WHERE slot_name =
'reconnect_test'");
+$node->safe_psql('postgres', "SELECT pg_terminate_backend($backend_pid)");
+
+# Wait for reconnection
+$node->poll_query_until('postgres',
+ "SELECT active_pid IS NOT NULL AND active_pid != $backend_pid FROM
pg_replication_slots WHERE slot_name = 'reconnect_test'");
+
+# Insert after reconnection
+$node->safe_psql('postgres', 'INSERT INTO t VALUES (2);');
+
+# Wait for file to contain both inserts
+$node->poll_query_until('postgres',
+ "SELECT (SELECT pg_read_file('$outfile') ~ 'INSERT.*INSERT') AS has_both");
+
+$recv->signal('TERM');
+$recv->finish();
+
+open(my $file, '<', $outfile);
+my $count = () = do { local $/; <$file> } =~ /INSERT/g;
+close($file);
+
+cmp_ok($count, '==', 2, 'two INSERTs');
+
+$node->command_ok(
+ [
+ 'pg_recvlogical',
+ '--slot' => 'reconnect_test',
+ '--dbname' => $node->connstr('postgres'),
+ '--drop-slot'
+ ],
+ 'reconnect_test slot dropped');
+
done_testing();
--
2.39.5 (Apple Git-154)