On Sun, Jan 11, 2026 at 6:49 PM Dewei Dai <[email protected]> wrote: > > Hi Fujii, > > At 2026-01-11 17:21:19, "Fujii Masao" <[email protected]> wrote: > >That's possible. But TBH I'm not sure how much effort is justified here. > >The test uses pg_recvlogical to activate the slot and doesn't really test > >pg_recvlogical itself. It's unclear how valuable it is to additionally run > >this test on Windows... > > > I applied the V4 patch and tested it on a CentOS 7 x86_64 platform. The test > steps are as follows: > > 1. Create a table: > `create table test_id(id integer);` > 2. Create a function to close the connection: > `create or replace function test_f(id integer) returns integer as $$ > declare > var1 integer; > begin > SELECT active_pid into var1 FROM pg_replication_slots WHERE slot_name = > 'reconnect_test'; > perform pg_terminate_backend(var1); > return 1; > end; $$ language plpgsql;` > > 3. Execute the command to receive logs: > `./pg_recvlogical --create-slot --slot reconnect_test --dbname postgres > --start --file decoding.out --fsync-interval 200 --status-interval 100 > --verbose` > 4. Execute the following shell script: > `while true > do > ./psql -d postgres<<EOF > select test_f(1); > \q > EOF > done` > > 5. Execute data insertion using psql: > `insert into test_id values(1); > insert into test_id values(2);` > 6. `tail -f decoding.out` > I found duplicate insert statements in the file. > I don't know if this is a problem. > Additionally, I tried moving the two lines involving `Stream LogicalLog` > outside the loop > in the `main` function, and then it worked correctly. > `output_written_lsn = InvalidXLogRecPtr;` > `output_fsync_lsn = InvalidXLogRecPtr;`
Thanks for the test and the investigation! I was able to reproduce the issue as well. It occurs when the pg_recvlogical connection is terminated before it has received any messages. The problematic sequence is roughly: 1. The pg_recvlogical connection is terminated after running for some time. 2. StreamLogicalLog() is called again and initializes output_written_lsn to InvalidXLogRecPtr. 3. pg_recvlogical reconnects and starts replication from valid startpos. 4. The connection is terminated again. 5. StreamLogicalLog() exits and OutputFsync() sets startpos to output_written_lsn (i.e., InvalidXLogRecPtr). As a result, the next StreamLogicalLog() starts replication with startpos = InvalidXLogRecPtr, which can cause the server to resend already-streamed data and lead to duplicate output. The root cause is that StreamLogicalLog() reinitializes output_written_lsn and output_fsync_lsn on every call. As you suggested, removing that initialization fixes the issue. I’ve updated the 0001 patch accordingly. Attached is the updated version of the patches. Regards, -- Fujii Masao
v5-0001-pg_recvlogical-Prevent-flushed-data-from-being-re.patch
Description: Binary data
v5-0002-Add-a-new-helper-function-wait_for_file-to-Utils..patch
Description: Binary data
v5-0004-pg_recvlogical-remove-unnecessary-OutputFsync-ret.patch
Description: Binary data
v5-0003-Add-test-for-pg_recvlogical-reconnection-behavior.patch
Description: Binary data
