On Mon, Mar 8, 2021 at 6:25 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Mon, Mar 8, 2021 at 4:20 PM vignesh C <vignes...@gmail.com> wrote: > > > > On Mon, Mar 8, 2021 at 11:30 AM Ajin Cherian <itsa...@gmail.com> wrote: > > > > > > On Fri, Mar 5, 2021 at 9:25 PM vignesh C <vignes...@gmail.com> wrote: > > > > > > > > > Created new patch v53: > > > > Thanks for the updated patch. > > I had noticed one issue, publisher does not get stopped normally in > > the following case: > > # Publisher steps > > psql -d postgres -c "CREATE TABLE do_write(id serial primary key);" > > psql -d postgres -c "INSERT INTO do_write VALUES(generate_series(1,10));" > > psql -d postgres -c "CREATE PUBLICATION mypub FOR TABLE do_write;" > > > > # Subscriber steps > > psql -d postgres -p 9999 -c "CREATE TABLE do_write(id serial primary key);" > > psql -d postgres -p 9999 -c "INSERT INTO do_write VALUES(1);" # to > > cause a PK violation > > psql -d postgres -p 9999 -c "CREATE SUBSCRIPTION mysub CONNECTION > > 'host=localhost port=5432 dbname=postgres' PUBLICATION mypub WITH > > (two_phase = true);" > > > > # prepare & commit prepared at publisher > > psql -d postgres -c \ > > "begin; insert into do_write values (100); prepare transaction 'test1';" > > psql -d postgres -c "commit prepared 'test1';" > > > > Stop publisher: > > ./pg_ctl -D publisher stop > > waiting for server to shut > > down............................................................... > > failed > > pg_ctl: server does not shut down > > > > This is because the following process does not exit: > > postgres: walsender vignesh 127.0.0.1(41550) START_REPLICATION > > > > It continuously loops at the below: > > > > What happens if you don't set the two_phase option? If that also leads > to the same error then can you please also check this case on the > HEAD?
It succeeds without the two_phase option. I had further analyzed this issue, see the details of it below: We have the below code in WalSndDone function which will handle the walsender exit: if (WalSndCaughtUp && sentPtr == replicatedPtr && !pq_is_send_pending()) { QueryCompletion qc; /* Inform the standby that XLOG streaming is done */ SetQueryCompletion(&qc, CMDTAG_COPY, 0); EndCommand(&qc, DestRemote, false); pq_flush(); proc_exit(0); } But in case of with two_phase option, replicatedPtr and sentPtr never becomes same: (gdb) p /x replicatedPtr $8 = 0x15faa70 (gdb) p /x sentPtr $10 = 0x15fac50 Whereas in case of without two_phase option, replicatedPtr and sentPtr becomes same and exits: (gdb) p /x sentPtr $7 = 0x15fae10 (gdb) p /x replicatedPtr $8 = 0x15fae10 I think in case of two_phase option, replicatedPtr and sentPtr never becomes the same which causes this process to hang. Regards, Vignesh