On Mon, Mar 8, 2021 at 6:25 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Mon, Mar 8, 2021 at 4:20 PM vignesh C <vignes...@gmail.com> wrote:
> >
> > On Mon, Mar 8, 2021 at 11:30 AM Ajin Cherian <itsa...@gmail.com> wrote:
> > >
> > > On Fri, Mar 5, 2021 at 9:25 PM vignesh C <vignes...@gmail.com> wrote:
> > >
> > >
> > > Created new patch v53:
> >
> > Thanks for the updated patch.
> > I had noticed one issue, publisher does not get stopped normally in
> > the following case:
> > # Publisher steps
> > psql -d postgres -c "CREATE TABLE do_write(id serial primary key);"
> > psql -d postgres -c "INSERT INTO do_write VALUES(generate_series(1,10));"
> > psql -d postgres -c "CREATE PUBLICATION mypub FOR TABLE do_write;"
> >
> > # Subscriber steps
> > psql -d postgres -p 9999 -c "CREATE TABLE do_write(id serial primary key);"
> > psql -d postgres -p 9999 -c "INSERT INTO do_write VALUES(1);" # to
> > cause a PK violation
> > psql -d postgres -p 9999 -c "CREATE SUBSCRIPTION mysub CONNECTION
> > 'host=localhost port=5432 dbname=postgres' PUBLICATION mypub WITH
> > (two_phase = true);"
> >
> > # prepare & commit prepared at publisher
> > psql -d postgres -c \
> > "begin; insert into do_write values (100); prepare transaction 'test1';"
> > psql -d postgres -c "commit prepared 'test1';"
> >
> > Stop publisher:
> > ./pg_ctl -D publisher stop
> > waiting for server to shut
> > down...............................................................
> > failed
> > pg_ctl: server does not shut down
> >
> > This is because the following process does not exit:
> > postgres: walsender vignesh 127.0.0.1(41550) START_REPLICATION
> >
> > It continuously loops at the below:
> >
>
> What happens if you don't set the two_phase option? If that also leads
> to the same error then can you please also check this case on the
> HEAD?

It succeeds without the two_phase option.
I had further analyzed this issue, see the details of it below:
We have the below code in WalSndDone function which will handle the
walsender exit:
if (WalSndCaughtUp && sentPtr == replicatedPtr &&
!pq_is_send_pending())
{
QueryCompletion qc;

/* Inform the standby that XLOG streaming is done */
SetQueryCompletion(&qc, CMDTAG_COPY, 0);
EndCommand(&qc, DestRemote, false);
pq_flush();

proc_exit(0);
}

But in case of with two_phase option, replicatedPtr and sentPtr never
becomes same:
(gdb) p /x replicatedPtr
$8 = 0x15faa70
(gdb) p /x sentPtr
$10 = 0x15fac50

Whereas in case of without two_phase option, replicatedPtr and sentPtr
becomes same and exits:
(gdb) p /x sentPtr
$7 = 0x15fae10
(gdb) p /x replicatedPtr
$8 = 0x15fae10

I think in case of two_phase option, replicatedPtr and sentPtr never
becomes the same which causes this process to hang.

Regards,
Vignesh


Reply via email to