On Mon, Mar 8, 2021 at 8:09 PM vignesh C <vignes...@gmail.com> wrote:
>
> On Mon, Mar 8, 2021 at 6:25 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
> >
>
> I think in case of two_phase option, replicatedPtr and sentPtr never
> becomes the same which causes this process to hang.
>

The reason is that because on subscriber you have created a situation
(PK violation) where it is not able to proceed with initial tablesync
and then the apply worker is waiting for tablesync to complete, so it
is not able to process new messages. I think as soon as you remove the
duplicate row from the table it will be able to proceed.

Now, we can see a similar situation even in HEAD without 2PC though it
is a bit tricky to reproduce. Basically, when the tablesync worker is
in SUBREL_STATE_CATCHUP state and it has a lot of WAL to process then
the apply worker is just waiting for it to finish applying all the WAL
and won't process any message. So at that time, if you try to stop the
publisher you will see the same behavior. I have simulated a lot of
WAL processing by manually debugging the tablesync and not proceeding
for some time. You can also try by adding sleep after the tablesync
worker has set the state as SUBREL_STATE_CATCHUP.

So, I feel this is just an expected behavior and users need to
manually fix the situation where tablesync worker is not able to
proceed due to PK violation. Does this make sense?

-- 
With Regards,
Amit Kapila.


Reply via email to