On Tue, Feb 7, 2023 at 8:18 AM shiy.f...@fujitsu.com
<shiy.f...@fujitsu.com> wrote:
>
>
> On Thu, Feb 2, 2023 11:48 AM shveta malik <shveta.ma...@gmail.com> wrote:
> >
> >
> > So to fix this, I think either we update origin and slot entries in
> > the system catalog after the creation has passed or we clean-up the
> > system catalog in case of failure. What do you think?
> >
>
> I think the first way seems better.

Yes, I agree.

>
> I reproduced the problem I reported before with latest patch (v7-0001,
> v10-0002), and looked into this problem. It is caused by a similar reason. 
> Here
> is some analysis for the problem I reported [1].#6.
>
> First, a tablesync worker (worker-1) started for "tbl1", its originname is
> "pg_16398_1". And it exited because of unique constraint. In
> LogicalRepSyncTableStart(), originname in pg_subscription_rel is updated when
> updating table state to DATASYNC, and the origin is created when updating 
> table
> state to FINISHEDCOPY. So when it exited with state DATASYNC , the origin is 
> not
> created but the originname has been updated in pg_subscription_rel.
>
> Then a tablesync worker (worker-2) started for "tbl2", its originname is
> "pg_16398_2". After tablesync of "tbl2" finished, this worker moved to sync
> table "tbl1". In LogicalRepSyncTableStart(), it got the originname of "tbl1" -
> "pg_16398_1", by calling ReplicationOriginNameForLogicalRep(), and tried to 
> drop
> the origin (although it is not actually created before). After that, it called
> replorigin_by_name to get the originid whose name is "pg_16398_1" and the 
> result
> is InvalidOid. Origin won't be created in this case because the sync worker 
> has
> created a replication slot (when it synced tbl2), so the originid was still
> invalid and it caused an assertion failure when calling replorigin_advance().
>
> It seems we don't need to drop previous origin in worker-2 because the 
> previous
> origin was not created in worker-1. I think one way to fix it is to not update
> originname of pg_subscription_rel when setting state to DATASYNC, and only do
> that when setting state to FINISHEDCOPY. If so, the originname in
> pg_subscription_rel will be set at the same time the origin is created.

+1. Update of system-catalog needs to be done carefully and only when
origin is created.

thanks
Shveta


Reply via email to