RE: row filtering for logical replication

[email protected] Mon, 29 Nov 2021 17:39:37 -0800

On Mon, Nov 29, 2021 6:11 PM Amit Kapila <[email protected]> wrote:
> On Mon, Nov 29, 2021 at 12:10 PM Greg Nancarrow <[email protected]>
> wrote:
> >
> > On Fri, Nov 26, 2021 at 12:40 AM [email protected]
> > <[email protected]> wrote:
> > >
> > > When researching and writing a top-up patch about this.
> > > I found a possible issue which I'd like to confirm first.
> > >
> > > It's possible the table is published in two publications A and B,
> > > publication A only publish "insert" , publication B publish
> > > "update". When UPDATE, both row filter in A and B will be executed. Is 
> > > this
> behavior expected?
> > >
> > > For example:
> > > ---- Publication
> > > create table tbl1 (a int primary key, b int); create publication A
> > > for table tbl1 where (b<2) with(publish='insert'); create
> > > publication B for table tbl1 where (a>1) with(publish='update');
> > >
> > > ---- Subscription
> > > create table tbl1 (a int primary key); CREATE SUBSCRIPTION sub
> > > CONNECTION 'dbname=postgres host=localhost port=10000'
> PUBLICATION
> > > A,B;
> > >
> > > ---- Publication
> > > update tbl1 set a = 2;
> > >
> > > The publication can be created, and when UPDATE, the rowfilter in A
> > > (b<2) will also been executed but the column in it is not part of replica
> identity.
> > > (I am not against this behavior just confirm)
> > >
> >
> > There seems to be problems related to allowing the row filter to
> > include columns that are not part of the replica identity (in the case
> > of publish=insert).
> > In your example scenario, the tbl1 WHERE clause "(b < 2)" for
> > publication A, that publishes inserts only, causes a problem, because
> > column "b" is not part of the replica identity.
> > To see this, follow the simple example below:
> > (and note, for the Subscription, the provided tbl1 definition has an
> > error, it should also include the 2nd column "b int", same as in the
> > publisher)
> >
> > ---- Publisher:
> > INSERT INTO tbl1 VALUES (1,1);
> > UPDATE tbl1 SET a = 2;
> >
> > Prior to the UPDATE above:
> > On pub side, tbl1 contains (1,1).
> > On sub side, tbl1 contains (1,1)
> >
> > After the above UPDATE:
> > On pub side, tbl1 contains (2,1).
> > On sub side, tbl1 contains (1,1), (2,1)
> >
> > So the UPDATE on the pub side has resulted in an INSERT of (2,1) on
> > the sub side.
> >
> > This is because when (1,1) is UPDATEd to (2,1), it attempts to use the
> > "insert" filter "(b<2)" to determine whether the old value had been
> > inserted (published to subscriber), but finds there is no "b" value
> > (because it only uses RI cols for UPDATE) and so has to assume the old
> > tuple doesn't exist on the subscriber, hence the UPDATE ends up doing
> > an INSERT.
> > INow if the use of RI cols were enforced for the insert filter case,
> > we'd properly know the answer as to whether the old row value had been
> > published and it would have correctly performed an UPDATE instead of
> > an INSERT in this case.
> >
> 
> I don't think it is a good idea to combine the row-filter from the publication
> that publishes just 'insert' with the row-filter that publishes 'updates'. We
> shouldn't apply the 'insert' filter for 'update' and similarly for publication
> operations. We can combine the filters when the published operations are the
> same. So, this means that we might need to cache multiple row-filters but I
> think that is better than having another restriction that publish operation
> 'insert'
> should also honor RI columns restriction.


Personally, I agreed that an UPDATE operation should only apply a row filter 
that
is part of a publication that has only UPDATE.

Best regards,
Hou zj

RE: row filtering for logical replication

Reply via email to