RE: Column Filtering in Logical Replication

[email protected] Wed, 09 Mar 2022 02:12:23 -0800

On Wednesday, March 9, 2022 6:04 PM Amit Kapila <[email protected]>
> On Mon, Mar 7, 2022 at 8:48 PM Tomas Vondra
> <[email protected]> wrote:
> >
> > On 3/4/22 11:42, Amit Kapila wrote:
> >
> > > *
> > > Fetching column filter info in tablesync.c is quite expensive. It
> > > seems to be using four round-trips to get the complete info whereas
> > > for row-filter we use just one round trip. I think we should try to
> > > get both row filter and column filter info in just one round trip.
> > >
> >
> > Maybe, but I really don't think this is an issue.
> >
> 
> I am not sure but it might matter for small tables. Leaving aside the
> performance issue, I think the current way will get the wrong column list in
> many cases: (a) The ALL TABLES IN SCHEMA case handling won't work for
> partitioned tables when the partitioned table is part of one schema and
> partition table is part of another schema. (b) The handling of partition 
> tables in
> other cases will fetch incorrect lists as it tries to fetch the column list 
> of all the
> partitions in the hierarchy.
> 
> One of my colleagues has even tested these cases both for column filters and
> row filters and we find the behavior of row filter is okay whereas for column
> filter it uses the wrong column list. We will share the tests and results 
> with you
> in a later email. We are trying to unify the column filter queries with row 
> filter to
> make their behavior the same and will share the findings once it is done. I 
> hope
> if we are able to achieve this that we will reduce the chances of bugs in 
> this area.
> 
> Note: I think the first two patches for tests are not required after commit
> ceb57afd3c.


Hi,

Here are some tests and results about the table sync query of
column filter patch and row filter.

1) multiple publications which publish schema of parent table and partition.
----pub
create schema s1;
create table s1.t (a int, b int, c int) partition by range (a);
create table t_1 partition of s1.t for values from (1) to (10);
create publication pub1 for all tables in schema s1;
create publication pub2 for table t_1(b);

----sub
- prepare tables
CREATE SUBSCRIPTION sub CONNECTION 'port=10000 dbname=postgres' PUBLICATION 
pub1, pub2;

When doing table sync for 't_1', the column list will be (b). I think it should
be no filter because table t_1 is also published via ALL TABLES IN SCHMEA
publication.

For Row Filter, it will use no filter for this case.


2) one publication publishes both parent and child
----pub
create table t (a int, b int, c int) partition by range (a);
create table t_1 partition of t for values from (1) to (10)
       partition by range (a);
create table t_2 partition of t_1 for values from (1) to (10);

create publication pub2 for table t_1(a), t_2
  with (PUBLISH_VIA_PARTITION_ROOT);

----sub
- prepare tables
CREATE SUBSCRIPTION sub CONNECTION 'port=10000 dbname=postgres' PUBLICATION 
pub2;

When doing table sync for table 't_1', it has no column list. I think the
expected column list is (a).

For Row Filter, it will use the row filter of the top most parent table(t_1) in
this case.


3) one publication publishes both parent and child
----pub
create table t (a int, b int, c int) partition by range (a);
create table t_1 partition of t for values from (1) to (10)
       partition by range (a);
create table t_2 partition of t_1 for values from (1) to (10);

create publication pub2 for table t_1(a), t_2(b)
  with (PUBLISH_VIA_PARTITION_ROOT);

----sub
- prepare tables
CREATE SUBSCRIPTION sub CONNECTION 'port=10000 dbname=postgres' PUBLICATION 
pub2;

When doing table sync for table 't_1', the column list would be (a, b). I think
the expected column list is (a).

For Row Filter, it will use the row filter of the top most parent table(t_1) in
this case.

Best regards,
Hou zj

RE: Column Filtering in Logical Replication

Reply via email to