On Friday, January 14, 2022 7:52 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Wed, Jan 12, 2022 at 2:40 AM Justin Pryzby <pry...@telsasoft.com> wrote: > > > > Is there any coordination between the "column filter" patch and the "row > > filter" patch ? Are they both on track for PG15 ? Has anybody run them > > together ? > > > > The few things where I think we might need to define some common > behavior are as follows: >
I tried some cases about the points you mentions, which can be taken as reference. > 1. Replica Identity handling: Currently the column filter patch gives > an error during create/alter subscription if the specified column list > is invalid (Replica Identity columns are missing). It also gives an > error if the user tries to change the replica identity. However, it > doesn't deal with cases where the user drops and adds a different > primary key that has a different set of columns which can lead to > failure during apply on the subscriber. > An example for this scenario: -- publisher -- create table tbl(a int primary key, b int); create publication pub for table tbl(a); alter table tbl drop CONSTRAINT tbl_pkey; alter table tbl add primary key (b); -- subscriber -- create table tbl(a int, b int); create subscription sub connection 'port=5432 dbname=postgres' publication pub; -- publisher -- insert into tbl values (1,1); -- subscriber -- postgres=# select * from tbl; a | b ---+--- 1 | (1 row) update tbl set b=1 where a=1; alter table tbl add primary key (b); -- publisher -- delete from tbl; The subscriber reported the following error message and DELETE failed in subscriber. ERROR: publisher did not send replica identity column expected by the logical replication target relation "public.tbl" CONTEXT: processing remote data during "DELETE" for replication target relation "public.tbl" in transaction 723 at 2022-01-14 13:11:51.514261+08 -- subscriber postgres=# select * from tbl; a | b ---+--- 1 | 1 (1 row) > I think another issue w.r.t column filter patch is that even while > creating publication (even for 'insert' publications) it should check > that all primary key columns must be part of published columns, > otherwise, it can fail while applying on subscriber as it will try to > insert NULL for the primary key column. > For example: -- publisher -- create table tbl(a int primary key, b int); create publication pub for table tbl(a); alter table tbl drop CONSTRAINT tbl_pkey; alter table tbl add primary key (b); -- subscriber -- create table tbl(a int, b int primary key); create subscription sub connection 'port=5432 dbname=postgres' publication pub; -- publisher -- insert into tbl values (1,1); The subscriber reported the following error message and INSERT failed in subscriber. ERROR: null value in column "b" of relation "tbl" violates not-null constraint DETAIL: Failing row contains (1, null). -- subscriber -- postgres=# select * from tbl; a | b ---+--- (0 rows) > 2. Handling of partitioned tables vs. Replica Identity (RI): When > adding a partitioned table with a column list to the publication (with > publish_via_partition_root = false), we should check the Replica > Identity of all its leaf partition as the RI on the partition is the > one actually takes effect when publishing DML changes. We need to > check RI while attaching the partition as well, as the newly added > partitions will automatically become part of publication if the > partitioned table is part of the publication. If we don't do this the > later deletes/updates can fail. > Please see the following 3 cases about partition. Case1 (publish a parent table which has a partition table): ---------------------------- -- publisher -- create table parent (a int, b int) partition by range (a); create table child partition of parent default; create unique INDEX ON child (a,b); alter table child alter a set not null; alter table child alter b set not null; alter table child replica identity using INDEX child_a_b_idx; create publication pub for table parent(a) with(publish_via_partition_root=false); -- subscriber -- create table parent (a int, b int) partition by range (a); create table child partition of parent default; create subscription sub connection 'port=5432 dbname=postgres' publication pub; -- publisher -- insert into parent values (1,1); -- subscriber -- postgres=# select * from parent; a | b ---+--- 1 | (1 row) -- add RI in subscriber to avoid other errors update child set b=1 where a=1; create unique INDEX ON child (a,b); alter table child alter a set not null; alter table child alter b set not null; alter table child replica identity using INDEX child_a_b_idx; -- publisher -- delete from parent; The subscriber reported the following error message and DELETE failed in subscriber. ERROR: publisher did not send replica identity column expected by the logical replication target relation "public.child" CONTEXT: processing remote data during "DELETE" for replication target relation "public.child" in transaction 727 at 2022-01-14 20:29:46.50784+08 -- subscriber -- postgres=# select * from parent; a | b ---+--- 1 | 1 (1 row) Case2 (create publication for parent table, then alter table to attach partition): ---------------------------- -- publisher -- create table parent (a int, b int) partition by range (a); create table child (a int, b int); create unique INDEX ON child (a,b); alter table child alter a set not null; alter table child alter b set not null; alter table child replica identity using INDEX child_a_b_idx; create publication pub for table parent(a) with(publish_via_partition_root=false); alter table parent attach partition child default; insert into parent values (1,1); -- subscriber -- create table parent (a int, b int) partition by range (a); create table child partition of parent default; create subscription sub connection 'port=5432 dbname=postgres' publication pub; postgres=# select * from parent; a | b ---+--- 1 | (1 row) -- add RI in subscriber to avoid other errors update child set b=1 where a=1; create unique INDEX ON child (a,b); alter table child alter a set not null; alter table child alter b set not null; alter table child replica identity using INDEX child_a_b_idx; -- publisher -- delete from parent; The subscriber reported the following error message and DELETE failed in subscriber. ERROR: publisher did not send replica identity column expected by the logical replication target relation "public.child" CONTEXT: processing remote data during "DELETE" for replication target relation "public.child" in transaction 728 at 2022-01-14 20:42:16.483878+08 -- subscriber -- postgres=# select * from parent; a | b ---+--- 1 | 1 (1 row) Case3 (create publication for parent table, then using "create table partition of", and specify primary key when creating partition table): ---------------------------- -- publisher -- create table parent (a int, b int) partition by range (a); create publication pub for table parent(a) with(publish_via_partition_root=false); create table child partition of parent (primary key (a,b)) default; -- subscriber -- create table parent (a int, b int) partition by range (a); create table child partition of parent default; create subscription sub connection 'port=5432 dbname=postgres' publication pub; -- publisher -- insert into parent values (1,1); -- subscriber -- postgres=# select * from parent; a | b ---+--- 1 | (1 row) -- add PK in subscriber to avoid other errors update child set b=1 where a=1; alter table child add primary key (a,b); -- publisher -- delete from parent; The subscriber reported the following error message and DELETE failed in subscriber. ERROR: publisher did not send replica identity column expected by the logical replication target relation "public.child" CONTEXT: processing remote data during "DELETE" for replication target relation "public.child" in transaction 723 at 2022-01-14 20:45:33.622168+08 -- subscriber -- postgres=# select * from parent; a | b ---+--- 1 | 1 (1 row) > 3. Tablesync.c handling: Ideally, it would be good if we have a single > query to fetch both row filters and column filters but even if that is > not possible in the first version, the behavior should be same for > both queries w.r.t partitioned tables, For ALL Tables and For All > Tables In Schema cases. > > Currently, the column filter patch doesn't seem to respect For ALL > Tables and For All Tables In Schema cases, basically, it just copies > the columns it finds through some of the publications even if one of > the publications is defined as For All Tables. The row filter patch > ignores the row filters if one of the publications is defined as For > ALL Tables and For All Tables In Schema. > A case for the publications is defined as For ALL Tables and For All Tables In Schema: -- publisher -- create schema s1; create table s1.t1 (a int, b int); create publication p1 for table s1.t1 (a); create publication p2 for all tables; insert into s1.t1 values (1,1); -- subscriber -- create schema s1; create table s1.t1 (a int, b int); create subscription sub connection 'port=5432 dbname=postgres' publication p1, p2; postgres=# select * from s1.t1; a | b ---+--- 1 | (1 row) (I got the same result when p2 is specified as "FOR ALL TABLES IN SCHEMA s1") > For row filter patch, if the publication contains a partitioned table, > the publication parameter publish_via_partition_root determines if it > uses the partition row filter (if the parameter is false, the default) > or the root partitioned table row filter and this is taken care of > even during the initial tablesync. > > For column filter patch, if the publication contains a partitioned > table, it seems that it finds all columns that the tables in its > partition tree specified in the publications, whether > publish_via_partition_root is true or false. > Please see the following cases. Column filter ---------------------------------------- -- publisher -- create table parent (a int, b int) partition by range (a); create table child partition of parent default; create publication p1 for table parent (a) with(publish_via_partition_root=false); create publication p2 for table parent (a) with(publish_via_partition_root=true); insert into parent values (1,1); -- subscriber -- create table parent (a int, b int) partition by range (a); create table child partition of parent default; create subscription sub connection 'port=5432 dbname=postgres' publication p1; postgres=# select * from parent; -- column filter works when publish_via_partition_root=false a | b ---+--- 1 | (1 row) drop subscription sub; delete from parent; create subscription sub connection 'port=5432 dbname=postgres' publication p2; postgres=# select * from parent; -- column filter also works when publish_via_partition_root=true a | b ---+--- 1 | (1 row) Row filter ---------------------------------------- -- publisher -- create table parent (a int, b int) partition by range (a); create table child partition of parent default; create publication p1 for table parent where (a>10) with(publish_via_partition_root=false); create publication p2 for table parent where (a>10) with(publish_via_partition_root=true); insert into parent values (1,1); insert into parent values (11,11); -- subscriber create table parent (a int, b int) partition by range (a); create table child partition of parent default; create subscription sub connection 'port=5432 dbname=postgres' publication p1; postgres=# select * from parent; -- row filter doesn't work when publish_via_partition_root=false a | b ----+---- 1 | 1 11 | 11 (2 rows) drop subscription sub; delete from parent; create subscription sub connection 'port=5432 dbname=postgres' publication p2; postgres=# select * from parent; -- row filter works when publish_via_partition_root=true a | b ----+---- 11 | 11 (1 row) Regards, Tang