On Thur, Jul 14, 2022 at 12:46 PM Peter Smith <smithpb2...@gmail.com> wrote: > Here are some review comments for the v6 patch (HEAD only):
Thanks for your comments. > 1. Commit message > > If there are two publications that publish the parent table and the child > table > separately, and both specify the option PUBLISH_VIA_PARTITION_ROOT, > subscribing > to both publications from one subscription causes initial copy twice. What we > expect is to be copied only once. > > ~ > > I don’t think the parameter even works in uppercase, so maybe better to say: > PUBLISH_VIA_PARTITION_ROOT -> 'publish_via_partition_root' It seems that there are more places to use lowercase than uppercase, so improved it as suggested. > 2. > > What we expect is to be copied only once. > > SUGGESTION > It should only be copied once. > > ~~~ > > 3. > > To fix this, we extend the API of the function pg_get_publication_tables. > Now, the function pg_get_publication_tables could receive the publication > list. > And then, if we specify option viaroot, we could exclude the partitioned table > whose ancestor belongs to the publication list when getting the table > informations. > > ~ > > Don't you mean "partition table" instead of "partitioned table"? > > SUGGESTION (also reworded) > To fix this, the API function pg_get_publication_tables has been > extended to take a publication list. Now, when getting the table > information, if the publish_via_partition_root is true, the function > can exclude a partition table whose ancestor is also published by the > same publication list. Improved and fixed as suggested. > 4. src/backend/catalog/pg_publication.c - pg_get_publication_tables > > - publication = GetPublicationByName(pubname, false); > + arr = PG_GETARG_ARRAYTYPE_P(0); > + deconstruct_array(arr, TEXTOID, -1, false, TYPALIGN_INT, > + &elems, NULL, &nelems); > > Maybe should have some comment to describe that this function > parameter is now an array of publications names. Add the following comment: `/* Deconstruct the parameter into elements. */`. Also improved the comment above the function pg_get_publication_tables: `Returns information of tables in one or more publications.` --> `Returns information of the tables in the given publication array.` > 5. > > + /* get Oids of tables from each publication */ > > Uppercase comment Improved as suggested. > 6. > > + ArrayType *arr; > + Datum *elems; > + int nelems, > + i; > + Publication *publication; > + bool viaroot = false; > + List *pub_infos = NIL; > + ListCell *lc1, > + *lc2; > > The 'publication' should be declared only in the loop that uses it. > It's also not good that this is shadowing the same variable name in a > later declaration. Reverted changes to variable "publication" declarations. > 7. > > + * Publications support partitioned tables, although all changes > + * are replicated using leaf partition identity and schema, so we > + * only need those. > */ > + if (publication->alltables) > + current_tables = GetAllTablesPublicationRelations(publication->pubviaroot); > + else > + { > + List *relids, > + *schemarelids; > + > + relids = GetPublicationRelations(publication->oid, > + publication->pubviaroot ? > + PUBLICATION_PART_ROOT : > + PUBLICATION_PART_LEAF); > + schemarelids = GetAllSchemaPublicationRelations(publication->oid, > + publication->pubviaroot ? > + PUBLICATION_PART_ROOT : > + PUBLICATION_PART_LEAF); > + current_tables = list_concat(relids, schemarelids); > + } > > Somehow I was confused by this comment because it says you only need > the LEAF tables but then the subsequent code is getting ROOT relations > anyway... Can you clarify the comment some more? I think this is a slight mistake when publication parameter "publish_via_partition_root" was introduced before. I improved the comment to the following: ``` Publications support partitioned tables. If publish_via_partition_root is false, all changes are replicated using leaf partition identity and schema, so we only need those. Otherwise, If publish_via_partition_root is true, get the partitioned table itself. ``` > 8. > > + bool viaroot = false; > > I think that should have a comment something like: > /* At least one publication is using publish_via_partition_root */ Improved as suggested. > 9. > > + /* > + * Record the published table and the corresponding publication so > + * that we can get row filters and column list later. > + */ > + foreach(lc1, tables) > + { > + Oid relid = lfirst_oid(lc1); > + > + foreach(lc2, pub_infos) > + { > + pub_info *pubinfo = (pub_info *) lfirst(lc2); > + > + if (list_member_oid(pubinfo->table_list, relid)) > + { > + Oid *result = (Oid *) malloc(sizeof(Oid) * 2); > + > + result[0] = relid; > + result[1] = pubinfo->pubid; > + > + results = lappend(results, result); > + } > + } > } > > I felt a bit uneasy about the double-looping here. I wonder if these > 'results' could have been accumulated within the existing loop over > all publications. Then the results would need to be filtered to remove > the ones associated with removed partitions. Otherwise with 10000 > tables and also many publications this (current) double-looping seems > like it might be quite expensive. Improved as suggested. > 10. src/backend/commands/subscriptioncmds.c - fetch_table_list > > + if (check_columnlist && server_version >= 160000) > > This condition does not make much sense to me. Isn’t it effectively > same as saying > if (server_version >= 150000 && server_version >= 160000) > > ??? Fixed as suggested. > 11. > > + /* > + * Get the list of tables from publisher, the partitioned table whose > + * ancestor is also in this list should be ignored, otherwise the > + * initial date in the partitioned table would be replicated twice. > + */ > > 11.a > Isn't this comment all backwards? I think you mean to say "partition" > or "partition table" (not partitioned table) because partitions have > ancestors but partition-ED tables don't. > > > 11.b > "initial date" -> "initial data" Fixed as suggested. > 12. src/test/subscription/t/013_partition.pl > > -# Note: We create two separate tables, not a partitioned one, so that we can > -# easily identity through which relation were the changes replicated. > +# Note: We only create one table for the partition table (tab4) here. > +# Because we specify option PUBLISH_VIA_PARTITION_ROOT (see pub_all and > +# pub_lower_level above), all data should be replicated to the partition > table. > +# So we do not need to create table for the partitioned table. > > 12.a > AFAIK "tab4" is the *partitioned* table, not a partition. I think this > comment has all the "partitioned" and "partition" back-to-front. > > 12.b > Also please say “publish_via_partition_root" instead of > PUBLISH_VIA_PARTITION_ROOT Fixed as suggested. > 13. src/test/subscription/t/028_row_filter.pl > > @@ -723,8 +727,11 @@ is($result, qq(t|1), 'check replicated rows to > tab_rowfilter_toast'); > # - INSERT (16) YES, 16 > 15 > $result = > $node_subscriber->safe_psql('postgres', > - "SELECT a FROM tab_rowfilter_viaroot_part"); > -is($result, qq(16), 'check replicated rows to tab_rowfilter_viaroot_part'); > + "SELECT a FROM tab_rowfilter_viaroot_part ORDER BY 1"); > +is($result, qq(16 > +17), > + 'check replicated rows to tab_rowfilter_viaroot_part' > +); > > There is a comment above that code like: > # tab_rowfilter_viaroot_part filter is: (a > 15) > # - INSERT (14) NO, 14 < 15 > # - INSERT (15) NO, 15 = 15 > # - INSERT (16) YES, 16 > 15 > > I think should modify that comment to explain the new data this patch > inserts - e.g. NO for 13 and YES for 17... Improved as suggested. I also improved the patches for back-branch according to some of Peter's comments and added the back-branch patch for REL_13. In addition, in the patch (REL15_v6) I attached for REL15 in [1], I forgot to remove the modification to the function pg_get_publication_tables. I removed related modifications now (REL15_v7). Attach the new patches. [1] - https://www.postgresql.org/message-id/OS3PR01MB6275AFA91925615A4AA782D09EBD9%40OS3PR01MB6275.jpnprd01.prod.outlook.com Regards, Wang wei
HEAD_v7-0001-Fix-data-replicated-twice-when-specifying-publish.patch
Description: HEAD_v7-0001-Fix-data-replicated-twice-when-specifying-publish.patch
REL15_v7-0001-Fix-data-replicated-twice-when-specifying-publish_patch
Description: REL15_v7-0001-Fix-data-replicated-twice-when-specifying-publish_patch
REL14_v7-0001-Fix-data-replicated-twice-when-specifying-publish_patch
Description: REL14_v7-0001-Fix-data-replicated-twice-when-specifying-publish_patch
REL13_v7-0001-Fix-data-replicated-twice-when-specifying-publish_patch
Description: REL13_v7-0001-Fix-data-replicated-twice-when-specifying-publish_patch