On Mon, Nov 15, 2021 6:17 PM Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > On Mon, Nov 15, 2021 at 1:48 PM houzj.f...@fujitsu.com > <houzj.f...@fujitsu.com> wrote: > > > > Hi hackers, > > > > In another thread[1], we found the pg_get_publication_tables function > > will output duplicate partition relid when adding both child and > > parent table to the publication(pubviaroot = false). > > > > Example: > > create table tbl1 (a int) partition by range (a); create table > > tbl1_part1 partition of tbl1 for values from (1) to (10); create table > > tbl1_part2 partition of tbl1 for values from (10) to (20); create > > publication pub for table tbl1, tbl1_part1 with > > (publish_via_partition_root=false); > > > > select * from pg_get_publication_tables('pub'); relid > > ------- > > 16387 > > 16390 > > 16387 > > > > The reason of the duplicate output is that: > > > > pg_get_publication_tables invokes function GetPublicationRelations > internally. > > In GetPublicationRelations(), it will add the oid of partition 'tbl1_part1' > > twice. First time from extracting partitions from the specified parent > > table 'tbl1', second time from the explicitly specified partition > > 'tbl1_part1'. > > > > > > I am not sure is this behavior expected as it seems natural for > > pg_get_publication_tables to return duplicate-free relid list. OTOH, > > there seems no harm for the current behavior(duplicate output), it > > doesn't affect the initial sync and change replication when using logical > replication. > > > > Personally, I think it might be better to make the output of > > pg_get_publication_tables duplicate-free, because the change happened > > on each output relid will only be replicated once. So, it seems more > > consistent to output each relid only once. > > > > Thoughts ? > > > > (Attach a patch which make the output duplicate-free) > > > > [1] > > > https://www.postgresql.org/message-id/CAJcOf-eURu03QNmD%3D37Ptsxu > NW4nB > > GN3G_FdRMBx_tpkeyzDUw%40mail.gmail.com > > The users can always specify the distinct clause, see [1]. I don't see any > problem > with the existing way. If required you can specify in the view > pg_publication_tables documentation that "This view returns multiple rows for > the same relation, if the relation is child to a parent partition table and > if both > the parent and child are specified in the publication" or something similart.
Thanks for the response. Yes, I agreed, as I said there's no harm for the current behavior. If most people don't think It's worth changing the behavior, I am fine with that. > If at all, the pg_get_publication_tables() is supposed to give unique outputs, > let's fix it and it is a good idea to specify that the view > pg_publication_tables > returns unique rows even though child and parent partition tables are > specified > in the publication. > > I have fdw comments on the patch: > Let's avoid adding anything to the while > loop and use list_sort+list_deduplicate_oid (qsort(O(nlogn)+O(n)) > Thanks for the comments, I will address this if we finally decide to fix it. > > 2) Are you sure you want GetPublicationRelations to be returning the unique > relations? I'm just thinking why can't you just do > list_sort+list_deduplicate_oid > in the caller? I think this makes the function more restrictive. If required, > you > can pass a boolean flag (bool give_unique and specify in the function > comments and if passed true do list_sort+list_deduplicate_oid at the end of > GetPublicationRelations ). I change the GetPublicationRelations because I found no caller need the duplicate oid, and de-duplicate oid in GetPublicationRelations could reduce some code change. I think if we need the duplicate oid in the future, we can add the flag as you suggested. Best regards, Hou zj