RE: pg_get_publication_tables() output duplicate relid

[email protected] Mon, 15 Nov 2021 17:51:09 -0800

On Mon, Nov 15, 2021 6:17 PM Bharath Rupireddy 
<[email protected]> wrote:
> On Mon, Nov 15, 2021 at 1:48 PM [email protected]
> <[email protected]> wrote:
> >
> > Hi hackers,
> >
> > In another thread[1], we found the pg_get_publication_tables function
> > will output duplicate partition relid when adding both child and
> > parent table to the publication(pubviaroot = false).
> >
> > Example:
> > create table tbl1 (a int) partition by range (a); create table
> > tbl1_part1 partition of tbl1 for values from (1) to (10); create table
> > tbl1_part2 partition of tbl1 for values from (10) to (20); create
> > publication pub for table tbl1, tbl1_part1 with
> > (publish_via_partition_root=false);
> >
> > select * from pg_get_publication_tables('pub');  relid
> > -------
> >  16387
> >  16390
> >  16387
> >
> > The reason of the duplicate output is that:
> >
> > pg_get_publication_tables invokes function GetPublicationRelations
> internally.
> > In GetPublicationRelations(), it will add the oid of partition 'tbl1_part1'
> > twice. First time from extracting partitions from the specified parent
> > table 'tbl1', second time from the explicitly specified partition 
> > 'tbl1_part1'.
> >
> >
> > I am not sure is this behavior expected as it seems natural for
> > pg_get_publication_tables to return duplicate-free relid list. OTOH,
> > there seems no harm for the current behavior(duplicate output), it
> > doesn't affect the initial sync and change replication when using logical
> replication.
> >
> > Personally, I think it might be better to make the output of
> > pg_get_publication_tables duplicate-free, because the change happened
> > on each output relid will only be replicated once. So, it seems more
> > consistent to output each relid only once.
> >
> > Thoughts ?
> >
> > (Attach a patch which make the output duplicate-free)
> >
> > [1]
> >
> https://www.postgresql.org/message-id/CAJcOf-eURu03QNmD%3D37Ptsxu
> NW4nB
> > GN3G_FdRMBx_tpkeyzDUw%40mail.gmail.com
> 
> The users can always specify the distinct clause, see [1]. I don't see any 
> problem
> with the existing way. If required you can specify in the view
> pg_publication_tables documentation that "This view returns multiple rows for
> the same relation, if the relation is child to a parent partition table and 
> if both
> the parent and child are specified in the publication" or something similart.


Thanks for the response.
Yes, I agreed, as I said there's no harm for the current behavior. If most 
people don't think
It's worth changing the behavior, I am fine with that.

> If at all, the pg_get_publication_tables() is supposed to give unique outputs,
> let's fix it and it is a good idea to specify that the view 
> pg_publication_tables
> returns unique rows even though child and parent partition tables are 
> specified
> in the publication.
> 
> I have fdw comments on the patch:
> Let's avoid adding anything to the while
> loop and use list_sort+list_deduplicate_oid (qsort(O(nlogn)+O(n))
> 

Thanks for the comments, I will address this if we finally decide to fix it.

> 
> 2) Are you sure you want GetPublicationRelations to be returning the unique
> relations? I'm just thinking why can't you just do 
> list_sort+list_deduplicate_oid
> in the caller? I think this makes the function more restrictive. If required, 
> you
> can pass a boolean flag (bool give_unique and specify in the function
> comments and if passed true do list_sort+list_deduplicate_oid at the end of
> GetPublicationRelations ).

I change the GetPublicationRelations because I found no caller need the
duplicate oid, and de-duplicate oid in GetPublicationRelations could reduce
some code change. I think if we need the duplicate oid in the future, we can
add the flag as you suggested.

Best regards,
Hou zj

RE: pg_get_publication_tables() output duplicate relid

Reply via email to