On Thu, 1 Sept 2022 at 09:19, Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Wed, Aug 31, 2022 at 11:35 AM Peter Smith <smithpb2...@gmail.com> wrote: > > > > Here are my review comments for v43-0001. > > > > ====== > > > > 1. Commit message > > > > The initial copy phase has no way to know the origin of the row data, > > so if 'copy_data = true' in the step 4 below, log a warning to notify user > > that potentially non-local data might have been copied. > > > > 1a. > > "in the step 4" -> "in step 4" > > > > ~ > > > > 1b. > > "notify user" -> "notify the user" > > > > ====== > > > > 2. doc/src/sgml/ref/create_subscription.sgml > > > > + <para> > > + If the subscription is created with <literal>origin = none</literal> and > > + <literal>copy_data = true</literal>, it will check if the publisher has > > + subscribed to the same table from other publishers and, if so, log a > > + warning to notify the user to check the publisher tables. Before > > continuing > > + with other operations the user should check that publisher tables did > > not > > + have data with different origins to prevent data inconsistency issues > > on the > > + subscriber. > > + </para> > > > > I am not sure about that wording saying "to prevent data inconsistency > > issues" because I thought maybe is already too late because any > > unwanted origin data is already copied during the initial sync. I > > think the user can do it try to clean up any problems caused before > > things become even worse (??) > > > > ~~~ > > > > You asked for my thoughts (see [1] 6b): > > > > > There is nothing much a user can do in this case. Only option would be > > > to take a backup before the operation and restore it and then recreate > > > the replication setup. I was not sure if we should document these > > > steps as similar inconsistent data could get created without the > > > origin option . Thoughts? > > > > I think those cases are a bit different: > > > > - For a normal scenario (i.e. origin=ANY) then inconsistent data just > > refers to problems like maybe PK violations so I think you just get > > what you get and have to deal with the errors... > > > > - But when the user said origin=NONE they are specifically saying they > > do NOT want any non-local data, yet they still might end up getting > > data contrary to their wishes. So I think maybe there ought to be > > something documented to warn about this potentially happening. My > > first impression is that all this seems like a kind of terrible > > problem because if it happens then cleanup could be difficult - if > > that subscriber in turn was also a publisher then this bad data might > > have propagated all over the place already! Anyway, I guess all this > > could be mentioned in some (new ?) section of the > > logical-replication.sgml file (i.e. patch 0002). > > > > IDEA: I was wondering if the documentation should recommend (or maybe > > we can even enforce this) that when using origin=NONE the user should > > always create the subscription with enabled=FALSE. That way if there > > are some warnings about potential bad origin data then there it might > > be possible to take some action or change their mind *before* any > > unwanted data pollutes everything. > > > > ====== > > > > 3. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed > > > > + if (count == 0) > > + tablenames = makeStringInfo(); > > + else > > + appendStringInfoString(tablenames, ", "); > > + > > + appendStringInfo(tablenames, "%s.%s", nspname, relname); > > + count++; > > > > I think the quotes are not correct - each separate table name should be > > quoted. > > > > ~~~ > > > > 4. > > > > + /* > > + * There might be many tables present in this case, we will display a > > + * maximum of 100 tables followed by "..." to indicate there might be > > + * more tables. > > + */ > > + if (count == 100) > > + { > > + appendStringInfoString(tablenames, ", ..."); > > + break; > > + } > > > > 4a. > > IMO this code should be part of the previous if/else > > > > e.g. > > if (count == 0) ... makeStringInfo() > > else if (count > 100) ... append ellipsis "..." and break out of the loop > > else ... append table name to the message > > > > Doing it in this way means "..." definitely means there are more than > > 100 bad tables. > > > > ~ > > > > 4b. > > Changing like above (#4a) simplifies the comment because it removes > > doubts like "might be…" since you already know you found the 101st > > table. > > > > SUGGESTION (comment) > > The message displays a maximum of 100 tables having potentially > > non-local data. Any more than that will be indicated by "...". > > > > ~ > > > > Are we using this style of an error message anywhere else in the code? > If not, then I think we should avoid it here as well as it appears a > bit adhoc to me in the sense that why restrict at 100 tables. Can we > instead think of displaying the publications list in the message? So > errdetail would be something like: Subscribed publications \"%s\" has > been subscribed from some other publisher. Then we can probably give a > SQL query for a user to find tables that may data from multiple > origins especially if that is not complicated. Also, then we can > change the function name to check_publications_origin().
Modified as suggested. > Few other minor comments on v43-0001* > 1. > + ereport(WARNING, > + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("subscription %s requested origin=NONE but might copy data > that had a different origin.", > + subname), > > In error message, we don't use full stop at the end. Modified > 2. > + ereport(WARNING, > + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("subscription %s requested origin=NONE but might copy data > that had a different origin.", > + subname), > + errdetail("Subscribed table(s) \"%s\" has been subscribed from some > other publisher.", > + tablenames->data), > > It is better to use errdetail_plural here as there could be one or > more objects in this message? Modified the new errdetail message to errdetail_plural. Thanks for the comments, the v44 version attached at [1] has the changes for the same. [1] - https://www.postgresql.org/message-id/CALDaNm0NRJ1O1cYcZD%3Df7NgynozFprb7zpJSayFN5rcaS44G6Q%40mail.gmail.com Regards, Vignesh