On 7/25/22 4:54 AM, vignesh C wrote:
On Sun, Jul 24, 2022 at 10:21 PM Jonathan S. Katz <jk...@postgresql.org> wrote:On 7/22/22 12:47 AM, Amit Kapila wrote:On Fri, Jul 22, 2022 at 1:39 AM Jonathan S. Katz <jk...@postgresql.org> wrote:
BTW, do you have any opinion on the idea of the first remaining patch where we accomplish two things: a) Checks and throws an error if 'copy_data = on' and 'origin = none' but the publication tables were also replicated from other publishers. b) Adds 'force' value for copy_data parameter to allow copying in such a case. The primary reason for this patch is to avoid loops or duplicate data in the initial phase. We can't skip copying based on origin as we can do while replicating changes from WAL. So, we detect that the publisher already has data from some other node and doesn't allow replication unless the user uses the 'force' option for copy_data.In general, I agree with the patch; but I'm not sure why we are calling "copy_data = force" in this case and how it varies from "on". I understand the goal is to prevent the infinite loop, but is there some technical restriction why we can't set "origin = none, copy_data = on" and have this work (and apologies if I missed that upthread)?Let's take a simple case to understand why copy_data = force is required to replicate between two primaries for table t1 which has data as given below:
step4 - Node-2: Create Subscription sub2 Connection '<node-1 details>' Publication pub1_2 with (origin = none, copy_data=on); If we had allowed the create subscription to be successful with copy_data = on. After this the data will be something like this: Node-1: 1, 2, 3, 4, 5, 6, 7, 8 Node-2: 1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8 So, you can see that data on Node-2 (5, 6, 7, 8) is duplicated. In case, table t1 has a unique key, it will lead to a unique key violation and replication won't proceed. To avoid this we will throw an error: ERROR: could not replicate table "public.t1" DETAIL: CREATE/ALTER SUBSCRIPTION with origin = none and copy_data = on is not allowed when the publisher has subscribed same table. HINT: Use CREATE/ALTER SUBSCRIPTION with copy_data = off/force.
Thanks for the example. I agree that it is fairly simple to reproduce.I understand that "copy_data = force" is meant to protect a user from hurting themself. I'm not convinced that this is the best way to do so.
For example today I can subscribe to multiple publications that write to the same table. If I have a primary key on that table, and two of the subscriptions try to write an identical ID, we conflict. We don't have any special flags or modes to guard against that from happening, though we do have documentation on conflicts and managing them.
AFAICT the same issue with "copy_data" also exists in the above scenario too, even without the "origin" attribute. However, I think this case is more noticeable for "origin=none" because we currently default "copy_data" to "true" and in this case data can be copied in two directions.
That said, this introduces a new restriction for this particular scenario that doesn't exist on other scenarios. Instead, I would advocate we document how to correctly set up the two-way replication scenario (which we have a draft!), document the warnings around the conflicts, perhaps include Vignesh's instructions on how to remediate a conflict on initial sync, and consider throwing a WARNING as you suggested.
Thoughts? Thanks, Jonathan
OpenPGP_signature
Description: OpenPGP digital signature