On 7/25/22 4:54 AM, vignesh C wrote:
On Sun, Jul 24, 2022 at 10:21 PM Jonathan S. Katz <jk...@postgresql.org> wrote:

On 7/22/22 12:47 AM, Amit Kapila wrote:
On Fri, Jul 22, 2022 at 1:39 AM Jonathan S. Katz <jk...@postgresql.org> wrote:

BTW, do you have any opinion on the idea of the first remaining patch
where we accomplish two things: a) Checks and throws an error if
'copy_data = on' and 'origin = none' but the publication tables were
also replicated from other publishers. b) Adds 'force' value for
copy_data parameter to allow copying in such a case. The primary
reason for this patch is to avoid loops or duplicate data in the
initial phase. We can't skip copying based on origin as we can do
while replicating changes from WAL. So, we detect that the publisher
already has data from some other node and doesn't allow replication
unless the user uses the 'force' option for copy_data.

In general, I agree with the patch; but I'm not sure why we are calling
"copy_data = force" in this case and how it varies from "on". I
understand the goal is to prevent the infinite loop, but is there some
technical restriction why we can't set "origin = none, copy_data = on"
and have this work (and apologies if I missed that upthread)?

Let's take a simple case to understand why copy_data = force is
required to replicate between two primaries for table t1 which has
data as given below:

step4 - Node-2:
Create Subscription sub2 Connection '<node-1 details>' Publication
pub1_2 with (origin = none, copy_data=on);
If we had allowed the create subscription to be successful with
copy_data = on. After this the data will be something like this:
Node-1:
1, 2, 3, 4, 5, 6, 7, 8

Node-2:
1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8

So, you can see that data on Node-2 (5, 6, 7, 8) is duplicated. In
case, table t1 has a unique key, it will lead to a unique key
violation and replication won't proceed.

To avoid this we will throw an error:
ERROR:  could not replicate table "public.t1"
DETAIL:  CREATE/ALTER SUBSCRIPTION with origin = none and copy_data =
on is not allowed when the publisher has subscribed same table.
HINT:  Use CREATE/ALTER SUBSCRIPTION with copy_data = off/force.

Thanks for the example. I agree that it is fairly simple to reproduce.

I understand that "copy_data = force" is meant to protect a user from hurting themself. I'm not convinced that this is the best way to do so.

For example today I can subscribe to multiple publications that write to the same table. If I have a primary key on that table, and two of the subscriptions try to write an identical ID, we conflict. We don't have any special flags or modes to guard against that from happening, though we do have documentation on conflicts and managing them.

AFAICT the same issue with "copy_data" also exists in the above scenario too, even without the "origin" attribute. However, I think this case is more noticeable for "origin=none" because we currently default "copy_data" to "true" and in this case data can be copied in two directions.

That said, this introduces a new restriction for this particular scenario that doesn't exist on other scenarios. Instead, I would advocate we document how to correctly set up the two-way replication scenario (which we have a draft!), document the warnings around the conflicts, perhaps include Vignesh's instructions on how to remediate a conflict on initial sync, and consider throwing a WARNING as you suggested.

Thoughts?

Thanks,

Jonathan

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to