On Fri, Nov 28, 2025 at 5:50 PM Amit Kapila <[email protected]> wrote: > > On Tue, Nov 18, 2025 at 3:40 PM shveta malik <[email protected]> wrote: > > > > On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <[email protected]> wrote: > > > > > > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <[email protected]> > > > wrote: > > > > > > > > 3) > > > > We also need to think how we are going to display the info in case of > > > > multiple_unique_conflicts as there could be multiple local and remote > > > > tuples conflicting for one single operation. Example: > > > > > > > > create table conf_tab (a int primary key, b int unique, c int unique); > > > > > > > > sub: insert into conf_tab values (2,2,2), (3,3,3), (4,4,4); > > > > > > > > pub: insert into conf_tab values (2,3,4); > > > > > > > > ERROR: conflict detected on relation "public.conf_tab": > > > > conflict=multiple_unique_conflicts > > > > DETAIL: Key already exists in unique index "conf_tab_pkey", modified > > > > locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30. > > > > Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4). > > > > Key already exists in unique index "conf_tab_b_key", modified locally > > > > in transaction 874 at 2025-11-12 14:35:13.452143+05:30. > > > > Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4). > > > > Key already exists in unique index "conf_tab_c_key", modified locally > > > > in transaction 874 at 2025-11-12 14:35:13.452143+05:30. > > > > Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4). > > > > CONTEXT: processing remote data for replication origin "pg_16392" > > > > during message type "INSERT" for replication target relation > > > > "public.conf_tab" in transaction 781, finished at 0/017FDDA0 > > > > > > > > Currently in clt, we have singular terms such as 'key_tuple', > > > > 'local_tuple', 'remote_tuple'. Shall we have multiple rows inserted? > > > > But it does not look reasonable to have multiple rows inserted for a > > > > single conflict raised. I will think more about this. > > > > > > Currently I am inserting multiple records in the conflict history > > > table, the same as each tuple is logged, but couldn't find any better > > > way for this. > > > > > The biggest drawback of this approach is data bloat. The incoming data > row will be stored multiple times. > > > > Another option is to use an array of tuples instead of a > > > single tuple but not sure this might make things more complicated to > > > process by any external tool. > > > > It’s arguable and hard to say what the correct behaviour should be. > > I’m slightly leaning toward having a single row per conflict. > > > > Yeah, it is better to either have a single row per conflict or have > two tables conflict_history and conflict_history_details to avoid data > bloat as pointed above. For example, two-table approach could be: > > 1. The Header Table (Incoming Data) > This stores the data that tried to be applied. > SQL > CREATE TABLE conflict_header ( > conflict_id SERIAL PRIMARY KEY, > source_tx_id VARCHAR(100), -- Transaction ID from source > table_name VARCHAR(100), > operation CHAR(1), -- 'I' for Insert > incoming_data JSONB, -- Store the incoming row as JSON > ... > ); > > 2. The Detail Table (Existing Conflicting Data) > This stores the actual rows currently in the database that caused the > violations. > CREATE TABLE conflict_details ( > detail_id SERIAL PRIMARY KEY, > conflict_id INT REFERENCES conflict_header(conflict_id), > constraint_name/key_tuple VARCHAR(100), > conflicting_row_data JSONB -- The existing row in the DB > that blocked the insert > ); > > Please don't consider these exact columns; you can use something on > the lines of what is proposed in the patch. This is just to show how > the conflict data can be rearranged. Now, one argument against this is > that users need to use JOIN to query data but still better than > bloating the table. The idea to store in a single table could be > changed to have columns like violated_constraints TEXT[], -- > e.g., ['uk_email', 'uk_phone'], error_details JSONB -- e.g., > [{"const": "uk_email", "val": "[email protected]"}, ...]. If we want to store > multiple conflicting tuples in a single column, we need to ensure it > is queryable via a JSONB column. The point in favour of a single JSONB > column to combine multiple conflicting tuples is that we need this > combination only for one kind of conflict. > > Both the approaches have their pros and cons. I feel we should dig a > bit deeper for both by laying out details for each method and see what > others think.
The specific scenario we are discussing is when a single row from the publisher attempts to apply an operation that causes a conflict across multiple unique keys, with each of those unique key violations conflicting with a different local row on the subscriber, is very rare. IMHO this low-frequency scenario does not justify overcomplicating the design with an array field or a multi-level table. Consider the infrequency of the root causes: - How often does a table have more than 3 to 4 unique keys? - How frequently would each of these keys conflict with a unique row on the subscriber side? If resolving this occasional, synthetic conflict requires inserting two or three rows instead of a single one, this is an acceptable trade-off considering how rare it can occur. Anyway this is my opinion and I am open to opinions from others. -- Regards, Dilip Kumar Google
