On Thu, May 14, 2026 at 8:35 AM Nisha Moond <[email protected]> wrote: > > Hi hackers, > > While reviewing the issue reported at [1] and the proposed solutions > at [2], I noticed a related problem: false negative conflict detection > when a 'ReplOriginId' gets reused. > > In logical replication, conflict detection relies on the tuple’s > replication origin ('roident'). The problem is that if a subscription > is dropped and a new subscription later reuses the same origin ID, the > apply worker may incorrectly treat incoming changes as “its own” > changes and skip conflict detection. > > A simple example: > 1. Create subscription sub1 with 'roident = 1' > 2. Replicate some rows into table 't1' > 3. Drop 'sub1' > 4. Create another subscription 'sub2' > 5. `sub2` reuses 'roident = 1' > 6. New updates arrive for rows previously written by 'sub1' > At this point, conflict detection sees: > tuple_origin == current_origin > > and incorrectly assumes the row was written by the current > subscription instance, so no 'update_origin_differ' conflict is > raised.
I agree with the problem sattement. I will prioritize the review soon. > This may look harmless in this simple setup, but it becomes > problematic if the new subscription is connected to a different > publisher, because real conflicts can then be silently missed. > > I explored two possible approaches to solve this: > > Approach 1. Zero out old origin IDs in commit_ts data when dropping a > subscription > ---------------------- > - When a subscription is dropped and its replication origin becomes > free, scan all 'commit_ts' SLRU entries and replace that old origin ID > with 'InvalidRepOriginId (0)'. > - So rows previously written by the old subscription would no longer > appear to belong to any active replication origin. > - A new subscription reusing the same 'roident' will always conflict > with origin '0'. > > Pros: > - Fixes the stale-origin problem completely and may also help solve > the tablesync-origin issue discussed in [1] > - No additional checks needed during conflict detection > > Cons: > - Requires scanning the entire 'commit_ts' SLRU during DROP > SUBSCRIPTION, so it can become very expensive on large systems > - Not crash-safe currently(patch): > - if the server crashes midway, some entries may still contain the > old origin ID > - after restart, reused origins can again lead to missed conflicts > - Making this fully crash-safe would likely require WAL logging or > recovery-time reprocessing. > > Approach 2. Store replication origin creation time > ---------------------- > - Add a creation timestamp for each replication origin > - During conflict check: > if tuple_origin != current_origin > -> existing behavior > if tuple_origin == current_origin > -> compare tuple commit timestamp with origin creation time > if tuple_commit_ts <= origin_creation_time > -> treat as an origin reuse case and raise conflict > > Pros: > ------- > - No additional processing during DROP SUBSCRIPTION > - Lightweight runtime check (just one timestamp comparison) > - Naturally crash-safe since origin creation is WAL-logged already > > Cons: > - Requires a catalog schema change > - The <= comparison can produce false-positive conflicts for rows > committed at the exact same microsecond as origin creation > - May require additional handling for upgraded origins > > IMO, the second approach currently looks more practical because it > avoids the heavy SLRU scan and crash-recovery complexity. > > Attached: > - Patch for approach 1 > - Patch for approach 2 > - A TAP test reproducing the issue > > Note: The patches are manually tested for the reported issue, but not > yet tested for performance or additional edge cases. > > Feedback and suggestions are welcome. > > [1] > https://www.postgresql.org/message-id/CALDaNm3Y6Y4Mub6QC8fZKnNy5jZspELQYCoQF_FL2Zwzweu%3Dog%40mail.gmail.com > [2] > https://www.postgresql.org/message-id/CAA4eK1LxGXR7jOAKh0B8N362S-Q3b6GhBxxcV_HxUaicEPq5Cg%40mail.gmail.com > > -- > Thanks, > Nisha
