On Thu, Jun 13, 2024 at 11:41 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Wed, Jun 5, 2024 at 3:32 PM Zhijie Hou (Fujitsu) > <houzj.f...@fujitsu.com> wrote: > > > > Hi, > > > > This time at PGconf.dev[1], we had some discussions regarding this > > project. The proposed approach is to split the work into two main > > components. The first part focuses on conflict detection, which aims to > > identify and report conflicts in logical replication. This feature will > > enable users to monitor the unexpected conflicts that may occur. The > > second part involves the actual conflict resolution. Here, we will provide > > built-in resolutions for each conflict and allow user to choose which > > resolution will be used for which conflict(as described in the initial > > email of this thread). > > I agree with this direction that we focus on conflict detection (and > logging) first and then develop conflict resolution on top of that. > > > > > Of course, we are open to alternative ideas and suggestions, and the > > strategy above can be changed based on ongoing discussions and feedback > > received. > > > > Here is the patch of the first part work, which adds a new parameter > > detect_conflict for CREATE and ALTER subscription commands. This new > > parameter will decide if subscription will go for conflict detection. By > > default, conflict detection will be off for a subscription. > > > > When conflict detection is enabled, additional logging is triggered in the > > following conflict scenarios: > > > > * updating a row that was previously modified by another origin. > > * The tuple to be updated is not found. > > * The tuple to be deleted is not found. > > > > While there exist other conflict types in logical replication, such as an > > incoming insert conflicting with an existing row due to a primary key or > > unique index, these cases already result in constraint violation errors. > > What does detect_conflict being true actually mean to users? I > understand that detect_conflict being true could introduce some > overhead to detect conflicts. But in terms of conflict detection, even > if detect_confict is false, we detect some conflicts such as > concurrent inserts with the same key. Once we introduce the complete > conflict detection feature, I'm not sure there is a case where a user > wants to detect only some particular types of conflict. >
You are right that users would wish to detect the conflicts and probably the extra effort would only be in the 'update_differ' case where we need to consult committs module and that we will only do when 'track_commit_timestamp' is true. BTW, I think for Inserts with primary/unique key violation, we should catch the ERROR and log it. If we want to log the conflicts in a separate table then do we want to do that in the catch block after getting pk violation or do an extra scan before 'INSERT' to find the conflict? I think logging would need extra cost especially if we want to LOG it in some table as you are suggesting below that may need some option. > > Therefore, additional conflict detection for these cases is currently > > omitted to minimize potential overhead. However, the pre-detection for > > conflict in these error cases is still essential to support automatic > > conflict resolution in the future. > > I feel that we should log all types of conflict in an uniform way. For > example, with detect_conflict being true, the update_differ conflict > is reported as "conflict %s detected on relation "%s"", whereas > concurrent inserts with the same key is reported as "duplicate key > value violates unique constraint "%s"", which could confuse users. > Ideally, I think that we log such conflict detection details (table > name, column name, conflict type, etc) to somewhere (e.g. a table or > server logs) so that the users can resolve them manually. > It is good to think if there is a value in providing in pg_conflicts_history kind of table which will have details of conflicts that occurred and then we can extend it to have resolutions. I feel we can anyway LOG the conflicts by default. Updating a separate table with conflicts should be done by default or with a knob is a point to consider. -- With Regards, Amit Kapila.