On 6/13/24 7:28 AM, Amit Kapila wrote:

You are right that users would wish to detect the conflicts and
probably the extra effort would only be in the 'update_differ' case
where we need to consult committs module and that we will only do when
'track_commit_timestamp' is true. BTW, I think for Inserts with
primary/unique key violation, we should catch the ERROR and log it. If
we want to log the conflicts in a separate table then do we want to do
that in the catch block after getting pk violation or do an extra scan
before 'INSERT' to find the conflict? I think logging would need extra
cost especially if we want to LOG it in some table as you are
suggesting below that may need some option.

Therefore, additional conflict detection for these cases is currently
omitted to minimize potential overhead. However, the pre-detection for
conflict in these error cases is still essential to support automatic
conflict resolution in the future.

I feel that we should log all types of conflict in an uniform way. For
example, with detect_conflict being true, the update_differ conflict
is reported as "conflict %s detected on relation "%s"", whereas
concurrent inserts with the same key is reported as "duplicate key
value violates unique constraint "%s"", which could confuse users.
Ideally, I think that we log such conflict detection details (table
name, column name, conflict type, etc) to somewhere (e.g. a table or
server logs) so that the users can resolve them manually.


It is good to think if there is a value in providing in
pg_conflicts_history kind of table which will have details of
conflicts that occurred and then we can extend it to have resolutions.
I feel we can anyway LOG the conflicts by default. Updating a separate
table with conflicts should be done by default or with a knob is a
point to consider.

+1 for logging conflicts uniformly, but I would +100 to exposing the log in a way that's easy for the user to query (whether it's a system view or a stat table). Arguably, I'd say that would be the most important feature to come out of this effort.

Removing how conflicts are resolved, users want to know exactly what row had a conflict, and users from other database systems that have dealt with these issues will have tooling to be able to review and analyze if a conflicts occur. This data is typically stored in a queryable table, with data retained for N days. When you add in automatic conflict resolution, users then want to have a record of how the conflict was resolved, in case they need to manually update it.

Having this data in a table also gives the user opportunity to understand conflict stats (e.g. conflict rates) and potentially identify portions of the application and other parts of the system to optimize. It also makes it easier to import to downstream systems that may perform further analysis on conflict resolution, or alarm if a conflict rate exceeds a certain threshold.

Thanks,

Jonathan


Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to