> On Jan 23, 2026, at 07:20, Heikki Linnakangas <[email protected]> wrote:
>
> I had a look at recovery conflict signaling and a few things caught my eye.
> No functional changes, but some cleanups and readability improvements:
>
> Patch 0001: Remove useless errdetail_abort()
> --------------------------------------------
>
> The function is supposed to add DETAIL to errors when you are in an aborted
> transaction, if the transaction was aborted by a recovery conflict, like this:
>
> ERROR: current transaction is aborted, commands ignored until end of
> transaction block"
> DETAIL: Abort reason: recovery conflict
>
> But I don't see how to reach that. If a transaction is aborted by recovery
> conflict, you get a different error like this:
>
> ERROR: canceling statement due to conflict with recovery
> DETAIL: User was holding a relation lock for too long.
>
> The transaction abort clears the 'recoveryConflictPending' flag, so even if
> that happens in a transaction block, you don't get that "DETAIL: Abort
> reason: recovery conflict" in the subsequent errors.
>
> errdetail_abort() was introduced in commit a8ce974cdd. I suppose it was
> needed back then, but the signal handling has changed a lot since. Looking at
> that commit now, though, I don't really understand how it was reachable even
> back then. (Except with a race with an unrelated transaction abort, see
> commit message)
>
> Has anyone seen the "DETAIL: Abort reason: recovery conflict" in recent
> years, or ever? If not, let's rip it out.
>
I did a Google search and couldn't find any post reporting the message, which
seems to prove that message can be removed.
>
> 0002: Don't hint that you can reconnect when the database is dropped
> --------------------------------------------------------------------
>
> If you're connected to a database is being dropped, during recovery, you get
> an error like this:
>
> FATAL: terminating connection due to conflict with recovery
> DETAIL: User was connected to a database that must be dropped.
> HINT: In a moment you should be able to reconnect to the database and repeat
> your command.
>
> The hint seems misleading. The database is being dropped, you most likely can
> *not* reconnect to it. Let's remove it.
>
I like this change. Not only removing the misleading error message, the code is
also clearer now.
>
> 0003-0004: Separate RecoveryConflictReasons from procsignals
> -------------------------------------------------------------
>
> We're currently using different PROCSIG_* flags to indicate different kinds
> of recovery conflicts. We're also abusing the same flags in functions like
> LogRecoveryConflict, which isn't related to inter-process signaling. It seems
> better to have a separate enum for the recovery conflict reasons. With this
> patch, there's just a single PROCSIG_RECOVERY_CONFLICT to wake up a process
> on a recovery conflict, and the reason is communicated by setting a flag in a
> bitmask in PGPROC.
>
> I was inspired to do this in preparation of my project to replaces latches
> with "interrupts". By having just a single PROCSIG flag, we reduce the need
> for "interrupt bits" with that project. But it seems nicer on its own merits
> too.
>
>
> 0005: Refactor ProcessRecoveryConflictInterrupt for readability
> ---------------------------------------------------------------
>
> The function had a switch-statement with fallthrough through all the cases.
> It took me a while to understand how it works. Once I finally understood it,
> I refactored it to not rely on the fallthrough. I hope this makes it easier
> for others too.
>
> - Heikki
> <0001-Remove-useless-errdetail_abort.patch><0002-Don-t-hint-that-you-can-reconnect-when-the-database-.patch><0003-Use-ProcNumber-rather-than-pid-in-ReplicationSlot.patch><0004-Separate-RecoveryConflictReasons-from-procsignals.patch><0005-Refactor-ProcessRecoveryConflictInterrupt-for-readab.patch>
A few comments on 003-0005:
1 - 0003
```
ReplicationSlotAcquire(const char *name, bool nowait, bool error_if_invalid)
{
ReplicationSlot *s;
- int active_pid;
+ ProcNumber active_proc;
+ pid_t active_pid;
```
Active_pid is only used inside the "if (active_proc != MyProcNumber)” clause,
so it can be only defined within the “if” clause.
2 - 0003
```
if (MyBackendType == B_STARTUP)
- (void) SendProcSignal(active_pid,
-
PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT,
-
INVALID_PROC_NUMBER);
+ SendProcSignal(active_pid,
+
PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT,
+ active_proc);
```
Here active_proc!=INVALID_PROC_NUMBER, so this changes the original logic
without an explanation. Is the change intentional?
3 - 0004
```
+ * This is a bitmask of RecoveryConflictReasons
+ */
+ pg_atomic_uint32 pendingRecoveryConflicts;
```
I just feel this comment is a little bit confusing because
RecoveryConflictReasons is an enum. Maybe we can say something like: This is a
bitmask; each bit corresponds to one RecoveryConflictReason enum value.
4 - 0004
```
- (void) SendProcSignal(pid, sigmode,
vxid.procNumber);
+ (void) SendProcSignal(pid,
PROCSIG_RECOVERY_CONFLICT, vxid.procNumber);
```
Nit: Here (void) is retained for SendProcSignal, but in the place of commit 2
for 0003, (void) is deleted when calling SendProcSignal, is there any reason
for retaining this one and deleting that one?
5 - 0004
```
+ * doesn't check for deadlock direcly, because we want to kill one of
the
```
Typo: direcly -> directly
6 - 0005
```
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a2fa98ee971..bbf2254ca67 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -179,11 +179,15 @@ static bool IsTransactionExitStmt(Node *parsetree);
static bool IsTransactionExitStmtList(List *pstmts);
static bool IsTransactionStmtList(List *pstmts);
static void drop_unnamed_stmt(void);
+static void ProcessRecoveryConflictInterrupts(void);
+static void ProcessRecoveryConflictInterrupt(RecoveryConflictReason reason);
+static void report_recovery_conflict(RecoveryConflictReason reason);
static void log_disconnections(int code, Datum arg);
static void enable_statement_timeout(void);
static void disable_statement_timeout(void);
+
/* ----------------------------------------------------------------
```
Nit: No need to add this empty line.
7 - 0005
```
+static void
+report_recovery_conflict(RecoveryConflictReason reason)
+{
+ bool fatal;
+ if (RECOVERY_CONFLICT_DATABASE)
+ {
```
I believe this should be if (reason == RECOVERY_CONFLICT_DATABASE).
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/