> On Jan 23, 2026, at 07:20, Heikki Linnakangas <[email protected]> wrote:
> 
> I had a look at recovery conflict signaling and a few things caught my eye. 
> No functional changes, but some cleanups and readability improvements:
> 
> Patch 0001: Remove useless errdetail_abort()
> --------------------------------------------
> 
> The function is supposed to add DETAIL to errors when you are in an aborted 
> transaction, if the transaction was aborted by a recovery conflict, like this:
> 
> ERROR:  current transaction is aborted, commands ignored until end of 
> transaction block"
> DETAIL:  Abort reason: recovery conflict
> 
> But I don't see how to reach that. If a transaction is aborted by recovery 
> conflict, you get a different error like this:
> 
> ERROR:  canceling statement due to conflict with recovery
> DETAIL:  User was holding a relation lock for too long.
> 
> The transaction abort clears the 'recoveryConflictPending' flag, so even if 
> that happens in a transaction block, you don't get that "DETAIL: Abort 
> reason: recovery conflict" in the subsequent errors.
> 
> errdetail_abort() was introduced in commit a8ce974cdd. I suppose it was 
> needed back then, but the signal handling has changed a lot since. Looking at 
> that commit now, though, I don't really understand how it was reachable even 
> back then. (Except with a race with an unrelated transaction abort, see 
> commit message)
> 
> Has anyone seen the "DETAIL:  Abort reason: recovery conflict" in recent 
> years, or ever? If not, let's rip it out.
> 

I did a Google search and couldn't find any post reporting the message, which 
seems to prove that message can be removed.

> 
> 0002: Don't hint that you can reconnect when the database is dropped
> --------------------------------------------------------------------
> 
> If you're connected to a database is being dropped, during recovery, you get 
> an error like this:
> 
> FATAL:  terminating connection due to conflict with recovery
> DETAIL:  User was connected to a database that must be dropped.
> HINT:  In a moment you should be able to reconnect to the database and repeat 
> your command.
> 
> The hint seems misleading. The database is being dropped, you most likely can 
> *not* reconnect to it. Let's remove it.
> 

I like this change. Not only removing the misleading error message, the code is 
also clearer now.

> 
> 0003-0004:  Separate RecoveryConflictReasons from procsignals
> -------------------------------------------------------------
> 
> We're currently using different PROCSIG_* flags to indicate different kinds 
> of recovery conflicts. We're also abusing the same flags in functions like 
> LogRecoveryConflict, which isn't related to inter-process signaling. It seems 
> better to have a separate enum for the recovery conflict reasons. With this 
> patch, there's just a single PROCSIG_RECOVERY_CONFLICT to wake up a process 
> on a recovery conflict, and the reason is communicated by setting a flag in a 
> bitmask in PGPROC.
> 
> I was inspired to do this in preparation of my project to replaces latches 
> with "interrupts". By having just a single PROCSIG flag, we reduce the need 
> for "interrupt bits" with that project. But it seems nicer on its own merits 
> too.
> 
> 
> 0005: Refactor ProcessRecoveryConflictInterrupt for readability
> ---------------------------------------------------------------
> 
> The function had a switch-statement with fallthrough through all the cases. 
> It took me a while to understand how it works. Once I finally understood it, 
> I refactored it to not rely on the fallthrough. I hope this makes it easier 
> for others too.
> 
> - Heikki
> <0001-Remove-useless-errdetail_abort.patch><0002-Don-t-hint-that-you-can-reconnect-when-the-database-.patch><0003-Use-ProcNumber-rather-than-pid-in-ReplicationSlot.patch><0004-Separate-RecoveryConflictReasons-from-procsignals.patch><0005-Refactor-ProcessRecoveryConflictInterrupt-for-readab.patch>

A few comments on 003-0005:

1 - 0003
```
 ReplicationSlotAcquire(const char *name, bool nowait, bool error_if_invalid)
 {
        ReplicationSlot *s;
-       int                     active_pid;
+       ProcNumber      active_proc;
+       pid_t           active_pid;
```

Active_pid is only used inside the "if (active_proc != MyProcNumber)” clause, 
so it can be only defined within the “if” clause.


2 - 0003
```
                                if (MyBackendType == B_STARTUP)
-                                       (void) SendProcSignal(active_pid,
-                                                                               
  PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT,
-                                                                               
  INVALID_PROC_NUMBER);
+                                       SendProcSignal(active_pid,
+                                                                  
PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT,
+                                                                  active_proc);
```

Here active_proc!=INVALID_PROC_NUMBER, so this changes the original logic 
without an explanation. Is the change intentional?

3 - 0004
```
+        * This is a bitmask of RecoveryConflictReasons
+        */
+       pg_atomic_uint32 pendingRecoveryConflicts;
```

I just feel this comment is a little bit confusing because 
RecoveryConflictReasons is an enum. Maybe we can say something like: This is a 
bitmask; each bit corresponds to one RecoveryConflictReason enum value.

4 - 0004
```
-                               (void) SendProcSignal(pid, sigmode, 
vxid.procNumber);
+                               (void) SendProcSignal(pid, 
PROCSIG_RECOVERY_CONFLICT, vxid.procNumber);
```

Nit: Here (void) is retained for SendProcSignal, but in the place of commit 2 
for 0003, (void) is deleted when calling SendProcSignal, is there any reason 
for retaining this one and deleting that one?

5 - 0004
```
+        * doesn't check for deadlock direcly, because we want to kill one of 
the
```

Typo: direcly -> directly

6 - 0005
```
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a2fa98ee971..bbf2254ca67 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -179,11 +179,15 @@ static bool IsTransactionExitStmt(Node *parsetree);
 static bool IsTransactionExitStmtList(List *pstmts);
 static bool IsTransactionStmtList(List *pstmts);
 static void drop_unnamed_stmt(void);
+static void ProcessRecoveryConflictInterrupts(void);
+static void ProcessRecoveryConflictInterrupt(RecoveryConflictReason reason);
+static void report_recovery_conflict(RecoveryConflictReason reason);
 static void log_disconnections(int code, Datum arg);
 static void enable_statement_timeout(void);
 static void disable_statement_timeout(void);
 
 
+
 /* ----------------------------------------------------------------
```

Nit: No need to add this empty line.

7 - 0005
```
+static void
+report_recovery_conflict(RecoveryConflictReason reason)
+{
+       bool            fatal;
 
+       if (RECOVERY_CONFLICT_DATABASE)
+       {
```

I believe this should be if (reason == RECOVERY_CONFLICT_DATABASE).

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/






Reply via email to