On Thu, Aug 29, 2024 at 09:52:06PM +0300, Heikki Linnakangas wrote: > Currently, if you configure a hot standby server with a smaller > max_connections setting than the primary, the server refuses to start up: > > LOG: entering standby mode > FATAL: recovery aborted because of insufficient parameter settings > DETAIL: max_connections = 10 is a lower setting than on the primary server, > where its value was 100.
> happen anyway: > > 2024-08-29 21:44:32.634 EEST [668327] FATAL: out of shared memory > 2024-08-29 21:44:32.634 EEST [668327] HINT: You might need to increase > "max_locks_per_transaction". > 2024-08-29 21:44:32.634 EEST [668327] CONTEXT: WAL redo at 2/FD40FCC8 for > Standby/LOCK: xid 996 db 5 rel 154045 > 2024-08-29 21:44:32.634 EEST [668327] WARNING: you don't own a lock of type > AccessExclusiveLock > 2024-08-29 21:44:32.634 EEST [668327] LOG: RecoveryLockHash contains entry > for lock no longer recorded by lock manager: xid 996 database 5 relation > 154045 > TRAP: failed Assert("false"), File: "../src/backend/storage/ipc/standby.c", > Granted, if you restart the server, it will probably succeed because > restarting the server will kill all the other queries that were holding > locks. But yuck. Agreed. > So how to improve this? I see a few options: > > a) Downgrade the error at startup to a warning, and allow starting the > standby with smaller settings in standby. At least with a smaller > max_locks_per_transactions. The other settings also affect the size of > known-assigned XIDs array, but if the CSN snapshots get committed, that will > get fixed. In most cases there is enough lock memory anyway, and it will be > fine. Just fix the assertion failure so that the error message is a little > nicer. > > b) If you run out of lock space, kill running queries, and prevent new ones > from starting. Track the locks in startup process' private memory until > there is enough space in the lock manager, and then re-open for queries. In > essence, go from hot standby mode to warm standby, until it's possible to go > back to hot standby mode again. Either seems fine. Having never encountered actual lock exhaustion from this, I'd lean toward (a) for simplicity. > Thoughts, better ideas? I worry about future code assuming a MaxBackends-sized array suffices for something. That could work almost all the time, breaking only when a standby replays WAL from a server having a larger array. What could we do now to catch that future mistake promptly? As a start, 027_stream_regress.pl could use low settings on its standby.