Hi, When I read the shutdown code to create the smart shutdown patch for sync rep, I found the corner case where shutdown can get stuck infinitely. This happens when postmaster reaches PM_WAIT_BACKENDS state before walsender marks itself as WAL sender process for streaming WAL (i.e., before walsender calls MarkPostmasterChildWalSender). In this case,CountChildren(NORMAL) in PostmasterStateMachine() returns non-zero because normal backend (i.e., would-be walsender) is running, and postmaster in PM_WAIT_BACKENDS state gets out of PostmasterStateMachine(). Then the backend receives START_REPLICATION command, declares itself as walsender and CountChildren(NORMAL) returns zero.
The problem is; that declaration doesn't trigger PostmasterStateMachine() at all. So, even though there is no normal backends, postmaster cannot call PostmasterStateMachine() and move its state from PM_WAIT_BACKENDS. I think this problem is harmless in practice since it doesn't happen too often. But that can happen... The simple fix is to change ServerLoop() so that it periodically calls PostmasterStateMachine() while shutdown is running. Though I was thinking to change PostmasterStateMachine(), that looked complicated. Thought? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers