On Sun, Aug 20, 2017 at 10:57 PM, Craig Ringer <cr...@2ndquadrant.com> wrote:
> I've noticed a possible bug / design limitation where shm_mq_wait_internal
> sleep in a latch wait forever, and the postmaster gets stuck waiting for the
> bgworker the wait is running in to exit.
>
> This happens when the shm_mq does not have an associated bgworker handle
> registered because the other end is not known at mq creation time or is a
> normal backend not a bgworker. So a BGW handle cannot be passed.

Well, right.  The reason the API lets you pass a bgworker handle is to
avoid the problem that happens if you don't pass a bgworker handle.
:-)

> shm_mq_wait_internal() will CHECK_FOR_INTERRUPTS() when its latch wait is
> interrupted by a SIGTERM. But it doesn't actually respond to SIGTERM in any
> way; it just merrily resets its latch and keeps looping.
>
> It will bail out correctly on SIGQUIT.

There are a couple of ways to avoid this in the existing code
structure.  One is to not use blocking operations.  If you pass nowait
as true then you can put the latch wait loop in the caller and inject
whatever logic you want.  Another approach is to use die() or
something else that sets ProcDiePending as your SIGTERM handler.

> The only ways I can see to fix this are:
>
> * Generalize SIGTERM handling across postgres, so there's a global
> "got_SIGTERM" global that shm_mq_wait_internal can test to break out of its
> loop, and every backend's signal handler must set it. Lots of churn.

Not a bad idea otherwise, though.

> * In a proc's signal handler, use globals set before entry and after exit
> from shm_mq operations to detect if we're currently in shm_mq and promote
> SIGTERM to SIGQUIT by sending a new signal to ourselves. Or set up state so
> CHECK_FOR_INTERRUPTS() will notice when the handler returns.

Sounds ugly.

> * Allow passing of a *bool that tests for SIGTERM, or a function pointer
> called on each iteration to test whether looping should continue, to be
> passed to shm_mq_attach. So if you can't supply a bgw handle, you supply
> that instead. Provide a shm_mq_set_handle equivalent for it too.

While this would work, I don't really see the need for it given the
availability of nonblocking operations.  See mq_putmessage() for an
example.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to