[HACKERS] Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)

Noah Misch Sun, 23 Jun 2013 11:49:23 -0700

On Sun, Jun 23, 2013 at 01:55:19PM +0900, MauMau wrote:
> From: "Robert Haas" <robertmh...@gmail.com>
>> On Fri, Jun 21, 2013 at 10:02 PM, MauMau <maumau...@gmail.com> wrote:
>>> I'm comfortable with 5 seconds.  We are talking about the interval  
>>> between
>>> sending SIGQUIT to the children and then sending SIGKILL to them.  In 
>>> most
>>> situations, the backends should terminate immediately.  However, as I 
>>> said a
>>> few months ago, ereport() call in quickdie() can deadlock 
>>> indefinitely. This
>>> is a PostgreSQL's bug.
>>
>> So let's fix that bug.  Then we don't need this.


[quoted part filtered to undo caps lock]
> There are two ways to fix that bug.  You are suggesting 1 of the 
> following two methods, aren't you?

I suspect Robert meant to prevent the deadlock by limiting quickdie() to
calling only async-signal-safe functions.  Implementing that looked grotty
when we last discussed it, though, and it would not help against a library or
trusted PL function suspending SIGQUIT.

> 1. (Robert-san's idea)
> Upon receipt of immediate shutdown request, postmaster sends SIGKILL to 
> its children.
>
> 2. (Tom-san's idea)
> Upon receipt of immediate shutdown request, postmaster first sends 
> SIGQUIT to its children, wait a while for them to terminate, then sends 
> SIGKILL to them.
>
>
> At first I proposed 1.  Then Tom san suggested 2 so that the server is as 
> friendly to the clients as now by notifying them of the server shutdown.  
> I was convinced by that idea and chose 2.
>
> Basically, I think both ideas are right.  They can solve the original  
> problem.

Agreed.

> However, re-considering the meaning of "immediate" shutdown, I feel 1 is 
> a bit better, because trying to do something in quickdie() or somewhere 
> does not match the idea of "immediate". We may not have to be friendly to 

Perhaps so, but more important than the definition of the word "immediate" is
what an immediate shutdown has long meant in PostgreSQL.  All this time it has
made some effort to send a message to the client.  Note also that the patch
under discussion affects both immediate shutdowns and the automatic reset in
response to a backend crash.  I think the latter is the more-important use
case, and the message is nice to have there.

> the clients at the immediate shutdown.  The code gets much simpler.  In  
> addition, it may be better that we similarly send SIGKILL in backend 
> crash (FatalError) case, eliminate the use of SIGQUIT and remove 
> quickdie() and other SIGQUIT handlers.

My take is that the client message has some value, and we shouldn't just
discard it to simplify the code slightly.  Finishing the shutdown quickly has
value, of course.  The relative importance of those things should guide the
choice of a timeout under method #2, but I don't see a rigorous way to draw
the line.  I feel five seconds is, if anything, too high.

What about using deadlock_timeout?  It constitutes a rough barometer on the
system's tolerance for failure detection latency, and we already overload it
by having it guide log_lock_waits.  The default of 1s makes sense to me for
this purpose, too.  We can always add a separate immediate_shutdown_timeout if
there's demand, but I don't expect such demand.  (If we did add such a GUC,
setting it to zero could be the way to request method 1.  If some folks
strongly desire method 1, that's probably the way to offer it.)

Thanks,
nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)

Reply via email to