Re: [HACKERS] bgworker crashed or not?

Robert Haas Mon, 03 Feb 2014 07:16:46 -0800

On Fri, Jan 31, 2014 at 12:44 PM, Antonin Houska
<[email protected]> wrote:
> In 9.3 I noticed that postmaster considers bgworker crashed (and
> therefore tries to restart it) even if it has exited with zero status code.
>
> I first thought about a patch like the one below, but then noticed that
> postmaster.c:bgworker_quickdie() signal handler exits with 0 too (when
> there's no success). Do we need my patch, my patch + <something for the
> handler> or no patch at all?


I think the word "crashed" here doesn't actually mean "crashed".  If a
worker dies with an exit status of other than 0 or 1, we call
HandleChildCrash(), which effects a system-wide restart.  But if it
exits with code 0, we don't do that.  We just set HaveCrashedWorker so
that it gets restarted immediately, and that's it.  In other words,
exit(0) in a bgworker causes an immediate relaunch, and exit(1) causes
the restart interval to be respected, and exit(other) causes a
system-wide crash-and-restart cycle.

This is admittedly a weird API, and we've had some discussion of
whether to change it, but I don't know that we've reached any final
conclusion.  I'm tempted to propose exactly inverting the current
meaning of exit(0).  That is, make it mean "don't restart me, ever,
even if I have a restart interval configured" rather than "restart me
right away, even if I have a restart interval configured".  That way,
a background process that wants to run until it accomplishes some task
could be written to exit(1) on error and exit(0) on success, which
seems quite natural.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bgworker crashed or not?

Reply via email to