Hi all There's a bug in the dynamic bgworkers code that I think needs fixing before release. TL;DR: BGW_NO_RESTART workers are restarted after postmaster crash, attached patch changes that.
The case that's triggering the issue is where a static bgworker is registering a new dynamic bgworker to do some setup work each time it starts. The static worker is relaunched on postmaster restart and registers a new BGW_NO_RESTART dynamic bgworker. This dynamic bgworker immediately hits an Assert and dies. This *should* cause a postmaster restart loop; that's expected. What it shouldn't do, but is doing, is restart multiple copies of the bgworker - fail to purge the old BGW_NO_RESTART one and launch it as if it'd never crashed. The attached patch fixes the problem. Detail: If you set a worker as BGW_NO_RESTART it isn't restarted if it ERRORs out. That's fine. With Petr's applied patch it no longer restarts on exit 0 (normal exit) either. There's a third case, though: a bgworker crash causing postmaster restart. In this case the bgworker is still restarted, which makes no sense at all if it isn't for the other two cases. The existing code looks like it tries to protect against this - in maybe_start_bgworker, the invocation of do_start_bgworker is protected by a prior test for rw->rw_crashed_at that, if rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART, unregisters the worker and skips to the next one. However, in my testing a breakpoint inside the if (rw->rw_crashed_at != 0) test in maybe_start_bgworker is never hit, even for a bgworker that is known to have crashed. rw->rw_crashed_at is always zero. The culprit is ResetBackgroundWorkerCrashTimes, which unconditionally resets the crash time without considering that the worker might be BGW_NO_RESTART. The attached patch makes ResetBackgroundWorkerCrashTimes only reset the crashed time for workers with a restart time set. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
>From a48de97f593b17160487e67f953c8d2b25f9e98a Mon Sep 17 00:00:00 2001 From: Craig Ringer <cr...@2ndquadrant.com> Date: Fri, 16 May 2014 13:29:16 +0800 Subject: [PATCH] Don't restart BGW_NO_RESTART workers after postmaster crash ResetBackgroundWorkerCrashTimes was ignoring BGW_NO_RESTART and clearing the crash time of all bgworkers on postmaster restart so they'd be restarted as soon as the postmaster was up and ready. It should only do this for workers that are supposed to be restarted, as a BGW_NO_RESTART worker with no rw_crashed_at set is assumed to be newly registered. As a result, if a bgworker registered another BGW_NO_RESTART bgworker during postmaster restart we'd get an increasing number of duplicates of the BGW_NO_RESTART worker every time the postmaster restarted. --- src/backend/postmaster/bgworker.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c index 85a3b3a..4bbebb6 100644 --- a/src/backend/postmaster/bgworker.c +++ b/src/backend/postmaster/bgworker.c @@ -411,7 +411,8 @@ ResetBackgroundWorkerCrashTimes(void) RegisteredBgWorker *rw; rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur); - rw->rw_crashed_at = 0; + if (rw->rw_worker.bgw_restart_time != BGW_NEVER_RESTART) + rw->rw_crashed_at = 0; } } -- 1.9.0
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers