On Sun, Feb 19, 2017 at 5:54 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Sun, Feb 19, 2017 at 2:17 PM, Robert Haas <robertmh...@gmail.com> wrote: >> Such a change can be made, but as I pointed out in the part you didn't >> quote, there are reasons to wonder whether that will be a constructive >> change in real life even if it's better for the regression tests. >> Optimizing PostgreSQL for the use case of running regression tests in >> the buildfarm at the expense of other use cases wouldn't be very >> smart. Maybe such a change is better in real-world applications too, >> but that deserves at least a little bit of thought and substantive >> discussion. > > Rewind. Wait a minute. Looking at this code again, it looks like > we're supposed to ALREADY BE DOING THIS. > > DestroyParallelContext() calls WaitForParallelWorkersToExit() which > calls WaitForBackgroundWorkerShutdown() for each worker. That > function returns only when the postmaster dies (which causes an error > with that specific complaint) or when GetBackgroundWorkerPid() sets > the status to BGWH_STOPPED. GetBackgroundWorkerPid() only returns > BGWH_STOPPED when either (a) handle->generation != slot->generation > (meaning that the slot got reused, and therefore must have been freed) > or when (b) slot->pid == 0. The pid only gets set to 0 in > BackgroundWorkerStateChange() when slot->terminate is set, or in > ReportBackgroundWorkerPID() when it's called from > CleanupBackgroundWorker. So this function should not be returning > until after all workers have actually exited. >
Yeah, I have also noticed this point and was thinking of the way to close this gap. > However, it looks like there's a race condition here, because the slot > doesn't get freed up at the same time that the PID gets set to 0. > That actually happens later, when the postmaster calls > maybe_start_bgworker() or DetermineSleepTime() and one of those > functions calls ForgetBackgroundWorker(). We could tighten this up by > changing CleanupBackgroundWorker() to also call > ForgetBackgroundWorker() immediately after calling > ReportBackgroundWorker() if rw->rw_terminate || > rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART. If we do that > BEFORE sending the notification to the starting process, that closes > this hole. Almost. > To close the remaining gap, don't you think we can check slot->in_use flag when generation number for handle and slot are same. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers