On Sun, Feb 19, 2017 at 5:54 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Sun, Feb 19, 2017 at 2:17 PM, Robert Haas <robertmh...@gmail.com> wrote:
>> Such a change can be made, but as I pointed out in the part you didn't
>> quote, there are reasons to wonder whether that will be a constructive
>> change in real life even if it's better for the regression tests.
>> Optimizing PostgreSQL for the use case of running regression tests in
>> the buildfarm at the expense of other use cases wouldn't be very
>> smart.  Maybe such a change is better in real-world applications too,
>> but that deserves at least a little bit of thought and substantive
>> discussion.
> Rewind.  Wait a minute.  Looking at this code again, it looks like
> we're supposed to ALREADY BE DOING THIS.
> DestroyParallelContext() calls WaitForParallelWorkersToExit() which
> calls WaitForBackgroundWorkerShutdown() for each worker.  That
> function returns only when the postmaster dies (which causes an error
> with that specific complaint) or when GetBackgroundWorkerPid() sets
> the status to BGWH_STOPPED. GetBackgroundWorkerPid() only returns
> BGWH_STOPPED when either (a) handle->generation != slot->generation
> (meaning that the slot got reused, and therefore must have been freed)
> or when (b) slot->pid == 0.  The pid only gets set to 0 in
> BackgroundWorkerStateChange() when slot->terminate is set, or in
> ReportBackgroundWorkerPID() when it's called from
> CleanupBackgroundWorker.  So this function should not be returning
> until after all workers have actually exited.

Yeah, I have also noticed this point and was thinking of the way to
close this gap.

> However, it looks like there's a race condition here, because the slot
> doesn't get freed up at the same time that the PID gets set to 0.
> That actually happens later, when the postmaster calls
> maybe_start_bgworker() or DetermineSleepTime() and one of those
> functions calls ForgetBackgroundWorker(). We could tighten this up by
> changing CleanupBackgroundWorker() to also call
> ForgetBackgroundWorker() immediately after calling
> ReportBackgroundWorker() if rw->rw_terminate ||
> rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART.  If we do that
> BEFORE sending the notification to the starting process, that closes
> this hole.  Almost.

To close the remaining gap, don't you think we can check slot->in_use
flag when generation number for handle and slot are same.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to