On Fri, Oct 23, 2015 at 10:33 AM, Robert Haas <[email protected]> wrote: > > + /* > + * We can't finish transaction commit or abort until all of the > + * workers are dead. This means, in particular, that > we can't respond > + * to interrupts at this stage. > + */ > + HOLD_INTERRUPTS(); > + status = > WaitForBackgroundWorkerShutdown(pcxt->worker[i].bgwhandle); > + RESUME_INTERRUPTS(); > > These comments are correct when this code is called from > DestroyParallelContext(), but they're flat wrong when called from > ReinitializeParallelDSM(). I suggest moving the comment back to > DestroyParallelContext and following it with this: > > HOLD_INTERRUPTS(); > WaitForParallelWorkersToDie(); > RESUME_INTERRUPTS(); > > Then ditch the HOLD/RESUME interrupts in WaitForParallelWorkersToDie() itself. >
Changed as per suggestion.
> This hunk is a problem:
>
> case 'X': /* Terminate,
> indicating clean exit */
> {
> - pfree(pcxt->worker[i].bgwhandle);
> pfree(pcxt->worker[i].error_mqh);
> - pcxt->worker[i].bgwhandle = NULL;
> pcxt->worker[i].error_mqh = NULL;
> break;
> }
>
> If you do that on receipt of the 'X' message, then
> DestroyParallelContext() might SIGTERM a worker that has supposedly
> exited cleanly. That seems bad. I think maybe the solution is to
> make DestroyParallelContext() terminate the worker only if
> pcxt->worker[i].error_mqh != NULL.
Changed as per suggestion.
> So make error_mqh == NULL mean a
> clean loss of a worker: either we couldn't register it, or it exited
> cleanly. And bgwhandle == NULL would mean it's actually gone.
>
I think even if error_mqh is NULL, it not guarnteed that the worker has
exited, it ensures that clean worker shutdown is either in-progress or
done.
> It makes sense to have ExecShutdownGather and
> ExecShutdownGatherWorkers, but couldn't the former call the latter
> instead of duplicating the code?
>
makes sense, so changed accordingly.
> I think ReInitialize should be capitalized as Reinitialize throughout.
>
Changed as per suggestion.
> ExecParallelReInitializeTupleQueues is almost a cut-and-paste
> duplicate of ExecParallelSetupTupleQueues. Please refactor this to
> avoid duplication - e.g. change
> ExecParallelSetupTupleQueues(ParallelContext *pcxt) to take a second
> argument bool reinit. ExecParallelReInitializeTupleQueues can just do
> ExecParallelSetupTupleQueues(pxct, true).
>
Changed as per suggestion.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
parallel_seqscan_partialseqscan_v23.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
