Robert Haas <> writes:
> On Fri, Feb 17, 2017 at 9:57 PM, Amit Kapila <> wrote:
>>> That seems like a seriously broken design to me, first because it can make
>>> for a significant delay in the slots becoming available (which is what's
>>> evidently causing these regression failures), and second because it's
>>> simply bad design to load extra responsibilities onto the postmaster.
>>> Especially ones that involve touching shared memory.

> It's a little surprising to me that the delay we're seeing here is
> significant, because the death of a child should cause an immediate
> SIGCHLD, resulting in a call to reaper(), resulting in a call to
> waitpid().  Why's that not working?

I can't say for sure about gharial, but gaur/pademelon is a single-CPU
machine, and I have reason to believe that its scheduler discriminates
pretty hard against processes that have been deemed to be background
processes.  The reason it's not working is simply that the postmaster
doesn't get a chance to run until the active backend has already decided
that there are no free slots for the next parallel query (indeed, the
next several parallel queries).  Yeah, it got the signal, and eventually
it gets some actual cycles, but not soon enough.  On a sufficiently loaded
machine this could be expected to happen, at least occasionally, even if
you had lots of CPUs.  It's merely more readily reproducible on this
particular configuration.

Also, you're assuming that the parallel worker process gets enough more
cycles to exit once it's woken the parent backend with an "I'm done"
signal.  I wouldn't care to bet on that happening promptly either.
I have definitely seen HPUX do a process context swap *immediately*
when it sees a lower-priority process signal a higher-priority one,
and not give the first process any more cycles for a good long time.

>> There seem to be many reasons why exit of background workers is done
>> by postmaster like when they have to restart after a crash or if
>> someone terminates them (TerminateBackgroundWorker()) or if the
>> notification has to be sent at exit (bgw_notify_pid).  Moreover, there
>> can be background workers without shared memory access as well which
>> can't clear state from shared memory.  Another point to think is that
>> background workers contain user-supplied code, so not sure how good it
>> is to give them control of tinkering with shared data structures of
>> the backend.  Now, one can argue that for some of the cases where it
>> is possible background worker should cleanup shared memory state at
>> the exit, but I think it is not directly related to the parallel
>> query.  As far as the parallel query is concerned, it just needs to
>> wait for workers exit to ensure that no more operations can be
>> performed in workers after that point so that it can accumulate stats
>> and retrieve some fixed parallel state information.

> That's a pretty good list of reasons with which I agree.

It seems largely bogus IMO, except possibly the "no access to shared
memory" reason, which is irrelevant for the problem at hand; I see no very
probable reason why backends would ever be interested in launching workers
that they couldn't communicate with.  In particular, I'll buy the
"user-supplied code" one only if we rip out every possibility of executing
user-supplied C or plperlu or plpythonu code in standard backends.  There
is *no* good reason for treating parallel workers as less reliable than
standard backends; if anything, given the constraints on what we let them
do, they should be more reliable.

Basically, I think we need to fix things so that 
WaitForParallelWorkersToFinish doesn't return until the slot is free
and there are no impediments to launching another worker.  Anything
less is going to result in race conditions and intermittent misbehavior
on heavily-loaded machines.  Personally I think that it'd be a good idea
to get the postmaster out of the loop while we're at it, but the minimum
change is to wait for the slot to be marked free by the postmaster.

>>>> I think what we need to do
>>>> here is to move the test that needs workers to execute before other
>>>> parallel query tests where there is no such requirement.

>>> That's not fixing the problem, it's merely averting your eyes from
>>> the symptom.

>> I am not sure why you think so.

In other words, you are willing to accept a system in which we can never
be sure that any query after the first one in select_parallel actually ran
in parallel?  That seems silly on its face, and an enormous restriction on
what we'll actually be able to test.

                        regards, tom lane

Sent via pgsql-hackers mailing list (
To make changes to your subscription:

Reply via email to