Robert Haas <robertmh...@gmail.com> writes: > On Fri, Feb 17, 2017 at 9:57 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: >>> That seems like a seriously broken design to me, first because it can make >>> for a significant delay in the slots becoming available (which is what's >>> evidently causing these regression failures), and second because it's >>> simply bad design to load extra responsibilities onto the postmaster. >>> Especially ones that involve touching shared memory.
> It's a little surprising to me that the delay we're seeing here is > significant, because the death of a child should cause an immediate > SIGCHLD, resulting in a call to reaper(), resulting in a call to > waitpid(). Why's that not working? I can't say for sure about gharial, but gaur/pademelon is a single-CPU machine, and I have reason to believe that its scheduler discriminates pretty hard against processes that have been deemed to be background processes. The reason it's not working is simply that the postmaster doesn't get a chance to run until the active backend has already decided that there are no free slots for the next parallel query (indeed, the next several parallel queries). Yeah, it got the signal, and eventually it gets some actual cycles, but not soon enough. On a sufficiently loaded machine this could be expected to happen, at least occasionally, even if you had lots of CPUs. It's merely more readily reproducible on this particular configuration. Also, you're assuming that the parallel worker process gets enough more cycles to exit once it's woken the parent backend with an "I'm done" signal. I wouldn't care to bet on that happening promptly either. I have definitely seen HPUX do a process context swap *immediately* when it sees a lower-priority process signal a higher-priority one, and not give the first process any more cycles for a good long time. >> There seem to be many reasons why exit of background workers is done >> by postmaster like when they have to restart after a crash or if >> someone terminates them (TerminateBackgroundWorker()) or if the >> notification has to be sent at exit (bgw_notify_pid). Moreover, there >> can be background workers without shared memory access as well which >> can't clear state from shared memory. Another point to think is that >> background workers contain user-supplied code, so not sure how good it >> is to give them control of tinkering with shared data structures of >> the backend. Now, one can argue that for some of the cases where it >> is possible background worker should cleanup shared memory state at >> the exit, but I think it is not directly related to the parallel >> query. As far as the parallel query is concerned, it just needs to >> wait for workers exit to ensure that no more operations can be >> performed in workers after that point so that it can accumulate stats >> and retrieve some fixed parallel state information. > That's a pretty good list of reasons with which I agree. It seems largely bogus IMO, except possibly the "no access to shared memory" reason, which is irrelevant for the problem at hand; I see no very probable reason why backends would ever be interested in launching workers that they couldn't communicate with. In particular, I'll buy the "user-supplied code" one only if we rip out every possibility of executing user-supplied C or plperlu or plpythonu code in standard backends. There is *no* good reason for treating parallel workers as less reliable than standard backends; if anything, given the constraints on what we let them do, they should be more reliable. Basically, I think we need to fix things so that WaitForParallelWorkersToFinish doesn't return until the slot is free and there are no impediments to launching another worker. Anything less is going to result in race conditions and intermittent misbehavior on heavily-loaded machines. Personally I think that it'd be a good idea to get the postmaster out of the loop while we're at it, but the minimum change is to wait for the slot to be marked free by the postmaster. >>>> I think what we need to do >>>> here is to move the test that needs workers to execute before other >>>> parallel query tests where there is no such requirement. >>> That's not fixing the problem, it's merely averting your eyes from >>> the symptom. >> I am not sure why you think so. In other words, you are willing to accept a system in which we can never be sure that any query after the first one in select_parallel actually ran in parallel? That seems silly on its face, and an enormous restriction on what we'll actually be able to test. regards, tom lane -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers