On Fri, Feb 17, 2017 at 9:57 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: >> That seems like a seriously broken design to me, first because it can make >> for a significant delay in the slots becoming available (which is what's >> evidently causing these regression failures), and second because it's >> simply bad design to load extra responsibilities onto the postmaster. >> Especially ones that involve touching shared memory.
It's a little surprising to me that the delay we're seeing here is significant, because the death of a child should cause an immediate SIGCHLD, resulting in a call to reaper(), resulting in a call to waitpid(). Why's that not working? >> I think this needs to be changed, and promptly. Why in the world don't >> you simply have the workers clearing their slots when they exit? > > There seem to be many reasons why exit of background workers is done > by postmaster like when they have to restart after a crash or if > someone terminates them (TerminateBackgroundWorker()) or if the > notification has to be sent at exit (bgw_notify_pid). Moreover, there > can be background workers without shared memory access as well which > can't clear state from shared memory. Another point to think is that > background workers contain user-supplied code, so not sure how good it > is to give them control of tinkering with shared data structures of > the backend. Now, one can argue that for some of the cases where it > is possible background worker should cleanup shared memory state at > the exit, but I think it is not directly related to the parallel > query. As far as the parallel query is concerned, it just needs to > wait for workers exit to ensure that no more operations can be > performed in workers after that point so that it can accumulate stats > and retrieve some fixed parallel state information. That's a pretty good list of reasons with which I agree. To underscore one of Amit's points, bgw_notify_pid requires SIGUSR1 to be sent to the process that registered the worker if that process is still running, both when the process is started and when it terminates. The workers have no way of keeping track of whether the process that did the registration is still running, and with bad luck might SIGUSR1 an unrelated process (possibly killing it). And those SIGUSR1s are absolutely necessary, because otherwise a process that is waiting for a worker to start or stop would have to poll, which would make the whole system more resource-intensive and a whole lot less responsive. There are a few more reasons Amit didn't mention: 1. Workers can fail during startup before they can possibly have done enough work to be able to mark their slots as being free. On Linux, fork() can fail; on Windows, you can fail either to start a new process or to have it reattach to the shared memory segment. You could try to have the postmaster catch just those cases where the worker doesn't get far enough and have the worker do the others, but that's almost bound to make things more complex and fragile. I can't see us coming out ahead there. 2. Worker slots don't get freed when the process terminates; they get freed when the background worker is not running and will never again be restarted. So there's a subtle lifespan difference here: a background worker slot is consumed before any process is started, and persists across possibly many restarts of that process, and is finally freed when the process is both not currently running and not ever going to be restarted. It would no doubt be a bit fragile if the worker tried to guess whether or not the postmaster was going to restart it and free the slot only if not -- what happens if the postmaster comes to a different conclusion? >>> I think what we need to do >>> here is to move the test that needs workers to execute before other >>> parallel query tests where there is no such requirement. >> >> That's not fixing the problem, it's merely averting your eyes from >> the symptom. >> > I am not sure why you think so. Parallel query is capable of running > without workers even if the number of planned workers are not > available and there are many reasons for same. In general, I think > the tests should not rely on the availability of background workers > and if there is a test like that then it should be the responsibility > of the test to ensure the same. I agree with that, too. One thing that I'm a little concerned about, though, is that users could also see the kind of behavior Tom seems to be describing here and might find it surprising. For example, suppose your queries all use 4 workers and you have 4 workers configured. If you're the only user on the system and you run query after query after query, you expect that all of those queries are going to get all 4 workers. If the postmaster is so unresponsive that you don't get all of them, or worse you don't get any, you might be tempted to swear at the stupid PostgreSQL developer who engineered this system. (Hi, swearing people!) I'd like to understand whether this is something that happens once-in-a-blue-moon on CLOBBER_CACHE_RECURSIVELY critters or whether it's a pretty regular thing. If it's a regular thing, that's kinda disappointing, because it implies that the operating system feels free to take a little holiday before delivering SIGCHLD, or something like that. I guess one possible solution to this problem might be to change ExecParallelFinish's call to WaitForParallelWorkersToFinish() to instead call WaitForParallelWorkersToExit() -- and maybe just removing the former, since at that point it would have no remaining callers and no real use, if we believe that waiting for the slot to become free is the right behavior. The downside of this is that it increases the lag to shut down a parallel query over what's basically a corner case. Yeah, there might be some cases where not being able to reuse the worker slots you just freed hurts, but is it worth making all of the queries that don't care about that a little slower to cater to the cases where you do care? I'm not sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers