On Wed, Jan 24, 2018 at 5:31 PM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > Here's a version that works, and a minimal repro test module thing. > Without 0003 applied, it hangs.
I can confirm that this version does in fact fix the problem with parallel CREATE INDEX hanging in the event of (simulated) worker fork() failure. And, it seems to have at least one tiny advantage over the other approaches I was talking about that you didn't mention, which is that we never have to wait until the leader stops participating as a worker before an error is raised. IOW, either the whole parallel CREATE INDEX operation throws an error at an early point in the CREATE INDEX, or the CREATE INDEX completely succeeds. Obviously, the other, stated advantage is more relevant: *everyone* automatically doesn't have to worry about nworkers_launched being inaccurate this way, including code that gets away with this today only due to using a tuple queue, such as nodeGather.c, but may not always get away with it in the future. I've run out of time to assess what you've done here in any real depth. For now, I will say that this approach seems interesting to me. I'll take a closer look tomorrow. -- Peter Geoghegan