[HACKERS] parallel.c oblivion of worker-startup failures

Amit Kapila Sun, 17 Sep 2017 21:33:13 -0700

Sometime back Tom Lane has reported [1] about $Subject.  I have looked
into the issue and found that the problem is not only with parallel
workers but with general background worker machinery as well in
situations where fork or some such failure occurs. The first problem
is that after we register the dynamic worker, the way to know whether
the worker has started
(WaitForBackgroundWorkerStartup/GetBackgroundWorkerPid) won't give the
right answer if the fork failure happens.  Also, in cases where the
worker is marked not to start after the crash, postmaster doesn't
notify the backend if it is not able to start the worker which can
make the backend wait forever as it is oblivion of the fact that the
worker is not started.  Now, apart from these general problems of
background worker machinery, parallel.c assumes that after it has
registered the dynamic workers, they will start and perform their
work.  We need to ensure that in case, postmaster is not able to start
parallel workers due to fork failure or any similar situations,
backend doesn't keep on waiting forever.  To fix it, before waiting
for workers to finish, we can check whether the worker exists at all.
Attached patch fixes these problems.


Another way to fix the parallel query related problem is that after
registering the workers, the master backend should wait for workers to
start before setting up different queues for them to communicate.  I
think that can be quite expensive.

Thoughts?

[1] - https://www.postgresql.org/message-id/[email protected]

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

fix_worker_startup_failures_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] parallel.c oblivion of worker-startup failures

Reply via email to