I agree that blocking shutdown by default isn't a good idea. A child will eventually get indefinitely stuck on a nonresponsive connection and hang the whole server. This behavior change is surprising and should be reverted in master, and definitely not backported.
As for block-timeout or block-timeout-kill, waiting more than zero seconds in server_close() should be optional, because you're right that the best timeout is circumstantial. Since ThreadingMixIn also leaks threads, server_close() could grow a timeout flag (following the socket module timeout convention) and maybe a terminate boolean. ThreadingMixIn could then also be fixed. I'm not sure how useful that is though, since I'd bet almost all users of socketserver exit the process shortly after server_close(). Plus it can't be backported to the feature-freeze branches. It seems like this is getting complicated enough that putting the fix in test_socketserver.py is probably best. Another solution is to add a secret terminating-close flag to ForkingMixIn just for the tests. Is that good practice in the stdlib? On Fri, Aug 11, 2017 at 6:46 AM Victor Stinner <victor.stin...@gmail.com> wrote: > Hi, > > I'm working on reducing the failure rate of Python CIs (Travis CI, > AppVeyor, buildbots). For that, I'm trying to reduce test side effects > using "environment altered" warnings. This week, I worked on > support.reap_children() which detects leaked child processes (usually > created with os.fork()). > > I found a bug in the socketserver module: it waits for child processes > completion, but only in non-blocking mode. If a child process takes > too long, the server will never reads its exit status and so the > server leaks "zombie processes". Leaking processes can increase the > memory usage, spawning new processes can fail, etc. > > => http://bugs.python.org/issue31151 > > I changed the code to call waitpid() in blocking mode on each child > process on server_close(), to ensure that all children completed when > on server close: > > > https://github.com/python/cpython/commit/aa8ec34ad52bb3b274ce91169e1bc4a598655049 > > After pushing my change, I'm not sure anymore if it's a good idea. > There is a risk that server_close() blocks if a child is stuck on a > socket recv() or send() for some reasons. > > Should we relax the code by waiting a few seconds (problem: hardcoded > timeouts are always a bad idea), or *terminate* processes (SIGKILL on > UNIX) if they don't complete fast enough? > > I don't know which applications use socketserver. How I can test if it > breaks code in the wild? > > At least, I didn't notice any regression on Python CIs. > > Well, maybe the change is ok for the master branch. But I would like > your opinion because now I would like to backport the fix to 2.7 and > 3.6 branches. It might break some applications. > > If we *cannot* backport such change to 2.7 and 3.6 because it changes > the behaviour, I will fix the bug in test_socketserver.py instead. > > What do you think? > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/rmsr%40lab.net >
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com