[Python-Dev] socketserver ForkingMixin waiting for child processes
Hi, I'm working on reducing the failure rate of Python CIs (Travis CI, AppVeyor, buildbots). For that, I'm trying to reduce test side effects using "environment altered" warnings. This week, I worked on support.reap_children() which detects leaked child processes (usually created with os.fork()). I found a bug in the socketserver module: it waits for child processes completion, but only in non-blocking mode. If a child process takes too long, the server will never reads its exit status and so the server leaks "zombie processes". Leaking processes can increase the memory usage, spawning new processes can fail, etc. => http://bugs.python.org/issue31151 I changed the code to call waitpid() in blocking mode on each child process on server_close(), to ensure that all children completed when on server close: https://github.com/python/cpython/commit/aa8ec34ad52bb3b274ce91169e1bc4a598655049 After pushing my change, I'm not sure anymore if it's a good idea. There is a risk that server_close() blocks if a child is stuck on a socket recv() or send() for some reasons. Should we relax the code by waiting a few seconds (problem: hardcoded timeouts are always a bad idea), or *terminate* processes (SIGKILL on UNIX) if they don't complete fast enough? I don't know which applications use socketserver. How I can test if it breaks code in the wild? At least, I didn't notice any regression on Python CIs. Well, maybe the change is ok for the master branch. But I would like your opinion because now I would like to backport the fix to 2.7 and 3.6 branches. It might break some applications. If we *cannot* backport such change to 2.7 and 3.6 because it changes the behaviour, I will fix the bug in test_socketserver.py instead. What do you think? Victor ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] socketserver ForkingMixin waiting for child processes
Common pattern I've used is to wait a bit, then send a kill signal. M -- Matt Billenstein [email protected] Sent from my iPhone 6 (this put here so you know I have one) > On Aug 11, 2017, at 5:44 AM, Victor Stinner wrote: > > Hi, > > I'm working on reducing the failure rate of Python CIs (Travis CI, > AppVeyor, buildbots). For that, I'm trying to reduce test side effects > using "environment altered" warnings. This week, I worked on > support.reap_children() which detects leaked child processes (usually > created with os.fork()). > > I found a bug in the socketserver module: it waits for child processes > completion, but only in non-blocking mode. If a child process takes > too long, the server will never reads its exit status and so the > server leaks "zombie processes". Leaking processes can increase the > memory usage, spawning new processes can fail, etc. > > => http://bugs.python.org/issue31151 > > I changed the code to call waitpid() in blocking mode on each child > process on server_close(), to ensure that all children completed when > on server close: > > https://github.com/python/cpython/commit/aa8ec34ad52bb3b274ce91169e1bc4a598655049 > > After pushing my change, I'm not sure anymore if it's a good idea. > There is a risk that server_close() blocks if a child is stuck on a > socket recv() or send() for some reasons. > > Should we relax the code by waiting a few seconds (problem: hardcoded > timeouts are always a bad idea), or *terminate* processes (SIGKILL on > UNIX) if they don't complete fast enough? > > I don't know which applications use socketserver. How I can test if it > breaks code in the wild? > > At least, I didn't notice any regression on Python CIs. > > Well, maybe the change is ok for the master branch. But I would like > your opinion because now I would like to backport the fix to 2.7 and > 3.6 branches. It might break some applications. > > If we *cannot* backport such change to 2.7 and 3.6 because it changes > the behaviour, I will fix the bug in test_socketserver.py instead. > > What do you think? > > Victor > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/matt%40vazor.com ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Summary of Python tracker Issues
ACTIVITY SUMMARY (2017-08-04 - 2017-08-11) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open6116 (+16) closed 36820 (+46) total 42936 (+62) Open issues with patches: 2341 Issues opened (42) == #30855: [2.7] test_tk: test_use() of test_tkinter.test_widgets randoml http://bugs.python.org/issue30855 reopened by haypo #31122: SSLContext.wrap_socket() throws OSError with errno == 0 http://bugs.python.org/issue31122 opened by nikratio #31123: fnmatch does not follow Unix fnmatch functionality http://bugs.python.org/issue31123 opened by Varun Agrawal #31128: Allow pydoc to run with an arbitrary hostname http://bugs.python.org/issue31128 opened by feanil #31129: RawConfigParser.items() is unusual in taking arguments http://bugs.python.org/issue31129 opened by odd_bloke #31131: asyncio.wait_for() TimeoutError doesn't provide full traceback http://bugs.python.org/issue31131 opened by chris.jerdonek #31132: test_prlimit from test_resource fails when building python3 in http://bugs.python.org/issue31132 opened by cstratak #31137: Add a path attribute to NamedTemporaryFile http://bugs.python.org/issue31137 opened by wsanchez #31139: Improve tracemalloc demo code in docs http://bugs.python.org/issue31139 opened by louielu #31140: Insufficient error message with incorrect formated string lite http://bugs.python.org/issue31140 opened by chris.259263 #31141: Start should be a keyword argument of the built-in sum http://bugs.python.org/issue31141 opened by Mark.Bell #31142: python shell crashed while input quotes in print() http://bugs.python.org/issue31142 opened by py78py90py #31143: lib2to3 requires source files for fixes http://bugs.python.org/issue31143 opened by tschiller #31144: add initializer to concurrent.futures.ProcessPoolExecutor http://bugs.python.org/issue31144 opened by nvdv #31145: PriorityQueue.put() fails with TypeError if priority_number in http://bugs.python.org/issue31145 opened by MikoÅaj Babiak #31146: Docs: On non-public translations, language picker fallback to http://bugs.python.org/issue31146 opened by mdk #31147: a mostly useless check in list_extend() http://bugs.python.org/issue31147 opened by Oren Milman #31149: Add Japanese to the language switcher http://bugs.python.org/issue31149 opened by mdk #31151: test_socketserver: Warning -- reap_children() reaped child pro http://bugs.python.org/issue31151 opened by haypo #31153: Update docstrings of itertools functions http://bugs.python.org/issue31153 opened by serhiy.storchaka #31155: Encode set, frozenset, bytearray, and iterators as json arrays http://bugs.python.org/issue31155 opened by javenoneal #31158: test_pty: test_basic() fails randomly on Travis CI http://bugs.python.org/issue31158 opened by haypo #31159: Doc: Language switch can't switch on specific cases http://bugs.python.org/issue31159 opened by mdk #31161: Only check for print and exec parentheses cases for SyntaxErro http://bugs.python.org/issue31161 opened by mjpieters #31162: urllib.request.urlopen error http://bugs.python.org/issue31162 opened by [email protected] #31163: Return destination path in Path.rename and Path.replace http://bugs.python.org/issue31163 opened by albertogomcas #31164: test_functools: test_recursive_pickle() stack overflow on x86 http://bugs.python.org/issue31164 opened by haypo #31165: null pointer deref and segfault in list_slice (listobject.c:45 http://bugs.python.org/issue31165 opened by geeknik #31166: null pointer deref and segfault in _PyObject_Alloc (obmalloc.c http://bugs.python.org/issue31166 opened by geeknik #31169: Unknown-source assertion failure in multiprocessing/managers.p http://bugs.python.org/issue31169 opened by drallensmith #31171: multiprocessing.BoundedSemaphore of 32-bit python could not wo http://bugs.python.org/issue31171 opened by hongxu #31172: Py_Main() is totally broken on Visual Studio 2017 http://bugs.python.org/issue31172 opened by rutski #31174: test_tools leaks randomly references on x86 Gentoo Refleaks 3. http://bugs.python.org/issue31174 opened by haypo #31175: Exception while extracting file from ZIP with non-matching fil http://bugs.python.org/issue31175 opened by zyxtarmo #31176: Is a UDP transport also a ReadTransport/WriteTransport? http://bugs.python.org/issue31176 opened by twisteroid ambassador #31177: unittest mock's reset_mock throws an error when an attribute h http://bugs.python.org/issue31177 opened by hmvp #31178: [EASY] subprocess: TypeError: can't concat str to bytes, in _e http://bugs.python.org/issue31178 opened by haypo #31179: Speed-up dict.copy() up to 5.5 times. http://bugs.python.org/issue31179 opened by yselivanov #31180: test_multiprocessing_spawn hangs randomly on x86 Windows7 3.6 http://bugs.python.org/issue31180 opened by haypo #31181: Segfault
[Python-Dev] Pull requests now have labels specifying what stage they are at
I just pushed a change to Bedevere to help track what stage a pull request is in. The stages are: - awaiting review - awaiting core review - awaiting changes - awaiting change review - awaiting merge The "review" stage is for when a pull request has no reviews either approving or requesting changes (this only applies to PRs not by a core dev; all PRs created by a core dev are flagged as awaiting merge immediately). The "core review" stage is when a review has been done by one or more non-core folk, but no core devs. The idea with the distinction is to encourage people to help in doing initial reviewing of PRs who happen to not be core devs since triaging issues and reviewing pull requests is the biggest chunk of work in maintaining CPython. The "changes" stage occurs when a core dev requests changes to a PR (a non-core dev review will not trigger this). A comment will be left to this effect with instructions for the PR creator to leave a comment saying "I didn't expect the Spanish Inquisition" to signal when they believe they are done addressing the reviewer's comments (there is also appropriate comments to explain the quote along with an easter egg(s) :) . The "change review" label means that the PR creator left the Spanish Inquisition quote and so now the core dev(s) who requested changes should re-review the PR. There will be a comment mentioning the core devs to this effect so that they will get a proper GitHub notification and they can check the label to see if the PR creator believes they are ready for another round of review (compared to now where you have to hunt around in the PR to see if it's ready for another review). The "merge" label means the PR has received an approving review from a core dev and so it should be ready to go. Now when any state change happens to trigger these labels, any preexisting "awaiting" label will be cleared and the new one set. But removing or setting a label manually won't trigger an action, so in this instance you can override Bedevere manually until some later event occurs which triggers an automatic update. One thing I do want to call out is that it's now more important that core devs leave appropriate "approved" and "request changes" reviews through the GitHub UI instead of just leaving a comment when they approve or want changes to a PR, respectively. Not only is this necessary to automate this part of the stage labeling, but it also visibly exposes in the GitHub UI the review state of a PR. I'm not backfilling the open PRs as it's more work and I don't want to do said work. ;) But as PRs are opened, reviewed, etc., things should slowly trickle their way through the various PRs. I should mention this is the second-to-last "intrusive" workflow change I have planned to introduce some new aspect to the workflow (the last one being a status check for a news entry after every branch has been blurbified). After these two changes land, everything else I have planned is just touching things up (e.g. the CLA check becoming a status check instead of a label, touching up CONTRIBUTING.md, etc.). I think some other people have plans to improve things as well, but at least my TODO list of "radical" changes to our workflow is almost done! ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] socketserver ForkingMixin waiting for child processes
I agree that blocking shutdown by default isn't a good idea. A child will eventually get indefinitely stuck on a nonresponsive connection and hang the whole server. This behavior change is surprising and should be reverted in master, and definitely not backported. As for block-timeout or block-timeout-kill, waiting more than zero seconds in server_close() should be optional, because you're right that the best timeout is circumstantial. Since ThreadingMixIn also leaks threads, server_close() could grow a timeout flag (following the socket module timeout convention) and maybe a terminate boolean. ThreadingMixIn could then also be fixed. I'm not sure how useful that is though, since I'd bet almost all users of socketserver exit the process shortly after server_close(). Plus it can't be backported to the feature-freeze branches. It seems like this is getting complicated enough that putting the fix in test_socketserver.py is probably best. Another solution is to add a secret terminating-close flag to ForkingMixIn just for the tests. Is that good practice in the stdlib? On Fri, Aug 11, 2017 at 6:46 AM Victor Stinner wrote: > Hi, > > I'm working on reducing the failure rate of Python CIs (Travis CI, > AppVeyor, buildbots). For that, I'm trying to reduce test side effects > using "environment altered" warnings. This week, I worked on > support.reap_children() which detects leaked child processes (usually > created with os.fork()). > > I found a bug in the socketserver module: it waits for child processes > completion, but only in non-blocking mode. If a child process takes > too long, the server will never reads its exit status and so the > server leaks "zombie processes". Leaking processes can increase the > memory usage, spawning new processes can fail, etc. > > => http://bugs.python.org/issue31151 > > I changed the code to call waitpid() in blocking mode on each child > process on server_close(), to ensure that all children completed when > on server close: > > > https://github.com/python/cpython/commit/aa8ec34ad52bb3b274ce91169e1bc4a598655049 > > After pushing my change, I'm not sure anymore if it's a good idea. > There is a risk that server_close() blocks if a child is stuck on a > socket recv() or send() for some reasons. > > Should we relax the code by waiting a few seconds (problem: hardcoded > timeouts are always a bad idea), or *terminate* processes (SIGKILL on > UNIX) if they don't complete fast enough? > > I don't know which applications use socketserver. How I can test if it > breaks code in the wild? > > At least, I didn't notice any regression on Python CIs. > > Well, maybe the change is ok for the master branch. But I would like > your opinion because now I would like to backport the fix to 2.7 and > 3.6 branches. It might break some applications. > > If we *cannot* backport such change to 2.7 and 3.6 because it changes > the behaviour, I will fix the bug in test_socketserver.py instead. > > What do you think? > > Victor > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/rmsr%40lab.net > ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
