[Python-Dev] socketserver ForkingMixin waiting for child processes

2017-08-11 Thread Victor Stinner
Hi,

I'm working on reducing the failure rate of Python CIs (Travis CI,
AppVeyor, buildbots). For that, I'm trying to reduce test side effects
using "environment altered" warnings. This week, I worked on
support.reap_children() which detects leaked child processes (usually
created with os.fork()).

I found a bug in the socketserver module: it waits for child processes
completion, but only in non-blocking mode. If a child process takes
too long, the server will never reads its exit status and so the
server leaks "zombie processes". Leaking processes can increase the
memory usage, spawning new processes can fail, etc.

=> http://bugs.python.org/issue31151

I changed the code to call waitpid() in blocking mode on each child
process on server_close(), to ensure that all children completed when
on server close:

https://github.com/python/cpython/commit/aa8ec34ad52bb3b274ce91169e1bc4a598655049

After pushing my change, I'm not sure anymore if it's a good idea.
There is a risk that server_close() blocks if a child is stuck on a
socket recv() or send() for some reasons.

Should we relax the code by waiting a few seconds (problem: hardcoded
timeouts are always a bad idea), or *terminate* processes (SIGKILL on
UNIX) if they don't complete fast enough?

I don't know which applications use socketserver. How I can test if it
breaks code in the wild?

At least, I didn't notice any regression on Python CIs.

Well, maybe the change is ok for the master branch. But I would like
your opinion because now I would like to backport the fix to 2.7 and
3.6 branches. It might break some applications.

If we *cannot* backport such change to 2.7 and 3.6 because it changes
the behaviour, I will fix the bug in test_socketserver.py instead.

What do you think?

Victor
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] socketserver ForkingMixin waiting for child processes

2017-08-11 Thread Matt Billenstein
Common pattern I've used is to wait a bit, then send a kill signal.

M

--
Matt Billenstein
[email protected]

Sent from my iPhone 6 (this put here so you know I have one)

> On Aug 11, 2017, at 5:44 AM, Victor Stinner  wrote:
> 
> Hi,
> 
> I'm working on reducing the failure rate of Python CIs (Travis CI,
> AppVeyor, buildbots). For that, I'm trying to reduce test side effects
> using "environment altered" warnings. This week, I worked on
> support.reap_children() which detects leaked child processes (usually
> created with os.fork()).
> 
> I found a bug in the socketserver module: it waits for child processes
> completion, but only in non-blocking mode. If a child process takes
> too long, the server will never reads its exit status and so the
> server leaks "zombie processes". Leaking processes can increase the
> memory usage, spawning new processes can fail, etc.
> 
> => http://bugs.python.org/issue31151
> 
> I changed the code to call waitpid() in blocking mode on each child
> process on server_close(), to ensure that all children completed when
> on server close:
> 
> https://github.com/python/cpython/commit/aa8ec34ad52bb3b274ce91169e1bc4a598655049
> 
> After pushing my change, I'm not sure anymore if it's a good idea.
> There is a risk that server_close() blocks if a child is stuck on a
> socket recv() or send() for some reasons.
> 
> Should we relax the code by waiting a few seconds (problem: hardcoded
> timeouts are always a bad idea), or *terminate* processes (SIGKILL on
> UNIX) if they don't complete fast enough?
> 
> I don't know which applications use socketserver. How I can test if it
> breaks code in the wild?
> 
> At least, I didn't notice any regression on Python CIs.
> 
> Well, maybe the change is ok for the master branch. But I would like
> your opinion because now I would like to backport the fix to 2.7 and
> 3.6 branches. It might break some applications.
> 
> If we *cannot* backport such change to 2.7 and 3.6 because it changes
> the behaviour, I will fix the bug in test_socketserver.py instead.
> 
> What do you think?
> 
> Victor
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/matt%40vazor.com

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Summary of Python tracker Issues

2017-08-11 Thread Python tracker

ACTIVITY SUMMARY (2017-08-04 - 2017-08-11)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open6116 (+16)
  closed 36820 (+46)
  total  42936 (+62)

Open issues with patches: 2341 


Issues opened (42)
==

#30855: [2.7] test_tk: test_use() of test_tkinter.test_widgets randoml
http://bugs.python.org/issue30855  reopened by haypo

#31122: SSLContext.wrap_socket() throws OSError with errno == 0
http://bugs.python.org/issue31122  opened by nikratio

#31123: fnmatch does not follow Unix fnmatch functionality
http://bugs.python.org/issue31123  opened by Varun Agrawal

#31128: Allow pydoc to run with an arbitrary hostname
http://bugs.python.org/issue31128  opened by feanil

#31129: RawConfigParser.items() is unusual in taking arguments
http://bugs.python.org/issue31129  opened by odd_bloke

#31131: asyncio.wait_for() TimeoutError doesn't provide full traceback
http://bugs.python.org/issue31131  opened by chris.jerdonek

#31132: test_prlimit from test_resource fails when building python3 in
http://bugs.python.org/issue31132  opened by cstratak

#31137: Add a path attribute to NamedTemporaryFile
http://bugs.python.org/issue31137  opened by wsanchez

#31139: Improve tracemalloc demo code in docs
http://bugs.python.org/issue31139  opened by louielu

#31140: Insufficient error message with incorrect formated string lite
http://bugs.python.org/issue31140  opened by chris.259263

#31141: Start should be a keyword argument of the built-in sum
http://bugs.python.org/issue31141  opened by Mark.Bell

#31142: python shell crashed while input quotes in print()
http://bugs.python.org/issue31142  opened by py78py90py

#31143: lib2to3 requires source files for fixes
http://bugs.python.org/issue31143  opened by tschiller

#31144: add initializer to concurrent.futures.ProcessPoolExecutor
http://bugs.python.org/issue31144  opened by nvdv

#31145: PriorityQueue.put() fails with TypeError if priority_number in
http://bugs.python.org/issue31145  opened by Mikołaj Babiak

#31146: Docs: On non-public translations, language picker fallback to 
http://bugs.python.org/issue31146  opened by mdk

#31147: a mostly useless check in list_extend()
http://bugs.python.org/issue31147  opened by Oren Milman

#31149: Add Japanese to the language switcher
http://bugs.python.org/issue31149  opened by mdk

#31151: test_socketserver: Warning -- reap_children() reaped child pro
http://bugs.python.org/issue31151  opened by haypo

#31153: Update docstrings of itertools functions
http://bugs.python.org/issue31153  opened by serhiy.storchaka

#31155: Encode set, frozenset, bytearray, and iterators as json arrays
http://bugs.python.org/issue31155  opened by javenoneal

#31158: test_pty: test_basic() fails randomly on Travis CI
http://bugs.python.org/issue31158  opened by haypo

#31159: Doc: Language switch can't switch on specific cases
http://bugs.python.org/issue31159  opened by mdk

#31161: Only check for print and exec parentheses cases for SyntaxErro
http://bugs.python.org/issue31161  opened by mjpieters

#31162: urllib.request.urlopen error
http://bugs.python.org/issue31162  opened by [email protected]

#31163: Return destination path in Path.rename and Path.replace
http://bugs.python.org/issue31163  opened by albertogomcas

#31164: test_functools: test_recursive_pickle() stack overflow on x86 
http://bugs.python.org/issue31164  opened by haypo

#31165: null pointer deref and segfault in list_slice (listobject.c:45
http://bugs.python.org/issue31165  opened by geeknik

#31166: null pointer deref and segfault in _PyObject_Alloc (obmalloc.c
http://bugs.python.org/issue31166  opened by geeknik

#31169: Unknown-source assertion failure in multiprocessing/managers.p
http://bugs.python.org/issue31169  opened by drallensmith

#31171: multiprocessing.BoundedSemaphore of 32-bit python could not wo
http://bugs.python.org/issue31171  opened by hongxu

#31172: Py_Main() is totally broken on Visual Studio 2017
http://bugs.python.org/issue31172  opened by rutski

#31174: test_tools leaks randomly references on x86 Gentoo Refleaks 3.
http://bugs.python.org/issue31174  opened by haypo

#31175: Exception while extracting file from ZIP with non-matching fil
http://bugs.python.org/issue31175  opened by zyxtarmo

#31176: Is a UDP transport also a ReadTransport/WriteTransport?
http://bugs.python.org/issue31176  opened by twisteroid ambassador

#31177: unittest mock's reset_mock throws an error when an attribute h
http://bugs.python.org/issue31177  opened by hmvp

#31178: [EASY] subprocess: TypeError: can't concat str to bytes, in _e
http://bugs.python.org/issue31178  opened by haypo

#31179: Speed-up dict.copy() up to 5.5 times.
http://bugs.python.org/issue31179  opened by yselivanov

#31180: test_multiprocessing_spawn hangs randomly on x86 Windows7 3.6
http://bugs.python.org/issue31180  opened by haypo

#31181: Segfault

[Python-Dev] Pull requests now have labels specifying what stage they are at

2017-08-11 Thread Brett Cannon
I just pushed a change to Bedevere to help track what stage a pull request
is in. The stages are:


   - awaiting review
   - awaiting core review
   - awaiting changes
   - awaiting change review
   - awaiting merge

The "review" stage is for when a pull request has no reviews either
approving or requesting changes (this only applies to PRs not by a core
dev; all PRs created by a core dev are flagged as awaiting merge
immediately).

The "core review" stage is when a review has been done by one or more
non-core folk, but no core devs. The idea with the distinction is to
encourage people to help in doing initial reviewing of PRs who happen to
not be core devs since triaging issues and reviewing pull requests is the
biggest chunk of work in maintaining CPython.

The "changes" stage occurs when a core dev requests changes to a PR (a
non-core dev review will not trigger this). A comment will be left to this
effect with instructions for the PR creator to leave a comment saying "I
didn't expect the Spanish Inquisition" to signal when they believe they are
done addressing the reviewer's comments (there is also appropriate comments
to explain the quote along with an easter egg(s) :) .

The "change review" label means that the PR creator left the Spanish
Inquisition quote and so now the core dev(s) who requested changes should
re-review the PR. There will be a comment mentioning the core devs to this
effect so that they will get a proper GitHub notification and they can
check the label to see if the PR  creator believes they are ready for
another round of review (compared to now where you have to hunt around in
the PR to see if it's ready for another review).

The "merge" label means the PR has received an approving review from a core
dev and so it should be ready to go.

Now when any state change happens to trigger these labels, any preexisting
"awaiting" label will be cleared and the new one set. But removing or
setting a label manually won't trigger an action, so in this instance you
can override Bedevere manually until some later event occurs which triggers
an automatic update.

One thing I do want to call out is that it's now more important that core
devs leave appropriate "approved" and "request changes" reviews through the
GitHub UI instead of just leaving a comment when they approve or want
changes to a PR, respectively. Not only is this necessary to automate this
part of the stage labeling, but it also visibly exposes in the GitHub UI
the review state of a PR.

I'm not backfilling the open PRs as it's more work and I don't want to do
said work. ;) But as PRs are opened, reviewed, etc., things should slowly
trickle their way through the various PRs.

I should mention this is the second-to-last "intrusive" workflow change I
have planned to introduce some new aspect to the workflow (the last one
being a status check for a news entry after every branch has been
blurbified). After these two changes land, everything else I have planned
is just touching things up (e.g. the CLA check becoming a status check
instead of a label, touching up CONTRIBUTING.md, etc.). I think some other
people have plans to improve things as well, but at least my TODO list of
"radical" changes to our workflow is almost done!
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] socketserver ForkingMixin waiting for child processes

2017-08-11 Thread Ryan Smith-Roberts
I agree that blocking shutdown by default isn't a good idea. A child will
eventually get indefinitely stuck on a nonresponsive connection and hang
the whole server. This behavior change is surprising and should be reverted
in master, and definitely not backported.

As for block-timeout or block-timeout-kill, waiting more than zero seconds
in server_close() should be optional, because you're right that the best
timeout is circumstantial. Since ThreadingMixIn also leaks threads,
server_close() could grow a timeout flag (following the socket module
timeout convention) and maybe a terminate boolean. ThreadingMixIn could
then also be fixed. I'm not sure how useful that is though, since I'd bet
almost all users of socketserver exit the process shortly after
server_close(). Plus it can't be backported to the feature-freeze branches.

It seems like this is getting complicated enough that putting the fix in
test_socketserver.py is probably best. Another solution is to add a secret
terminating-close flag to ForkingMixIn just for the tests. Is that good
practice in the stdlib?

On Fri, Aug 11, 2017 at 6:46 AM Victor Stinner 
wrote:

> Hi,
>
> I'm working on reducing the failure rate of Python CIs (Travis CI,
> AppVeyor, buildbots). For that, I'm trying to reduce test side effects
> using "environment altered" warnings. This week, I worked on
> support.reap_children() which detects leaked child processes (usually
> created with os.fork()).
>
> I found a bug in the socketserver module: it waits for child processes
> completion, but only in non-blocking mode. If a child process takes
> too long, the server will never reads its exit status and so the
> server leaks "zombie processes". Leaking processes can increase the
> memory usage, spawning new processes can fail, etc.
>
> => http://bugs.python.org/issue31151
>
> I changed the code to call waitpid() in blocking mode on each child
> process on server_close(), to ensure that all children completed when
> on server close:
>
>
> https://github.com/python/cpython/commit/aa8ec34ad52bb3b274ce91169e1bc4a598655049
>
> After pushing my change, I'm not sure anymore if it's a good idea.
> There is a risk that server_close() blocks if a child is stuck on a
> socket recv() or send() for some reasons.
>
> Should we relax the code by waiting a few seconds (problem: hardcoded
> timeouts are always a bad idea), or *terminate* processes (SIGKILL on
> UNIX) if they don't complete fast enough?
>
> I don't know which applications use socketserver. How I can test if it
> breaks code in the wild?
>
> At least, I didn't notice any regression on Python CIs.
>
> Well, maybe the change is ok for the master branch. But I would like
> your opinion because now I would like to backport the fix to 2.7 and
> 3.6 branches. It might break some applications.
>
> If we *cannot* backport such change to 2.7 and 3.6 because it changes
> the behaviour, I will fix the bug in test_socketserver.py instead.
>
> What do you think?
>
> Victor
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/rmsr%40lab.net
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com