Re: [Python-Dev] socketserver ForkingMixin waiting for child processes

2017-08-18 Thread Victor Stinner
2017-08-12 0:34 GMT+02:00 Ryan Smith-Roberts :
> Since ThreadingMixIn also leaks threads,
> server_close() could grow a timeout flag (following the socket module
> timeout convention) and maybe a terminate boolean. ThreadingMixIn could then
> also be fixed. I'm not sure how useful that is though, since I'd bet almost
> all users of socketserver exit the process shortly after server_close().
> Plus it can't be backported to the feature-freeze branches.

Oh.

It took me 2 months, but I finally identified why *sometimes*,
test_logging fails with warning about threads. It's exactly because of
the weak socketserver.ThreadingMixIn which leaves running threads in
the background, even after server_close(). I just opened a new issue:

"socketserver.ThreadingMixIn leaks running threads after server_close()"
https://bugs.python.org/issue31233

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] socketserver ForkingMixin waiting for child processes

2017-08-16 Thread Victor Stinner
Hi,

The first bug was that test_socketserver "leaked" child processes: it
means that socketserver API creates zombie processes depending how
long the child processes take to complete.

If you want to backport my change waiting until child processes
complete, you need to fix the bug differently, so test_socketserver
doesn't leak processes (ex: wait until child processes complete in
test_socketserver).


2017-08-12 0:34 GMT+02:00 Ryan Smith-Roberts :
> I agree that blocking shutdown by default isn't a good idea. A child will
> eventually get indefinitely stuck on a nonresponsive connection and hang the
> whole server. This behavior change is surprising and should be reverted in
> master, and definitely not backported.

Right, a child process can hang. The question is how to handle such
case. I see 3 choices:

* Do nothing: Python 3.6 behaviour
* Blocking wait until the child process completes
* Wait a few seconds and then kill the process after a dead line

In Python 3.6, the process is stuck and continue to run even after
shutdown() and server_close(). That's surprising and doesn't seem
right to me.


> As for block-timeout or block-timeout-kill, waiting more than zero seconds
> in server_close() should be optional, because you're right that the best
> timeout is circumstantial.

Maybe we need to add a new method to wait N seconds until children
completes, then send SIGTERM, and finally send SIGKILL if children
take longer than N seconds to complete.

So the developer becomes responsible of killing child processes.

In the Apache world, it's called the "graceful" stop:
https://httpd.apache.org/docs/2.4/stopping.html


> Since ThreadingMixIn also leaks threads,
> server_close() could grow a timeout flag (following the socket module
> timeout convention) and maybe a terminate boolean.

Oh, I didn't know that ThreadingMixIn also leaks threads.

That's a similar but different issue. It's not easily possible to
"kill" a thread.


> Plus it can't be backported to the feature-freeze branches.

IMHO leaking zombie processes is a bug, but I'm ok to keep bugs and
only fix tests.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] socketserver ForkingMixin waiting for child processes

2017-08-12 Thread Martin Panter
> On Fri, Aug 11, 2017 at 6:46 AM Victor Stinner 
> wrote:
>> => http://bugs.python.org/issue31151
>>
>> I changed the code to call waitpid() in blocking mode on each child
>> process on server_close(), to ensure that all children completed when
>> on server close:
>>
>> https://github.com/python/cpython/commit/aa8ec34ad52bb3b274ce91169e1bc4a598655049
>>
>> After pushing my change, I'm not sure anymore if it's a good idea.
>> There is a risk that server_close() blocks if a child is stuck on a
>> socket recv() or send() for some reasons.

I agree that this could be an unwanted change in behaviour. For
example, a web browser could be holding a HTTP 1.1 persistent
connection open.

>> Should we relax the code by waiting a few seconds (problem: hardcoded
>> timeouts are always a bad idea), or *terminate* processes (SIGKILL on
>> UNIX) if they don't complete fast enough?

It does seem reasonable to have an option to clean up background
forks, whether by blocking, or terminating them immediately.

On 11 August 2017 at 22:34, Ryan Smith-Roberts  wrote:
> Since ThreadingMixIn also leaks threads,
> server_close() could grow a timeout flag (following the socket module
> timeout convention) and maybe a terminate boolean. ThreadingMixIn could then
> also be fixed.

You could do a blocking join on each background thread. But I suspect
there is no perfect way to terminate the threads without waiting.
Using ThreadingMixIn.daemon_threads and exiting the interpreter might
have to be good enough.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] socketserver ForkingMixin waiting for child processes

2017-08-11 Thread Ryan Smith-Roberts
I agree that blocking shutdown by default isn't a good idea. A child will
eventually get indefinitely stuck on a nonresponsive connection and hang
the whole server. This behavior change is surprising and should be reverted
in master, and definitely not backported.

As for block-timeout or block-timeout-kill, waiting more than zero seconds
in server_close() should be optional, because you're right that the best
timeout is circumstantial. Since ThreadingMixIn also leaks threads,
server_close() could grow a timeout flag (following the socket module
timeout convention) and maybe a terminate boolean. ThreadingMixIn could
then also be fixed. I'm not sure how useful that is though, since I'd bet
almost all users of socketserver exit the process shortly after
server_close(). Plus it can't be backported to the feature-freeze branches.

It seems like this is getting complicated enough that putting the fix in
test_socketserver.py is probably best. Another solution is to add a secret
terminating-close flag to ForkingMixIn just for the tests. Is that good
practice in the stdlib?

On Fri, Aug 11, 2017 at 6:46 AM Victor Stinner 
wrote:

> Hi,
>
> I'm working on reducing the failure rate of Python CIs (Travis CI,
> AppVeyor, buildbots). For that, I'm trying to reduce test side effects
> using "environment altered" warnings. This week, I worked on
> support.reap_children() which detects leaked child processes (usually
> created with os.fork()).
>
> I found a bug in the socketserver module: it waits for child processes
> completion, but only in non-blocking mode. If a child process takes
> too long, the server will never reads its exit status and so the
> server leaks "zombie processes". Leaking processes can increase the
> memory usage, spawning new processes can fail, etc.
>
> => http://bugs.python.org/issue31151
>
> I changed the code to call waitpid() in blocking mode on each child
> process on server_close(), to ensure that all children completed when
> on server close:
>
>
> https://github.com/python/cpython/commit/aa8ec34ad52bb3b274ce91169e1bc4a598655049
>
> After pushing my change, I'm not sure anymore if it's a good idea.
> There is a risk that server_close() blocks if a child is stuck on a
> socket recv() or send() for some reasons.
>
> Should we relax the code by waiting a few seconds (problem: hardcoded
> timeouts are always a bad idea), or *terminate* processes (SIGKILL on
> UNIX) if they don't complete fast enough?
>
> I don't know which applications use socketserver. How I can test if it
> breaks code in the wild?
>
> At least, I didn't notice any regression on Python CIs.
>
> Well, maybe the change is ok for the master branch. But I would like
> your opinion because now I would like to backport the fix to 2.7 and
> 3.6 branches. It might break some applications.
>
> If we *cannot* backport such change to 2.7 and 3.6 because it changes
> the behaviour, I will fix the bug in test_socketserver.py instead.
>
> What do you think?
>
> Victor
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/rmsr%40lab.net
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] socketserver ForkingMixin waiting for child processes

2017-08-11 Thread Matt Billenstein
Common pattern I've used is to wait a bit, then send a kill signal.

M

--
Matt Billenstein
m...@vazor.com

Sent from my iPhone 6 (this put here so you know I have one)

> On Aug 11, 2017, at 5:44 AM, Victor Stinner  wrote:
> 
> Hi,
> 
> I'm working on reducing the failure rate of Python CIs (Travis CI,
> AppVeyor, buildbots). For that, I'm trying to reduce test side effects
> using "environment altered" warnings. This week, I worked on
> support.reap_children() which detects leaked child processes (usually
> created with os.fork()).
> 
> I found a bug in the socketserver module: it waits for child processes
> completion, but only in non-blocking mode. If a child process takes
> too long, the server will never reads its exit status and so the
> server leaks "zombie processes". Leaking processes can increase the
> memory usage, spawning new processes can fail, etc.
> 
> => http://bugs.python.org/issue31151
> 
> I changed the code to call waitpid() in blocking mode on each child
> process on server_close(), to ensure that all children completed when
> on server close:
> 
> https://github.com/python/cpython/commit/aa8ec34ad52bb3b274ce91169e1bc4a598655049
> 
> After pushing my change, I'm not sure anymore if it's a good idea.
> There is a risk that server_close() blocks if a child is stuck on a
> socket recv() or send() for some reasons.
> 
> Should we relax the code by waiting a few seconds (problem: hardcoded
> timeouts are always a bad idea), or *terminate* processes (SIGKILL on
> UNIX) if they don't complete fast enough?
> 
> I don't know which applications use socketserver. How I can test if it
> breaks code in the wild?
> 
> At least, I didn't notice any regression on Python CIs.
> 
> Well, maybe the change is ok for the master branch. But I would like
> your opinion because now I would like to backport the fix to 2.7 and
> 3.6 branches. It might break some applications.
> 
> If we *cannot* backport such change to 2.7 and 3.6 because it changes
> the behaviour, I will fix the bug in test_socketserver.py instead.
> 
> What do you think?
> 
> Victor
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/matt%40vazor.com

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] socketserver ForkingMixin waiting for child processes

2017-08-11 Thread Victor Stinner
Hi,

I'm working on reducing the failure rate of Python CIs (Travis CI,
AppVeyor, buildbots). For that, I'm trying to reduce test side effects
using "environment altered" warnings. This week, I worked on
support.reap_children() which detects leaked child processes (usually
created with os.fork()).

I found a bug in the socketserver module: it waits for child processes
completion, but only in non-blocking mode. If a child process takes
too long, the server will never reads its exit status and so the
server leaks "zombie processes". Leaking processes can increase the
memory usage, spawning new processes can fail, etc.

=> http://bugs.python.org/issue31151

I changed the code to call waitpid() in blocking mode on each child
process on server_close(), to ensure that all children completed when
on server close:

https://github.com/python/cpython/commit/aa8ec34ad52bb3b274ce91169e1bc4a598655049

After pushing my change, I'm not sure anymore if it's a good idea.
There is a risk that server_close() blocks if a child is stuck on a
socket recv() or send() for some reasons.

Should we relax the code by waiting a few seconds (problem: hardcoded
timeouts are always a bad idea), or *terminate* processes (SIGKILL on
UNIX) if they don't complete fast enough?

I don't know which applications use socketserver. How I can test if it
breaks code in the wild?

At least, I didn't notice any regression on Python CIs.

Well, maybe the change is ok for the master branch. But I would like
your opinion because now I would like to backport the fix to 2.7 and
3.6 branches. It might break some applications.

If we *cannot* backport such change to 2.7 and 3.6 because it changes
the behaviour, I will fix the bug in test_socketserver.py instead.

What do you think?

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com