[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-11-05 Thread William Fisher


William Fisher  added the comment:

asyncio.MultiLoopChildWatcher has two problems that create a race condition.

1. The SIGCHLD signal handler does not guard against interruption/re-entry.
2. The SIGCHLD signal handler can interrupt add_child_handler's 
`self._do_waitpid(pid)`.


Symptoms:

Log messages that look like this:

1634935451.761 WARNING Unknown child process pid 8747, will report returncode 
255
...
1634935451.762 WARNING Child watcher got an unexpected pid: 8747
Traceback (most recent call last):
  File 
"/Users/runner/hostedtoolcache/Python/3.9.7/x64/lib/python3.9/asyncio/unix_events.py",
 line 1306, in _do_waitpid
loop, callback, args = self._callbacks.pop(pid)
KeyError: 8747


Background:

I've been working on a library to make calling asyncio subprocesses more 
convenient. As part of my testing, I've been stress testing asyncio using 
different child watcher policies. My CI build runs more than 200 tests each on 
macOS, Linux and FreeBSD. I've noticed a small percentage of sporadic failures 
using MultiLoopChildWatcher.

My understanding of Python signal functions is that:

1. Upon receipt of a signal, the native "C" signal handler sets a flag that 
indicates the signal arrived.
2. The main thread checks the signal flags after each bytecode instruction. If 
a signal flag is set, Python saves the call stack, runs the signal handler on 
the main thread immediately, then pops the stack when it returns (assuming no 
exception raised by signal handler).
3. If you are in the middle of a signal handler running on the main thread and 
Python detects another signal flag, your signal handler can be interrupted.
4. Stacked signal handlers run in LIFO order. The last one that enters will run 
to completion without interruption.


Explanation:

I wrapped MultiLoopChildWatcher's sig_chld function in a decorator that logs 
when it is entered and exited. This clearly shows that _sig_chld is being 
re-entered. In the following log snippet, I'm running two commands in a 
pipeline "tr | cat".

1634935451.743 DEBUG process '/usr/bin/tr' created: pid 8747
...
1634935451.746 DEBUG process '/bin/cat' created: pid 8748
...
1634935451.761 DEBUG enter '_sig_chld' 20
1634935451.761 DEBUG enter '_sig_chld' 20
1634935451.761 WARNING Unknown child process pid 8747, will report returncode 
255
1634935451.762 DEBUG process 8748 exited with returncode 0
1634935451.762 DEBUG exit '_sig_chld' 20
1634935451.762 WARNING Child watcher got an unexpected pid: 8747
Traceback (most recent call last):
  File 
"/Users/runner/hostedtoolcache/Python/3.9.7/x64/lib/python3.9/asyncio/unix_events.py",
 line 1306, in _do_waitpid
loop, callback, args = self._callbacks.pop(pid)
KeyError: 8747
1634935451.763 WARNING Unknown child process pid 8748, will report returncode 
255
1634935451.763 WARNING Child watcher got an unexpected pid: 8748
Traceback (most recent call last):
  File 
"/Users/runner/hostedtoolcache/Python/3.9.7/x64/lib/python3.9/asyncio/unix_events.py",
 line 1306, in _do_waitpid
loop, callback, args = self._callbacks.pop(pid)
KeyError: 8748
1634935451.763 DEBUG exit '_sig_chld' 20


Here is the breakdown of what happens:

1. Pid 8747 exits and we enter _sig_chld #1.
2. sig_chld #1 calls os.waitpid which gives (pid, status) = (8747, 0).
3. Before sig_chld #1 has a chance to call `self._callbacks.pop(pid)`, it is 
interrupted.
4. sig_chld #2 calls os.waitpid for pid 8747. We get a ChildProcessError and 
then "Unknown child process pid 8747, will report returncode 255"
5. sig_chld #2 invokes the callback for pid 8747 saying the returncode=255.
6. sig_chld #2 continues to completion. It reaps pid 8748 normally.
7. sig_chld #1 picks up where it left off. We get an error when we try to pop 
the callback for 8747.
8. sig_chld #1 calls os.waitpid for pid 8748. This gives us failure messages 
because it was done by sig_chld #2.

The issue of interruption can also happen in the case of running a single 
process. If the _sig_chld interrupts the call to `self._do_waitpid(pid)` in 
add_child_watcher, a similar interleaving can occur.


Work-Around:

In my tests, I patched MultiLoopChildWatcher and so far, it appears to be more 
reliable. In add_child_handler, I call raise_signal(SIGCHLD) so that all the 
work is done in the signal handler.


class PatchedMultiLoopChildWatcher(asyncio.MultiLoopChildWatcher):
"Test race condition fixes in MultiLoopChildWatcher."

def add_child_handler(self, pid, callback, *args):
loop = asyncio.get_running_loop()
self._callbacks[pid] = (loop, callback, args)

# Prevent a race condition in case signal was delivered before
# callback added.
signal.raise_signal(signal.SIGCHLD)

@_serialize
def _sig_chld(self, signum, frame):
super()._sig_chld(signum, frame)


_serialize is a decorator that looks like this:


def _serialize(func):
"""Decorator to serialize a non-reentrant signal function.
If one client is already in the critical 

[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-28 Thread STINNER Victor


STINNER Victor  added the comment:

MultiLoopChildWatcher still has a race condition. I see different options:

* Deprecate/Remove MultiLoopChildWatcher
* Find a magic fix for MultiLoopChildWatcher. It sounds complicated since it's 
*designed* to not be related to any event loop. See my failed attempt: PR 26536.
* Document the limitation and adapt the unit test to work around the issue.

If an event loop frequently gets events, the worst case is that a child process 
complete will be notified later, but at least the event is handled. If the 
event loop is idle, the completion is never notified.

A workaround is to schedule an asynchronous task which runs frequently, like 
once per minute. It doesn't even have to do anything. It can be a no-op task.

I'm not really excited by documenting the limitation and suggest a hack to work 
around the limitation. MultiLoopChildWatcher could belong to a PyPI module.

But for the base asyncio from the stdlib, I would prefer to not ship any known 
race conditions :-( So I'm more in favor of deprecate/remove.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-22 Thread Ken Jin


Change by Ken Jin :


--
pull_requests:  -25430

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-22 Thread Ken Jin


Change by Ken Jin :


--
nosy: +kj
nosy_count: 10.0 -> 11.0
pull_requests: +25430
pull_request: https://github.com/python/cpython/pull/26848

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-14 Thread Jack DeVries


Change by Jack DeVries :


--
pull_requests:  -25270

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-14 Thread Jack DeVries


Change by Jack DeVries :


--
nosy:  -jack__d

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-12 Thread STINNER Victor


STINNER Victor  added the comment:

> I think this is a good idea. MultiLoopChildWatcher could use setwakeupfd with 
> some no-op callback just to wakeup the loop.

A no-op doesn't solve the issue. It doesn't wake up the event loop. There are 
only two options to wake up the event loop:

* Raise an exception in the Python signal handler
* Generate any event which stops the event loop polling. In practice, it means 
to write into the event loop self pipe.

This issue is similar to call_soon() which doesn't wake up the event loop if 
called from a different thread: you must call call_soon_threadsafe() which... 
writes into the self pipe. The self pipe again!

--

MultiLoopChildWatcher cannot use setwakeupfd if it's also used by an event loop 
to register a signal handler (like CTRL+C). Python currently supports a single 
signal wakeup FD.

The event loop *must* set the wakeup FD to its own self pipe. Otherwise, the 
asynchronous Python signal handler register by add_signal_handler() is not 
called. Extract of the Unix add_signal_handler():

# Register a dummy signal handler to ask Python to write the signal
# number in the wakeup file descriptor. _process_self_data() will
# read signal numbers from this file descriptor to handle signals.
signal.signal(sig, _sighandler_noop)

Currently, MultiLoopChildWatcher is not related to any event loop, it cannot 
access such "self pipe".

If MultiLoopChildWatcher is modified to be attached to an event loop... does it 
contract the whole purpose of MultiLoopChildWatcher?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-12 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:


New changeset af5fb6706219d7949c1db5c9f2b7da53198123f3 by Miss Islington (bot) 
in branch '3.8':
bpo-38323: Skip SubprocessMultiLoopWatcherTest as they can hang the test suite 
(GH-26542) (GH-26670)
https://github.com/python/cpython/commit/af5fb6706219d7949c1db5c9f2b7da53198123f3


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-11 Thread Jack DeVries


Change by Jack DeVries :


--
nosy: +jack__d
nosy_count: 10.0 -> 11.0
pull_requests: +25270
pull_request: https://github.com/python/cpython/pull/26643

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-11 Thread miss-islington


Change by miss-islington :


--
pull_requests: +25257
pull_request: https://github.com/python/cpython/pull/26670

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-10 Thread Pablo Galindo Salgado


Change by Pablo Galindo Salgado :


--
pull_requests: +25247
pull_request: https://github.com/python/cpython/pull/26632

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-08 Thread Yury Selivanov


Yury Selivanov  added the comment:

>  MultiLoopChildWatcher must ensures that the event loop is awaken when it 
> receives a signal by using signal.setwakeup(). This is done by 
> _UnixSelectorEventLoop.add_signal_handler(). Maybe MultiLoopChildWatcher 
> could reuse this function, rather than calling directly signal.signal().

I think this is a good idea. MultiLoopChildWatcher could use setwakeupfd with 
some no-op callback just to wakeup the loop.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-07 Thread Shreyan Avigyan


Change by Shreyan Avigyan :


--
pull_requests: +25162
pull_request: https://github.com/python/cpython/pull/26574

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-05 Thread Shreyan Avigyan


Change by Shreyan Avigyan :


--
nosy: +shreyanavigyan

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-04 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:


New changeset 0d441d2e70e365b5dc517d8ff24a20b97bc4536a by Miss Islington (bot) 
in branch '3.10':
bpo-38323: Skip SubprocessMultiLoopWatcherTest as they can hang the test suite 
(GH-26542) (GH-26544)
https://github.com/python/cpython/commit/0d441d2e70e365b5dc517d8ff24a20b97bc4536a


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-04 Thread miss-islington


miss-islington  added the comment:


New changeset b3c50b29e14d8f089aee3e0980298bfc74cb3ba7 by Miss Islington (bot) 
in branch '3.9':
bpo-38323: Skip SubprocessMultiLoopWatcherTest as they can hang the test suite 
(GH-26542)
https://github.com/python/cpython/commit/b3c50b29e14d8f089aee3e0980298bfc74cb3ba7


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-04 Thread miss-islington


Change by miss-islington :


--
pull_requests: +25134
pull_request: https://github.com/python/cpython/pull/26544

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-04 Thread miss-islington


Change by miss-islington :


--
pull_requests: +25133
pull_request: https://github.com/python/cpython/pull/26543

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-04 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:


New changeset f171877ebe276749f31386baed5841ce37cbee2e by Pablo Galindo in 
branch 'main':
bpo-38323: Skip SubprocessMultiLoopWatcherTest as they can hang the test suite 
(GH-26542)
https://github.com/python/cpython/commit/f171877ebe276749f31386baed5841ce37cbee2e


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-04 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

I'm landing https://github.com/python/cpython/pull/26542 as a temporary 
solution meanwhile we decide what to do with PR 26536.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-04 Thread Pablo Galindo Salgado


Change by Pablo Galindo Salgado :


--
pull_requests: +25132
pull_request: https://github.com/python/cpython/pull/26542

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-04 Thread Pablo Galindo Salgado


Change by Pablo Galindo Salgado :


--
nosy: +pablogsal
nosy_count: 8.0 -> 9.0
pull_requests: +25131
pull_request: https://github.com/python/cpython/pull/26541

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-04 Thread STINNER Victor


Change by STINNER Victor :


--
pull_requests: +25127
pull_request: https://github.com/python/cpython/pull/26536

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-04 Thread Chris Jerdonek


Chris Jerdonek  added the comment:

> MultiLoopChildWatcher must ensures that the event loop is awaken when it 
> receives a signal by using signal.setwakeup(). This is done by 
> _UnixSelectorEventLoop.add_signal_handler(). Maybe MultiLoopChildWatcher 
> could reuse this function, 

This is the conclusion I came to, too. But changing MultiLoopChildWatcher to 
use loop.add_signal_handler() would contradict the class's documented purpose 
and make it the same as SafeChildWatcher. This is why I think a maintainer 
needs to weigh in. The class's purpose / design might fundamentally not be 
workable. If it can't be made to work, maybe it should be removed or be 
documented as susceptible to hangs.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-04 Thread STINNER Victor


STINNER Victor  added the comment:

When I reproduce test_cancel_make_subprocess_transport_exec() hang, the problem 
is that the C signal handler is called with SIGCHLD when the child process 
completes, but the Python signal handler is not called.

Python is "blocked" in a selector (maybe select.select(), it doesn't matter). I 
guess that the selector is interrupted by a signal (even if asyncio calls 
signal.setinterrupt(SIGCHLD, False)), but since the signal handler doesn't 
raise an exception, the syscall is restarted: see the PEP 475.

I understood that the asyncio event loop only gets the opportunity to call the 
Python signal handler if there is a pending asyncio event (call_soon, 
call_timer, event on a tracked FD, whatever). If the signal arrives when the 
event loop is idle, the Python signal handler will never be called since the 
selector is called with timeout=0 (blocking mode).

MultiLoopChildWatcher must ensures that the event loop is awaken when it 
receives a signal by using signal.setwakeup(). This is done by 
_UnixSelectorEventLoop.add_signal_handler(). Maybe MultiLoopChildWatcher could 
reuse this function, rather than calling directly signal.signal().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-04 Thread Chris Jerdonek


Chris Jerdonek  added the comment:

> This issue is not solved.

Yes, nothing was changed. After diagnosing this issue and trying some things 
out in a draft PR, my conclusion is that an asyncio maintainer really needs to 
weigh in on what to do (especially Andrew who authored the class). The reason 
is that the hang is closely tied to MultiLoopChildWatcher's documented purpose. 
The only way I could see to fix MultiLoopChildWatcher would change its 
documented behavior and make it the same as SafeChildWatcher, which would 
defeat the purpose of having a separate class: 
https://github.com/python/cpython/pull/20142#issuecomment-712417912
Maybe there is a way to sidestep the hangs in the tests without fixing 
MultiLoopChildWatcher, but I didn't look into that.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-06-04 Thread STINNER Victor


STINNER Victor  added the comment:

This issue is not solved. I can still reproduce the hang using:

./python -m test test_asyncio -m SubprocessMultiLoopWatcherTests -v -F -j20 
--timeout=30.0

Example with test_cancel_make_subprocess_transport_exec:

$ ./python -m test test_asyncio -m 
test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests.test_cancel_make_subprocess_transport_exec
 -v -F -j20 --timeout=30.0
...
0:00:37 load avg: 10.97 [163] test_asyncio passed -- running: test_asyncio 
(30.9 sec)
test_cancel_make_subprocess_transport_exec 
(test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests) ... ok

--

Ran 1 test in 0.032s

OK
0:00:37 load avg: 10.97 [164] test_asyncio passed -- running: test_asyncio 
(31.0 sec)
test_cancel_make_subprocess_transport_exec 
(test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests) ... ok

--

Ran 1 test in 0.036s

OK
0:00:37 load avg: 10.97 [165/1] test_asyncio crashed (Exit code 1)
test_cancel_make_subprocess_transport_exec 
(test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests) ...
Timeout (0:00:30)!
Thread 0x7fb583cbd740 (most recent call first):
  File "/home/vstinner/python/main/Lib/selectors.py", line 469 in select
  File "/home/vstinner/python/main/Lib/asyncio/base_events.py", line 1845 in 
_run_once
  File "/home/vstinner/python/main/Lib/asyncio/base_events.py", line 595 in 
run_forever
  File "/home/vstinner/python/main/Lib/asyncio/base_events.py", line 628 in 
run_until_complete
  File "/home/vstinner/python/main/Lib/test/test_asyncio/test_subprocess.py", 
line 442 in test_cancel_make_subprocess_transport_exec
...


Test complete in less than 1 second, but sometimes it hangs for at least 30 
seconds.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-12-16 Thread miss-islington


miss-islington  added the comment:


New changeset dd262ef46d2e2fff4c413cfa5faa9e72cd36220e by Miss Islington (bot) 
in branch '3.8':
bpo-38323: Add guard clauses in MultiLoopChildWatcher. (GH-22756)
https://github.com/python/cpython/commit/dd262ef46d2e2fff4c413cfa5faa9e72cd36220e


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-12-16 Thread miss-islington


miss-islington  added the comment:


New changeset bf0eed3e60acc7e4969a4fae940bc615fa924c30 by Miss Islington (bot) 
in branch '3.9':
bpo-38323: Add guard clauses in MultiLoopChildWatcher. (GH-22756)
https://github.com/python/cpython/commit/bf0eed3e60acc7e4969a4fae940bc615fa924c30


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-12-16 Thread miss-islington


Change by miss-islington :


--
nosy: +miss-islington
nosy_count: 7.0 -> 8.0
pull_requests: +22666
pull_request: https://github.com/python/cpython/pull/23806

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-12-16 Thread Andrew Svetlov


Andrew Svetlov  added the comment:


New changeset 66d3b589c44fcbcf9afe1e442d9beac3bd8bcd34 by Chris Jerdonek in 
branch 'master':
bpo-38323: Add guard clauses in MultiLoopChildWatcher. (#22756)
https://github.com/python/cpython/commit/66d3b589c44fcbcf9afe1e442d9beac3bd8bcd34


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-12-16 Thread miss-islington


Change by miss-islington :


--
pull_requests: +22667
pull_request: https://github.com/python/cpython/pull/23807

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-11-05 Thread Jakub Stasiak


Jakub Stasiak  added the comment:

FYI your PR 20142 together with my PR 
https://github.com/python/cpython/pull/23154 allow me to run the whole 
test_asyncio test suite on OpenIndiana 5.11:

$ ./python -m unittest -v test.test_asyncio
(...)
Ran 2298 tests in 71.668s

OK (skipped=52)


without your PR at least one of the tests is hanging forever. So, this PR 
improves Solaris/SunOS/illumos/OpenIndiana side of things as well.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-11-05 Thread Jakub Stasiak


Change by Jakub Stasiak :


--
nosy: +jstasiak

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-10-18 Thread Chris Jerdonek


Change by Chris Jerdonek :


--
pull_requests: +21719
pull_request: https://github.com/python/cpython/pull/22756

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-16 Thread Chris Jerdonek


Change by Chris Jerdonek :


--
type:  -> behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-16 Thread Chris Jerdonek


Chris Jerdonek  added the comment:

I posted a draft PR for this issue: https://github.com/python/cpython/pull/20142

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-16 Thread Chris Jerdonek


Change by Chris Jerdonek :


--
pull_requests: +19447
pull_request: https://github.com/python/cpython/pull/20142

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-12 Thread Chris Jerdonek


Chris Jerdonek  added the comment:

I think I have a possible explanation for this now.

In my reproducer script and in the original test, the wakeup file descriptor 
isn't set. I think this explains why the loop isn't waking up to call SIGCHLD's 
handler when the signal comes in. The reason the wakeup file descriptor isn't 
set in the original test is that MultiLoopChildWatcher registers the SIGCHLD 
handler using signal.signal():
https://github.com/python/cpython/blob/74ea6b5a7501fb393cd567fb21998d0bfeeb267c/Lib/asyncio/unix_events.py#L1261-L1267
rather than using loop.add_signal_handler(), which calls signal.set_wakeup_fd():
https://github.com/python/cpython/blob/74ea6b5a7501fb393cd567fb21998d0bfeeb267c/Lib/asyncio/unix_events.py#L95

Is there a reason MultiLoopChildWatcher.attach_loop() isn't calling 
loop.add_signal_handler()? Since attach_loop() has to be called from the main 
thread anyways, it seems like it could be okay.

I also wonder if the documentation could perhaps be more specific as to the 
difference between loop.add_signal_handler() and signal.signal(). Currently, it 
just says callbacks registered with add_signal_handler() are "allowed to 
interact with the event loop":
https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.add_signal_handler
But it doesn't give any warnings about using signal.signal(). For example, it 
might be worth saying something about the possibility of hangs if 
add_signal_handler() isn't used.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-11 Thread Chris Jerdonek


Chris Jerdonek  added the comment:

I was able to simplify the script a lot more and continue to reproduce the 
hang. It's attached as test-kill3.py (80 lines). It doesn't use 
subprocess_exec() or a watcher anymore -- just subprocess.Popen() followed by 
popen.kill(), and then awaiting on a future.

The right amount of time has to elapse between the popen.kill() and the await, 
so I introduced a random bit of variability in between. The right range / 
amount of time to put in between probably depends on the machine. (What you 
want is a narrow range right on the borderline, where sometimes the signal 
fires right before the await, and sometimes right after.) I also added a 
printf() statement at the beginning of signalmodule.c's trip_signal(), so I can 
see in the console whether the signal is firing before or after the await. In 
the timeout / hang case, the signal will be firing after. The hang is very 
infrequent with the script, though (less frequent than the original unittest). 
It can take multiple 1-minute runs.

It seems similar issues have happened a number of times in the past around the 
signal-handling code. See e.g. https://bugs.python.org/issue30038 and changes 
to the neighboring code since then.

--
Added file: https://bugs.python.org/file49150/test-kill3.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-09 Thread Chris Jerdonek


Chris Jerdonek  added the comment:

> 4. use call_later() to terminate the future after 5 seconds

You should read that as "3.5" (I inserted it later).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-09 Thread Chris Jerdonek


Chris Jerdonek  added the comment:

I came up with the script by (1) running the test locally and seeing the same 
hang, (2) moving the test function to its own script separate from the unit 
tests and seeing the same hang, and (3) successively stripping away code while 
continuing to check for the same hang. So it should be equivalent.

As for why it's related to signals, it's because of what the script does (it's 
not waiting on a subprocess). All it does is start an event loop and then do 
the following repeatedly:

1. starts a subprocess that sleeps indefinitely
2. create an empty future
3. set a SIGCHLD handler that calls set_result() on the future
4. use call_later() to terminate the future after 5 seconds
4. kill the subprocess
5. await on the future

Almost all of the time, (5) completes immediately (because the handler is 
called immediately). But sometimes, (5) takes 5 seconds (which means the 
timeout fired). And in the cases it takes 5 seconds, I'm able to observe both 
that (a) Python received the SIGCHLD right away, and (b) the signal handler 
only gets called when the loop is woken up by the call_later(). So during the 
await in (5), it seems like Python is holding onto the signal for 5 seconds 
without calling its signal handler.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-09 Thread Nathaniel Smith


Nathaniel Smith  added the comment:

How do you know that your reproducer is showing the same bug, or anything
related to signals? IIUC subprocess waiting by default doesn't involve a
signal handler.

On Sat, May 9, 2020, 14:40 Chris Jerdonek  wrote:

>
> Chris Jerdonek  added the comment:
>
> > this seems like an awful lot of energy to spend on some code
> that's not even used by default.
>
> True, but it seems likely to me that this signals :) a deeper, more
> general issue about CPython / signals (which is why I spent time on it).
> For example, my reproducer script doesn't use MultiLoopWatcher. I'd like to
> see if it can be reproduced with signals other than SIGCHLD, too.
>
> --
>
> ___
> Python tracker 
> 
> ___
>

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-09 Thread Chris Jerdonek


Chris Jerdonek  added the comment:

> this seems like an awful lot of energy to spend on some code
that's not even used by default.

True, but it seems likely to me that this signals :) a deeper, more general 
issue about CPython / signals (which is why I spent time on it). For example, 
my reproducer script doesn't use MultiLoopWatcher. I'd like to see if it can be 
reproduced with signals other than SIGCHLD, too.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-09 Thread Kyle Stanley


Kyle Stanley  added the comment:

> Would it make sense to deprecate it and stick with ThreadedChildWatcher?

I had proposed deprecating it in another bpo issue, but it was brought up that 
MutliLoopWatcher had only been added recently in 3.8. So, it might be a bit too 
soon to deprecate it already.

Although I suppose that if there are several complex and difficult to resolve 
issues with the implementation, it may very well be less damaging to deprecate 
it now rather than after it's gained a decent a decent amount of usage.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-09 Thread Nathaniel Smith


Nathaniel Smith  added the comment:

I don't have any particular insight into the bug, but I do have an
observation: this seems like an awful lot of energy to spend on some code
that's not even used by default. Would it make sense to deprecate it and
stick with ThreadedChildWatcher?

On Sat, May 9, 2020, 05:39 Chris Jerdonek  wrote:

>
> Chris Jerdonek  added the comment:
>
> I'm adding Nathaniel in case he can recognize this as one of the signal
> handler cases he's already come across.
>
> --
> nosy: +njs
>
> ___
> Python tracker 
> 
> ___
>

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-09 Thread Chris Jerdonek


Chris Jerdonek  added the comment:

I'm adding Nathaniel in case he can recognize this as one of the signal handler 
cases he's already come across.

--
nosy: +njs

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-09 Thread Chris Jerdonek


Chris Jerdonek  added the comment:

I'm attaching a slightly simpler version of the script.

--
Added file: https://bugs.python.org/file49145/test-kill2.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-09 Thread Chris Jerdonek


Chris Jerdonek  added the comment:

I'm attaching a stand-alone script that can reproduce the issue. It doesn't use 
unittest or even MultiLoopChildWatcher.

It starts an event loop and then repeatedly calls loop.subprocess_exec() with 
0.2 seconds in between until the "hang" happens (which shows up as a timeout). 
I recommend running the script for about 15 seconds, and if it doesn't happen, 
re-run it again. You might need to run it a half-dozen or dozen times to see 
the hang, but it can also happen right away.

I'm sure the script can be cleaned up and simplified a lot more. This is just a 
start. I wanted to see how much of the cruft I could strip out quickly.

This is what the output looks like after one of the hangs:

[81]: 16.77
/.../cpython/Lib/subprocess.py:1048: ResourceWarning: subprocess 3282 is still 
running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
killing pid: 3283
BaseSubprocessTransport: awaiting in _wait
_sig_child: started
releasing waiter: okay
okay
[82]: 16.99
/.../cpython/Lib/subprocess.py:1048: ResourceWarning: subprocess 3283 is still 
running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
killing pid: 3284
BaseSubprocessTransport: awaiting in _wait
_sig_child: started
releasing waiter: **TIMEOUT**
not okay: **TIMEOUT**

You can ignore the ResourceWarnings. You can also see at the end that the 
_sig_child() handler was called even in the timeout case (right before the 
loop.call_later(TIMEOUT, ...) callback began).

--
Added file: https://bugs.python.org/file49144/test-kill.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-08 Thread Chris Jerdonek


Chris Jerdonek  added the comment:

I looked into this a little after reproducing it locally.

What I found is that MultiLoopChildWatcher._sig_chld() *is* called. It's just 
that it's only called immediately after timeout=5 has elapsed. (The timeout=5 
was added by Andrew here: 
https://github.com/python/cpython/blob/7f7e706d78ab968a1221c6179dfdba714860bd12/Lib/test/test_asyncio/test_subprocess.py#L480
 )

Consider this line in asyncio.tasks.wait_for(), which is to trigger the timeout:
https://github.com/python/cpython/blob/7f7e706d78ab968a1221c6179dfdba714860bd12/Lib/asyncio/tasks.py#L476

  timeout_handle = loop.call_later(timeout, _release_waiter, waiter)

I put some debug statements inside _release_waiter, and I found that 
_sig_chld() is called after the timeout has elapsed and before _release_waiter 
starts. So basically, it looks like CPython is holding onto the signal, and 
waiting for the loop to do something more before running the handler and 
calling the _sig_chld().

The code base already has logic to skip running signal handlers in various 
cases, but I don't know whether it relates to this issue:
https://github.com/python/cpython/blob/7f7e706d78ab968a1221c6179dfdba714860bd12/Python/ceval.c#L1410-L1425

It seems like there are a number of issues on the tracker related to signals 
(some solved and others not, e.g. https://bugs.python.org/issue21895 ). So it 
looks to me like this could point to a deeper issue between asyncio and 
CPython's signal handling.

--
nosy: +chris.jerdonek

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-08 Thread Kyle Stanley


Change by Kyle Stanley :


--
pull_requests: +19324
pull_request: https://github.com/python/cpython/pull/20013

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-08 Thread Kyle Stanley


Kyle Stanley  added the comment:

> What other MultiLoopWatcher tests are currently having random hangs?

Oh, never mind. I see `test_stdin_stdout` is also hanging in 
https://bugs.python.org/issue38182. I would like to take a closer look at that 
one before skipping it as well.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-08 Thread Kyle Stanley


Kyle Stanley  added the comment:

> There are more MultiLoopWatcher tests which hang randomly, it's not only 
> test_close_kill_running().

What other MultiLoopWatcher tests are currently having random hangs? From 
searching "MultiLoopWatcher" on bpo, I'm only finding this issue. For now, I'll 
just work on a PR to temporarily skip `test_close_kill_running()` until we can 
find a fix.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-06 Thread STINNER Victor


STINNER Victor  added the comment:

> we may want to consider skipping `test_close_kill_running` for 
> `MultiLoopWatcher` until we can find one

There are more MultiLoopWatcher tests which hang randomly, it's not only 
test_close_kill_running().

I'm fine with skipping tests until someone can come up with a fix.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-05 Thread Kyle Stanley


Kyle Stanley  added the comment:

I have also explored a few different solutions and was unable to find a fix for 
this issue. If it's still causing intermittent hangs after Andrew's patch, we 
may want to consider skipping `test_close_kill_running` for `MultiLoopWatcher` 
until we can find one. This clearly seems to be a complex issue without an easy 
solution, and I don't think we want it to potentially mask other issues in the 
meantime.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-05-04 Thread STINNER Victor


STINNER Victor  added the comment:

Issue open since 2019-09-30 and tests still hang randomly. What's the progress 
on this issue?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-02-11 Thread Andrew Svetlov


Andrew Svetlov  added the comment:

Victor, sorry.
I spent about 4 days in total trying to figure out what is the reason for 
hanging.
I see that the signal handler is installed; but, in a rare situation, it is not 
called when the process is killed.

If I reduce an amount of code executed in a signal handler by spawning a 
auxiliary thread, setting threading. Event in sighandler, and doing all dirty 
work in this secondary thread -- the problem goes away.
I'm not happy with this "patch" because it looks like reducing a change of the 
race without the elimination of the root cause.

I've mastered a quick hack to skip the problematic test on a hang, it may 
unblock other developers.
I want to spend more time on the issue detection. Unfortunately, it is not easy 
because of non-determenistic problem source.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-02-11 Thread Andrew Svetlov


Change by Andrew Svetlov :


--
keywords: +patch
pull_requests: +17831
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/18457

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-02-10 Thread STINNER Victor


STINNER Victor  added the comment:

This issue is still failing frequently on buildbots. What can be done to stop 
getting test failures on buildbots?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-02-07 Thread STINNER Victor


STINNER Victor  added the comment:

I marked bpo-38182 as a duplicate of this issue: "test_asyncio: 
SubprocessMultiLoopWatcherTests.test_stdin_stdout() failed on AMD64 FreeBSD 
10-STABLE Non-Debug 3.x".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-01-29 Thread STINNER Victor


STINNER Victor  added the comment:

> FYI, I'm able to reproduce the hang by running "python -m test -F 
> test_asyncio -m test_close_kill_running". Working on the fix.

Any update on this issue? It's still failing randomly on buildbots. Example:
https://buildbot.python.org/all/#/builders/158/builds/64

Another option is to skip the test_close_kill_running() until it's fixed which 
usually means forget about it until an user gets the bug for real in production.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-01-13 Thread Andrew Svetlov


Andrew Svetlov  added the comment:

FYI, I'm able to reproduce the hang by running "python -m test -F test_asyncio 
-m test_close_kill_running".

Working on the fix.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-01-13 Thread Andrew Svetlov


Andrew Svetlov  added the comment:

I'd like to spend time by reproducing the issue locally.
Please give me a few days.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2020-01-13 Thread STINNER Victor


STINNER Victor  added the comment:

The test still hangs randomly. Can it be disabled or fixed?

s390x Debian 3.x:
https://buildbot.python.org/all/#builders/105/builds/167

test_close_dont_kill_finished 
(test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests) ... ok
Timeout (0:15:00)!
Thread 0x03ffaa9f8700 (most recent call first):
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/selectors.py", 
line 468 in select
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/asyncio/base_events.py",
 line 1852 in _run_once
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/asyncio/base_events.py",
 line 596 in run_forever
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/asyncio/base_events.py",
 line 629 in run_until_complete
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/test/test_asyncio/test_subprocess.py",
 line 485 in test_close_kill_running
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/unittest/case.py", 
line 616 in _callTestMethod
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/unittest/case.py", 
line 659 in run
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/unittest/case.py", 
line 719 in __call__
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/unittest/suite.py",
 line 122 in run
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/unittest/suite.py",
 line 84 in __call__
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/unittest/suite.py",
 line 122 in run
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/unittest/suite.py",
 line 84 in __call__
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/unittest/suite.py",
 line 122 in run
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/unittest/suite.py",
 line 84 in __call__
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/unittest/suite.py",
 line 122 in run
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/unittest/suite.py",
 line 84 in __call__
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/unittest/suite.py",
 line 122 in run
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/unittest/suite.py",
 line 84 in __call__
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/unittest/runner.py",
 line 176 in run
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/test/support/__init__.py",
 line 2079 in _run_suite
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/test/support/__init__.py",
 line 2201 in run_unittest
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/test/libregrtest/runtest.py",
 line 209 in _test_module
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/test/libregrtest/runtest.py",
 line 234 in _runtest_inner2
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/test/libregrtest/runtest.py",
 line 270 in _runtest_inner
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/test/libregrtest/runtest.py",
 line 153 in _runtest
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/test/libregrtest/runtest.py",
 line 193 in runtest
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/test/libregrtest/main.py",
 line 318 in rerun_failed_tests
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/test/libregrtest/main.py",
 line 691 in _main
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/test/libregrtest/main.py",
 line 634 in main
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/test/libregrtest/main.py",
 line 712 in main
  File 
"/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/test/__main__.py", 
line 2 in 
  File "/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/runpy.py", 
line 86 in _run_code
  File "/home/dje/cpython-buildarea/3.x.edelsohn-debian-z/build/Lib/runpy.py", 
line 193 in _run_module_as_main
test_close_kill_running 
(test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests) ... 
Makefile:1164: recipe for target 'buildbottest' failed
make: *** [buildbottest] Error 1

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2019-10-25 Thread STINNER Victor


STINNER Victor  added the comment:

I did my tests on Fedora 30 on my laptop which has 4 cores (8 logical CPUs).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2019-10-25 Thread Kyle Stanley


Kyle Stanley  added the comment:

> I'm able to reproduce the issue locally using the command:
./python -m test test_asyncio 
--match=test.test_asyncio.test_subprocess.SubprocessMultiLoopWatcherTests.test_close_kill_running
 -v -F -j20 --timeout=30.0

I was unable to reproduce this locally on OS: Arch Linux 5.3.7 x86_64 and CPU: 
Intel i5-4460 from the latest commit (heads/master:96b06aefe2) within a 
reasonable number of tests (~1). I also tried differing number of jobs, up 
to 100. What OS were you able to reproduce it locally on using that 
configuration?

This is of course still an issue since it's still occurring on multiple 
buildbots intermittently and Victor was able to reproduce it locally, but I was 
not able to do so on my local setup. This will be significantly harder to debug 
without a reliable means of reproducing the failure.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2019-10-24 Thread Kyle Stanley


Change by Kyle Stanley :


--
nosy: +aeros

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2019-10-22 Thread STINNER Victor


STINNER Victor  added the comment:

It seems like MultiLoopChildWatcher doesn't respect the PEP 475: 
siginterrupt(False) must not be used. But the following change isn't enough to 
fix this issue (test_close_kill_running hang).

diff --git a/Lib/asyncio/unix_events.py b/Lib/asyncio/unix_events.py
index d8f653045a..887f837bad 100644
--- a/Lib/asyncio/unix_events.py
+++ b/Lib/asyncio/unix_events.py
@@ -1192,9 +1192,6 @@ class MultiLoopChildWatcher(AbstractChildWatcher):
"restore to default handler on watcher close.")
 self._saved_sighandler = signal.SIG_DFL
 
-# Set SA_RESTART to limit EINTR occurrences.
-signal.siginterrupt(signal.SIGCHLD, False)
-
 def _do_waitpid_all(self):
 for pid in list(self._callbacks):
 self._do_waitpid(pid)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2019-10-22 Thread STINNER Victor


Change by STINNER Victor :


--
title: test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 
3.x -> asyncio: MultiLoopWatcher has a race condition (test_asyncio: 
test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com