[issue45723] Improve and simplify configure.ac checks
William Fisher added the comment: In the conversion to PY_CHECK_FUNC, there's a mistake in HAVE_EPOLL. Python 3.10.1 defines HAVE_EPOLL by checking for the `epoll_create` function. Python 3.11.0a3 checks for the `epoll` function instead. There is no epoll() function so this always fails. The effect is that `epoll` doesn't exist in the `select` module on Python 3.11.0a3. Most code that uses epoll falls back when it is not available, so this may not be failing any tests. -- nosy: +byllyfish ___ Python tracker <https://bugs.python.org/issue45723> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)
William Fisher added the comment: asyncio.MultiLoopChildWatcher has two problems that create a race condition. 1. The SIGCHLD signal handler does not guard against interruption/re-entry. 2. The SIGCHLD signal handler can interrupt add_child_handler's `self._do_waitpid(pid)`. Symptoms: Log messages that look like this: 1634935451.761 WARNING Unknown child process pid 8747, will report returncode 255 ... 1634935451.762 WARNING Child watcher got an unexpected pid: 8747 Traceback (most recent call last): File "/Users/runner/hostedtoolcache/Python/3.9.7/x64/lib/python3.9/asyncio/unix_events.py", line 1306, in _do_waitpid loop, callback, args = self._callbacks.pop(pid) KeyError: 8747 Background: I've been working on a library to make calling asyncio subprocesses more convenient. As part of my testing, I've been stress testing asyncio using different child watcher policies. My CI build runs more than 200 tests each on macOS, Linux and FreeBSD. I've noticed a small percentage of sporadic failures using MultiLoopChildWatcher. My understanding of Python signal functions is that: 1. Upon receipt of a signal, the native "C" signal handler sets a flag that indicates the signal arrived. 2. The main thread checks the signal flags after each bytecode instruction. If a signal flag is set, Python saves the call stack, runs the signal handler on the main thread immediately, then pops the stack when it returns (assuming no exception raised by signal handler). 3. If you are in the middle of a signal handler running on the main thread and Python detects another signal flag, your signal handler can be interrupted. 4. Stacked signal handlers run in LIFO order. The last one that enters will run to completion without interruption. Explanation: I wrapped MultiLoopChildWatcher's sig_chld function in a decorator that logs when it is entered and exited. This clearly shows that _sig_chld is being re-entered. In the following log snippet, I'm running two commands in a pipeline "tr | cat". 1634935451.743 DEBUG process '/usr/bin/tr' created: pid 8747 ... 1634935451.746 DEBUG process '/bin/cat' created: pid 8748 ... 1634935451.761 DEBUG enter '_sig_chld' 20 1634935451.761 DEBUG enter '_sig_chld' 20 1634935451.761 WARNING Unknown child process pid 8747, will report returncode 255 1634935451.762 DEBUG process 8748 exited with returncode 0 1634935451.762 DEBUG exit '_sig_chld' 20 1634935451.762 WARNING Child watcher got an unexpected pid: 8747 Traceback (most recent call last): File "/Users/runner/hostedtoolcache/Python/3.9.7/x64/lib/python3.9/asyncio/unix_events.py", line 1306, in _do_waitpid loop, callback, args = self._callbacks.pop(pid) KeyError: 8747 1634935451.763 WARNING Unknown child process pid 8748, will report returncode 255 1634935451.763 WARNING Child watcher got an unexpected pid: 8748 Traceback (most recent call last): File "/Users/runner/hostedtoolcache/Python/3.9.7/x64/lib/python3.9/asyncio/unix_events.py", line 1306, in _do_waitpid loop, callback, args = self._callbacks.pop(pid) KeyError: 8748 1634935451.763 DEBUG exit '_sig_chld' 20 Here is the breakdown of what happens: 1. Pid 8747 exits and we enter _sig_chld #1. 2. sig_chld #1 calls os.waitpid which gives (pid, status) = (8747, 0). 3. Before sig_chld #1 has a chance to call `self._callbacks.pop(pid)`, it is interrupted. 4. sig_chld #2 calls os.waitpid for pid 8747. We get a ChildProcessError and then "Unknown child process pid 8747, will report returncode 255" 5. sig_chld #2 invokes the callback for pid 8747 saying the returncode=255. 6. sig_chld #2 continues to completion. It reaps pid 8748 normally. 7. sig_chld #1 picks up where it left off. We get an error when we try to pop the callback for 8747. 8. sig_chld #1 calls os.waitpid for pid 8748. This gives us failure messages because it was done by sig_chld #2. The issue of interruption can also happen in the case of running a single process. If the _sig_chld interrupts the call to `self._do_waitpid(pid)` in add_child_watcher, a similar interleaving can occur. Work-Around: In my tests, I patched MultiLoopChildWatcher and so far, it appears to be more reliable. In add_child_handler, I call raise_signal(SIGCHLD) so that all the work is done in the signal handler. class PatchedMultiLoopChildWatcher(asyncio.MultiLoopChildWatcher): "Test race condition fixes in MultiLoopChildWatcher." def add_child_handler(self, pid, callback, *args): loop = asyncio.get_running_loop() self._callbacks[pid] = (loop, callback, args) # Prevent a race condition in case signal was delivered before # callback added. signal.raise_signal(signal.SIGCHLD) @_serialize def _sig_chld(self, signum, frame): super()._sig_chld(signum, frame) _serialize is a decorator that looks like this: def _serialize(func): """Decorator to se
[issue45718] asyncio: MultiLoopWatcher has a race condition (Proposed work-around)
William Fisher added the comment: Thanks, I will comment on bpo-38323 directly. -- resolution: -> duplicate stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue45718> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45718] asyncio: MultiLoopWatcher has a race condition (Proposed work-around)
New submission from William Fisher : Summary: asyncio.MultiLoopChildWatcher has two problems that create a race condition. 1. The SIGCHLD signal handler does not guard against interruption/re-entry. 2. The SIGCHLD signal handler can interrupt add_child_handler's `self._do_waitpid(pid)`. This is a continuation of bpo-38323. That issue discussed two bugs. This issue proposes a work-around for one of them that may be useful in making build tests more reliable. I'm reserving discussion to the case of a single asyncio event loop on the main thread. (MultiLoopChildWatcher has a separate "signal-delivery-blocked" problem when used in an event loop that is not in the main thread as mentioned in bpo-38323.) Symptoms: Log messages that look like this: 1634935451.761 WARNING Unknown child process pid 8747, will report returncode 255 ... 1634935451.762 WARNING Child watcher got an unexpected pid: 8747 Traceback (most recent call last): File "/Users/runner/hostedtoolcache/Python/3.9.7/x64/lib/python3.9/asyncio/unix_events.py", line 1306, in _do_waitpid loop, callback, args = self._callbacks.pop(pid) KeyError: 8747 Background: I've been working on a library to make calling asyncio subprocesses more convenient. As part of my testing, I've been stress testing asyncio using different child watcher policies. My CI build runs more than 200 tests each on macOS, Linux and FreeBSD. I've noticed a small percentage of sporadic failures using MultiLoopChildWatcher. My understanding of Python signal functions is that: 1. Upon receipt of a signal, the native "C" signal handler sets a flag that indicates the signal arrived. 2. The main thread checks the signal flags after each bytecode instruction. If a signal flag is set, Python saves the call stack, runs the signal handler on the main thread immediately, then pops the stack when it returns (assuming no exception raised by signal handler). 3. If you are in the middle of a signal handler running on the main thread and Python detects another signal flag, your signal handler can be interrupted. 4. Stacked signal handlers run in LIFO order. The last one that enters will run to completion without interruption. Explanation: I wrapped MultiLoopChildWatcher's sig_chld function in a decorator that logs when it is entered and exited. This clearly shows that _sig_chld is being re-entered. In the following log snippet, I'm running two commands in a pipeline "tr | cat". 1634935451.743 DEBUG process '/usr/bin/tr' created: pid 8747 ... 1634935451.746 DEBUG process '/bin/cat' created: pid 8748 ... 1634935451.761 DEBUG enter '_sig_chld' 20 1634935451.761 DEBUG enter '_sig_chld' 20 1634935451.761 WARNING Unknown child process pid 8747, will report returncode 255 1634935451.762 DEBUG process 8748 exited with returncode 0 1634935451.762 DEBUG exit '_sig_chld' 20 1634935451.762 WARNING Child watcher got an unexpected pid: 8747 Traceback (most recent call last): File "/Users/runner/hostedtoolcache/Python/3.9.7/x64/lib/python3.9/asyncio/unix_events.py", line 1306, in _do_waitpid loop, callback, args = self._callbacks.pop(pid) KeyError: 8747 1634935451.763 WARNING Unknown child process pid 8748, will report returncode 255 1634935451.763 WARNING Child watcher got an unexpected pid: 8748 Traceback (most recent call last): File "/Users/runner/hostedtoolcache/Python/3.9.7/x64/lib/python3.9/asyncio/unix_events.py", line 1306, in _do_waitpid loop, callback, args = self._callbacks.pop(pid) KeyError: 8748 1634935451.763 DEBUG exit '_sig_chld' 20 Here is the breakdown of what happens: 1. Pid 8747 exits and we enter _sig_chld #1. 2. sig_chld #1 calls os.waitpid which gives (pid, status) = (8747, 0). 3. Before sig_chld #1 has a chance to call `self._callbacks.pop(pid)`, it is interrupted. 4. sig_chld #2 calls os.waitpid for pid 8747. We get a ChildProcessError and then "Unknown child process pid 8747, will report returncode 255" 5. sig_chld #2 invokes the callback for pid 8747 saying the returncode=255. 6. sig_chld #2 continues to completion. It reaps pid 8748 normally. 7. sig_chld #1 picks up where it left off. We get an error when we try to pop the callback for 8747. 8. sig_chld #1 calls os.waitpid for pid 8748. This gives us failure messages because it was done by sig_chld #2. The issue of interruption can also happen in the case of running a single process. If the _sig_chld interrupts the call to `self._do_waitpid(pid)` in add_child_watcher, a similar interleaving can occur. Work-Around: In my tests, I patched MultiLoopChildWatcher and so far, it appears to be more reliable. In add_child_handler, I call raise_signal(SIGCHLD) so that all the work is done in the signal handler. class PatchedMultiLoopChildWatcher(asyncio.MultiLoopChildWatcher): "Test race condition fixes in MultiLoopChildWatcher." def add_child_handler(self, pid, callback, *args)
[issue45074] asyncio hang in subprocess wait_closed() on Windows, BrokenPipeError
New submission from William Fisher : I have a reproducible case where stdin.wait_closed() is hanging on Windows. This happens in response to a BrokenPipeError. The same code works fine on Linux and MacOS. Please see the attached code for the demo. I believe the hang is related to this debug message from the logs: DEBUG <_ProactorWritePipeTransport closing fd=632>: Fatal write error on pipe transport Traceback (most recent call last): File "C:\hostedtoolcache\windows\Python\3.9.6\x64\lib\asyncio\proactor_events.py", line 379, in _loop_writing f.result() File "C:\hostedtoolcache\windows\Python\3.9.6\x64\lib\asyncio\windows_events.py", line 812, in _poll value = callback(transferred, key, ov) File "C:\hostedtoolcache\windows\Python\3.9.6\x64\lib\asyncio\windows_events.py", line 538, in finish_send return ov.getresult() BrokenPipeError: [WinError 109] The pipe has been ended It appears that the function that logs "Fatal write error on pipe transport" also calls _abort on the stream. If _abort is called before stdin.close(), everything is okay. If _abort is called after stdin.close(), stdin.wait_closed() will hang. Please see issue #44428 for another instance of a similar hang in wait_closed(). -- components: asyncio files: wait_closed.py messages: 400810 nosy: asvetlov, byllyfish, yselivanov priority: normal severity: normal status: open title: asyncio hang in subprocess wait_closed() on Windows, BrokenPipeError type: behavior versions: Python 3.10, Python 3.9 Added file: https://bugs.python.org/file50250/wait_closed.py ___ Python tracker <https://bugs.python.org/issue45074> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45008] asyncio.gather should not "dedup" awaitables
New submission from William Fisher : asyncio.gather uses a dictionary to de-duplicate futures and coros. However, this can lead to problems when you pass an awaitable object (implements __await__ but isn't a future or coro). 1. Two or more awaitables may compare for equality/hash, but still expect to produce different results (See the RandBits class in gather_test.py) 2. If an awaitable doesn't support hashing, asyncio.gather doesn't work. Would it be possible for non-future, non-coro awaitables to opt out of the dedup logic? The attached file shows an awaitable RandBits class. Each time you await it, you should get a different result. Using gather, you will always get the same result. -- components: asyncio files: gather_test.py messages: 400309 nosy: asvetlov, byllyfish, yselivanov priority: normal severity: normal status: open title: asyncio.gather should not "dedup" awaitables type: behavior versions: Python 3.9 Added file: https://bugs.python.org/file50236/gather_test.py ___ Python tracker <https://bugs.python.org/issue45008> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com