from:"William Fisher"

[issue45723] Improve and simplify configure.ac checks

2022-01-06 Thread William Fisher



William Fisher  added the comment:

In the conversion to PY_CHECK_FUNC, there's a mistake in HAVE_EPOLL.

Python 3.10.1 defines HAVE_EPOLL by checking for the `epoll_create` function. 
Python 3.11.0a3 checks for the `epoll` function instead. There is no epoll() 
function so this always fails.

The effect is that `epoll` doesn't exist in the `select` module on Python 
3.11.0a3. Most code that uses epoll falls back when it is not available, so 
this may not be failing any tests.

--
nosy: +byllyfish

___
Python tracker 
<https://bugs.python.org/issue45723>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

2021-11-05 Thread William Fisher



William Fisher  added the comment:

asyncio.MultiLoopChildWatcher has two problems that create a race condition.

1. The SIGCHLD signal handler does not guard against interruption/re-entry.
2. The SIGCHLD signal handler can interrupt add_child_handler's 
`self._do_waitpid(pid)`.


Symptoms:

Log messages that look like this:

1634935451.761 WARNING Unknown child process pid 8747, will report returncode 
255
...
1634935451.762 WARNING Child watcher got an unexpected pid: 8747
Traceback (most recent call last):
  File 
"/Users/runner/hostedtoolcache/Python/3.9.7/x64/lib/python3.9/asyncio/unix_events.py",
 line 1306, in _do_waitpid
loop, callback, args = self._callbacks.pop(pid)
KeyError: 8747


Background:

I've been working on a library to make calling asyncio subprocesses more 
convenient. As part of my testing, I've been stress testing asyncio using 
different child watcher policies. My CI build runs more than 200 tests each on 
macOS, Linux and FreeBSD. I've noticed a small percentage of sporadic failures 
using MultiLoopChildWatcher.

My understanding of Python signal functions is that:

1. Upon receipt of a signal, the native "C" signal handler sets a flag that 
indicates the signal arrived.
2. The main thread checks the signal flags after each bytecode instruction. If 
a signal flag is set, Python saves the call stack, runs the signal handler on 
the main thread immediately, then pops the stack when it returns (assuming no 
exception raised by signal handler).
3. If you are in the middle of a signal handler running on the main thread and 
Python detects another signal flag, your signal handler can be interrupted.
4. Stacked signal handlers run in LIFO order. The last one that enters will run 
to completion without interruption.


Explanation:

I wrapped MultiLoopChildWatcher's sig_chld function in a decorator that logs 
when it is entered and exited. This clearly shows that _sig_chld is being 
re-entered. In the following log snippet, I'm running two commands in a 
pipeline "tr | cat".

1634935451.743 DEBUG process '/usr/bin/tr' created: pid 8747
...
1634935451.746 DEBUG process '/bin/cat' created: pid 8748
...
1634935451.761 DEBUG enter '_sig_chld' 20
1634935451.761 DEBUG enter '_sig_chld' 20
1634935451.761 WARNING Unknown child process pid 8747, will report returncode 
255
1634935451.762 DEBUG process 8748 exited with returncode 0
1634935451.762 DEBUG exit '_sig_chld' 20
1634935451.762 WARNING Child watcher got an unexpected pid: 8747
Traceback (most recent call last):
  File 
"/Users/runner/hostedtoolcache/Python/3.9.7/x64/lib/python3.9/asyncio/unix_events.py",
 line 1306, in _do_waitpid
loop, callback, args = self._callbacks.pop(pid)
KeyError: 8747
1634935451.763 WARNING Unknown child process pid 8748, will report returncode 
255
1634935451.763 WARNING Child watcher got an unexpected pid: 8748
Traceback (most recent call last):
  File 
"/Users/runner/hostedtoolcache/Python/3.9.7/x64/lib/python3.9/asyncio/unix_events.py",
 line 1306, in _do_waitpid
loop, callback, args = self._callbacks.pop(pid)
KeyError: 8748
1634935451.763 DEBUG exit '_sig_chld' 20


Here is the breakdown of what happens:

1. Pid 8747 exits and we enter _sig_chld #1.
2. sig_chld #1 calls os.waitpid which gives (pid, status) = (8747, 0).
3. Before sig_chld #1 has a chance to call `self._callbacks.pop(pid)`, it is 
interrupted.
4. sig_chld #2 calls os.waitpid for pid 8747. We get a ChildProcessError and 
then "Unknown child process pid 8747, will report returncode 255"
5. sig_chld #2 invokes the callback for pid 8747 saying the returncode=255.
6. sig_chld #2 continues to completion. It reaps pid 8748 normally.
7. sig_chld #1 picks up where it left off. We get an error when we try to pop 
the callback for 8747.
8. sig_chld #1 calls os.waitpid for pid 8748. This gives us failure messages 
because it was done by sig_chld #2.

The issue of interruption can also happen in the case of running a single 
process. If the _sig_chld interrupts the call to `self._do_waitpid(pid)` in 
add_child_watcher, a similar interleaving can occur.


Work-Around:

In my tests, I patched MultiLoopChildWatcher and so far, it appears to be more 
reliable. In add_child_handler, I call raise_signal(SIGCHLD) so that all the 
work is done in the signal handler.


class PatchedMultiLoopChildWatcher(asyncio.MultiLoopChildWatcher):
"Test race condition fixes in MultiLoopChildWatcher."

def add_child_handler(self, pid, callback, *args):
loop = asyncio.get_running_loop()
self._callbacks[pid] = (loop, callback, args)

# Prevent a race condition in case signal was delivered before
# callback added.
signal.raise_signal(signal.SIGCHLD)

@_serialize
def _sig_chld(self, signum, frame):
super()._sig_chld(signum, frame)


_serialize is a decorator that looks like this:


def _serialize(func):
"""Decorator to se

[issue45718] asyncio: MultiLoopWatcher has a race condition (Proposed work-around)

2021-11-05 Thread William Fisher



William Fisher  added the comment:

Thanks, I will comment on bpo-38323 directly.

--
resolution:  -> duplicate
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue45718>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue45718] asyncio: MultiLoopWatcher has a race condition (Proposed work-around)

2021-11-04 Thread William Fisher



New submission from William Fisher :

Summary: asyncio.MultiLoopChildWatcher has two problems that create a race 
condition.

1. The SIGCHLD signal handler does not guard against interruption/re-entry.
2. The SIGCHLD signal handler can interrupt add_child_handler's 
`self._do_waitpid(pid)`.

This is a continuation of bpo-38323. That issue discussed two bugs. This issue 
proposes a work-around for one of them that may be useful in making build tests 
more reliable.

I'm reserving discussion to the case of a single asyncio event loop on the main 
thread. (MultiLoopChildWatcher has a separate "signal-delivery-blocked" problem 
when used in an event loop that is not in the main thread as mentioned in 
bpo-38323.)


Symptoms:

Log messages that look like this:

1634935451.761 WARNING Unknown child process pid 8747, will report returncode 
255
...
1634935451.762 WARNING Child watcher got an unexpected pid: 8747
Traceback (most recent call last):
  File 
"/Users/runner/hostedtoolcache/Python/3.9.7/x64/lib/python3.9/asyncio/unix_events.py",
 line 1306, in _do_waitpid
loop, callback, args = self._callbacks.pop(pid)
KeyError: 8747


Background:

I've been working on a library to make calling asyncio subprocesses more 
convenient. As part of my testing, I've been stress testing asyncio using 
different child watcher policies. My CI build runs more than 200 tests each on 
macOS, Linux and FreeBSD. I've noticed a small percentage of sporadic failures 
using MultiLoopChildWatcher.

My understanding of Python signal functions is that:

1. Upon receipt of a signal, the native "C" signal handler sets a flag that 
indicates the signal arrived.
2. The main thread checks the signal flags after each bytecode instruction. If 
a signal flag is set, Python saves the call stack, runs the signal handler on 
the main thread immediately, then pops the stack when it returns (assuming no 
exception raised by signal handler).
3. If you are in the middle of a signal handler running on the main thread and 
Python detects another signal flag, your signal handler can be interrupted.
4. Stacked signal handlers run in LIFO order. The last one that enters will run 
to completion without interruption.


Explanation:

I wrapped MultiLoopChildWatcher's sig_chld function in a decorator that logs 
when it is entered and exited. This clearly shows that _sig_chld is being 
re-entered. In the following log snippet, I'm running two commands in a 
pipeline "tr | cat".

1634935451.743 DEBUG process '/usr/bin/tr' created: pid 8747
...
1634935451.746 DEBUG process '/bin/cat' created: pid 8748
...
1634935451.761 DEBUG enter '_sig_chld' 20
1634935451.761 DEBUG enter '_sig_chld' 20
1634935451.761 WARNING Unknown child process pid 8747, will report returncode 
255
1634935451.762 DEBUG process 8748 exited with returncode 0
1634935451.762 DEBUG exit '_sig_chld' 20
1634935451.762 WARNING Child watcher got an unexpected pid: 8747
Traceback (most recent call last):
  File 
"/Users/runner/hostedtoolcache/Python/3.9.7/x64/lib/python3.9/asyncio/unix_events.py",
 line 1306, in _do_waitpid
loop, callback, args = self._callbacks.pop(pid)
KeyError: 8747
1634935451.763 WARNING Unknown child process pid 8748, will report returncode 
255
1634935451.763 WARNING Child watcher got an unexpected pid: 8748
Traceback (most recent call last):
  File 
"/Users/runner/hostedtoolcache/Python/3.9.7/x64/lib/python3.9/asyncio/unix_events.py",
 line 1306, in _do_waitpid
loop, callback, args = self._callbacks.pop(pid)
KeyError: 8748
1634935451.763 DEBUG exit '_sig_chld' 20


Here is the breakdown of what happens:

1. Pid 8747 exits and we enter _sig_chld #1.
2. sig_chld #1 calls os.waitpid which gives (pid, status) = (8747, 0).
3. Before sig_chld #1 has a chance to call `self._callbacks.pop(pid)`, it is 
interrupted.
4. sig_chld #2 calls os.waitpid for pid 8747. We get a ChildProcessError and 
then "Unknown child process pid 8747, will report returncode 255"
5. sig_chld #2 invokes the callback for pid 8747 saying the returncode=255.
6. sig_chld #2 continues to completion. It reaps pid 8748 normally.
7. sig_chld #1 picks up where it left off. We get an error when we try to pop 
the callback for 8747.
8. sig_chld #1 calls os.waitpid for pid 8748. This gives us failure messages 
because it was done by sig_chld #2.

The issue of interruption can also happen in the case of running a single 
process. If the _sig_chld interrupts the call to `self._do_waitpid(pid)` in 
add_child_watcher, a similar interleaving can occur.


Work-Around:

In my tests, I patched MultiLoopChildWatcher and so far, it appears to be more 
reliable. In add_child_handler, I call raise_signal(SIGCHLD) so that all the 
work is done in the signal handler.


class PatchedMultiLoopChildWatcher(asyncio.MultiLoopChildWatcher):
"Test race condition fixes in MultiLoopChildWatcher."

def add_child_handler(self, pid, callback, *args)

[issue45074] asyncio hang in subprocess wait_closed() on Windows, BrokenPipeError

2021-08-31 Thread William Fisher



New submission from William Fisher :

I have a reproducible case where stdin.wait_closed() is hanging on
Windows. This happens in response to a BrokenPipeError. The same code 
works fine on Linux and MacOS.

Please see the attached code for the demo.

I believe the hang is related to this debug message from the logs:


DEBUG <_ProactorWritePipeTransport closing fd=632>: Fatal write error on pipe 
transport
Traceback (most recent call last):
  File 
"C:\hostedtoolcache\windows\Python\3.9.6\x64\lib\asyncio\proactor_events.py", 
line 379, in _loop_writing
f.result()
  File 
"C:\hostedtoolcache\windows\Python\3.9.6\x64\lib\asyncio\windows_events.py", 
line 812, in _poll
value = callback(transferred, key, ov)
  File 
"C:\hostedtoolcache\windows\Python\3.9.6\x64\lib\asyncio\windows_events.py", 
line 538, in finish_send
return ov.getresult()
BrokenPipeError: [WinError 109] The pipe has been ended


It appears that the function that logs "Fatal write error on pipe transport" 
also 
calls _abort on the stream. If _abort is called before stdin.close(), 
everything is okay. 
If _abort is called after stdin.close(), stdin.wait_closed() will hang.

Please see issue #44428 for another instance of a similar hang in wait_closed().

--
components: asyncio
files: wait_closed.py
messages: 400810
nosy: asvetlov, byllyfish, yselivanov
priority: normal
severity: normal
status: open
title: asyncio hang in subprocess wait_closed() on Windows, BrokenPipeError
type: behavior
versions: Python 3.10, Python 3.9
Added file: https://bugs.python.org/file50250/wait_closed.py

___
Python tracker 
<https://bugs.python.org/issue45074>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue45008] asyncio.gather should not "dedup" awaitables

2021-08-25 Thread William Fisher



New submission from William Fisher :

asyncio.gather uses a dictionary to de-duplicate futures and coros. However, 
this can lead to problems when
you pass an awaitable object (implements __await__ but isn't a future or coro).

1. Two or more awaitables may compare for equality/hash, but still expect to 
produce different results (See the RandBits class in gather_test.py)

2. If an awaitable doesn't support hashing, asyncio.gather doesn't work.

Would it be possible for non-future, non-coro awaitables to opt out of the 
dedup logic?

The attached file shows an awaitable RandBits class. Each time you await it, 
you should get a different
result. Using gather, you will always get the same result.

--
components: asyncio
files: gather_test.py
messages: 400309
nosy: asvetlov, byllyfish, yselivanov
priority: normal
severity: normal
status: open
title: asyncio.gather should not "dedup" awaitables
type: behavior
versions: Python 3.9
Added file: https://bugs.python.org/file50236/gather_test.py

___
Python tracker 
<https://bugs.python.org/issue45008>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue45723] Improve and simplify configure.ac checks

[issue38323] asyncio: MultiLoopWatcher has a race condition (test_asyncio: test_close_kill_running() hangs on AMD64 RHEL7 Refleaks 3.x)

[issue45718] asyncio: MultiLoopWatcher has a race condition (Proposed work-around)

[issue45718] asyncio: MultiLoopWatcher has a race condition (Proposed work-around)

[issue45074] asyncio hang in subprocess wait_closed() on Windows, BrokenPipeError

[issue45008] asyncio.gather should not "dedup" awaitables

6 matches

Site Navigation

Mail list logo

Footer information