[issue46716] regrtest didn't respect the timeout when running test_subprocess on AMD64 Windows11 3.x

2022-03-23 Thread Jeremy Kloth


Change by Jeremy Kloth :


--
pull_requests: +30168
pull_request: https://github.com/python/cpython/pull/32081

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46716] regrtest didn't respect the timeout when running test_subprocess on AMD64 Windows11 3.x

2022-03-23 Thread Jeremy Kloth


Change by Jeremy Kloth :


--
keywords: +patch
pull_requests: +30167
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/32079

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46716] regrtest didn't respect the timeout when running test_subprocess on AMD64 Windows11 3.x

2022-02-21 Thread STINNER Victor


STINNER Victor  added the comment:

> It does seem that only the Windows Popen._wait() cannot handle negative 
> timeout values, so the fix should be as simple as coercing the timeout values 
> to >= 0.

Oh. This function should maybe raise an exception if the timeout is negative, 
and ther caller must replace negative timeout with zero.

> A judicious use of prints in subprocess.py, reveals that the timeout passed 
> to wait() ends up being negative.  That value, once cast to a DWORD, 
> ultimately causes a very long wait (0xfff2, in my testing).

This sounds dangerous and must be fixed. Python must not convert negative 
values to very large positive values.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46716] regrtest didn't respect the timeout when running test_subprocess on AMD64 Windows11 3.x

2022-02-14 Thread Eryk Sun


Eryk Sun  added the comment:

> 4) use Job objects to group Windows processes for termination

I think a separate issue should be created for this enhancement.

_winapi wrappers would be needed for CreateJobObjectW(), 
SetInformationJobObject(), AssignProcessToJobObject(), TerminatejobObject(), 
and ResumeThread(), plus the constant CREATE_SUSPENDED. I'd also prefer to make 
related changes to send_signal(), which would require GetConsoleProcessList() 
and GenerateConsoleCtrlEvent().

A new Popen option would be needed to configure whether the job allows 
descendants to break away via the process creation flag 
CREATE_BREAKAWAY_FROM_JOB. This should be allowed by default.

---

send_signal(): SIGKILL, SIGTERM, SIBREAK, SIGINT

Support Popen.kill(group=False) and Popen.terminate(group=False) on all 
platforms as Popen.send_signal(signal.SIGKILL, group=group) and 
Popen.send_signal(signal.SIGTERM, group=group).

The Universal CRT (ucrt) in Windows doesn't define SIGKILL. Even when it's not 
defined by the platform, signal.SIGKILL should always be defined, preferably as 
9 (unused by ucrt), else as an unused value in the range up to NSIG, else as 
NSIG + 1.

The `group` keyword-only option broadens the scope to the process group or job. 
A process is a group leader if it was created with the flag 
CREATE_NEW_PROCESS_GROUP (save self._creationflags) and its process ID is in 
GetConsoleProcessList().

For SIGKILL, always use forced termination. For SIGTERM, use forced termination 
either if `group` is false or if `group` is true and the process is not a group 
leader. To force termination, call TerminateJobObject(self._job_handle, 1) if 
`group` is true, else TerminateProcess(self._handle, 1). 

For SIGTERM, SIGBREAK, and SIGINT, call GenerateConsoleCtrlEvent() if `group` 
is true and the process is a group leader. For SIGTERM and SIGBREAK, send 
CTRL_BREAK_EVENT. For SIGINT, send CTRL_C_EVENT. 

Behavior to deprecate:

When `group` is false and `sig` is CTRL_C_EVENT (0) or CTRL_BREAK_EVENT (1), 
send_signal() always calls os.kill(). This legacy behavior tries to handle a 
process as if it's a process group, and it combines POSIX kill() with Windows 
API constants. subprocess should distance itself from the fundamental problems 
with os.kill() in Windows.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46716] regrtest didn't respect the timeout when running test_subprocess on AMD64 Windows11 3.x

2022-02-14 Thread Eryk Sun


Eryk Sun  added the comment:

> I fear the potential 3rd-party breakage alone should bump
> this to its own issue.

The change to the DWORD converter in _winapi should only be in 3.11+. If this 
causes problems for other projects, they're probably depending on undefined 
behavior in the standard library. No third-party package or application should 
use _winapi directly.

> 1) modify Windows Popen._wait() to raise on out of bounds 
> values [< 0 or >= INFINITE]

I think raising ValueError would be best at this level. For example:

if timeout is None:
timeout_millis = _winapi.INFINITE
else:
timeout_millis = int(timeout * 1000)
if timeout_millis >= _winapi.INFINITE:
raise ValueError("'timeout' exceeds the platform limit")
if timeout_millis < 0:
raise ValueError("'timeout' must be non-negative")

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46716] regrtest didn't respect the timeout when running test_subprocess on AMD64 Windows11 3.x

2022-02-13 Thread Jeremy Kloth


Jeremy Kloth  added the comment:

> > the fix should be as simple as coercing the timeout values to >= 0.
>
> Popen._remaining_time() should return max(endtime - _time(), 0).

That was my first initial instinct as well, however, that change would
also affect more of the Popen behavior and need a much more thorough
investigation of the POSIX side of Popen.

> Popen._wait() should raise OverflowError if the timeout is too large for the 
> implementation. In Windows, the upper limit in milliseconds is 
> `_winapi.INFINITE - 1` (about 49.7 days). It's important to only allow the 
> timeout in milliseconds to be _winapi.INFINITE when `timeout is None`.

I agree.

> The DWORD converter in _winapi needs to subclass unsigned_long_converter. The 
> current implementation based on the legacy format unit "k" is too lenient. 
> Negative values and values that are too large should fail.

Whilst I agree this is a correct solution, I fear the potential
3rd-party breakage alone should bump this to its own issue.

I believe that this then leads to the following action items for this issue:
1) modify Windows Popen._wait() to raise on out of bounds values [< 0
or >= INFINITE]
2) cap Popen._remaining_time() return value to >= 0
3) change _winapi DWORD converter be unsigned long
4) use Job objects to group Windows processes for termination

Have I missed anything?  I should be able to knock out PRs for these today.
-- 
Jeremy Kloth

--
nosy: +jeremy.kloth

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46716] regrtest didn't respect the timeout when running test_subprocess on AMD64 Windows11 3.x

2022-02-13 Thread Eryk Sun


Eryk Sun  added the comment:

> the fix should be as simple as coercing the timeout values to >= 0.

Popen._remaining_time() should return max(endtime - _time(), 0).

Popen._wait() should raise OverflowError if the timeout is too large for the 
implementation. In Windows, the upper limit in milliseconds is 
`_winapi.INFINITE - 1` (about 49.7 days). It's important to only allow the 
timeout in milliseconds to be _winapi.INFINITE when `timeout is None`.

The DWORD converter in _winapi needs to subclass unsigned_long_converter. The 
current implementation based on the legacy format unit "k" is too lenient. 
Negative values and values that are too large should fail. I updated it to use 
the following definition:

class DWORD_converter(unsigned_long_converter):
type = 'DWORD'

This produces the following improved results:

>>> h = _winapi.GetCurrentProcess()
>>> _winapi.WaitForSingleObject(h, _winapi.INFINITE + 1)
Traceback (most recent call last):
  File "", line 1, in 
OverflowError: Python int too large to convert to C unsigned long

>>> _winapi.WaitForSingleObject(h, -1)
Traceback (most recent call last):
  File "", line 1, in 
ValueError: value must be positive

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46716] regrtest didn't respect the timeout when running test_subprocess on AMD64 Windows11 3.x

2022-02-12 Thread Jeremy Kloth


Jeremy Kloth  added the comment:

I've been able locally to reproduce the test_subprocess hang.  The responsible 
function is subprocess.run().  The test case, test_timeout(), uses a small 
timeout value (0.0001), which, when given enough load, can cause the run() call 
to hang.

A judicious use of prints in subprocess.py, reveals that the timeout passed to 
wait() ends up being negative.  That value, once cast to a DWORD, ultimately 
causes a very long wait (0xfff2, in my testing).

It does seem that only the Windows Popen._wait() cannot handle negative timeout 
values, so the fix should be as simple as coercing the timeout values to >= 0.

--
Added file: https://bugs.python.org/file50623/process.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46716] regrtest didn't respect the timeout when running test_subprocess on AMD64 Windows11 3.x

2022-02-11 Thread Eryk Sun


Eryk Sun  added the comment:

> test_call_timeout() or test_timeout() in test_subprocess.py.

These tests don't override the standard files, and they only spawn a single 
child with no descendants. I don't see why this would hang. It shouldn't be a 
problem with leaked pipe handles (see bpo-43346). It probably will need to be 
diagnosed by attaching a debugger, or offline with a dump file.

> process trees whereas terminating a parent automatically kills the children

One can use a job object to manage a child process and all of its descendants, 
including resource usage and termination. A process can belong to multiple job 
objects in Windows 8+, which is required by Python 3.9+.

For reliability, the child has to be created in a suspended state via 
CREATE_SUSPENDED. It can be resumed with ResumeThread() after adding it to the 
job with AssignProcessToJobObject().

You can try to terminate a job cleanly, which is similar in effect to sending 
SIGTERM to a process group in POSIX. In Windows, this has to be approached 
differently for console vs graphical processes.

To handle console apps, assuming the child inherits the current console, spawn 
it as a new process group via 
creationflags=subprocess.CREATE_NEW_PROCESS_GROUP. You can request an exit by 
sending a Ctrl+Break event to the group via os.kill(p.pid, 
signal.CTRL_BREAK_EVENT) [1]. The request might be ignored, but typically the 
default handler is called, which calls ExitProcess().

To handle GUI apps, assuming the child inherits the current desktop (usually 
"WinSta0\Default"), first enumerate the top-level and message-only windows on 
the current desktop via EnumWindows() and FindWindowExW(). Use 
GetWindowThreadProcessId() to filter the list to include only windows that 
belong to the job. Post WM_CLOSE to each window in the job. A process might 
ignore a request to close. It could keep the window open or continue running in 
the background.

After an internal timeout, you can call TerminateJobObject() to kill any 
process in the job that remains alive. This is a forced and abrupt termination, 
which is similar to sending SIGKILL to a process group in POSIX.

---

[1] This usage of os.kill() is what we're stuck with. Rightfully, we should be 
using os.killpg(p.pid, signal.SIGBREAK) or os.kill(-p.pid, signal.SIGBREAK) 
(note the negative pid value).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46716] regrtest didn't respect the timeout when running test_subprocess on AMD64 Windows11 3.x

2022-02-10 Thread Jeremy Kloth


Jeremy Kloth  added the comment:

The test only completed once I purposefully terminated the offending Python 
process.  The only identifying information I noticed was the command-line of 
`-c "while True: pass"`, indicating it was stuck in either
test_call_timeout() or test_timeout() in test_subprocess.py.

Something to note is that Windows does not, by default, have a concept of 
process trees whereas terminating a parent automatically kills the children.  
Eryk Sun may have additional ideas on how this desired behavior could be 
accomplished.

--
nosy: +eryksun, jkloth

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46716] regrtest didn't respect the timeout when running test_subprocess on AMD64 Windows11 3.x

2022-02-10 Thread STINNER Victor


Change by STINNER Victor :


--
title: regrtest didn't respect the timeout on AMD64 Windows11 3.x -> regrtest 
didn't respect the timeout when running test_subprocess on AMD64 Windows11 3.x

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com