[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-13 Thread Eryk Sun


Eryk Sun  added the comment:

> I didn't have that in mind at all.

I understood what you had in mind, and I don't disagree. I was just 
highlighting that the only somewhat important work done in _stop() is to clean 
up the _shutdown_locks set, to keep it from growing too large. But that task is 
already handled when creating a new thread, and it's arguably more important to 
handle it there.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-13 Thread Tim Peters


Tim Peters  added the comment:

> It's nice that _maintain_shutdown_locks() gets
> called in _stop(), but the more important call site is in
> _set_tstate_lock().

I didn't have that in mind at all. What at the Python level cares whether the 
thread is alive? Well. is_alive() does, and it calls _wait_for_tstate_lock() 
__repr__() does, and it calls is_alive() (which I added years ago, else repr 
could keep claiming a thread was alive weeks ;-) after it actually ended). 
join() does, and it calls _wait_for_tstate_lock().

Anything else? Not off the top of my head. The point is that if 
_wait_for_tstate_lock() fails to call _stop() for some reason when it "should 
have" called it, it's not fatal. Anything that _cares_ will call it again. 
Indeed, it was all designed so that it could be called any number of times, 
redundantly or not, and without any explicit synchronization.

For the rest, I value clarity above all else here. This code has a long history 
of being the subject of bug reports. The cost of an "extra" call to .locked() 
is just trivial, especially given that calls to _wait_for_tstate_lock() are 
typically rare.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-13 Thread Eryk Sun


Eryk Sun  added the comment:

> Anything at the Python level that cares whether the thread is 
> still alive will call _wait_for_tstate_lock() again

It's nice that _maintain_shutdown_locks() gets called in _stop(), but the more 
important call site is in _set_tstate_lock().

I typed up the example initially with try/finally. Then I changed it to avoid 
the extra lock.locked() call when there's no exception, but I forgot to add a 
`raise` statement:

try:
if lock.acquire_and_release(block, timeout):
self._stop
except:
if not lock.locked():
self._stop()
raise

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-13 Thread Tim Peters


Tim Peters  added the comment:

>> is there a bulletproof way to guarantee that `self._stop()` gets 
>> called if the acquire_and_release() succeeds? 

> I don't think it's critical.

Agreed! Anything at the Python level that cares whether the thread is still 
alive will call _wait_for_tstate_lock() again, and get another chance each time 
to notice that the tstate lock has been freed.

Instead of:

try:
if lock.acquire_and_release(block, timeout):
self._stop
except:
if not lock.locked():
self._stop()

I'd prefer:

try:
lock.acquire_and_release(block, timeout)
finally:
if not lock.locked():
self._stop()

Because, at the Python level, whether acquire_and_release() succeeded at its 
task depends entirely on whether the lock is free afterward. That's what we're 
_really_ looking for, and is so on all possible platforms.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-13 Thread Eryk Sun


Eryk Sun  added the comment:

> is there a bulletproof way to guarantee that `self._stop()` gets 
> called if the acquire_and_release() succeeds? 

I don't think it's critical. But we still should deal with the common case in 
Windows in which a KeyboardInterrupt is raised immediately after the method 
returns. For example:

try:
if lock.acquire_and_release(block, timeout):
self._stop
except:
if not lock.locked():
self._stop()

The proposed acquire_and_release() method can also be used in 
threading._shutdown(), when it iterates _shutdown_locks to join non-daemon 
threads.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-12 Thread Tim Peters


Tim Peters  added the comment:

While bundling the lock.release() into C makes that bulletproof, is there a 
bulletproof way to guarantee that `self._stop()` gets called if the 
acquire_and_release() succeeds? Offhand, I don't see a reason for why that 
isn't just as vulnerable to getting skipped due to an unfortunate signal.

Indeed, huge mounds of threading.py can leave things in insane states in the 
presence of by-magic exceptions. Even if code is very careful to put crucial 
cleanup code in `finally` blocks so it gets executed "no matter what", there's 
nothing to stop code in `finally` blocks from getting skipped over due to 
by-magic exceptions too.

It's an eternal marvel that anything ever works at all ;-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-12 Thread Tim Peters


Tim Peters  added the comment:

> Maybe add an `acquire_and_release()` method

Bingo - that should do the trick, in an "obviously correct" way. Of course it's 
of limited applicability, but fine by me.

Will you open a PR with this code?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-12 Thread Eryk Sun


Eryk Sun  added the comment:

> Wrap everything needed in a custom C function. 

Maybe add an `acquire_and_release()` method:

static PyObject *
lock_PyThread_acquire_and_release_lock(
lockobject *self, PyObject *args, PyObject *kwds)
{
_PyTime_t timeout;

if (lock_acquire_parse_args(args, kwds, ) < 0)
return NULL;

PyLockStatus r = acquire_timed(self->lock_lock, timeout);

if (r == PY_LOCK_INTR) {
return NULL;
}

if (r == PY_LOCK_ACQUIRED) {
PyThread_release_lock(self->lock_lock);
self->locked = 0;
}

return PyBool_FromLong(r == PY_LOCK_ACQUIRED);
}

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-12 Thread Tim Peters


Tim Peters  added the comment:

Na, we've been doing excruciatingly clever stuff to deal with thread shutdown 
for decades, and it always proves to be wrong in some way. Even if code like

except:
if lock.locked():
lock.release()
self._stop()
raise

did work as hoped for, it would still be broken, but in a different way: 
suppose we really did acquire the tstate lock because the thread actually ended 
while we were in acquire(). But then a signal dumped us into the `except` block 
before doing the release. Oops! There's nothing I can see to stop _another_ 
(say) KeyboardInterrupt preventing us from doing the "failsafe" release too. So 
the lock remains locked forever after (we hold the lock now, and missed our 
chances to release it). And I think that's pretty likely: if I don't see an 
instant response to Ctrl-C, I'm likely to do another very soon after.

So I don't think catching exceptions can be made to work for this. Or `finally` 
blocks either. Indeed, it appears that any way whatsoever of spelling 
`lock.release()` in Python can be defeated by an unfortunately timed signal.

Which isn't unique to this code, of course. The failure modes of this code just 
happen to be unusually visible ;-)

Two other approaches come to mind:

- Wrap everything needed in a custom C function. CPython can't interfere with 
its control flow.

- Add new sys gimmicks to suppress and re-enable raising Python level 
exceptions for signals. Then, e.g., something here like:

with sys.delay_signal_exceptions():
# Dead simple code, like _before_ we "fixed" it ;-)
# In particular, while Ctrl-C may terminate the `acquire()` call,
# KeyboardInterrupt will not be raised until the `with` block
# exits.
# Possibly intractable: arranging then for the traceback to
# point at the code where the exception would have been raised
# had temporary suspension not been enabled. Then again, since
# it's not _actually_ raised there at the Python level, maybe
# it's a Good Thing to ignore.
if lock.acquire(block, timeout):
lock.release()
self._stop()

The second way is more general, but would probably require a PEP.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-12 Thread Eryk Sun


Eryk Sun  added the comment:

> If the acquire() in fact times out, but the store to the `acquired` 
> variable is interrupted, `if _WINDOWS and acquired is None` will
> succeed, despite that the lock is still locked

Yeah, my proposed workaround is no good, so we can't resolve this without a 
more fundamental solution. Are you looking into a way to prevent the STORE_FAST 
instruction from getting interrupted by an asynchronous exception?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-12 Thread Tim Peters


Tim Peters  added the comment:

Eryk, I don't think that workaround is solid on Windows in all cases. For 
example, if .join() is called with a timeout, the same timeout is passed to 
lock.acquire(block, timeout). If the acquire() in fact times out, but the store 
to the `acquired` variable is interrupted, `if _WINDOWS and acquired is None` 
will succeed, despite that the lock is still locked. Then we go on to - again - 
incorrectly release the lock and call _stop().

But please don't "repair" that: platform-specific tricks aren't on a path to an 
actual solution ;-) If the latter requires some new C code, fine.

--
nosy: +tim.peters

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-12 Thread Eryk Sun


Eryk Sun  added the comment:

> The race on := is much smaller than the original race 
> and I suspect in practice will be very hard to hit.

In Windows, the acquire() method of a lock can't be interrupted. Thus, in the 
main thread, an exception from Ctrl+C gets raised as soon as acquire() returns. 
This exception definitely will interrupt the assignment. Here's a workaround:

global scope:

_WINDOWS = _sys.platform == 'win32'

in _wait_for_tstate_lock():

acquired = None

try:
if acquired := lock.acquire(block, timeout):
lock.release()
self._stop()
except:
if _WINDOWS and acquired is None:
acquired = True
if acquired:
lock.release()
self._stop()
raise

This doesn't help in POSIX if the STORE_FAST instruction that assigns 
`acquired` gets interrupted. This can't be distinguished from acquire() itself 
getting interrupted. But at least the window for this is as small as possible.

--
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-12 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
nosy: +pitrou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-11 Thread Ben


Change by Ben :


--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-11 Thread Ben


Ben  added the comment:

You are right,

What one really needs here is a way to know *who* owns the lock,
but threading.Lock does not provide that.

The race on := is much smaller than the original race and I suspect in practice 
will be very hard to hit.

As the original bpo notes, it may not be possible to write a complete fix for 
this in pure Python, after all there may always be another interrupt between 
the `except` and the second attempted `release`.

The issue with the solution was that it turned a relatively hard-to-hit race 
condition into a really easy-to-hit one,  but perhaps the outcome is slightly 
less worse?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-11 Thread Kevin Shweh


Kevin Shweh  added the comment:

The PR you submitted doesn't work, unfortunately. It essentially reintroduces 
issue 45274. If this line:

if locked := lock.acquire(block, timeout):

gets interrupted between the acquire and the assignment, locked is still False. 
That's rare, but so is an interruption between the acquire and the release, 
which is the original form of issue 45274.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-11 Thread Kevin Shweh


Kevin Shweh  added the comment:

Issue 45274 was a subtly different issue. That was a problem that happened if 
the thread got interrupted *between* the acquire and the release, causing it to 
*not* release the lock and *not* perform end-of-thread cleanup.

The fix for that issue caused this issue, which happens if the thread gets 
interrupted *during* the acquire, in which case it *does* release the lock 
(that someone else is holding) and *does* perform end-of-thread cleanup even 
though it's not supposed to do either of those things.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-11 Thread James Gerity


Change by James Gerity :


--
nosy: +SnoopJeDi

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-11 Thread Ben


Change by Ben :


--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-11 Thread Ben


Ben  added the comment:

This is a duplicate of https://bugs.python.org/issue45274
but the patch there did not fix it

I've just added a PR there (or should it go here?) that (i think) fixes this.

The issue is that the lock.locked() call just checks that *someone* has the 
lock, not that the previous acquire() is what got the lock.
If it's just that the tstate lock is held because the thread is still running,  
then it's premature to release() the lock.

--
nosy: +bjs

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-11 Thread Kevin Shweh


New submission from Kevin Shweh :

This code in Thread._wait_for_tstate_lock:

try:
if lock.acquire(block, timeout):
lock.release()
self._stop()
except:
if lock.locked():
# bpo-45274: lock.acquire() acquired the lock, but the function
# was interrupted with an exception before reaching the
# lock.release(). It can happen if a signal handler raises an
# exception, like CTRL+C which raises KeyboardInterrupt.
lock.release()
self._stop()
raise

has a bug. The "if lock.locked()" check doesn't check whether this code managed 
to acquire the lock. It checks if *anyone at all* is holding the lock. The lock 
is almost always locked, so this code will perform a spurious call to 
self._stop() if it gets interrupted while trying to acquire the lock.

Thread.join uses this method to wait for a thread to finish, so a thread will 
spuriously be marked dead if you interrupt a join call with Ctrl-C while it's 
trying to acquire the lock. Here's a reproducer:


import time
import threading
 
event = threading.Event()
 
def target():
event.wait()
print('thread done')
 
t = threading.Thread(target=target)
t.start()
print('joining now')
try:
t.join()
except KeyboardInterrupt:
pass
print(t.is_alive())
event.set()


Interrupt this code with Ctrl-C during the join(), and print(t.is_alive()) will 
print False.

--
components: Library (Lib)
messages: 413106
nosy: Kevin Shweh
priority: normal
severity: normal
status: open
title: Thread spuriously marked dead after interrupting a join call
type: behavior
versions: Python 3.10, Python 3.11, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com