[issue31783] Race condition in ThreadPoolExecutor when scheduling new jobs while the interpreter shuts down

2019-07-01 Thread Ned Deily


Ned Deily  added the comment:

Yes, you can still add the backport labels even after the original PR has been 
merged.  It's magic!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31783] Race condition in ThreadPoolExecutor when scheduling new jobs while the interpreter shuts down

2019-07-01 Thread Brian Quinlan


Brian Quinlan  added the comment:

Can I add "needs backport to 3.8" and "needs backport to 3.7" labels now or
do I have to use cherry_picker at this point?

On Mon, Jul 1, 2019 at 3:55 PM Ned Deily  wrote:

>
> Ned Deily  added the comment:
>
> > I don't know what the backport policy is.
>
> It does seem that the devguide does not give much guidance on this; I've
> opened an issue about it (https://github.com/python/devguide/issues/503).
> But, in general, if the fix is potentially beneficial and does not add
> undue risk or an incompatibility, we would generally consider backporting
> it to the currently active maintenance branches; at the moment, that would
> be 3.8 (in beta phase) and 3.7 (maintenance nmode).  We have a lot of
> buildbot tests that show non-deterministic failures, some possibly due to
> concurrent.futures.  If there is a chance that this fix might mitigate
> those, I'd like to consider it for backporting.
>
> --
>
> ___
> Python tracker 
> 
> ___
>
>

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31783] Race condition in ThreadPoolExecutor when scheduling new jobs while the interpreter shuts down

2019-07-01 Thread Ned Deily


Ned Deily  added the comment:

> I don't know what the backport policy is.

It does seem that the devguide does not give much guidance on this; I've opened 
an issue about it (https://github.com/python/devguide/issues/503).  But, in 
general, if the fix is potentially beneficial and does not add undue risk or an 
incompatibility, we would generally consider backporting it to the currently 
active maintenance branches; at the moment, that would be 3.8 (in beta phase) 
and 3.7 (maintenance nmode).  We have a lot of buildbot tests that show 
non-deterministic failures, some possibly due to concurrent.futures.  If there 
is a chance that this fix might mitigate those, I'd like to consider it for 
backporting.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31783] Race condition in ThreadPoolExecutor when scheduling new jobs while the interpreter shuts down

2019-07-01 Thread Brian Quinlan


Brian Quinlan  added the comment:

I don't know what the backport policy is. The bug is only theoretical AFAIK
i.e. someone noticed it through code observation but it has not appeared in
the wild.

On Mon, Jul 1, 2019 at 3:25 PM Ned Deily  wrote:

>
> Ned Deily  added the comment:
>
> Brian, should this fix be backported to 3.8 and 3.7?  As it stands now, it
> will only be showing up in Python 3.9 unless you add the backport labels to
> the original PR.  Unless it cna be shown to be a security issue, it would
> not be appropriate to backport to 3.6 at this stage.
>
> --
> nosy: +ned.deily
>
> ___
> Python tracker 
> 
> ___
>
>

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31783] Race condition in ThreadPoolExecutor when scheduling new jobs while the interpreter shuts down

2019-07-01 Thread Ned Deily


Ned Deily  added the comment:

Brian, should this fix be backported to 3.8 and 3.7?  As it stands now, it will 
only be showing up in Python 3.9 unless you add the backport labels to the 
original PR.  Unless it cna be shown to be a security issue, it would not be 
appropriate to backport to 3.6 at this stage.

--
nosy: +ned.deily

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31783] Race condition in ThreadPoolExecutor when scheduling new jobs while the interpreter shuts down

2019-06-28 Thread Brian Quinlan


Change by Brian Quinlan :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31783] Race condition in ThreadPoolExecutor when scheduling new jobs while the interpreter shuts down

2019-06-28 Thread Brian Quinlan


Brian Quinlan  added the comment:


New changeset 242c26f53edb965e9808dd918089e664c0223407 by Brian Quinlan in 
branch 'master':
bpo-31783: Fix a race condition creating workers during shutdown (#13171)
https://github.com/python/cpython/commit/242c26f53edb965e9808dd918089e664c0223407


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31783] Race condition in ThreadPoolExecutor when scheduling new jobs while the interpreter shuts down

2019-05-07 Thread Brian Quinlan


Change by Brian Quinlan :


--
pull_requests: +13087
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31783] Race condition in ThreadPoolExecutor when scheduling new jobs while the interpreter shuts down

2019-05-07 Thread Brian Quinlan


Brian Quinlan  added the comment:

I think that ProcessPoolExecutor might have a similar race condition - but not 
in exactly this code path since it would only be with the queue management 
thread (which is only started once).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31783] Race condition in ThreadPoolExecutor when scheduling new jobs while the interpreter shuts down

2019-05-07 Thread Brian Quinlan

Brian Quinlan  added the comment:

Great report Steven!

I was able to reproduce this with the attached patch (just adds some sleeps and 
prints) and this script:

from threading import current_thread
from concurrent.futures import ThreadPoolExecutor
from time import sleep

pool = ThreadPoolExecutor(100)

def f():
print("I'm running in: ", current_thread().name)

def g():
print("I'm running in: ", current_thread().name)
for _ in range(100):
pool.submit(f)
sleep(0.1)

pool.submit(g)
sleep(1.5)

The output for me was:

 Creating new thread:  ThreadPoolExecutor-0_0
I'm running in:  ThreadPoolExecutor-0_0
Setting _shutdown
 Killing 1 workers 
 Creating new thread:  ThreadPoolExecutor-0_1
I'm running in:  ThreadPoolExecutor-0_1

So another thread was created *after* shutdown.

It seems like the most obvious way to fix this is by adding a lock for the 
global _shutdown variable.

--
keywords: +patch
Added file: https://bugs.python.org/file48311/find-race.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31783] Race condition in ThreadPoolExecutor when scheduling new jobs while the interpreter shuts down

2019-05-07 Thread Brian Quinlan


Change by Brian Quinlan :


--
assignee:  -> bquinlan

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31783] Race condition in ThreadPoolExecutor when scheduling new jobs while the interpreter shuts down

2017-10-20 Thread Terry J. Reedy

Change by Terry J. Reedy :


--
nosy: +bquinlan

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31783] Race condition in ThreadPoolExecutor when scheduling new jobs while the interpreter shuts down

2017-10-13 Thread Steven Barker

New submission from Steven Barker :

While investigating a Stack Overflow question (here: 
https://stackoverflow.com/q/46529767/1405065), I found that there may be a race 
condition in the cleanup code for concurrent.futures.ThreadPoolIterator. The 
behavior in normal situations is fairly benign (the executor may run a few more 
jobs than you'd expect, but exits cleanly), but in rare situations it might 
lose track of a running thread and allow the interpreter to shut down while the 
thread is still trying to do work.

Here's some example that concisely demonstrates the situation where the issue 
can come up (it doesn't actually cause the race to go the wrong way on my 
system, but sets up the possibility for it to occur):


from threading import current_thread
from concurrent.futures import ThreadPoolExecutor
from time import sleep

pool = ThreadPoolExecutor(4)

def f(_):
print(current_thread().name)
future = pool.submit(sleep, 0.1)
future.add_done_callback(f)

f(None)


The callback from completion of one job schedules another job, indefinitely.

When run in an interactive session, this code will print thread names forever. 
You should get "MainThread" once, followed by a bunch of 
"ThreadPoolExecutor-X_Y" names (often the same name will be repeated most of 
the time, due to the GIL I think, but in theory the work could rotate between 
threads). The main thread will return to the interactive REPL right away, so 
you can type in other stuff while the executor's worker threads are printing 
stuff the background (I suggest running pool.shutdown() to make them stop). 
This is fine.

But if the example code is run as a script, you'll usually get "MainThread", 
followed by exactly four repeats of "ThreadPoolExecutor-0_0" (or fewer in the 
unlikely case that the race condition strikes you). That's the number of 
threads the ThreadPoolExecutor was limited to, but note that the thread name 
that gets printed will usually end with 0 every time (you don't get one output 
from each worker thread, just the same number of outputs as there are threads, 
all from the first thread). Why you get that number of outputs (instead of zero 
or one or an infinite number) was one part of the Stack Overflow question.

The answer turned out to be that after the main thread has queued up the first 
job in the ThreadPoolExecutor, it runs off the end of the script's code, so it 
starts shutting down the interpreter. The cleanup function _python_exit (in 
Lib/concurrent/futures/thread.py) gets run since it is registered with atexit, 
and it tries to signal the worker threads to shut down cleanly. However, the 
shutdown logic interacts oddly with an executor that's still spinning up its 
threads. It will only signal and join the threads that existed when it started 
running, not any new threads.

As it turns out, usually the newly spawned threads will shut themselves down 
immediately after they spawn, but as a side effect, the first worker thread 
carries on longer than expected, doing one additional job for each new thread 
that gets spawned and exiting itself only when the executor has a full set. 
This is why there are four outputs from the worker thread instead of some other 
number. But the exact behavior is dependent on the thread scheduling order, so 
there is a race condition.

You can demonstrate a change in behavior from different timing by putting a 
call to time.sleep at the very top of the _worker() function (delaying how 
quickly the new threads can get to the work queue). You should see the program 
behavior change to only print "ThreadPoolExecutor-0_0" once before exiting.

Lets go through the steps of the process:

1. The main thread runs f() and schedules a job (which adds a work item to the 
executor's work queue). The first worker thread is spawned by the executor to 
run the job, since it doesn't have any threads yet. The main thread also sets a 
callback on the future to run f() again.

2. The main thread exits f() and reaches the end of the script, so it begins 
the interpreter shutdown process, including calling atexit functions. One of 
those is _python_exit, which adds a reference to None to the executor's work 
queue. Note that the None is added *after* the job item from step 1 (since 
they're both done by the same main thread). It then calls join() on the worker 
thread spawned in step 1, waiting for it to exit. It won't try to join any 
other threads that spawn later, since they don't exist yet.

3. The first worker thread spawned by the executor in step 1 begins running and 
pops an item off the work queue. The first item is a real job, so it runs it. 
(The first parts of this step may be running in parallel with step 2, but 
completing job will take much longer than step 2, so the rest of this step runs 
by itself after step 2 has finished.) Eventually the job ends and the callback 
function on the Future is called, which schedules another job (putting a job 
item