Josh Rosenberg <[email protected]> added the comment:
In response to Max's comments:
>But consider the case where input is produced slower than it can be processed
>(`iterables` may fetch data from a database, but the callable `fn` may be a
>fast in-memory transformation). Now suppose the `Executor.map` is called when
>the pool is busy, so there'll be a delay before processing begins. In this
>case, the most efficient approach is to get as much input as possible while
>the pool is busy, since eventually (when the pool is freed up) it will become
>the bottleneck. This is exactly what the current implementation does.
I'm not sure the "slow input iterable, fast task, competing tasks from other
sources" case is all that interesting. Uses of Executor.map in the first place
are usually a replacement for complex task submission; perhaps my viewpoint is
blinkered, but I see the Executors used for *either* explicit use of submit
*or* map, rather than mixing and matching (you might use it for both, but
rarely interleave usages). Without a mix and match scenario (and importantly, a
mix and match scenario where enough work is submitted before the map to occupy
all workers, and very little work is submitted after the map begins to space
out map tasks such that additional map input is requested while workers are
idle), the smallish default prefetch is an improvement, simply by virtue of
getting initial results more quickly.
The solution of making a dedicated input thread would introduce quite a lot of
additional complexity, well beyond what I think it justifiable for a relatively
niche use case, especially one with many available workarounds, e.g.
1. Raising the prefetch count explicitly
2. Having the caller listify the iterable (similar to passing an arbitrarily
huge prefetch value, with the large prefetch value having the advantage of
sending work to the workers immediately, while listifying has the advantage of
allowing you to handle any input exceptions up front rather than receiving them
lazily during processing)
3. Use cheaper inputs (e.g. the query string, not the results of the DB query)
and perform the expensive work as part of the task (after all, the whole point
is to parallelize the most expensive work)
4. Using separate Executors so the manually submitted work doesn't interfere
with the mapped work, and vice versa
5. Making a separate ThreadPoolExecutor to generate the expensive input values
via its own map function (optionally with a larger prefetch count), e.g.
instead of
with SomeExecutor() as executor:
for result in executor.map(func, (get_from_db(query) for query in queries)):
do:
with SomeExecutor() as executor, ThreadPoolExecutor() as inputexec:
inputs = inputexec.map(get_from_db, queries)
for result in executor.map(func, inputs):
Point is, yes, there will still be niche cases where Executor.map isn't
perfect, but this patch is intentionally a bit more minimal to keep the Python
code base simple (no marshaling exceptions across thread boundaries) and avoid
extreme behavioral changes; it has some smaller changes, e.g. it necessarily
means input-iterator-triggered exceptions can be raised after some results are
successfully produced, but it doesn't involve adding more implicit threading,
marshaling exceptions across threads, etc.
Your proposed alternative, with a thread for prefetching inputs, a thread for
sending tasks, and a thread for returning results creates a number of problems:
1. As you mentioned, if no prefetch limit is imposed, memory usage remains
unbounded; if the input is cheap to generate and slow to process, memory
exhaustion is nearly guaranteed for infinite inputs, and more likely for "very
large" inputs. I'd prefer the default arguments to be stable in (almost) all
cases, rather than try to maximize performance for rare cases at the expense of
stability in many cases.
2. When input generation is CPU bound, you've just introduced an additional
source of unavoidable GIL contention; granted, after the GIL fixes in 3.2, GIL
contention tends to hurt less (before those fixes, I could easily occupy 1.9
cores doing 0.5 cores worth of actual work with just two CPU bound threads).
Particularly in the ProcessPoolExecutor case (where avoiding GIL contention is
the goal), it's a little weird if you can end up with unavoidable GIL
contention in the main process.
3. Exception handling from the input iterator just became a nightmare; in a
"single thread performs input pulls and result yield" scenario, the exceptions
from the input thread naturally bubble to the caller of Executor.map (possibly
after several results have been produced, but eventually). If a separate thread
is caching from the input iterator, we'd need to marshal the exception from
that thread back to the thread running Executor.map so it's visible to the
caller, and providing a traceback that is both accurate and useful is not
obvious.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue29842>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com