[issue29842] Make Executor.map work with infinite/large inputs correctly

2021-05-04 Thread Leonard Lausen


Change by Leonard Lausen :


--
nosy: +leezu

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29842] Make Executor.map work with infinite/large inputs correctly

2021-01-14 Thread David Lukeš

David Lukeš  added the comment:

Any updates on this? Making Executor.map lazier would indeed be more consistent 
and very useful, it would be a shame if the PR went to waste :) It's a feature 
I keep wishing for in comparison with the older and process-only 
multiprocessing API. And eventually, yielding results in the order that tasks 
complete, like multiprocessing.Pool.imap_unordered, could be added on top of 
this, which would be really neat. (I know there's 
concurrent.futures.as_completed, but again, that one doesn't handle infinite 
iterables.)

--
nosy: +David Lukeš

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29842] Make Executor.map work with infinite/large inputs correctly

2020-02-20 Thread Thomas Grainger


Change by Thomas Grainger :


--
keywords: +patch
pull_requests: +17948
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/18566

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29842] Make Executor.map work with infinite/large inputs correctly

2019-05-06 Thread Josh Rosenberg


Josh Rosenberg  added the comment:

Noticed unresolved comments (largely on documentation) on the PR and since I'm 
sprinting this week I finally had the time to address them. I merged the latest 
master into the PR, hope that's considered the correct way to approach this.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29842] Make Executor.map work with infinite/large inputs correctly

2019-05-06 Thread Josh Rosenberg


Change by Josh Rosenberg :


--
versions: +Python 3.8 -Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29842] Make Executor.map work with infinite/large inputs correctly

2018-07-25 Thread Josh Rosenberg


Josh Rosenberg  added the comment:

In response to Max's comments:

>But consider the case where input is produced slower than it can be processed 
>(`iterables` may fetch data from a database, but the callable `fn` may be a 
>fast in-memory transformation). Now suppose the `Executor.map` is called when 
>the pool is busy, so there'll be a delay before processing begins. In this 
>case, the most efficient approach is to get as much input as possible while 
>the pool is busy, since eventually (when the pool is freed up) it will become 
>the bottleneck. This is exactly what the current implementation does.

I'm not sure the "slow input iterable, fast task, competing tasks from other 
sources" case is all that interesting. Uses of Executor.map in the first place 
are usually a replacement for complex task submission; perhaps my viewpoint is 
blinkered, but I see the Executors used for *either* explicit use of submit 
*or* map, rather than mixing and matching (you might use it for both, but 
rarely interleave usages). Without a mix and match scenario (and importantly, a 
mix and match scenario where enough work is submitted before the map to occupy 
all workers, and very little work is submitted after the map begins to space 
out map tasks such that additional map input is requested while workers are 
idle), the smallish default prefetch is an improvement, simply by virtue of 
getting initial results more quickly.

The solution of making a dedicated input thread would introduce quite a lot of 
additional complexity, well beyond what I think it justifiable for a relatively 
niche use case, especially one with many available workarounds, e.g.

1. Raising the prefetch count explicitly

2. Having the caller listify the iterable (similar to passing an arbitrarily 
huge prefetch value, with the large prefetch value having the advantage of 
sending work to the workers immediately, while listifying has the advantage of 
allowing you to handle any input exceptions up front rather than receiving them 
lazily during processing)

3. Use cheaper inputs (e.g. the query string, not the results of the DB query) 
and perform the expensive work as part of the task (after all, the whole point 
is to parallelize the most expensive work)

4. Using separate Executors so the manually submitted work doesn't interfere 
with the mapped work, and vice versa

5. Making a separate ThreadPoolExecutor to generate the expensive input values 
via its own map function (optionally with a larger prefetch count), e.g. 
instead of

with SomeExecutor() as executor:
for result in executor.map(func, (get_from_db(query) for query in queries)):

do:

with SomeExecutor() as executor, ThreadPoolExecutor() as inputexec:
inputs = inputexec.map(get_from_db, queries)
for result in executor.map(func, inputs):

Point is, yes, there will still be niche cases where Executor.map isn't 
perfect, but this patch is intentionally a bit more minimal to keep the Python 
code base simple (no marshaling exceptions across thread boundaries) and avoid 
extreme behavioral changes; it has some smaller changes, e.g. it necessarily 
means input-iterator-triggered exceptions can be raised after some results are 
successfully produced, but it doesn't involve adding more implicit threading, 
marshaling exceptions across threads, etc.

Your proposed alternative, with a thread for prefetching inputs, a thread for 
sending tasks, and a thread for returning results creates a number of problems:

1. As you mentioned, if no prefetch limit is imposed, memory usage remains 
unbounded; if the input is cheap to generate and slow to process, memory 
exhaustion is nearly guaranteed for infinite inputs, and more likely for "very 
large" inputs. I'd prefer the default arguments to be stable in (almost) all 
cases, rather than try to maximize performance for rare cases at the expense of 
stability in many cases.

2. When input generation is CPU bound, you've just introduced an additional 
source of unavoidable GIL contention; granted, after the GIL fixes in 3.2, GIL 
contention tends to hurt less (before those fixes, I could easily occupy 1.9 
cores doing 0.5 cores worth of actual work with just two CPU bound threads). 
Particularly in the ProcessPoolExecutor case (where avoiding GIL contention is 
the goal), it's a little weird if you can end up with unavoidable GIL 
contention in the main process.

3. Exception handling from the input iterator just became a nightmare; in a 
"single thread performs input pulls and result yield" scenario, the exceptions 
from the input thread naturally bubble to the caller of Executor.map (possibly 
after several results have been produced, but eventually). If a separate thread 
is caching from the input iterator, we'd need to marshal the exception from 
that thread back to the thread running Executor.map so it's visible to the 
caller, and providing a traceback that is both accurate and useful is not 
obvious.

--

___

[issue29842] Make Executor.map work with infinite/large inputs correctly

2018-07-25 Thread Josh Rosenberg


Josh Rosenberg  added the comment:

In any event, sorry to be a pain, but is there any way to get some movement on 
this issue? One person reviewed the code with no significant concerns to 
address. There have been a duplicate (#30323) and closely related (#34168) 
issues opened that this would address; I'd really like to see Executor.map made 
more bulletproof against cases that plain map handles with equanimity.

Even if it's not applied as is, something similar (with prefetch count defaults 
tweaked, or, at the expense of code complexity, a separate worker thread to 
perform the prefetch to address Max's concerns) would be a vast improvement 
over the status quo.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29842] Make Executor.map work with infinite/large inputs correctly

2017-05-16 Thread Klamann

Changes by Klamann :


--
nosy: +Klamann

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29842] Make Executor.map work with infinite/large inputs correctly

2017-05-15 Thread Max

Max added the comment:

Correction: this PR is useful for `ProcessPoolExecutor` as well. I thought 
`chunksize` parameter handles infinite generators already, but I was wrong. 
And, as long as the number of items prefetched is a multiple of `chunksize`, 
there are no issues with the chunksize optimization either.

And a minor correction: when listing the advantages of this PR, I should have 
said: "In addition, if the pool is not busy when `map` is called, your 
implementation will also be more responsive, since it will yield the first 
result earlier."

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29842] Make Executor.map work with infinite/large inputs correctly

2017-05-14 Thread Max

Max added the comment:

I'm also concerned about this (undocumented) inconsistency between map and 
Executor.map.

I think you would want to make your PR limited to `ThreadPoolExecutor`. The 
`ProcessPoolExecutor` already does everything you want with its `chunksize` 
paramater, and adding `prefetch` to it will jeopardize the optimization for 
which `chunksize` is intended.

Actually, I was even thinking whether it might be worth merging `chunksize` and 
`prefetch` arguments. The semantics of the two arguments is similar but not 
identical. Specifically, for `ProcessPoolExecutor`, there is pretty clear 
pressure to increase the value of `chunksize` to reduce amortized IPC costs; 
there is no IPC with threads, so the pressure to increase `prefetch` is much 
more situational (e.g., in the busy pool example I give below).

For `ThreadPoolExecutor`, I prefer your implementation over the current one, 
but I want to point out that it is not strictly better, in the sense that *with 
default arguments*, there are situations where the current implementation 
behaves better.

In many cases your implementation behaves much better. If the input is too 
large, it prevents out of memory condition. In addition, if the pool is not 
busy when `map` is called, your implementation will also be faster, since it 
will submit the first input for processing earlier.

But consider the case where input is produced slower than it can be processed 
(`iterables` may fetch data from a database, but the callable `fn` may be a 
fast in-memory transformation). Now suppose the `Executor.map` is called when 
the pool is busy, so there'll be a delay before processing begins. In this 
case, the most efficient approach is to get as much input as possible while the 
pool is busy, since eventually (when the pool is freed up) it will become the 
bottleneck. This is exactly what the current implementation does.

The implementation you propose will (by default) only prefetch a small number 
of input items. Then when the pool becomes available, it will quickly run out 
of prefetched input, and so it will be less efficient than the current 
implementation. This is especially unfortunate since the entire time the pool 
was busy, `Executor.map` is just blocking the main thread so it's literally 
doing nothing useful.

Of course, the client can tweak `prefetch` argument to achieve better 
performance. Still, I wanted to make sure this issue is considered before the 
new implementation is adopted.

>From the performance perspective, an even more efficient implementation would 
>be one that uses three background threads:

- one to prefetch items from the input
- one to sends items to the workers for processing
- one to yield results as they become available

It has a disadvantage of being slightly more complex, so I don't know if it 
really belongs in the standard library.

Its advantage is that it will waste less time: it fetches inputs without pause, 
it submits them for processing without pause, and it makes results available to 
the client as soon as they are processed. (I have implemented and tried this 
approach, but not in productioon.)

But even this implementation requires tuning. In the case with the busy pool 
that I described above, one would want to prefetch as much input as possible, 
but that may cause too much memory consumption and also possibly waste 
computation resources (if the most of input produced proves to be unneeded in 
the end).

--
nosy: +max

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29842] Make Executor.map work with infinite/large inputs correctly

2017-03-19 Thread Josh Rosenberg

Changes by Josh Rosenberg :


--
title: Executor.map should not submit all futures prior to yielding any results 
-> Make Executor.map work with infinite/large inputs correctly

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com