Kyle Stanley <[email protected]> added the comment:
> What "ignores the max_workers argument" means?
>From my understanding, their argument was that the parameter name
>"max_workers" and documentation implies that it will spawn processes as needed
>up to *max_workers* based on the number of jobs scheduled.
> And would you create a simple reproducible example?
I can't speak directly for the OP, but this simple example may demonstrate what
they're talking about:
Linux 5.4.8
Python 3.8.1
```
import concurrent.futures as cf
import os
import random
def get_rand_nums(ls, n):
return [random.randint(1, 100) for i in range(n)]
def show_processes():
print("All python processes:")
os.system("ps -C python")
def main():
nums = []
with cf.ProcessPoolExecutor(max_workers=6) as executor:
futs = []
show_processes()
for _ in range(3):
fut = executor.submit(get_rand_nums, nums, 10_000_000)
futs.append(fut)
show_processes()
for fut in cf.as_completed(futs):
nums.extend(fut.result())
show_processes()
assert len(nums) == 30_000_000
if __name__ == '__main__':
main()
```
Output:
```
[aeros:~/programming/python]$ python ppe_max_workers.py
All python processes: # Main python process
PID TTY TIME CMD
23683 pts/1 00:00:00 python
All python processes: # Main python process + 6 unused subprocesses
PID TTY TIME CMD
23683 pts/1 00:00:00 python
23685 pts/1 00:00:00 python
23686 pts/1 00:00:00 python
23687 pts/1 00:00:00 python
23688 pts/1 00:00:00 python
23689 pts/1 00:00:00 python
23690 pts/1 00:00:00 python
All python processes: # Main python process + 3 used subprocesses + 3 unused
subprocesses
PID TTY TIME CMD
23683 pts/1 00:00:00 python
23685 pts/1 00:00:07 python
23686 pts/1 00:00:07 python
23687 pts/1 00:00:07 python
23688 pts/1 00:00:00 python
23689 pts/1 00:00:00 python
23690 pts/1 00:00:00 python
```
As seen above, all processes up to *max_workers* were spawned immediately after
the jobs were submitted to ProcessPoolExecutor, regardless of the actual number
of jobs (3). It is also apparent that only three of those spawned processes
were utilized by the CPU, as indicated by the values in the TIME field. The
other three processes were not used.
If it wasn't for this behavior, I think there would be a significant
performance loss, as the executor would have to continuously calculate how many
processes are needed and spawn them throughout it's lifespan. AFAIK, it _seems_
more efficient to spawn *max_workers* processes when the jobs are scheduled,
and then use them as needed; rather than spawning the processes as needed.
As a result, I think the current behavior should remain the same; unless
someone can come up with a backwards-compatible alternative version and
demonstrate its advantages over the current one.
However, I do think the current documentation could do a better at explaining
how max_workers actually behaves. See the current explanation:
https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor.
The current version does not address any of the above points. In fact, the
first line seems like it might imply the opposite of what it's actually doing
(at least based on my above example):
"An Executor subclass that executes calls asynchronously *using a pool of at
most max_workers processes*." (asterisks added for emphasis)
"using a pool of at most max_workers processes" could imply to users that
*max_workers* sets an upper bound limit on the number of processes in the pool,
but that *max_workers* is only reached if all of those processes are _needed_.
Unless I'm misunderstanding something, that's not the case.
I would suggest converting this into a documentation issue, assuming that the
experts for the concurrent.futures confirm that the present behavior is
intentional and that I'm correctly understanding the OP.
----------
nosy: +aeros
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue39207>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com