On Fri, Oct 19, 2018 at 9:09 AM Thomas Moreau <thomas.moreau.2...@gmail.com> wrote:
> Hello, > > I have been working on the concurent.futures module lately and I think > this optimization should be avoided in the context of python Pools. > > This is an interesting idea, however its implementation will bring many > complicated issues as it breaks the basic paradigm of a Pool: the tasks are > independent and you don't know which worker is going to run which task. > > The function is serialized with each task because of this paradigm. This > ensure that any worker picking the task will be able to perform it > independently from the tasks it has run before, given that it as been > initialized correctly at the beginning. This makes it simple to run each > task. > I would not mind if there would be a subtype of Pool where you can only apply one kind of task to. This is a very common use mode. Though the question there is 'should this live in Python itself'? I'd be fine with a package on PyPi. As the Pool comes with no scheduler, with your idea, you would need a > synchronization step to send the function to all workers before you can > launch your task. But if there is already one worker performing a long > running task, does the Pool wait for it to be done before it sends the > function? If the Pool doesn't wait, how does it ensure that this worker > will be able to get the definition of the function before running it? > Also, the multiprocessing.Pool has some features where a worker can shut > itself down after a given number of tasks or a timeout. How does it ensure > that the new worker will have the definition of the function? > It is unsafe to try such a feature (sending only once an object) anywhere > else than in the initializer which is guaranteed to be run once per worker. > > On the other hand, you mentioned an interesting point being that making > globals available in the workers could be made simpler. A possible solution > would be to add a "globals" argument in the Pool which would instanciate > global variables in the workers. I have no specific idea but on the > implementation of such features but it would be safer as it would be an > initialization feature. > Would this also mean one could use a Pool in a context where threading is used? Currently using threading side effects unpicklables into the globals. Also being able to pass in globals=None would be optimal for a lot of use cases. -- Joni Orponen
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com