Would this change the other pool method behavior in some way if the user, for whatever reason, mixed techniques?
imap_unordered will only block when nexting the generator. If the user mingles nexting that generator with, say, apply_async, could the change you're proposing have some side-effect? On Tue, Oct 16, 2018, 5:09 AM Sean Harrington <seanhar...@gmail.com> wrote: > @Nataniel this is what I am suggesting as well. No cacheing - just storing > the `fn` on each worker, rather than pickling it for each item in our > iterable. > > As long as we store the `fn` post-fork on the worker process (perhaps as > global), subsequent calls to Pool.map shouldn't be effected (referencing > Antoine's & Michael's points that "multiprocessing encapsulates each > subprocesses globals in a separate namespace"). > > @Antoine - I'm making an effort to take everything you've said into > consideration here. My initial PR and talk > <https://www.youtube.com/watch?v=DH0JVSXvxu0> was intended to shed light > on a couple of pitfalls that I often see Python end-users encounter with > Pool. Moving beyond my naive first attempt, and the onslaught of deserved > criticism, it seems that we have an opportunity here: No changes to the > interface, just an optimization to reduce the frequency of pickling. > > Raymond Hettinger may also be interested in this optimization, as he > speaks (with great analogies) about different ways you can misuse > concurrency in Python <https://www.youtube.com/watch?v=9zinZmE3Ogk>. This > would address one of the pitfalls that he outlines: the "size of the > serialized/deserialized data". > > Is this an optimization that either of you would be willing to review, and > accept, if I find there is a *reasonable way* to implement it? > > > On Fri, Oct 12, 2018 at 3:40 PM Nathaniel Smith <n...@pobox.com> wrote: > >> On Fri, Oct 12, 2018, 06:09 Antoine Pitrou <solip...@pitrou.net> wrote: >> >>> On Fri, 12 Oct 2018 08:33:32 -0400 >>> Sean Harrington <seanhar...@gmail.com> wrote: >>> > Hi Nathaniel - this if this solution can be made performant, than I >>> would >>> > be more than satisfied. >>> > >>> > I think this would require removing "func" from the "task tuple", and >>> > storing the "func" "once per worker" somewhere globally (maybe a class >>> > attribute set post-fork?). >>> > >>> > This also has the beneficial outcome of increasing general performance >>> of >>> > Pool.map and friends. I've seen MANY folks across the interwebs doing >>> > things like passing instance methods to map, resulting in "big" tasks, >>> and >>> > slower-than-sequential parallelized code. Parallelizing "instance >>> methods" >>> > by passing them to map, w/o needing to wrangle with staticmethods and >>> > globals, would be a GREAT feature! It'd just be as easy as: >>> > >>> > Pool.map(self.func, ls) >>> > >>> > What do you think about this idea? This is something I'd be able to >>> take >>> > on, assuming I get a few core dev blessings... >>> >>> Well, I'm not sure how it would work, so it's difficult to give an >>> opinion. How do you plan to avoid passing "self"? By caching (by >>> equality? by identity?)? Something else? But what happens if "self" >>> changed value (in the case of a mutable object) in the parent? Do you >>> keep using the stale version in the child? That would break >>> compatibility... >>> >> >> I was just suggesting that within a single call to Pool.map, it would be >> reasonable optimization to only send the fn once to each worker. So e.g. if >> you have 5 workers and 1000 items, you'd only pickle fn 5 times, rather >> than 1000 times like we do now. I wouldn't want to get any fancier than >> that with caching data between different map calls or anything. >> >> Of course even this may turn out to be too complicated to implement in a >> reasonable way, since it would require managing some extra state on the >> workers. But semantically it would be purely an optimization of current >> semantics. >> >> -n >> >>> _______________________________________________ >> Python-Dev mailing list >> Python-Dev@python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/seanharr11%40gmail.com >> > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/mike%40selik.org >
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com