I would contend that this is much more granular than Dask - this is just an optimization of Pool.map() to avoid redundantly passing the same `func` repeatedly, once per task, to each worker, with the primary goal of eliminating redundant serialization of large-memory-footprinted Callables. This is a different use case than Dask - I don't intend to approach the shared memory or distributed computing realms.
And the second call to Pool.map would update the cached "self" as a part of its initialization workflow, s.t. "the latest version of self when map() is called is taken into account". Do you see a difficulty in accomplishing the second behavior? On Fri, Oct 12, 2018 at 9:25 AM Antoine Pitrou <anto...@python.org> wrote: > > Le 12/10/2018 à 15:17, Sean Harrington a écrit : > > The implementation details need to be flushed out, but agnostic of > > these, do you believe this a valid solution to the initial problem? Do > > you also see it as a beneficial optimization to Pool, given that we > > don't need to store funcs/bound-methods/partials on the tasks themselves? > > I'm not sure, TBH. I also think it may be better to leave this to > higher levels (for example Dask will intelligently distribute data on > workers and let you work with a kind of proxy object in the main > process, transfering data only when necessary). > > > The latter concern about "what happens if `self` changed value in the > > parent" is the same concern as "what happens if `func` changes in the > > parent?" given the current implementation. This is an assumption that is > > currently made with Pool.map_async(func, ls). If "func" changes in the > > parent, there is no communication with the child. So one just needs to > > be aware that calling "map_async(self.func, ls)" while the state of > > "self" is changing, will not communicate changes to each worker. The > > state is frozen when Pool.map is called, just as is the case now. > > If you cache "self" between pool.map calls, then the question is not > "what happens if self changes *during* a map() call" but "what happens > if self changes *between* two map() calls"? While the former is > intuitively undefined, current users would expect the latter to have a > clear answer, which is: the latest version of self when map() is called > is taken into account. > > Regards > > Antoine. > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/seanharr11%40gmail.com >
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com