You don't like using Pool.starmap and itertools.repeat or a comprehension that repeats an object?
On Wed, Oct 3, 2018, 6:30 PM Sean Harrington <seanhar...@gmail.com> wrote: > Hi guys - > > The solution to "lazily initialize" an expensive object in the worker > process (i.e. via @lru_cache) is a great solution (that I must admit I did > not think of). Additionally, in the second use case of "*passing a large > object to each worker process*", I also agree with your suggestion to > "shelter functions in a different module to avoid exposure to globals" as a > good solution if one is wary of globals. > > That said, I still think "*passing a large object from parent process to > worker processes*" should be easier when using Pool. Would either of you > be open to something like the following? > > def func(x, big_cache=None): > return big_cache[x] > > big_cache = { str(k): k for k in range(10000) } > > ls = [ i for i in range(1000) ] > > with Pool(func_kwargs={"big_cache": big_cache}) as pool: > > pool.map(func, ls) > > > It's a much cleaner interface (which presumably requires a more difficult > implementation) than my initial proposal. This also reads a lot better than > the "initializer + global" recipe (clear flow of data), and is less > constraining than the "define globals in parent" recipe. Most importantly, > when taking sequential code and parallelizing via Pool.map, this does not > force the user to re-implement "func" such that it consumes a global > (rather than a kwarg). It allows "func" to be used elsewhere (i.e. in the > parent process, from a different module, testing w/o globals, etc...).. > > This would essentially be an efficient implementation of Pool.starmap(), > where kwargs are static, and passed to each application of "func" over our > iterable. > > Thoughts? > > > On Sat, Sep 29, 2018 at 3:00 PM Michael Selik <m...@selik.org> wrote: > >> On Sat, Sep 29, 2018 at 5:24 AM Sean Harrington <seanhar...@gmail.com> >> wrote: >> >> On Fri, Sep 28, 2018 at 4:39 PM Sean Harrington <seanhar...@gmail.com> >> wrote: >> >> > My simple argument is that the developer should not be constrained >> to make the objects passed globally available in the process, as this MAY >> break encapsulation for large projects. >> >> >> >> I could imagine someone switching from Pool to ThreadPool and getting >> >> into trouble, but in my mind using threads is caveat emptor. Are you >> >> worried about breaking encapsulation in a different scenario? >> > >> > >> Without a specific example on-hand, you could imagine a tree of >> function calls that occur in the worker process (even newly created >> objects), that should not necessarily have access to objects passed from >> parent -> worker. In every case given the current implementation, they will. >> >> Echoing Antoine: If you want some functions to not have access to a >> module's globals, you can put those functions in a different module. >> Note that multiprocessing already encapsulates each subprocesses' >> globals in essentially a separate namespace. >> >> Without a specific example, this discussion is going to go around in >> circles. You have a clear aversion to globals. Antoine and I do not. >> No one else seems to have found this conversation interesting enough >> to participate, yet. > > > >>> > >> >>
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com