Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-22 Thread Sean Harrington
On Mon, Oct 22, 2018 at 2:01 PM Michael Selik wrote: > This thread seems more appropriate for python-ideas than python-dev. > > > On Mon, Oct 22, 2018 at 5:28 AM Sean Harrington > wrote: > >> Michael - the initializer/globals pattern still might be necessary if you >> need to create an object

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-22 Thread Michael Selik
This thread seems more appropriate for python-ideas than python-dev. On Mon, Oct 22, 2018 at 5:28 AM Sean Harrington wrote: > Michael - the initializer/globals pattern still might be necessary if you > need to create an object AFTER a worker process has been instantiated (i.e. > a database

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-22 Thread Sean Harrington
Michael - the initializer/globals pattern still might be necessary if you need to create an object AFTER a worker process has been instantiated (i.e. a database connection). Further, the user may want to access all of the niceties of Pool, like imap, imap_unordered, etc. The goal (IMO) would be

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-19 Thread Michael Selik
On Fri, Oct 19, 2018 at 5:01 AM Sean Harrington wrote: > I like the idea to extend the Pool class [to optimize the case when only > one function is passed to the workers]. > Why would this keep the same interface as the Pool class? If its workers are restricted to calling only one function,

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-19 Thread Sean Harrington
On Fri, Oct 19, 2018 at 7:32 AM Joni Orponen wrote: > On Fri, Oct 19, 2018 at 9:09 AM Thomas Moreau < > thomas.moreau.2...@gmail.com> wrote: > >> Hello, >> >> I have been working on the concurent.futures module lately and I think >> this optimization should be avoided in the context of python

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-19 Thread Joni Orponen
On Fri, Oct 19, 2018 at 9:09 AM Thomas Moreau wrote: > Hello, > > I have been working on the concurent.futures module lately and I think > this optimization should be avoided in the context of python Pools. > > This is an interesting idea, however its implementation will bring many > complicated

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-19 Thread Thomas Moreau
Hello, I have been working on the concurent.futures module lately and I think this optimization should be avoided in the context of python Pools. This is an interesting idea, however its implementation will bring many complicated issues as it breaks the basic paradigm of a Pool: the tasks are

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-18 Thread Chris Jerdonek
On Thu, Oct 18, 2018 at 9:11 AM Michael Selik wrote: > On Thu, Oct 18, 2018 at 8:35 AM Sean Harrington wrote: >> Further, let me pivot on my idea of __qualname__...we can use the `id` of >> `func` as the cache key to address your concern, and store this `id` on the >> `task` tuple (i.e. an

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-18 Thread Michael Selik
One idea would be for the Pool method to generate a uuid and slap it on the function as an attribute. If a function being passed in doesn't have one, generate one. If it already has one, just pass that instead of pickling. The child process will keep a cache mapping uuids to functions. I'm still

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-18 Thread Michael Selik
On Thu, Oct 18, 2018 at 8:35 AM Sean Harrington wrote: > The most common use case comes up when passing instance methods (of really > big objects!) to Pool.map(). > This reminds me of that old joke: "A patient says to the doctor, 'Doctor, it hurts when I ...!' The doctor replies, 'Well, don't

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-18 Thread Sean Harrington
You have correctly identified the summary of my intentions, and I agree with your reasoning & concern - however there is a somewhat reasonable answer as to why this optimization has never been implemented: In Pool, the `task` tuple consists of (result_job, func, (x,), {}) . This is the object

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-17 Thread Michael Selik
If imap_unordered is currently re-pickling and sending func each time it's called on the worker, I have to suspect there was some reason to do that and not cache it after the first call. Rather than assuming that's an opportunity for an optimization, I'd want to be certain it won't have edge case

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-16 Thread Sean Harrington
Is your concern something like the following? with Pool(8) as p: gen = p.imap_unordered(func, ls) first_elem = next(gen) p.apply_async(long_func, x) remaining_elems = [elem for elem in gen] ...here, if we store "func" on each worker Process as a global, and execute this pattern

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-16 Thread Michael Selik
Would this change the other pool method behavior in some way if the user, for whatever reason, mixed techniques? imap_unordered will only block when nexting the generator. If the user mingles nexting that generator with, say, apply_async, could the change you're proposing have some side-effect?

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-16 Thread Sean Harrington
@Nataniel this is what I am suggesting as well. No cacheing - just storing the `fn` on each worker, rather than pickling it for each item in our iterable. As long as we store the `fn` post-fork on the worker process (perhaps as global), subsequent calls to Pool.map shouldn't be effected

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-12 Thread Nathaniel Smith
On Fri, Oct 12, 2018, 06:09 Antoine Pitrou wrote: > On Fri, 12 Oct 2018 08:33:32 -0400 > Sean Harrington wrote: > > Hi Nathaniel - this if this solution can be made performant, than I would > > be more than satisfied. > > > > I think this would require removing "func" from the "task tuple", and

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-12 Thread Antoine Pitrou
Le 12/10/2018 à 16:49, Sean Harrington a écrit : > Yes - “func” (and “self” which func is bound to) would be copied to each > child worker process, where they are stored and applied to each element > of the iterable being mapped over. Only if it has changed, then, right? I suspect that would

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-12 Thread Sean Harrington
Yes - “func” (and “self” which func is bound to) would be copied to each child worker process, where they are stored and applied to each element of the iterable being mapped over. On Fri, Oct 12, 2018 at 10:41 AM Antoine Pitrou wrote: > On Fri, 12 Oct 2018 09:42:50 -0400 > Sean Harrington

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-12 Thread Antoine Pitrou
On Fri, 12 Oct 2018 09:42:50 -0400 Sean Harrington wrote: > I would contend that this is much more granular than Dask - this is just an > optimization of Pool.map() to avoid redundantly passing the same `func` > repeatedly, once per task, to each worker, with the primary goal of > eliminating

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-12 Thread Sean Harrington
I would contend that this is much more granular than Dask - this is just an optimization of Pool.map() to avoid redundantly passing the same `func` repeatedly, once per task, to each worker, with the primary goal of eliminating redundant serialization of large-memory-footprinted Callables. This is

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-12 Thread Antoine Pitrou
Le 12/10/2018 à 15:17, Sean Harrington a écrit : > The implementation details need to be flushed out, but agnostic of > these, do you believe this a valid solution to the initial problem? Do > you also see it as a beneficial optimization to Pool, given that we > don't need to store

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-12 Thread Sean Harrington
The implementation details need to be flushed out, but agnostic of these, do you believe this a valid solution to the initial problem? Do you also see it as a beneficial optimization to Pool, given that we don't need to store funcs/bound-methods/partials on the tasks themselves? The latter

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-12 Thread Antoine Pitrou
On Fri, 12 Oct 2018 08:33:32 -0400 Sean Harrington wrote: > Hi Nathaniel - this if this solution can be made performant, than I would > be more than satisfied. > > I think this would require removing "func" from the "task tuple", and > storing the "func" "once per worker" somewhere globally

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-12 Thread Sean Harrington
Hi Nathaniel - this if this solution can be made performant, than I would be more than satisfied. I think this would require removing "func" from the "task tuple", and storing the "func" "once per worker" somewhere globally (maybe a class attribute set post-fork?). This also has the beneficial

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-04 Thread Nathaniel Smith
On Wed, Oct 3, 2018 at 6:30 PM, Sean Harrington wrote: > with Pool(func_kwargs={"big_cache": big_cache}) as pool: > pool.map(func, ls) I feel like it would be nicer to spell this: with Pool() as pool: pool.map(functools.partial(func, big_cache=big_cache), ls) And this might also solve

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-04 Thread Sean Harrington
Starmap will serialize/deserialize the “big object” once for each task created, so this is not performant. The goal is to pay the “one time cost” of serialization of the “big object”, and still pass this object to func at each iteration. On Thu, Oct 4, 2018 at 4:14 AM Michael Selik wrote: > You

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-04 Thread Michael Selik
You don't like using Pool.starmap and itertools.repeat or a comprehension that repeats an object? On Wed, Oct 3, 2018, 6:30 PM Sean Harrington wrote: > Hi guys - > > The solution to "lazily initialize" an expensive object in the worker > process (i.e. via @lru_cache) is a great solution (that

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-03 Thread Sean Harrington
Hi guys - The solution to "lazily initialize" an expensive object in the worker process (i.e. via @lru_cache) is a great solution (that I must admit I did not think of). Additionally, in the second use case of "*passing a large object to each worker process*", I also agree with your suggestion to

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-09-29 Thread Michael Selik
On Sat, Sep 29, 2018 at 5:24 AM Sean Harrington wrote: >> On Fri, Sep 28, 2018 at 4:39 PM Sean Harrington wrote: >> > My simple argument is that the developer should not be constrained to make >> > the objects passed globally available in the process, as this MAY break >> > encapsulation for

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-09-29 Thread Sean Harrington
On Sat, Sep 29, 2018 at 8:18 AM Antoine Pitrou wrote: > On Sat, 29 Sep 2018 08:13:19 -0400 > Sean Harrington wrote: > > > > > > Hmm... We might have a disagreement on the target audience of the > > > multiprocessing module. multiprocessing isn't very high-level, I would > > > expect it to be

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-09-29 Thread Sean Harrington
On Fri, Sep 28, 2018 at 9:27 PM Michael Selik wrote: > On Fri, Sep 28, 2018 at 2:11 PM Sean Harrington > wrote: > > kwarg on Pool.__init__ called `expect_initret`, that defaults to False. > When set to True: > > Capture the return value of the initializer kwarg of Pool > > Pass this value to

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-09-29 Thread Antoine Pitrou
On Sat, 29 Sep 2018 08:13:19 -0400 Sean Harrington wrote: > > > > Hmm... We might have a disagreement on the target audience of the > > multiprocessing module. multiprocessing isn't very high-level, I would > > expect it to be used by experienced programmers who know how to mutate > > a global

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-09-29 Thread Sean Harrington
On Sat, Sep 29, 2018 at 6:24 AM Antoine Pitrou wrote: > > Hi Sean, > > On Fri, 28 Sep 2018 19:23:06 -0400 > Sean Harrington wrote: > > My simple argument is that the > > developer should not be constrained to make the objects passed globally > > available in the process, as this MAY break

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-09-29 Thread Antoine Pitrou
Hi Sean, On Fri, 28 Sep 2018 19:23:06 -0400 Sean Harrington wrote: > My simple argument is that the > developer should not be constrained to make the objects passed globally > available in the process, as this MAY break encapsulation for large > projects. IMHO, global variables don't break

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-09-28 Thread Michael Selik
On Fri, Sep 28, 2018 at 2:11 PM Sean Harrington wrote: > kwarg on Pool.__init__ called `expect_initret`, that defaults to False. When > set to True: > Capture the return value of the initializer kwarg of Pool > Pass this value to the function being applied, as a kwarg. The parameter name you

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-09-28 Thread Sean Harrington
Hi Antoine - see inline below for my response...thanks for your time! On Fri, Sep 28, 2018 at 6:45 PM Antoine Pitrou wrote: > > Hi, > > On Fri, 28 Sep 2018 17:07:33 -0400 > Sean Harrington wrote: > > > > In *short*, the implementation of the feature works as follows: > > > >1. Exposes a

Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-09-28 Thread Antoine Pitrou
Hi, On Fri, 28 Sep 2018 17:07:33 -0400 Sean Harrington wrote: > > In *short*, the implementation of the feature works as follows: > >1. Exposes a kwarg on Pool.__init__ called `expect_initret`, that >defaults to False. When set to True: > 1. Capture the return value of the

[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-09-28 Thread Sean Harrington
I am proposing an extension to the multiprocessing.Pool API that allows for an alternative way to pass data to Pool worker processes, *without* using globals. A PR has been opened , extensive test coverage is also included, with all tests & CI passing