If imap_unordered is currently re-pickling and sending func each time it's called on the worker, I have to suspect there was some reason to do that and not cache it after the first call. Rather than assuming that's an opportunity for an optimization, I'd want to be certain it won't have edge case negative effects.
On Tue, Oct 16, 2018 at 2:53 PM Sean Harrington <seanhar...@gmail.com> wrote: > Is your concern something like the following? > > with Pool(8) as p: > gen = p.imap_unordered(func, ls) > first_elem = next(gen) > p.apply_async(long_func, x) > remaining_elems = [elem for elem in gen] > My concern was passing the same function (or a function with the same qualname). You're suggesting caching functions and identifying them by qualname to avoid re-pickling a large stateful object that's shoved into the function's defaults or closure. Is that a correct summary? If so, how would the function cache distinguish between two functions with the same name? Would it need to examine the defaults and closure as well? If so, that means it's pickling the second one anyway, so there's no efficiency gain. In [1]: def foo(a): ...: def bar(): ...: print(a) ...: return bar In [2]: f = foo(1) In [3]: g = foo(2) In [4]: f Out[4]: <function __main__.foo.<locals>.bar()> In [5]: g Out[5]: <function __main__.foo.<locals>.bar()> If we say pool.apply_async(f) and pool.apply_async(g), would you want the latter one to avoid serialization, letting the worker make a second call with the first function object?
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com