Michael - the initializer/globals pattern still might be necessary if you need to create an object AFTER a worker process has been instantiated (i.e. a database connection). Further, the user may want to access all of the niceties of Pool, like imap, imap_unordered, etc. The goal (IMO) would be to preserve an interface which many Python users have grown accustomed to, and to allow them to access this optimization out-of-the-bag.
Having talked to folks at the Boston Python meetup, folks on my dev team, and perusing stack overflow, this "instance method parallelization" is a pretty common pattern that is often times a negative return on investment for the developer, due to the implicit implementation detail of pickling the function (and object) once per task. Is anyone open to reviewing a PR concerning this optimization of Pool, delivered as a subclass? This feature restricts the number of unique tasks being executed by workers at once to 1, while allowing aggressive subprocess-level function cacheing to prevent repeated serialization/deserialization of large functions/closures. The use case is s.t. the user only ever needs 1 call to Pool.map(func, ls) (or friends) executing at once, when `func` has a non-trivial memory footprint. On Fri, Oct 19, 2018 at 4:06 PM Michael Selik <m...@selik.org> wrote: > On Fri, Oct 19, 2018 at 5:01 AM Sean Harrington <seanhar...@gmail.com> > wrote: > >> I like the idea to extend the Pool class [to optimize the case when only >> one function is passed to the workers]. >> > > Why would this keep the same interface as the Pool class? If its workers > are restricted to calling only one function, that should be passed into the > Pool constructor. The map and apply methods would then only receive that > function's args and not the function itself. You're also trying to avoid > the initializer/globals pattern, so you could eliminate that parameter from > the Pool constructor. In fact, it sounds more like you'd want a function > than a class. You can call it "procmap" or similar. That's code I've > written more than once. > > results = poolmap(func, iterable, processes=cpu_count()) > > The nuance is that, since there's no explicit context manager, you'll want > to ensure the pool is shut down after all the tasks are finished, even if > the results generator hasn't been fully consumed. >
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com