This looks like a very nice library to put on PyPI. But it's not an idea for change to the Python language itself, so probably this is the wrong forum. Python-list is closer.
... if it is a suggestion to change the standard library itself, I'm -1 on the idea. On Sat, Feb 8, 2020 at 6:11 PM Sean McIntyre <boxys...@gmail.com> wrote: > Hi folks, > > I'd like to get some feedback on a multi-threading interface I've been > thinking about and using for the past year or so. I won't bury the lede, see > my approach here > <https://gist.github.com/boxysean/3ed325ebb75db0303002f9484821e553#file-my_example-py> > . > > *Background / problem:* > > A couple of years ago, I inherited my company's codebase to get data into > our data warehouse using an ELT approach (extract-and-loads done in python, > transforms done in dbt/SQL). The codebase has dozens of python scripts to > integrate first-party and third-party data from databases, FTPs, and APIs, > which are run on a scheduler (typically daily or hourly). The scripts I > inherited were single-threaded procedural scripts, looking like glue code, > and spending most of their time in network I/O. (See example. > <https://gist.github.com/boxysean/3ed325ebb75db0303002f9484821e553#file-unthreaded_example-py>) > This got my company pretty far! > > As my team and I added more and more integrations with more and more data, > we wanted to have faster and faster scripts to reduce our dev cycles and > reduce our multi-hour nightly jobs to minutes. Because our scripts were > network-bound, multi-threading was a good way to accomplish this, and so I > looked into concurrent.futures (example > <https://gist.github.com/boxysean/3ed325ebb75db0303002f9484821e553#file-concurrent_futures_example-py>) > and asyncio (example > <https://gist.github.com/boxysean/3ed325ebb75db0303002f9484821e553#file-asyncio_example-py>), > but I decided against these options because: > > 1. It wasn't immediately apparently how to adapt my codebase to use these > libraries without either some fundamental changes to our execution platform > and/or reworking of our scripts from the ground up and/or adding > significant lines of multi-threading code to each script. > > 2. I couldn't wrap my head around the async/await and future constructs > particularly quickly, and I was concerned that my team would also struggle > with this change. > > 3. I believe the procedural style glue code we have is quite easy to > comprehend, which I think has a positive impact on scale. > > *Solution:* > > And so, as mentioned at the top, I designed a different interface to > concurrent.futures.ThreadPoolExecutor that we are successfully using for > our extract-and-load pattern, see a basic example here > <https://gist.github.com/boxysean/3ed325ebb75db0303002f9484821e553#file-my_example-py>. > The design considerations of this interface include: > > - The usage is minimally-invasive to the original unthreaded approach of > the codebase. (And so, teaching the library to team members has been fairly > straightforward despite the multi-threaded paradigm shift.) > > - The @parallel.task decorator should be used to encapsulate a homogeneous > method accepting different parameters. The contents of the method should be > primarily I/O to achieve the concurrency gains of python multi-threading. > > - If no parallel.threads context manager has been entered, the > @parallel.task decorator acts as a no-op (and the code runs serially). > > - If an environment variable is set to disable the context manager, the > @parallel.task decorator acts as a no-op (and the code runs serially). > > - There is also an environment variable to change the number of workers > provided by parallel.threads (if not hard-coded). > > While it's possible to return a value from a @parallel.task method, I > encourage my team to use the decorator to start-and-complete work; think of > writing "embarrassingly parallel" methods that can be "mapped". > > A couple of other things we've implemented include a "thread barrier" in > the case where we want a set tasks to complete before a set of other tasks, > and a decorator for factory methods to produce cached thread-local objects > (helpful for ensuring thread-safe access to network clients that are not > thread-safe). > > *Your feedback:* > > - I'd love to hear your thoughts on my problem and solution. > > - I've done a bit of research of existing libraries in PyPI and PEPs but I > don't see any similar libraries; are you aware of anything? > > - What do you suggest I do next? I'm considering publishing it, but could > use some tips on what to here! > > Thanks! > > Sean McIntyre > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/KGSMCQT4JIVFEPXULKIYMQOIZLQZUWW5/ > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/76SNV2JMKQKIO4XGRHOMFZ2MT6637LLB/ Code of Conduct: http://python.org/psf/codeofconduct/