Re: [Python-Dev] Investigating time for `import requests`
On Mon, Oct 2, 2017 at 6:42 PM, Raymond Hettinger < raymond.hettin...@gmail.com> wrote: > > > On Oct 2, 2017, at 12:39 AM, Nick Coghlan wrote: > > > > "What requests uses" can identify a useful set of > > avoidable imports. A Flask "Hello world" app could likely provide > > another such sample, as could some example data analysis notebooks). > > Right. It is probably worthwhile to identify which parts of the library > are typically imported but are not ever used. And likewise, identify a > core set of commonly used tools that are going to be almost unavoidable in > sufficiently interesting applications (like using requests to access a REST > API, running a micro-webframework, or invoking mercurial). > > Presumably, if any of this is going to make a difference to end users, we > need to see if there is any avoidable work that takes a significant > fraction of the total time from invocation through the point where the user > first sees meaningful output. That would include loading from nonvolatile > storage, executing the various imports, and doing the actual application. > > I don't expect to find anything that would help users of Django, Flask, > and Bottle since those are typically long-running apps where we value > response time more than startup time. > > For scripts using the requests module, there will be some fruit because > not everything that is imported is used. However, that may not be > significant because scripts using requests tend to be I/O bound. In the > timings below, 6% of the running time is used to load and run python.exe, > another 16% is used to import requests, and the remaining 78% is devoted to > the actual task of running a simple REST API query. It would be interesting > to see how much of the 16% could be avoided without major alterations to > requests, to urllib3, and to the standard library. > It is certainly true that for a CLI tool that actually makes any network I/O, especially SSL, import times will quickly be negligible. It becomes tricky for complex tools, because of error management. For example, a common pattern I have used in the past is to have a high level "catch all exceptions" function that dispatch the CLI command: try: main_function(...) except ErrorKind1: except requests.exceptions.SSLError: # gives complete message about options when receiving SSL errors, e.g. invalid certificate This pattern requires importing requests every time the command is run, even if no network IO is actually done. For complex CLI tools, maybe most command don't use network IO (the tool in question was a complete packages manager), but you pay ~100 ms because of requests import for every command. It is particularly visible because commands latency starts to be felt around 100-150 ms, and while you can do a lot in python in 100-150 ms, you can't do much in 0-50 ms. David > For mercurial, "hg log" or "hg commit" will likely be instructive about > what portion of the imports actually get used. A push or pull will likely > be I/O bound so those commands are less informative. > > > Raymond > > > - Quick timing for a minimal script using the requests module > --- > > $ cat > demo_github_rest_api.py > import requests > info = requests.get('https://api.github.com/users/raymondh').json() > print('%(name)s works at %(company)s. Contact at %(email)s' % info) > > $ time python3.6 demo_github_rest_api.py > Raymond Hettinger works at SauceLabs. Contact at None > > real0m0.561s > user0m0.134s > sys 0m0.018s > > $ time python3.6 -c "import requests" > > real0m0.125s > user0m0.104s > sys 0m0.014s > > $ time python3.6 -c "" > > real0m0.036s > user0m0.024s > sys 0m0.005s > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > cournape%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Investigating time for `import requests`
On Sun, Oct 8, 2017 at 7:02 PM, David Cournapeau wrote: > It is certainly true that for a CLI tool that actually makes any network > I/O, especially SSL, import times will quickly be negligible. It becomes > tricky for complex tools, because of error management. For example, a common > pattern I have used in the past is to have a high level "catch all > exceptions" function that dispatch the CLI command: > > try: > main_function(...) > except ErrorKind1: > > except requests.exceptions.SSLError: > # gives complete message about options when receiving SSL errors, e.g. > invalid certificate > > This pattern requires importing requests every time the command is run, even > if no network IO is actually done. For complex CLI tools, maybe most command > don't use network IO (the tool in question was a complete packages manager), > but you pay ~100 ms because of requests import for every command. It is > particularly visible because commands latency starts to be felt around > 100-150 ms, and while you can do a lot in python in 100-150 ms, you can't do > much in 0-50 ms. This would be a perfect use-case for lazy importing, then. You'd pay the price of the import only if you get an error that isn't caught by one of the preceding except blocks. ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Investigating time for `import requests`
On Sun, Oct 8, 2017 at 11:02 AM, David Cournapeau wrote: > > On Mon, Oct 2, 2017 at 6:42 PM, Raymond Hettinger < > raymond.hettin...@gmail.com> wrote: > >> >> > On Oct 2, 2017, at 12:39 AM, Nick Coghlan wrote: >> > >> > "What requests uses" can identify a useful set of >> > avoidable imports. A Flask "Hello world" app could likely provide >> > another such sample, as could some example data analysis notebooks). >> >> Right. It is probably worthwhile to identify which parts of the library >> are typically imported but are not ever used. And likewise, identify a >> core set of commonly used tools that are going to be almost unavoidable in >> sufficiently interesting applications (like using requests to access a REST >> API, running a micro-webframework, or invoking mercurial). >> >> Presumably, if any of this is going to make a difference to end users, we >> need to see if there is any avoidable work that takes a significant >> fraction of the total time from invocation through the point where the user >> first sees meaningful output. That would include loading from nonvolatile >> storage, executing the various imports, and doing the actual application. >> >> I don't expect to find anything that would help users of Django, Flask, >> and Bottle since those are typically long-running apps where we value >> response time more than startup time. >> >> For scripts using the requests module, there will be some fruit because >> not everything that is imported is used. However, that may not be >> significant because scripts using requests tend to be I/O bound. In the >> timings below, 6% of the running time is used to load and run python.exe, >> another 16% is used to import requests, and the remaining 78% is devoted to >> the actual task of running a simple REST API query. It would be interesting >> to see how much of the 16% could be avoided without major alterations to >> requests, to urllib3, and to the standard library. >> > > It is certainly true that for a CLI tool that actually makes any network > I/O, especially SSL, import times will quickly be negligible. It becomes > tricky for complex tools, because of error management. For example, a > common pattern I have used in the past is to have a high level "catch all > exceptions" function that dispatch the CLI command: > > try: > main_function(...) > except ErrorKind1: > > except requests.exceptions.SSLError: > # gives complete message about options when receiving SSL errors, e.g. > invalid certificate > > This pattern requires importing requests every time the command is run, > even if no network IO is actually done. For complex CLI tools, maybe most > command don't use network IO (the tool in question was a complete packages > manager), but you pay ~100 ms because of requests import for every command. > It is particularly visible because commands latency starts to be felt > around 100-150 ms, and while you can do a lot in python in 100-150 ms, you > can't do much in 0-50 ms. > > Yes. OTOH, it can also happen that the *imports* are in fact what use the network IO. At the office, I usually import from a network drive. For instance, `import requests` takes a little less than a second, and `import IPython` usually takes more than a second, with some variation. ––Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Investigating time for `import requests`
On Sun, Oct 8, 2017 at 2:44 PM, Chris Angelico wrote: > On Sun, Oct 8, 2017 at 7:02 PM, David Cournapeau > wrote: > > It is certainly true that for a CLI tool that actually makes any network > > I/O, especially SSL, import times will quickly be negligible. It becomes > > tricky for complex tools, because of error management. For example, a > common > > pattern I have used in the past is to have a high level "catch all > > exceptions" function that dispatch the CLI command: > > > > try: > > main_function(...) > > except ErrorKind1: > > > > except requests.exceptions.SSLError: > > # gives complete message about options when receiving SSL errors, > e.g. > > invalid certificate > > > > This pattern requires importing requests every time the command is run, > even > > if no network IO is actually done. For complex CLI tools, maybe most > command > > don't use network IO (the tool in question was a complete packages > manager), > > but you pay ~100 ms because of requests import for every command. It is > > particularly visible because commands latency starts to be felt around > > 100-150 ms, and while you can do a lot in python in 100-150 ms, you > can't do > > much in 0-50 ms. > > This would be a perfect use-case for lazy importing, then. You'd pay > the price of the import only if you get an error that isn't caught by > one of the preceding except blocks. > I suppose it might be convenient to be able to do something like: with autoimport: try: main_function(...) except ErrorKind1: ... except requests.exceptions.SLLError: ... The easiest workaround at the moment is still pretty clumsy: def import_SLLError(): from requests.exceptions import SLLError return SLLError ... except import_SLLError(): But what happens if that gives you an ImportError? ––Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Investigating time for `import requests`
The easiest workaround at the moment is still pretty clumsy: def import_SLLError(): from requests.exceptions import SLLError return SLLError ... except import_SLLError(): But what happens if that gives you an ImportError? You can't catch a requests exception unless requests has already been imported, you could do something like: except Exception as ex: if 'requests' in sys.modules: import requests # this is basically free at this point if isinstance(ex, requests.exceptions): ... Eric. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On 7 October 2017 at 02:29, Koos Zevenhoven wrote: > While I'm actually trying not to say much here so that I can avoid this > discussion now, here's just a couple of ideas and thoughts from me at this > point: > > (A) > Instead of sending bytes and receiving memoryviews, one could consider > sending *and* receiving memoryviews for now. That could then be extended > into more types of objects in the future without changing the basic concept > of the channel. Probably, the memoryview would need to be copied (but not > the data of course). But I'm guessing copying a memoryview would be quite > fast. > The proposal is to allow sending any buffer-exporting object, so sending a memoryview would be supported. > This would hopefully require less API changes or additions in the future. > OTOH, giving it a different name like MemChannel or making it 3rd party > will buy some more time to figure out the right API. But maybe that's not > needed. > I think having both a memory-centric data channel and an object-centric data channel would be useful long term, so I don't see a lot of downsides to starting with the easier-to-implement MemChannel, and then looking at how to define a plain Channel later. For example, it occurs to me is that the closest current equivalent we have to an object level counterpart to the memory buffer protocol would be the weak reference protocol, wherein a multi-interpreter-aware proxy object could actually take care of switching interpreters as needed when manipulating reference counts. While weakrefs themselves wouldn't be usable in the general case (many builtin types don't support weak references, and we'd want to support strong cross-interpreter references anyway), a wrapt-style object proxy would provide us with a way to maintain a single strong reference to the original object in its originating interpreter (implicitly switching to that interpreter as needed), while also maintaining a regular local reference count on the proxy object in the receiving interpreter. And here's the neat thing: since subinterpreters share an address space, it would be possible to experiment with an object-proxy based channel by passing object pointers over a memoryview based channel. > (B) > We would probably then like to pretend that the object coming out the > other end of a Channel *is* the original object. As long as these channels > are the only way to directly pass objects between interpreters, there are > essentially only two ways to tell the difference (AFAICT): > > 1. Calling id(...) and sending it over to the other interpreter and > checking if it's the same. > > 2. When the same object is sent twice to the same interpreter. Then one > can compare the two with id(...) or using the `is` operator. > > There are solutions to the problems too: > > 1. Send the id() from the sending interpreter along with the sent object > so that the receiving interpreter can somehow attach it to the object and > then return it from id(...). > > 2. When an object is received, make a lookup in an interpreter-wide cache > to see if an object by this id has already been received. If yes, take that > one. > > Now it should essentially look like the received object is really "the > same one" as in the sending interpreter. This should also work with > multiple interpreters and multiple channels, as long as the id is always > preserved. > I don't personally think we want to expend much (if any) effort on presenting the illusion that the objects on either end of the channel are the "same" object, but postponing the question entirely is also one of the benefits I see to starting with MemChannel, and leaving the object-centric Channel until later. > (C) > One further complication regarding memoryview in general is that > .release() should probably be propagated to the sending interpreter somehow. > Yep, switching interpreters when releasing the buffer is the main reason you couldn't use a regular memoryview for this purpose - you need a variant that holds a strong reference to the sending interpreter, and switches back to it for the buffer release operation. > (D) > I think someone already mentioned this one, but would it not be better to > start a new interpreter in the background in a new thread by default? I > think this would make things simpler and leave more freedom regarding the > implementation in the future. If you need to run an interpreter within the > current thread, you could perhaps optionally do that too. > Not really, as that approach doesn't compose as well with existing thread management primitives like concurrent.futures.ThreadPoolExecutor. It also doesn't match the way the existing subinterpreter machinery works, where threads can change their active interpreter. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Uns
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On 14 September 2017 at 11:44, Eric Snow wrote: > Examples > > > Run isolated code > - > > :: > >interp = interpreters.create() >print('before') >interp.run('print("during")') >print('after') > A few more suggestions for examples: Running a module: main_module = mod_name interp.run(f"import runpy; runpy.run_module({main_module!r})") Running as script (including zip archives & directories): main_script = path_name interp.run(f"import runpy; runpy.run_path({main_script!r})") Running in a thread pool executor: interps = [interpreters.create() for i in range(5)] with concurrent.futures.ThreadPoolExecutor(max_workers=len(interps)) as pool: print('before') for interp in interps: pool.submit(interp.run, 'print("starting"); print("stopping")' print('after') That last one is prompted by the questions about the benefits of keeping the notion of an interpreter state distinct from the notion of a main thread (it allows a single "MainThread" object to be mapped to different OS level threads at different points in time, which means it's easier to combine with existing constructs for managing OS level thread pools). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com