Re: [Python-Dev] Investigating time for `import requests`

2017-10-08 Thread David Cournapeau
On Mon, Oct 2, 2017 at 6:42 PM, Raymond Hettinger <
raymond.hettin...@gmail.com> wrote:

>
> > On Oct 2, 2017, at 12:39 AM, Nick Coghlan  wrote:
> >
> >  "What requests uses" can identify a useful set of
> > avoidable imports. A Flask "Hello world" app could likely provide
> > another such sample, as could some example data analysis notebooks).
>
> Right.  It is probably worthwhile to identify which parts of the library
> are typically imported but are not ever used.  And likewise, identify a
> core set of commonly used tools that are going to be almost unavoidable in
> sufficiently interesting applications (like using requests to access a REST
> API, running a micro-webframework, or invoking mercurial).
>
> Presumably, if any of this is going to make a difference to end users, we
> need to see if there is any avoidable work that takes a significant
> fraction of the total time from invocation through the point where the user
> first sees meaningful output.  That would include loading from nonvolatile
> storage, executing the various imports, and doing the actual application.
>
> I don't expect to find anything that would help users of Django, Flask,
> and Bottle since those are typically long-running apps where we value
> response time more than startup time.
>
> For scripts using the requests module, there will be some fruit because
> not everything that is imported is used.  However, that may not be
> significant because scripts using requests tend to be I/O bound.  In the
> timings below, 6% of the running time is used to load and run python.exe,
> another 16% is used to import requests, and the remaining 78% is devoted to
> the actual task of running a simple REST API query. It would be interesting
> to see how much of the 16% could be avoided without major alterations to
> requests, to urllib3, and to the standard library.
>

It is certainly true that for a CLI tool that actually makes any network
I/O, especially SSL, import times will quickly be negligible. It becomes
tricky for complex tools, because of error management. For example, a
common pattern I have used in the past is to have a high level "catch all
exceptions" function that dispatch the CLI command:

try:
main_function(...)
except ErrorKind1:

except requests.exceptions.SSLError:
# gives complete message about options when receiving SSL errors, e.g.
invalid certificate

This pattern requires importing requests every time the command is run,
even if no network IO is actually done. For complex CLI tools, maybe most
command don't use network IO (the tool in question was a complete packages
manager), but you pay ~100 ms because of requests import for every command.
It is particularly visible because commands latency starts to be felt
around 100-150 ms, and while you can do a lot in python in 100-150 ms, you
can't do much in 0-50 ms.

David


> For mercurial, "hg log" or "hg commit" will likely be instructive about
> what portion of the imports actually get used.  A push or pull will likely
> be I/O bound so those commands are less informative.
>
>
> Raymond
>
>
> - Quick timing for a minimal script using the requests module
> ---
>
> $ cat > demo_github_rest_api.py
> import requests
> info = requests.get('https://api.github.com/users/raymondh').json()
> print('%(name)s works at %(company)s. Contact at %(email)s' % info)
>
> $ time python3.6 demo_github_rest_api.py
> Raymond Hettinger works at SauceLabs. Contact at None
>
> real0m0.561s
> user0m0.134s
> sys 0m0.018s
>
> $ time python3.6 -c "import requests"
>
> real0m0.125s
> user0m0.104s
> sys 0m0.014s
>
> $ time python3.6 -c ""
>
> real0m0.036s
> user0m0.024s
> sys 0m0.005s
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> cournape%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Investigating time for `import requests`

2017-10-08 Thread Chris Angelico
On Sun, Oct 8, 2017 at 7:02 PM, David Cournapeau  wrote:
> It is certainly true that for a CLI tool that actually makes any network
> I/O, especially SSL, import times will quickly be negligible. It becomes
> tricky for complex tools, because of error management. For example, a common
> pattern I have used in the past is to have a high level "catch all
> exceptions" function that dispatch the CLI command:
>
> try:
> main_function(...)
> except ErrorKind1:
> 
> except requests.exceptions.SSLError:
> # gives complete message about options when receiving SSL errors, e.g.
> invalid certificate
>
> This pattern requires importing requests every time the command is run, even
> if no network IO is actually done. For complex CLI tools, maybe most command
> don't use network IO (the tool in question was a complete packages manager),
> but you pay ~100 ms because of requests import for every command. It is
> particularly visible because commands latency starts to be felt around
> 100-150 ms, and while you can do a lot in python in 100-150 ms, you can't do
> much in 0-50 ms.

This would be a perfect use-case for lazy importing, then. You'd pay
the price of the import only if you get an error that isn't caught by
one of the preceding except blocks.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Investigating time for `import requests`

2017-10-08 Thread Koos Zevenhoven
On Sun, Oct 8, 2017 at 11:02 AM, David Cournapeau 
wrote:

>
> On Mon, Oct 2, 2017 at 6:42 PM, Raymond Hettinger <
> raymond.hettin...@gmail.com> wrote:
>
>>
>> > On Oct 2, 2017, at 12:39 AM, Nick Coghlan  wrote:
>> >
>> >  "What requests uses" can identify a useful set of
>> > avoidable imports. A Flask "Hello world" app could likely provide
>> > another such sample, as could some example data analysis notebooks).
>>
>> Right.  It is probably worthwhile to identify which parts of the library
>> are typically imported but are not ever used.  And likewise, identify a
>> core set of commonly used tools that are going to be almost unavoidable in
>> sufficiently interesting applications (like using requests to access a REST
>> API, running a micro-webframework, or invoking mercurial).
>>
>> Presumably, if any of this is going to make a difference to end users, we
>> need to see if there is any avoidable work that takes a significant
>> fraction of the total time from invocation through the point where the user
>> first sees meaningful output.  That would include loading from nonvolatile
>> storage, executing the various imports, and doing the actual application.
>>
>> I don't expect to find anything that would help users of Django, Flask,
>> and Bottle since those are typically long-running apps where we value
>> response time more than startup time.
>>
>> For scripts using the requests module, there will be some fruit because
>> not everything that is imported is used.  However, that may not be
>> significant because scripts using requests tend to be I/O bound.  In the
>> timings below, 6% of the running time is used to load and run python.exe,
>> another 16% is used to import requests, and the remaining 78% is devoted to
>> the actual task of running a simple REST API query. It would be interesting
>> to see how much of the 16% could be avoided without major alterations to
>> requests, to urllib3, and to the standard library.
>>
>
> It is certainly true that for a CLI tool that actually makes any network
> I/O, especially SSL, import times will quickly be negligible. It becomes
> tricky for complex tools, because of error management. For example, a
> common pattern I have used in the past is to have a high level "catch all
> exceptions" function that dispatch the CLI command:
>
> try:
> main_function(...)
> except ErrorKind1:
> 
> except requests.exceptions.SSLError:
> # gives complete message about options when receiving SSL errors, e.g.
> invalid certificate
>
> This pattern requires importing requests every time the command is run,
> even if no network IO is actually done. For complex CLI tools, maybe most
> command don't use network IO (the tool in question was a complete packages
> manager), but you pay ~100 ms because of requests import for every command.
> It is particularly visible because commands latency starts to be felt
> around 100-150 ms, and while you can do a lot in python in 100-150 ms, you
> can't do much in 0-50 ms.
>
>
Yes. ​OTOH, ​it can also happen that the *imports* are in fact what use the
network IO. At the office, I usually import from a network drive. For
instance, `import requests` takes a little less than a second, and `import
IPython` usually takes more than a second, with some variation.

––Koos


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Investigating time for `import requests`

2017-10-08 Thread Koos Zevenhoven
On Sun, Oct 8, 2017 at 2:44 PM, Chris Angelico  wrote:

> On Sun, Oct 8, 2017 at 7:02 PM, David Cournapeau 
> wrote:
> > It is certainly true that for a CLI tool that actually makes any network
> > I/O, especially SSL, import times will quickly be negligible. It becomes
> > tricky for complex tools, because of error management. For example, a
> common
> > pattern I have used in the past is to have a high level "catch all
> > exceptions" function that dispatch the CLI command:
> >
> > try:
> > main_function(...)
> > except ErrorKind1:
> > 
> > except requests.exceptions.SSLError:
> > # gives complete message about options when receiving SSL errors,
> e.g.
> > invalid certificate
> >
> > This pattern requires importing requests every time the command is run,
> even
> > if no network IO is actually done. For complex CLI tools, maybe most
> command
> > don't use network IO (the tool in question was a complete packages
> manager),
> > but you pay ~100 ms because of requests import for every command. It is
> > particularly visible because commands latency starts to be felt around
> > 100-150 ms, and while you can do a lot in python in 100-150 ms, you
> can't do
> > much in 0-50 ms.
>
> This would be a perfect use-case for lazy importing, then. You'd pay
> the price of the import only if you get an error that isn't caught by
> one of the preceding except blocks.
>


​I suppose it might be convenient to be able to do something like:

with autoimport:
try:
main_function(...)
   ​ except ErrorKind1:
...
except requests.exceptions.SLLError:
...


The easiest workaround at the moment is still pretty clumsy:

def import_SLLError():
from requests.exceptions import SLLError
return SLLError

...


except import_SLLError():


But what happens if that gives you an ImportError?

––Koos


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Investigating time for `import requests`

2017-10-08 Thread Eric V. Smith

The easiest workaround at the moment is still pretty clumsy:

def import_SLLError():
     from requests.exceptions import SLLError
     return SLLError

...


     except import_SLLError():


But what happens if that gives you an ImportError?


You can't catch a requests exception unless requests has already been 
imported, you could do something like:


except Exception as ex:
if 'requests' in sys.modules:
import requests  # this is basically free at this point
if isinstance(ex, requests.exceptions):
...

Eric.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-08 Thread Nick Coghlan
On 7 October 2017 at 02:29, Koos Zevenhoven  wrote:

> While I'm actually trying not to say much here so that I can avoid this
> discussion now, here's just a couple of ideas and thoughts from me at this
> point:
>
> (A)
> Instead of sending bytes and receiving memoryviews, one could consider
> sending *and* receiving memoryviews for now. That could then be extended
> into more types of objects in the future without changing the basic concept
> of the channel. Probably, the memoryview would need to be copied (but not
> the data of course). But I'm guessing copying a memoryview would be quite
> fast.
>

The proposal is to allow sending any buffer-exporting object, so sending a
memoryview would be supported.


> This would hopefully require less API changes or additions in the future.
> OTOH, giving it a different name like MemChannel or making it 3rd party
> will buy some more time to figure out the right API. But maybe that's not
> needed.
>

I think having both a memory-centric data channel and an object-centric
data channel would be useful long term, so I don't see a lot of downsides
to starting with the easier-to-implement MemChannel, and then looking at
how to define a plain Channel later.

For example, it occurs to me is that the closest current equivalent we have
to an object level counterpart to the memory buffer protocol would be the
weak reference protocol, wherein a multi-interpreter-aware proxy object
could actually take care of switching interpreters as needed when
manipulating reference counts.

While weakrefs themselves wouldn't be usable in the general case (many
builtin types don't support weak references, and we'd want to support
strong cross-interpreter references anyway), a wrapt-style object proxy
would provide us with a way to maintain a single strong reference to the
original object in its originating interpreter (implicitly switching to
that interpreter as needed), while also maintaining a regular local
reference count on the proxy object in the receiving interpreter.

And here's the neat thing: since subinterpreters share an address space, it
would be possible to experiment with an object-proxy based channel by
passing object pointers over a memoryview based channel.


> (B)
> We would probably then like to pretend that the object coming out the
> other end of a Channel *is* the original object. As long as these channels
> are the only way to directly pass objects between interpreters, there are
> essentially only two ways to tell the difference (AFAICT):
>
> 1. Calling id(...) and sending it over to the other interpreter and
> checking if it's the same.
>
> 2. When the same object is sent twice to the same interpreter. Then one
> can compare the two with id(...) or using the `is` operator.
>
> There are solutions to the problems too:
>
> 1. Send the id() from the sending interpreter along with the sent object
> so that the receiving interpreter can somehow attach it to the object and
> then return it from id(...).
>
> 2. When an object is received, make a lookup in an interpreter-wide cache
> to see if an object by this id has already been received. If yes, take that
> one.
>
> Now it should essentially look like the received object is really "the
> same one" as in the sending interpreter. This should also work with
> multiple interpreters and multiple channels, as long as the id is always
> preserved.
>

I don't personally think we want to expend much (if any) effort on
presenting the illusion that the objects on either end of the channel are
the "same" object, but postponing the question entirely is also one of the
benefits I see to starting with MemChannel, and leaving the object-centric
Channel until later.


> (C)
> One further complication regarding memoryview in general is that
> .release() should probably be propagated to the sending interpreter somehow.
>

Yep, switching interpreters when releasing the buffer is the main reason
you couldn't use a regular memoryview for this purpose - you need a variant
that holds a strong reference to the sending interpreter, and switches back
to it for the buffer release operation.


> (D)
> I think someone already mentioned this one, but would it not be better to
> start a new interpreter in the background in a new thread by default? I
> think this would make things simpler and leave more freedom regarding the
> implementation in the future. If you need to run an interpreter within the
> current thread, you could perhaps optionally do that too.
>

Not really, as that approach doesn't compose as well with existing thread
management primitives like concurrent.futures.ThreadPoolExecutor. It also
doesn't match the way the existing subinterpreter machinery works, where
threads can change their active interpreter.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Uns

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-08 Thread Nick Coghlan
On 14 September 2017 at 11:44, Eric Snow 
wrote:

> Examples
> 
>
> Run isolated code
> -
>
> ::
>
>interp = interpreters.create()
>print('before')
>interp.run('print("during")')
>print('after')
>

A few more suggestions for examples:

Running a module:

main_module = mod_name
interp.run(f"import runpy; runpy.run_module({main_module!r})")

Running as script (including zip archives & directories):

main_script = path_name
interp.run(f"import runpy; runpy.run_path({main_script!r})")

Running in a thread pool executor:

interps = [interpreters.create() for i in range(5)]
with concurrent.futures.ThreadPoolExecutor(max_workers=len(interps)) as
pool:
print('before')
for interp in interps:
pool.submit(interp.run, 'print("starting"); print("stopping")'
print('after')

That last one is prompted by the questions about the benefits of keeping
the notion of an interpreter state distinct from the notion of a main
thread (it allows a single "MainThread" object to be mapped to different OS
level threads at different points in time, which means it's easier to
combine with existing constructs for managing OS level thread pools).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com