Re: [Python-Dev] Investigating time for `import requests`

2017-10-01 Thread Raymond Hettinger

> On Oct 1, 2017, at 7:34 PM, Nathaniel Smith  wrote:
> 
> In principle re.compile() itself could be made lazy -- return a
> regular exception object that just holds the string, and then compiles
> and caches it the first time it's used. Might be tricky to do in a
> backwards compatibility way if it moves detection of invalid regexes
> from compile time to use time, but it could be an opt-in flag.

ISTM that someone writing ``re.compile(pattern)`` is explicitly saying they 
want the regex to be pre-compiled.   For cache on first-use, we already have a 
way to do that with ``re.search(pattern, some string)`` which compiles and then 
caches.

What would be more interesting would be to have a way to save the compiled 
regex in a pyc file so that it can be restored on load rather than recomputed.

Also, we should remind ourselves that making more and more things lazy is a 
false optimization unless those things never get used.  Otherwise, all we're 
doing is ending the timing before all the relevant work is done. If the lazy 
object does get used, we've made the actual total execution time worse (because 
of the overhead of the lazy evaluation logic).


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 553

2017-10-01 Thread Guido van Rossum
Hope you don't mind me CC'ing python-dev.

On Sun, Oct 1, 2017 at 9:38 AM, Barry Warsaw  wrote:

> You seem to be in PEP review mode :)
>
> What do you think about 553?  Still want to wait, or do you think it’s
> missing anything?  So far, all the feedback has been positive, and I think
> we can basically just close the open issues and both the PEP and
> implementation are ready to go.
>
> Happy to do more work on it if you feel it’s necessary.
> -B
>

I'm basically in agreement. Some quick notes:

- There's a comma instead of a period at the end of the 4th bullet in the
Rationale: "Breaking the idiom up into two lines further complicates the
use of the debugger," . Also I don't understand how this complicates use
(even though I also always type it as one line). And the lint warning is
actually useful when you accidentally leave this in code you send to CI
(assuming that runs a linter, as it should). TBH the biggest argument (to
me) is that I simply don't know *how* I would enter some IDE's debugger
programmatically. I think it should also be pointed out that even if an IDE
has a way to specify conditional breakpoints, the UI may be such that it's
easier to just add the check to the code -- and then the breakpoint()
option is much more attractive than having to look up how it's done in your
particular IDE (especially since this is not all that common).

- There's no rationale for the *args, **kwds part of the breakpoint()
signature. (I vaguely recall someone on the mailing list asking for it but
it seemed far-fetched at best.)

- The explanation of the relationship between sys.breakpoint() and
sys.__breakpointhook__ was unclear to me -- I had to go to the docs for
__displayhook__ (
https://docs.python.org/3/library/sys.html#sys.__displayhook__) to
understand that sys.__breakpointhook__ is simply initialized to the same
function as sys.breakpointhook, and the idea is that if you screw up you
can restore sys.breakpointhook from the other rather than having to restart
your process. Is that in fact it? The text "``sys.__breakpointhook__`` then
stashes the default value of ``sys.breakpointhook()`` to make it easy to
reset" seems to imply some action that would happen... when? how?

- Some pseudo-code would be nice. It seems that it's like this:

# in builtins
def breakpoint(*args, **kwds):
import sys
return sys.breakpointhook(*args, **kwds)

# in sys
def breakpointhook(*args, **kwds):
import os
hook = os.getenv('PYTHONBREAKPOINT')
if hook == '0':
return None
if not hook:
import pdb
return pdb.set_trace(*args, **kwds)

if '.' not in hook:
import builtins
mod = builtins
funcname = hook
else:
modname, funcname = hook.rsplit('.', 1)
__import__(modname)
import sys
mod = sys.modules[modname]
func = getattr(mod, funcname)
return func(*args, **kwds)

__breakpointhook__ = breakpointhook

Except that the error handling should be a bit better. (In particular the
PEP specifies a try/except around most of the code in sys.breakpointhook()
that issues a RuntimeWarning and returns None.)

- Not sure what the PEP's language around evaluation of PYTHONBREAKPOINT
means for the above pseudo code. I *think* the PEP author's opinion is that
the above pseudo-code is fine. Since programs can mutate their own
environment, I think something like `os.environ['PYTHONBREAKPOINT'] =
'foo.bar.baz'; breakpoint()` should result in foo.bar.baz() being imported
and called, right?

- I'm not quite sure what sort of fast-tracking for PYTHONBREAKPOINT=0 you
had in mind beyond putting it first in the code above.

- Did you get confirmation from other debuggers? E.g. does it work for
IDLE, Wing IDE, PyCharm, and VS 2015?

- I'm not sure what the point would be of making a call to breakpoint() a
special opcode (it seems a lot of work for something of this nature). ISTM
that if some IDE modifies bytecode it can do whatever it well please
without a PEP.

- I don't see the point of calling `pdb.pm()` at breakpoint time. But it
could be done using the PEP with `import pdb; sys.breakpointhook = pdb.pm`
right? So this hardly deserves an open issue.

- I haven't read the actual implementation in the PR. A PEP should not
depend on the actual proposed implementation for disambiguation of its
specification (hence my proposal to add pseudo-code to the PEP).

That's what I have!

-- 
--Guido van Rossum (python.org/~guido )
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Intention to accept PEP 552 soon (deterministic pyc files)

2017-10-01 Thread Guido van Rossum
One more thing. I would really appreciate it if you properly wrapped lines
in your PEP around column 72 instead of using a single line per paragraph.
This is the standard convention, see the template in PEP 12.

On Sun, Oct 1, 2017 at 8:42 PM, Guido van Rossum  wrote:

> On Sun, Oct 1, 2017 at 1:52 PM, Koos Zevenhoven  wrote:
>
>> On Oct 1, 2017 19:26, "Guido van Rossum"  wrote:
>>
>> Your PEP is currently incomplete. If you don't finish it, it is not even
>> a contender. But TBH it's not my favorite anyway, so you could also just
>> withdraw it.
>>
>>
>> I can withdraw it if you ask me to, but I don't want to withdraw it
>> without any reason. I haven't changed my mind about the big picture. OTOH,
>> PEP 521 is elegant and could be used to implement PEP 555, but 521 is
>> almost certainly less performant and has some problems regarding context
>> manager wrappers that use composition instead of inheritance.
>>
>
> It is my understanding that PEP 521 (which proposes to add optional
> __suspend__ and __resume__ methods to the context manager protocol, to be
> called whenever a frame is suspended or resumed inside a `with` block) is
> no longer a contender because it would be way too slow. I haven't read it
> recently or thought about it, so I don't know what the second issue you
> mention is about (though it's presumably about the `yield` in a context
> manager implemented using a generator decorated with
> `@contextlib.contextmanager`).
>
> So it's really between PEP 550 and PEP 555. And there are currently too
> many parts of PEP 555 that are left to the imagination of the reader. So,
> again, I ask you to put up or shut up. It's your choice. If you don't want
> to do the work completing the PEP you might as well withdraw (once I am
> satisfied with Yury's PEP I will just accept it when there's no contender).
> If you do complete it I will probably still choose PEP 550 -- but at the
> moment the choice would be between something I understand completely and
> something that's too poorly specified to be able to reason about it.
>
> --Guido
>
>
>> -- Koos
>>
>>
>>
>> On Oct 1, 2017 9:13 AM, "Koos Zevenhoven"  wrote:
>>
>>> On Sep 29, 2017 18:21, "Guido van Rossum"  wrote:
>>>
>>>
>>> PS. PEP 550 is still unaccepted, awaiting a new revision from Yury and
>>> Elvis.
>>>
>>>
>>> This is getting really off-topic, but I do have updates to add to PEP
>>> 555 if there is interest in that. IMO, 555 is better and most likely faster
>>> than 550, but on the other hand, the issues with PEP 550 are most likely
>>> not going to be a problem for me personally.
>>>
>>> -- Koos
>>>
>>
>>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Intention to accept PEP 552 soon (deterministic pyc files)

2017-10-01 Thread Guido van Rossum
On Sun, Oct 1, 2017 at 1:52 PM, Koos Zevenhoven  wrote:

> On Oct 1, 2017 19:26, "Guido van Rossum"  wrote:
>
> Your PEP is currently incomplete. If you don't finish it, it is not even a
> contender. But TBH it's not my favorite anyway, so you could also just
> withdraw it.
>
>
> I can withdraw it if you ask me to, but I don't want to withdraw it
> without any reason. I haven't changed my mind about the big picture. OTOH,
> PEP 521 is elegant and could be used to implement PEP 555, but 521 is
> almost certainly less performant and has some problems regarding context
> manager wrappers that use composition instead of inheritance.
>

It is my understanding that PEP 521 (which proposes to add optional
__suspend__ and __resume__ methods to the context manager protocol, to be
called whenever a frame is suspended or resumed inside a `with` block) is
no longer a contender because it would be way too slow. I haven't read it
recently or thought about it, so I don't know what the second issue you
mention is about (though it's presumably about the `yield` in a context
manager implemented using a generator decorated with
`@contextlib.contextmanager`).

So it's really between PEP 550 and PEP 555. And there are currently too
many parts of PEP 555 that are left to the imagination of the reader. So,
again, I ask you to put up or shut up. It's your choice. If you don't want
to do the work completing the PEP you might as well withdraw (once I am
satisfied with Yury's PEP I will just accept it when there's no contender).
If you do complete it I will probably still choose PEP 550 -- but at the
moment the choice would be between something I understand completely and
something that's too poorly specified to be able to reason about it.

--Guido


> -- Koos
>
>
>
> On Oct 1, 2017 9:13 AM, "Koos Zevenhoven"  wrote:
>
>> On Sep 29, 2017 18:21, "Guido van Rossum"  wrote:
>>
>>
>> PS. PEP 550 is still unaccepted, awaiting a new revision from Yury and
>> Elvis.
>>
>>
>> This is getting really off-topic, but I do have updates to add to PEP 555
>> if there is interest in that. IMO, 555 is better and most likely faster
>> than 550, but on the other hand, the issues with PEP 550 are most likely
>> not going to be a problem for me personally.
>>
>> -- Koos
>>
>
>


-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Investigating time for `import requests`

2017-10-01 Thread Glenn Linderman

On 10/1/2017 7:34 PM, Nathaniel Smith wrote:

Another major slowness comes from compiling regular expression.
I think we can increase cache size of `re.compile` and use ondemand cached
compiling (e.g. `re.match()`),
instead of "compile at import time" in many modules.

In principle re.compile() itself could be made lazy -- return a
regular exception object that just holds the string, and then compiles
and caches it the first time it's used. Might be tricky to do in a
backwards compatibility way if it moves detection of invalid regexes
from compile time to use time, but it could be an opt-in flag.
Would be interesting to know how many of the in-module, compile time 
re.compile calls use dynamic values, versus string constants. Seems like 
string constant parameters to re.compile calls could be moved to 
on-first-use compiling without significant backwards incompatibility 
impact if there is an adequate test suite... and if there isn't an 
adequate test suite, should we care about the deferred detection?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Investigating time for `import requests`

2017-10-01 Thread Nathaniel Smith
On Sun, Oct 1, 2017 at 7:04 PM, INADA Naoki  wrote:
> 4. http.client
>
> import time:  1376 |   2448 |   email.header
> ...
> import time:  1469 |   7791 |   email.utils
> import time:   408 |  10646 | email._policybase
> import time:   939 |  12210 |   email.feedparser
> import time:   322 |  12720 | email.parser
> ...
> import time:   599 |   1361 | email.message
> import time:  1162 |  16694 |   http.client
>
> email.parser has very large import tree.
> But I don't know how to break the tree.

There is some work to get urllib3/requests to stop using http.client,
though it's not clear if/when it will actually happen:
https://github.com/shazow/urllib3/pull/1068

> Another major slowness comes from compiling regular expression.
> I think we can increase cache size of `re.compile` and use ondemand cached
> compiling (e.g. `re.match()`),
> instead of "compile at import time" in many modules.

In principle re.compile() itself could be made lazy -- return a
regular exception object that just holds the string, and then compiles
and caches it the first time it's used. Might be tricky to do in a
backwards compatibility way if it moves detection of invalid regexes
from compile time to use time, but it could be an opt-in flag.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Investigating time for `import requests`

2017-10-01 Thread INADA Naoki
See also https://github.com/requests/requests/issues/4315

I tried new `-X importtime` option to `import requests`.
Full output is here:
https://gist.github.com/methane/96d58a29e57e5be97769897462ee1c7e

Currently, it took about 110ms.  And major parts are from Python stdlib.
Followings are root of slow stdlib subtrees.

import time: self [us] | cumulative | imported package
import time:  1374 |  14038 |   logging
import time:  2636 |   4255 |   socket
import time:  2902 |  11004 |   ssl
import time:  1162 |  16694 |   http.client
import time:   656 |   5331 | cgi
import time:  7338 |   7867 | http.cookiejar
import time:  2930 |   2930 | http.cookies


*1. logging*

logging is slow because it is imported in early stage.
It imports many common, relatively slow packages. (collections, functools,
enum, re).

Especially, traceback module is slow because linecache.

import time:  1419 |   5016 | tokenize
import time:   200 |   5910 |   linecache
import time:   347 |   8869 | traceback

I think it's worth enough to import linecache lazily.

*2. socket*

import time:   807 |   1221 | selectors
import time:  2636 |   4255 |   socket

socket imports selectors for socket.send_file(). And selectors module use
ABC.
That's why selectors is bit slow.

And socket module creates four enums.  That's why import socket took more
than 2.5ms
excluding subimports.

*3. ssl*

import time:  2007 |   2007 | ipaddress
import time:  2386 |   2386 | textwrap
import time:  2723 |   2723 | _ssl
...
import time:   306 |988 | base64
import time:  2902 |  11004 |   ssl

I already created pull request about removing textwrap dependency from ssl.
https://github.com/python/cpython/pull/3849

ipaddress and _ssl module are bit slow too.  But I don't know we can
improve them or not.

ssl itself took 2.9 ms.  It's because ssl has six enums.


*4. http.client*

import time:  1376 |   2448 |   email.header
...
import time:  1469 |   7791 |   email.utils
import time:   408 |  10646 | email._policybase
import time:   939 |  12210 |   email.feedparser
import time:   322 |  12720 | email.parser
...
import time:   599 |   1361 | email.message
import time:  1162 |  16694 |   http.client

email.parser has very large import tree.
But I don't know how to break the tree.

*5. cgi*

import time:  1083 |   1083 | html.entities
import time:   560 |   1643 |   html
...
import time:   656 |   2609 | shutil
import time:   424 |   3033 |   tempfile
import time:   656 |   5331 | cgi

cgi module uses tempfile to save uploaded file.
But requests imports cgi just for `cgi.parse_header()`.
tempfile is not used.  Maybe, it's worth enough to import it lazily.

FYI, cgi depends on very slow email.parser too.
But this tree doesn't contain it because http.client is imported before cgi.
Even though it's not problem for requests, it may affects to real CGI
application.
Of course, startup time is very important for CGI applications too.


*6. http.cookiejar and http.cookies*

It's slow because it has many `re.compile()`


*Ideas*

There are some places to break large import tree by "import in function"
hack.

ABC is slow, and it's used widely without almost no real need.  (Who need
selectors is ABC?)
We can't remove ABC dependency because of backward compatibility.
But I hope ABC is implemented in C by Python 3.7.

Enum is slow, maybe slower than most people think.
I don't know why exactly, but I suspect that it's because namespace dict
implemented in Python.

Anyway, I think we can have C implementation of IntEnum and IntFlag, like
namedtpule vs PyStructSequence.
It doesn't need to 100% compatible with current enum.  Especially, no need
for using metaclass.

Another major slowness comes from compiling regular expression.
I think we can increase cache size of `re.compile` and use ondemand cached
compiling (e.g. `re.match()`),
instead of "compile at import time" in many modules.

PEP 562 -- Module __getattr__ helps a lot too.
It make possible to split collection module and strings module.
(strings module is used often for constants like strings.ascii_letters, but
strings.Template
cause import time re.compile())


Regards,
-- 
Inada Naoki 
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Intention to accept PEP 552 soon (deterministic pyc files)

2017-10-01 Thread Koos Zevenhoven
On Oct 1, 2017 19:26, "Guido van Rossum"  wrote:

Your PEP is currently incomplete. If you don't finish it, it is not even a
contender. But TBH it's not my favorite anyway, so you could also just
withdraw it.


I can withdraw it if you ask me to, but I don't want to withdraw it without
any reason. I haven't changed my mind about the big picture. OTOH, PEP 521
is elegant and could be used to implement PEP 555, but 521 is almost
certainly less performant and has some problems regarding context manager
wrappers that use composition instead of inheritance.

-- Koos



On Oct 1, 2017 9:13 AM, "Koos Zevenhoven"  wrote:

> On Sep 29, 2017 18:21, "Guido van Rossum"  wrote:
>
>
> PS. PEP 550 is still unaccepted, awaiting a new revision from Yury and
> Elvis.
>
>
> This is getting really off-topic, but I do have updates to add to PEP 555
> if there is interest in that. IMO, 555 is better and most likely faster
> than 550, but on the other hand, the issues with PEP 550 are most likely
> not going to be a problem for me personally.
>
> -- Koos
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Intention to accept PEP 552 soon (deterministic pyc files)

2017-10-01 Thread Guido van Rossum
Your PEP is currently incomplete. If you don't finish it, it is not even a
contender. But TBH it's not my favorite anyway, so you could also just
withdraw it.

On Oct 1, 2017 9:13 AM, "Koos Zevenhoven"  wrote:

> On Sep 29, 2017 18:21, "Guido van Rossum"  wrote:
>
>
> PS. PEP 550 is still unaccepted, awaiting a new revision from Yury and
> Elvis.
>
>
> This is getting really off-topic, but I do have updates to add to PEP 555
> if there is interest in that. IMO, 555 is better and most likely faster
> than 550, but on the other hand, the issues with PEP 550 are most likely
> not going to be a problem for me personally.
>
> -- Koos
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Intention to accept PEP 552 soon (deterministic pyc files)

2017-10-01 Thread Koos Zevenhoven
On Sep 29, 2017 18:21, "Guido van Rossum"  wrote:


PS. PEP 550 is still unaccepted, awaiting a new revision from Yury and
Elvis.


This is getting really off-topic, but I do have updates to add to PEP 555
if there is interest in that. IMO, 555 is better and most likely faster
than 550, but on the other hand, the issues with PEP 550 are most likely
not going to be a problem for me personally.

-- Koos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bpo-30806 netrc.__repr__() is broken for writing to file (GH-2491)

2017-10-01 Thread Serhiy Storchaka

30.09.17 10:10, INADA Naoki пише:

https://github.com/python/cpython/commit/b24cd055ecb3eea9a15405a6ca72dafc739e6531
commit: b24cd055ecb3eea9a15405a6ca72dafc739e6531
branch: master
author: James Sexton 
committer: INADA Naoki 
date: 2017-09-30T16:10:31+09:00
summary:

bpo-30806 netrc.__repr__() is broken for writing to file (GH-2491)

netrc file format doesn't support quotes and escapes.

See https://linux.die.net/man/5/netrc


The commit message looks confusing to me. Is netrc.__repr__() is broken 
now? Or this change makes netrc file format supporting quotes and 
escapes now?


Please read the following thread: 
https://mail.python.org/pipermail/python-dev/2011-May/111303.html.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com