Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-23 Thread Brett Cannon
I just tried this and I get a str/bytes issue. I also think your setup3k.py
command is missing ``build`` and your build/scripts-3.2 is missing ``/hg``.

On Wed, Feb 22, 2012 at 19:26, Éric Araujo mer...@netwok.org wrote:

 Hi Brett,

 I think this message went unanswered, so here’s a late reply:

 Le 07/02/2012 23:21, Brett Cannon a écrit :
  On Tue, Feb 7, 2012 at 15:28, Dirkjan Ochtman dirk...@ochtman.nl
 wrote:
  [...]
  Anyway, I think there was enough of a python3 port for Mercurial (from
  various GSoC students) that you can probably run some of the very
  simple commands (like hg parents or hg id), which should be enough for
  your purposes, right?
 
  Possibly. Where is the code?

 # get Mercurial from a repo or tarball
 hg clone http://selenic.com/repo/hg/
 cd hg

 # convert files in place (don’t commit after this :)
 python3.2 contrib/setup3k.py

 # the makefile is not py3k-aware, need to run manually
 # the current stable head fails with a TypeError for me
 PYTHONPATH=. python3.2 build/scripts-3.2

 Cheers

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-22 Thread Éric Araujo
Hi Brett,

I think this message went unanswered, so here’s a late reply:

Le 07/02/2012 23:21, Brett Cannon a écrit :
 On Tue, Feb 7, 2012 at 15:28, Dirkjan Ochtman dirk...@ochtman.nl wrote:
 [...]
 Anyway, I think there was enough of a python3 port for Mercurial (from
 various GSoC students) that you can probably run some of the very
 simple commands (like hg parents or hg id), which should be enough for
 your purposes, right?
 
 Possibly. Where is the code?

# get Mercurial from a repo or tarball
hg clone http://selenic.com/repo/hg/
cd hg

# convert files in place (don’t commit after this :)
python3.2 contrib/setup3k.py

# the makefile is not py3k-aware, need to run manually
# the current stable head fails with a TypeError for me
PYTHONPATH=. python3.2 build/scripts-3.2

Cheers
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-11 Thread Éric Araujo
Le 07/02/2012 23:21, Brett Cannon a écrit :
 On Tue, Feb 7, 2012 at 15:28, Dirkjan Ochtman dirk...@ochtman.nl wrote:
 Yeah, startup performance getting worse kinda sucks for command-line
 apps. And IIRC it's been getting worse over the past few releases...

 Anyway, I think there was enough of a python3 port for Mercurial (from
 various GSoC students) that you can probably run some of the very
 simple commands (like hg parents or hg id), which should be enough for
 your purposes, right?
 Possibly. Where is the code?

hg clone http://selenic.com/repo/hg/
cd hg
python3 contrib/setup3k.py build
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-10 Thread Brett Cannon
On Thu, Feb 9, 2012 at 17:00, PJ Eby p...@telecommunity.com wrote:

 On Thu, Feb 9, 2012 at 2:53 PM, Mike Meyer m...@mired.org wrote:

 For those of you not watching -ideas, or ignoring the Python TIOBE
 -3% discussion, this would seem to be relevant to any discussion of
 reworking the import mechanism:

 http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html

 Interesting.  This gives me an idea for a way to cut stat calls per
 sys.path entry per import by roughly 4x, at the cost of a one-time
 directory read per sys.path entry.

 That is, an importer created for a particular directory could, upon first
 use, cache a frozenset(listdir()), and the stat().st_mtime of the
 directory.  All the filename checks could then be performed against the
 frozenset, and the st_mtime of the directory only checked once per import,
 to verify whether the frozenset() needed refreshing.


I actually contemplated this back in 2006 when I first began importlib for
use at Google to get around NFS's crappy stat performance. Never got around
to it as compatibility with import.c turned out to be a little tricky. =)
Your solution below, PJE, is more-or-less what I was considering (although
I also considered variants that didn't stat the directory when you knew
your code wasn't changing stuff behind your back).



 Since a failed module lookup takes at least 5 stat checks (pyc, pyo, py,
 directory, and compiled extension (pyd/so)), this cuts it down to only 1,
 at the price of a listdir().  The big question is how long does a listdir()
 take, compared to a stat() or failed open()?   That would tell us whether
 the tradeoff is worth making.


Actually it's pyc OR pyo, py, directory (which can lead to another set for
__init__.py and __pycache__), .so, module.so (or whatever your platform
uses for extensions).



 I did some crude timeit tests on frozenset(listdir()) and trapping failed
 stat calls.  It looks like, for a Windows directory the size of the 2.7
 stdlib, you need about four *failed* import attempts to overcome the
 initial caching cost, or about 8 successful bytecode imports.  (For Linux,
 you might need to double these numbers; my tests showed a different ratio
 there, perhaps due to the Linux stdib I tested having nearly twice as many
 directory entries as the directory I tested on Windows!)

 However, the numbers are much better for application directories than for
 the stdlib, since they are located earlier on sys.path.  Every successful
 stdlib import in an application is equal to one failed import attempt for
 every preceding directory on sys.path, so as long as the average directory
 on sys.path isn't vastly larger than the stdlib, and the average
 application imports at least four modules from the stdlib (on Windows, or 8
 on Linux), there would be a net performance gain for the application as a
 whole.  (That is, there'd be an improved per-sys.path entry import time for
 stdlib modules, even if not for any application modules.)


Does this comment take into account the number of modules required to load
the interpreter to begin with? That's already like 48 modules loaded by
Python 3.2 as it is.



 For smaller directories, the tradeoff actually gets better.  A directory
 one seventh the size of the 2.7 Windows stdlib has a listdir() that's
 proportionately faster, but failed stats() in that directory are *not*
 proportionately faster; they're only somewhat faster.  This means that it
 takes fewer failed module lookups to make caching a win - about 2 in this
 case, vs. 4 for the stdlib.

 Now, these numbers are with actual disk or network access abstracted away,
 because the data's in the operating system cache when I run the tests.
  It's possible that this strategy could backfire if you used, say, an NFS
 directory with ten thousand files in it as your first sys.path entry.
  Without knowing the timings for listdir/stat/failed stat in that setup,
 it's hard to say how many stdlib imports you need before you come out
 ahead.  When I tried a directory about 7 times larger than the stdlib,
 creating the frozenset took 10 times as long, but the cost of a failed stat
 didn't go up by very much.

 This suggests that there's probably an optimal directory size cutoff for
 this trick; if only there were some way to check the size of a directory
 without reading it, we could turn off the caching for oversize directories,
 and get a major speed boost for everything else.  On most platforms, the
 stat().st_size of the directory itself will give you some idea, but on
 Windows that's always zero.  On Windows, we could work around that by using
 a lower-level API than listdir() and simply stop reading the directory if
 we hit the maximum number of entries we're willing to build a cache for,
 and then call it off.

 (Another possibility would be to explicitly enable caching by putting a
 flag file in the directory, or perhaps by putting a special prefix on the
 sys.path entry, setting the cutoff in an 

Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-10 Thread PJ Eby
On Fri, Feb 10, 2012 at 1:05 PM, Brett Cannon br...@python.org wrote:



 On Thu, Feb 9, 2012 at 17:00, PJ Eby p...@telecommunity.com wrote:

 I did some crude timeit tests on frozenset(listdir()) and trapping failed
 stat calls.  It looks like, for a Windows directory the size of the 2.7
 stdlib, you need about four *failed* import attempts to overcome the
 initial caching cost, or about 8 successful bytecode imports.  (For Linux,
 you might need to double these numbers; my tests showed a different ratio
 there, perhaps due to the Linux stdib I tested having nearly twice as many
 directory entries as the directory I tested on Windows!)


 However, the numbers are much better for application directories than for
 the stdlib, since they are located earlier on sys.path.  Every successful
 stdlib import in an application is equal to one failed import attempt for
 every preceding directory on sys.path, so as long as the average directory
 on sys.path isn't vastly larger than the stdlib, and the average
 application imports at least four modules from the stdlib (on Windows, or 8
 on Linux), there would be a net performance gain for the application as a
 whole.  (That is, there'd be an improved per-sys.path entry import time for
 stdlib modules, even if not for any application modules.)


 Does this comment take into account the number of modules required to load
 the interpreter to begin with? That's already like 48 modules loaded by
 Python 3.2 as it is.


I didn't count those, no.  So, if they're loaded from disk *after*
importlib is initialized, then they should pay off the cost of caching even
fairly large directories that appear earlier on sys.path than the stdlib.
 We still need to know about NFS and other ratios, though...  I still worry
that people with more extreme directory sizes or slow-access situations
will run into even worse trouble than they have now.



 First is that if this were used on Windows or OS X (i.e. the OSs we
 support that typically have case-insensitive filesystems), then this
 approach would be a massive gain as we already call os.listdir() when
 PYTHONCASEOK isn't defined to check case-sensitivity; take your 5 stat
 calls and add in 5 listdir() calls and that's what you get on Windows and
 OS X right now. Linux doesn't have this check so you would still be
 potentially paying a penalty there.


Wow.  That means it'd always be a win for pre-stdlib sys.path entries,
because any successful stdlib import equals a failed pre-stdlib lookup.
 (Of course, that's just saving some of the overhead that's been *added* by
importlib, not a new gain, but still...)


Second is variance in filesystems. Are we guaranteed that the stat of a
 directory is updated before a file change is made?


Not quite sure what you mean here.  The directory stat is used to ensure
that new files haven't been added, old ones removed, or existing ones
renamed.  Changes to the files themselves shouldn't factor in, should they?



 Else there is a small race condition there which would suck. We also have
 the issue of granularity; Antoine has already had to add the source file
 size to .pyc files in Python 3.3 to combat crappy mtime granularity when
 generating bytecode. If we get file mod - import - file mod - import,
 are we guaranteed that the second import will know there was a modification
 if the first three steps occur fast enough to fit within the granularity of
 an mtime value?


Again, I'm not sure how this relates.  Automatic code reloaders monitor
individual files that have been previously imported, so the directory
timestamps aren't relevant.

Of course, I could be confused here.  Are you saying that if somebody makes
a new .py file and saves it, that it'll be possible to import it before
it's finished being written?  If so, that could happen already, and again
caching the directory doesn't make any difference.

Alternately, you could have a situation where the file is deleted after we
load the listdir(), but in that case the open will fail and we can fall
back...  heck, we can even force resetting the cache in that event.


I was going to say something about __pycache__, but it actually doesn't
 affect this. Since you would have to stat the directory anyway, you might
 as well just stat directory for the file you want to keep it simple. Only
 if you consider __pycache__ to be immutable except for what the interpreter
 puts in that directory during execution could you optimize that step (in
 which case you can stat the directory once and never care again as the set
 would be just updated by import whenever a new .pyc file was written).

 Having said all of this, implementing this idea would be trivial using
 importlib if you don't try to optimize the __pycache__ case. It's just a
 question of whether people are comfortable with the semantic change to
 import. This could also be made into something that was in importlib for
 people to use when desired if we are too worried about semantic changes.


Yep.  

Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-10 Thread Brett Cannon
On Fri, Feb 10, 2012 at 15:07, PJ Eby p...@telecommunity.com wrote:

 On Fri, Feb 10, 2012 at 1:05 PM, Brett Cannon br...@python.org wrote:



 On Thu, Feb 9, 2012 at 17:00, PJ Eby p...@telecommunity.com wrote:

 I did some crude timeit tests on frozenset(listdir()) and trapping
 failed stat calls.  It looks like, for a Windows directory the size of the
 2.7 stdlib, you need about four *failed* import attempts to overcome the
 initial caching cost, or about 8 successful bytecode imports.  (For Linux,
 you might need to double these numbers; my tests showed a different ratio
 there, perhaps due to the Linux stdib I tested having nearly twice as many
 directory entries as the directory I tested on Windows!)


 However, the numbers are much better for application directories than
 for the stdlib, since they are located earlier on sys.path.  Every
 successful stdlib import in an application is equal to one failed import
 attempt for every preceding directory on sys.path, so as long as the
 average directory on sys.path isn't vastly larger than the stdlib, and the
 average application imports at least four modules from the stdlib (on
 Windows, or 8 on Linux), there would be a net performance gain for the
 application as a whole.  (That is, there'd be an improved per-sys.path
 entry import time for stdlib modules, even if not for any application
 modules.)


 Does this comment take into account the number of modules required to
 load the interpreter to begin with? That's already like 48 modules loaded
 by Python 3.2 as it is.


 I didn't count those, no.  So, if they're loaded from disk *after*
 importlib is initialized, then they should pay off the cost of caching even
 fairly large directories that appear earlier on sys.path than the stdlib.
  We still need to know about NFS and other ratios, though...  I still worry
 that people with more extreme directory sizes or slow-access situations
 will run into even worse trouble than they have now.


It's possible. No way to make it work for everyone. This is why I didn't
worry about some crazy perf optimization.





 First is that if this were used on Windows or OS X (i.e. the OSs we
 support that typically have case-insensitive filesystems), then this
 approach would be a massive gain as we already call os.listdir() when
 PYTHONCASEOK isn't defined to check case-sensitivity; take your 5 stat
 calls and add in 5 listdir() calls and that's what you get on Windows and
 OS X right now. Linux doesn't have this check so you would still be
 potentially paying a penalty there.


 Wow.  That means it'd always be a win for pre-stdlib sys.path entries,
 because any successful stdlib import equals a failed pre-stdlib lookup.
  (Of course, that's just saving some of the overhead that's been *added* by
 importlib, not a new gain, but still...)


How so? import.c does a listdir() as well (this is not special to
importlib).




 Second is variance in filesystems. Are we guaranteed that the stat of a
 directory is updated before a file change is made?


 Not quite sure what you mean here.  The directory stat is used to ensure
 that new files haven't been added, old ones removed, or existing ones
 renamed.  Changes to the files themselves shouldn't factor in, should they?


Changes in any fashion to the directory. Do filesystems atomically update
the mtime of a directory when they commit a change? Otherwise we have a
potential race condition.





 Else there is a small race condition there which would suck. We also have
 the issue of granularity; Antoine has already had to add the source file
 size to .pyc files in Python 3.3 to combat crappy mtime granularity when
 generating bytecode. If we get file mod - import - file mod - import,
 are we guaranteed that the second import will know there was a modification
 if the first three steps occur fast enough to fit within the granularity of
 an mtime value?


 Again, I'm not sure how this relates.  Automatic code reloaders monitor
 individual files that have been previously imported, so the directory
 timestamps aren't relevant.


Don't care about automatic reloaders. I'm just asking about the case where
the mtime granularity is coarse enough to allow for a directory change, an
import to execute, and then another directory change to occur all within a
single mtime increment. That would lead to the set cache to be out of date.


 Of course, I could be confused here.  Are you saying that if somebody
 makes a new .py file and saves it, that it'll be possible to import it
 before it's finished being written?  If so, that could happen already, and
 again caching the directory doesn't make any difference.

 Alternately, you could have a situation where the file is deleted after we
 load the listdir(), but in that case the open will fail and we can fall
 back...  heck, we can even force resetting the cache in that event.


 I was going to say something about __pycache__, but it actually doesn't
 affect this. Since you would have to stat the 

Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-10 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/10/2012 03:38 PM, Brett Cannon wrote:
 Changes in any fashion to the directory. Do filesystems atomically 
 update the mtime of a directory when they commit a change? Otherwise 
 we have a potential race condition.

Hmm, maybe I misundersand you.  In POSIX land, the only thing which
changes the mtime of a directory is linking / unlinking / renaming a
file:  changes to individual files aren't detectable by examining their
containing directory's stat().


Tres.
- -- 
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk81jDsACgkQ+gerLs4ltQ7YRwCePFEQA7E74dD9/j8ILuRMHLlA
xbkAn1vTYGrEn4VOnVpygGafkGgnm42e
=rJGg
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-10 Thread Brett Cannon
On Fri, Feb 10, 2012 at 16:29, Tres Seaver tsea...@palladion.com wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 02/10/2012 03:38 PM, Brett Cannon wrote:
  Changes in any fashion to the directory. Do filesystems atomically
  update the mtime of a directory when they commit a change? Otherwise
  we have a potential race condition.

 Hmm, maybe I misundersand you.  In POSIX land, the only thing which
 changes the mtime of a directory is linking / unlinking / renaming a
 file:  changes to individual files aren't detectable by examining their
 containing directory's stat().


Individual file changes are not important; either the module is already in
sys.modules so no attempt is made to detect a change or it hasn't been
loaded and so it will have to be read regardless. All I'm asking is whether
filesystems typically update the filesystem for a e.g. file deletion
atomically with the mtime for the containing directory or not.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-10 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/10/2012 04:42 PM, Brett Cannon wrote:
 On Fri, Feb 10, 2012 at 16:29, Tres Seaver tsea...@palladion.com
 wrote:
 
 On 02/10/2012 03:38 PM, Brett Cannon wrote:
 Changes in any fashion to the directory. Do filesystems
 atomically update the mtime of a directory when they commit a
 change? Otherwise we have a potential race condition.
 
 Hmm, maybe I misundersand you.  In POSIX land, the only thing which 
 changes the mtime of a directory is linking / unlinking / renaming
 a file:  changes to individual files aren't detectable by examining
 their containing directory's stat().
 
 
 Individual file changes are not important; either the module is
 already in sys.modules so no attempt is made to detect a change or it
 hasn't been loaded and so it will have to be read regardless. All I'm
 asking is whether filesystems typically update the filesystem for a
 e.g. file deletion atomically with the mtime for the containing
 directory or not.

In POSIX land, most certainly.


Tres.
- -- 
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk81kCIACgkQ+gerLs4ltQ5MogCfQwP2n4gl9PfsNXuP3c5al8EX
TgwAn2EoGz1vk0OQAh5n3Tl9oze1CSSC
=3iuR
-END PGP SIGNATURE-
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-10 Thread PJ Eby
On Feb 10, 2012 3:38 PM, Brett Cannon br...@python.org wrote:
 On Fri, Feb 10, 2012 at 15:07, PJ Eby p...@telecommunity.com wrote:
 On Fri, Feb 10, 2012 at 1:05 PM, Brett Cannon br...@python.org wrote:
 First is that if this were used on Windows or OS X (i.e. the OSs we
support that typically have case-insensitive filesystems), then this
approach would be a massive gain as we already call os.listdir() when
PYTHONCASEOK isn't defined to check case-sensitivity; take your 5 stat
calls and add in 5 listdir() calls and that's what you get on Windows and
OS X right now. Linux doesn't have this check so you would still be
potentially paying a penalty there.


 Wow.  That means it'd always be a win for pre-stdlib sys.path entries,
because any successful stdlib import equals a failed pre-stdlib lookup.
 (Of course, that's just saving some of the overhead that's been *added* by
importlib, not a new gain, but still...)


 How so? import.c does a listdir() as well (this is not special to
importlib).

IIRC, it does a FindFirstFile on Windows, which is not the same thing.
That's one system call into a preallocated buffer, not a series of system
calls and creation of Python string objects.

 Don't care about automatic reloaders. I'm just asking about the case
where the mtime granularity is coarse enough to allow for a directory
change, an import to execute, and then another directory change to occur
all within a single mtime increment. That would lead to the set cache to be
out of date.

Ah.  Good point.  Well, if there's any way to know what the mtime
granularity is, we can avoid the race condition by never performing the
listdir when the current clock time is too close to the stat().  In effect,
we can bypass the optimization if the directory was just modified.

Something like:

mtime = stat(dir).st_mtime
if abs(time.time()-mtime)unsafe_window:
 old_mtime, files = cache.get(dir, (-1, ()))
 if mtime!=old_mtime:
  files = frozenset(listdir(dir))
  cache[dir] = mtime, files
 # code to check for possibility of importing
 # and shortcut if found, or
 # exit with failure if no matching files

# fallthrough to direct filesystem checking

The unsafe window is presumably filesystem and platform dependent, but
ISTR that even FAT filesystems have 2-second accuracy.  The other catch is
the relationship between st_mtime and time.time(); I assume they'd be the
same in any sane system, but what if you're working across a network and
there's clock skew?  Ugh.

Worst case example would be say, accessing a FAT device that's been shared
over a Windows network from a machine whose clock is several hours off.  So
it always looks safe to read, even if it's just been changed.

What's the downside in that case?  You're trying to import something that
just changed in the last fraction of a second...  why?

I mean, sure, the directory listing will be wrong, no question.  But it
only matters that it was wrong if you added, removed, or renamed importable
files.  Why are you trying to import one of them?

Ah, here's a use case: you're starting up IDLE, and while it's loading, you
save some .py files you plan to import later.  Your editor saves them all
at once, but IDLE does the listdir() midway through.  You then do an import
from the IDLE prompt, and it fails because the listdir() didn't catch
everything.

Okay, now I know how to fix this.  The problem isn't that there's a race
condition per se, the problem is that the race results in a broken cache
later.  After all, it could just as easily have been the case that the
import failed due to timing.  The problem is that all *future* imports
would fail in this circumstance.

So the fix is a time-to-live recheck: if TTL seconds have passed since the
last use of the cached frozenset, reload it, and reset the TTL to infinity.

In other words:

mtime = stat(dir).st_mtime
now - time.time()
if abs(now-mtime)unsafe_window:
 old_mtime, then, files = cache.get(dir, (-1, now, ()))
 if mtime!=old_mtime or then is not None and now-thenTTL:
  files = frozenset(listdir(dir))
  cache[dir] = mtime, now if mtime!=old_mtime else None, files
 # code to check for possibility of importing
 # and shortcut if found, or
 # exit with failure if no matching files

# fallthrough to direct filesystem checking

What this does (or should do) is handle clock-skew race condition stale
caches by reloading the listdir even if mtime hasn't changed, as soon as
TTL seconds have passed since the last snapshot was taken.  However, if the
mtime stays the same, no subsequent listdirs will occur.  As long as the
TTL is set high enough that a full startup of Python can occur, but low
enough that it resets by the time a human can notice something's wrong, it
should be golden.  ;-)

The TTL approach could be used in place of the unsafe_window, actually;
there's probably not much need for 

Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-10 Thread Chris Angelico
On Sat, Feb 11, 2012 at 11:23 AM, PJ Eby p...@telecommunity.com wrote:
 What's the downside in that case?  You're trying to import something that
 just changed in the last fraction of a second...  why?

I don't know if it's normal in the Python world, but these sorts of
race conditions occur most annoyingly when a single process changes a
file, then attempts to import it. If you open a file, write to it,
explicitly close it, and then load it, you would expect to read back
what you wrote, not the version that was there previously.

Chris Angelico
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-09 Thread Brett Cannon
On Wed, Feb 8, 2012 at 20:28, PJ Eby p...@telecommunity.com wrote:



 On Wed, Feb 8, 2012 at 4:08 PM, Brett Cannon br...@python.org wrote:


 On Wed, Feb 8, 2012 at 15:31, Terry Reedy tjre...@udel.edu wrote:

 For top-level imports, unless *all* are made lazy, then there *must* be
 some indication in the code of whether to make it lazy or not.


 Not true; importlib would make it dead-simple to whitelist what modules
 to make lazy (e.g. your app code lazy but all stdlib stuff not, etc.).


 There's actually only a few things stopping all imports from being lazy.
  from x import y immediately de-lazies them, after all.  ;-)

 The main two reasons you wouldn't want imports to *always* be lazy are:

 1. Changing sys.path or other parameters between the import statement and
 the actual import
 2. ImportErrors are likewise deferred until point-of-use, so conditional
 importing with try/except would break.


This actually depends on the type of ImportError. My current solution
actually would trigger an ImportError at the import statement if no finder
could locate the module. But if some ImportError was raised because of some
other issue during load then that would come up at first use.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-09 Thread Brett Cannon
On Wed, Feb 8, 2012 at 20:26, Nick Coghlan ncogh...@gmail.com wrote:

 On Thu, Feb 9, 2012 at 2:09 AM, Antoine Pitrou solip...@pitrou.net
 wrote:
  I guess my point was: why is there a function call in that case? The
  import statement could look up sys.modules directly.
  Or the built-in __import__ could still be written in C, and only defer
  to importlib when the module isn't found in sys.modules.
  Practicality beats purity.

 I quite like the idea of having builtin __import__ be a *very* thin
 veneer around importlib that just does the is this in sys.modules
 already so we can just return it from there? checks and delegates
 other more complex cases to Python code in importlib.

 Poking around in importlib.__import__ [1] (as well as
 importlib._gcd_import), I'm thinking what we may want to do is break
 up the logic a bit so that there are multiple helper functions that a
 C version can call back into so that we can optimise certain simple
 code paths to not call back into Python at all, and others to only do
 so selectively.

 Step 1: separate out the fromlist processing from __import__ into a
 separate helper function

def _process_fromlist(module, fromlist):
# Perform any required imports as per existing code:
#
 http://hg.python.org/cpython/file/aba513307f78/Lib/importlib/_bootstrap.py#l987


Fine by me.



 Step 2: separate out the relative import resolution from _gcd_import
 into a separate helper function.

def _resolve_relative_name(name, package, level):
assert hasattr(name, 'rpartition')
assert hasattr(package, 'rpartition')
assert level  0
name = # Recalculate as per the existing code:
#
 http://hg.python.org/cpython/file/aba513307f78/Lib/importlib/_bootstrap.py#l889
return name


I was actually already thinking of exposing this as
importlib.resolve_name() so breaking it out makes sense.

I also think it might be possible to expose a sort of
importlib.find_module() that does nothing more than find the loader for a
module  (if available).



 Step 3: Implement builtin __import__ in C (pseudo-code below):

def __import__(name, globals={}, locals={}, fromlist=[], level=0):
if level  0:
name = importlib._resolve_relative_import(name)
 try:
module = sys.modules[name]
except KeyError:
 # Not cached yet, need to invoke the full import machinery
# We already resolved any relative imports though, so
# treat it as an absolute import
return importlib.__import__(name, globals, locals, fromlist, 0)
# Got a hit in the cache, see if there's any more work to do
if not fromlist:
# Duplicate relevant importlib.__import__ logic as C code
# to find the right module to return from sys.modules
elif hasattr(module, __path__):
importlib._process_fromlist(module, fromlist)
return module

 This would then be similar to the way main.c already works when it
 interacts with runpy - simple cases are handled directly in C, more
 complex cases get handed over to the Python module.


I suspect that if people want the case where you load from bytecode is fast
then this will have to expand beyond this to include C functions and/or
classes which can be used as accelerators; while this accelerates the
common case of sys.modules, this (probably) won't make Antoine happy enough
for importing a small module from bytecode (importing large modules like
decimal are already fast enough).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-09 Thread PJ Eby
On Feb 9, 2012 9:58 AM, Brett Cannon br...@python.org wrote:
 This actually depends on the type of ImportError. My current solution
actually would trigger an ImportError at the import statement if no finder
could locate the module. But if some ImportError was raised because of some
other issue during load then that would come up at first use.

That's not really a lazy import then, or at least not as lazy as what
Mercurial or PEAK use for general lazy importing.  If you have a lot of
them, that module-finding time really adds up.

Again, the goal is fast startup of command-line tools that only use a small
subset of the overall framework; doing disk access for lazy imports goes
against that goal.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-09 Thread Brett Cannon
On Thu, Feb 9, 2012 at 13:43, PJ Eby p...@telecommunity.com wrote:


 On Feb 9, 2012 9:58 AM, Brett Cannon br...@python.org wrote:
  This actually depends on the type of ImportError. My current solution
 actually would trigger an ImportError at the import statement if no finder
 could locate the module. But if some ImportError was raised because of some
 other issue during load then that would come up at first use.

 That's not really a lazy import then, or at least not as lazy as what
 Mercurial or PEAK use for general lazy importing.  If you have a lot of
 them, that module-finding time really adds up.

 Again, the goal is fast startup of command-line tools that only use a
 small subset of the overall framework; doing disk access for lazy imports
 goes against that goal.

Depends if you consider stat calls the overhead vs. the actual disk
read/write to load the data. Anyway, this is going to lead down to a
discussion/argument over design parameters which I'm not up to having since
I'm not actively working on a lazy loader for the stdlib right now.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-09 Thread Mike Meyer
On Thu, 9 Feb 2012 14:19:59 -0500
Brett Cannon br...@python.org wrote:
 On Thu, Feb 9, 2012 at 13:43, PJ Eby p...@telecommunity.com wrote:
  Again, the goal is fast startup of command-line tools that only use a
  small subset of the overall framework; doing disk access for lazy imports
  goes against that goal.
 
 Depends if you consider stat calls the overhead vs. the actual disk
 read/write to load the data. Anyway, this is going to lead down to a
 discussion/argument over design parameters which I'm not up to having since
 I'm not actively working on a lazy loader for the stdlib right now.

For those of you not watching -ideas, or ignoring the Python TIOBE
-3% discussion, this would seem to be relevant to any discussion of
reworking the import mechanism:

http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html

mike
-- 
Mike Meyer m...@mired.org http://www.mired.org/
Independent Software developer/SCM consultant, email for more information.

O ascii ribbon campaign - stop html mail - www.asciiribbon.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-09 Thread Glenn Linderman

On 2/9/2012 11:53 AM, Mike Meyer wrote:

On Thu, 9 Feb 2012 14:19:59 -0500
Brett Cannonbr...@python.org  wrote:

On Thu, Feb 9, 2012 at 13:43, PJ Ebyp...@telecommunity.com  wrote:

Again, the goal is fast startup of command-line tools that only use a
small subset of the overall framework; doing disk access for lazy imports
goes against that goal.


Depends if you consider stat calls the overhead vs. the actual disk
read/write to load the data. Anyway, this is going to lead down to a
discussion/argument over design parameters which I'm not up to having since
I'm not actively working on a lazy loader for the stdlib right now.

For those of you not watching -ideas, or ignoring the Python TIOBE
-3% discussion, this would seem to be relevant to any discussion of
reworking the import mechanism:

http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html

 mike


So what is the implication here?  That building a cache of module 
locations (cleared when a new module is installed) would be more 
effective than optimizing the search for modules on every invocation of 
Python?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-09 Thread Terry Reedy

On 2/9/2012 3:27 PM, Glenn Linderman wrote:

On 2/9/2012 11:53 AM, Mike Meyer wrote:

On Thu, 9 Feb 2012 14:19:59 -0500
Brett Cannonbr...@python.org  wrote:

On Thu, Feb 9, 2012 at 13:43, PJ Ebyp...@telecommunity.com  wrote:

Again, the goal is fast startup of command-line tools that only use a
small subset of the overall framework; doing disk access for lazy imports
goes against that goal.


Depends if you consider stat calls the overhead vs. the actual disk
read/write to load the data. Anyway, this is going to lead down to a
discussion/argument over design parameters which I'm not up to having since
I'm not actively working on a lazy loader for the stdlib right now.

For those of you not watching -ideas, or ignoring the Python TIOBE
-3% discussion, this would seem to be relevant to any discussion of
reworking the import mechanism:

http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html


For 32k processes on BlueGene/P, importing
100 trivial C-extension modules takes 5.5 hours, compared to 35
minutes for all other interpreter loading and initialization.
We developed a simple pure-Python module (based on knee.py, a
hierarchical import example) that cuts the import time from 5.5 hours
to 6 minutes.


So what is the implication here?  That building a cache of module
locations (cleared when a new module is installed) would be more
effective than optimizing the search for modules on every invocation of
Python?



--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-09 Thread PJ Eby
On Thu, Feb 9, 2012 at 2:53 PM, Mike Meyer m...@mired.org wrote:

 For those of you not watching -ideas, or ignoring the Python TIOBE
 -3% discussion, this would seem to be relevant to any discussion of
 reworking the import mechanism:

 http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html

 Interesting.  This gives me an idea for a way to cut stat calls per
sys.path entry per import by roughly 4x, at the cost of a one-time
directory read per sys.path entry.

That is, an importer created for a particular directory could, upon first
use, cache a frozenset(listdir()), and the stat().st_mtime of the
directory.  All the filename checks could then be performed against the
frozenset, and the st_mtime of the directory only checked once per import,
to verify whether the frozenset() needed refreshing.

Since a failed module lookup takes at least 5 stat checks (pyc, pyo, py,
directory, and compiled extension (pyd/so)), this cuts it down to only 1,
at the price of a listdir().  The big question is how long does a listdir()
take, compared to a stat() or failed open()?   That would tell us whether
the tradeoff is worth making.

I did some crude timeit tests on frozenset(listdir()) and trapping failed
stat calls.  It looks like, for a Windows directory the size of the 2.7
stdlib, you need about four *failed* import attempts to overcome the
initial caching cost, or about 8 successful bytecode imports.  (For Linux,
you might need to double these numbers; my tests showed a different ratio
there, perhaps due to the Linux stdib I tested having nearly twice as many
directory entries as the directory I tested on Windows!)

However, the numbers are much better for application directories than for
the stdlib, since they are located earlier on sys.path.  Every successful
stdlib import in an application is equal to one failed import attempt for
every preceding directory on sys.path, so as long as the average directory
on sys.path isn't vastly larger than the stdlib, and the average
application imports at least four modules from the stdlib (on Windows, or 8
on Linux), there would be a net performance gain for the application as a
whole.  (That is, there'd be an improved per-sys.path entry import time for
stdlib modules, even if not for any application modules.)

For smaller directories, the tradeoff actually gets better.  A directory
one seventh the size of the 2.7 Windows stdlib has a listdir() that's
proportionately faster, but failed stats() in that directory are *not*
proportionately faster; they're only somewhat faster.  This means that it
takes fewer failed module lookups to make caching a win - about 2 in this
case, vs. 4 for the stdlib.

Now, these numbers are with actual disk or network access abstracted away,
because the data's in the operating system cache when I run the tests.
 It's possible that this strategy could backfire if you used, say, an NFS
directory with ten thousand files in it as your first sys.path entry.
 Without knowing the timings for listdir/stat/failed stat in that setup,
it's hard to say how many stdlib imports you need before you come out
ahead.  When I tried a directory about 7 times larger than the stdlib,
creating the frozenset took 10 times as long, but the cost of a failed stat
didn't go up by very much.

This suggests that there's probably an optimal directory size cutoff for
this trick; if only there were some way to check the size of a directory
without reading it, we could turn off the caching for oversize directories,
and get a major speed boost for everything else.  On most platforms, the
stat().st_size of the directory itself will give you some idea, but on
Windows that's always zero.  On Windows, we could work around that by using
a lower-level API than listdir() and simply stop reading the directory if
we hit the maximum number of entries we're willing to build a cache for,
and then call it off.

(Another possibility would be to explicitly enable caching by putting a
flag file in the directory, or perhaps by putting a special prefix on the
sys.path entry, setting the cutoff in an environment variable, etc.)

In any case, this seems really worth a closer look: in non-pathological
cases, it could make directory-based importing as fast as zip imports are.
 I'd be especially interested in knowing how the listdir/stat/failed stat
ratios work on NFS - ISTM that they might be even *more* conducive to this
approach, if setup latency dominates the cost of individual system calls.

If this works out, it'd be a good example of why importlib is a good idea;
i.e., allowing us to play with ideas like this.  Brett, wouldn't you love
to be able to say importlib is *faster* than the old C-based importing?
 ;-)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-09 Thread Antoine Pitrou
On Thu, 9 Feb 2012 17:00:04 -0500
PJ Eby p...@telecommunity.com wrote:
 On Thu, Feb 9, 2012 at 2:53 PM, Mike Meyer m...@mired.org wrote:
 
  For those of you not watching -ideas, or ignoring the Python TIOBE
  -3% discussion, this would seem to be relevant to any discussion of
  reworking the import mechanism:
 
  http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html
 
  Interesting.  This gives me an idea for a way to cut stat calls per
 sys.path entry per import by roughly 4x, at the cost of a one-time
 directory read per sys.path entry.

Why do you even think this is a problem with stat calls?



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-09 Thread Robert Kern

On 2/9/12 10:15 PM, Antoine Pitrou wrote:

On Thu, 9 Feb 2012 17:00:04 -0500
PJ Ebyp...@telecommunity.com  wrote:

On Thu, Feb 9, 2012 at 2:53 PM, Mike Meyerm...@mired.org  wrote:


For those of you not watching -ideas, or ignoring the Python TIOBE
-3% discussion, this would seem to be relevant to any discussion of
reworking the import mechanism:

http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html

Interesting.  This gives me an idea for a way to cut stat calls per

sys.path entry per import by roughly 4x, at the cost of a one-time
directory read per sys.path entry.


Why do you even think this is a problem with stat calls?


All he said is that reading about that problem and its solution gave him an idea 
about dealing with stat call overhead. The cost of stat calls has demonstrated 
itself to be a significant problem in other, more typical contexts.


--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-09 Thread PJ Eby
On Thu, Feb 9, 2012 at 5:34 PM, Robert Kern robert.k...@gmail.com wrote:

 On 2/9/12 10:15 PM, Antoine Pitrou wrote:

 On Thu, 9 Feb 2012 17:00:04 -0500
 PJ Ebyp...@telecommunity.com  wrote:

 On Thu, Feb 9, 2012 at 2:53 PM, Mike Meyerm...@mired.org  wrote:

  For those of you not watching -ideas, or ignoring the Python TIOBE
 -3% discussion, this would seem to be relevant to any discussion of
 reworking the import mechanism:

 http://mail.scipy.org/**pipermail/numpy-discussion/**
 2012-January/059801.htmlhttp://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html

 Interesting.  This gives me an idea for a way to cut stat calls per

 sys.path entry per import by roughly 4x, at the cost of a one-time
 directory read per sys.path entry.


 Why do you even think this is a problem with stat calls?


 All he said is that reading about that problem and its solution gave him
 an idea about dealing with stat call overhead. The cost of stat calls has
 demonstrated itself to be a significant problem in other, more typical
 contexts.


Right.  It was the part of the post that mentioned that all they sped up
was knowing which directory the files were in, not the actual loading of
bytecode.  The thought then occurred to me that this could perhaps be
applied to normal importing, as a zipimport-style speedup.  (The zipimport
module caches each zipfile directory it finds on sys.path, so failed import
lookups are extremely fast.)

It occurs to me, too, that applying the caching trick to *only* the stdlib
directories would still be a win as soon as you have between four and eight
site-packages (or user specific site-packages) imports in an application,
so it might be worth applying unconditionally to system-defined stdlib
(non-site) directories.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-09 Thread Nick Coghlan
On Fri, Feb 10, 2012 at 1:05 AM, Brett Cannon br...@python.org wrote:
 This would then be similar to the way main.c already works when it
 interacts with runpy - simple cases are handled directly in C, more
 complex cases get handed over to the Python module.

 I suspect that if people want the case where you load from bytecode is fast
 then this will have to expand beyond this to include C functions and/or
 classes which can be used as accelerators; while this accelerates the common
 case of sys.modules, this (probably) won't make Antoine happy enough for
 importing a small module from bytecode (importing large modules like decimal
 are already fast enough).

No, my suggestion of keeping a de minimis C implementation for the
builtin __import__ is purely about ensuring the case of repeated
imports (especially those nested inside functions) remains as fast as
it is today.

To speed up *first time* imports (regardless of their origin), I think
it makes a lot more sense to use better algorithms at the importlib
level, and that's much easier in Python than it is in C. It's not like
we've ever been philosophically *opposed* to smarter approaches, it's
just that import.c was already hairy enough and we had grave doubts
about messing with it too much (I still have immense respect for the
effort that Victor put in to sorting out most of its problems with
Unicode handling). Not having that millstone hanging around our necks
should open up *lots* of avenues for improvement without breaking
backwards compatibility (since we can really do what we like, so long
as the PEP 302 APIs are still invoked in the right order and the
various public APIs remain backwards compatible).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-09 Thread Terry Reedy

On 2/9/2012 7:19 PM, PJ Eby wrote:


Right.  It was the part of the post that mentioned that all they sped up
was knowing which directory the files were in, not the actual loading of
bytecode.  The thought then occurred to me that this could perhaps be
applied to normal importing, as a zipimport-style speedup.  (The
zipimport module caches each zipfile directory it finds on sys.path, so
failed import lookups are extremely fast.)

It occurs to me, too, that applying the caching trick to *only* the
stdlib directories would still be a win as soon as you have between four
and eight site-packages (or user specific site-packages) imports in an
application, so it might be worth applying unconditionally to
system-defined stdlib (non-site) directories.


It might be worthwhile to store a single file in in the directory that 
contains /Lib with the info inport needs to get files in /Lib and its 
subdirs, and check that it is not outdated relative to /Lib. Since in 
Python 3, .pyc files go in __pycache__, if /Lib included an empyty 
__pycache__ on installation, /Lib would never be touched on most 
installations. Ditto for the non-__pycache__ subdirs.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-08 Thread Brett Cannon
On Tue, Feb 7, 2012 at 17:42, Antoine Pitrou solip...@pitrou.net wrote:

 On Tue, 7 Feb 2012 17:24:21 -0500
 Brett Cannon br...@python.org wrote:
 
  IOW you want the sys.modules case fast, which I will never be able to
 match
  compared to C code since that is pure execution with no I/O.

 Why wouldn't continue using C code for that? It's trivial (just a dict
 lookup).


 Sure, but it's all the code between the function call and hitting
sys.modules which would also need to get shoved into the C code. As I said,
I have not tried to optimize anything yet (and unfortunately a lot of the
upfront costs are over stupid things like checking if  __import__ is being
called with a string for the module name).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-08 Thread Brett Cannon
On Tue, Feb 7, 2012 at 18:08, Antoine Pitrou solip...@pitrou.net wrote:

 On Tue, 7 Feb 2012 17:16:18 -0500
 Brett Cannon br...@python.org wrote:
 
 IOW I really do not look forward to someone saying importlib is so
 much
slower at importing a module containing ``pass`` when (a) that never
happens, and (b) most programs do not spend their time importing but
instead doing interesting work.
  
   Well, import time is so important that the Mercurial developers have
   written an on-demand import mechanism, to reduce the latency of
   command-line operations.
  
 
  Sure, but they are a somewhat extreme case.

 I don't think Mercurial is extreme. Any command-line tool written in
 Python applies. For example, yum (Fedora's apt-get) is written in
 Python. And I'm sure many people do small administration scripts in
 Python. These tools may then be run in a loop by whatever other script.

   But it's not only important for Mercurial and the like. Even if you're
   developing a Web app, making imports slower will make restarts slower,
   and development more tedious in the first place.
  
  
  Fine, startup cost from a hard crash I can buy when you are getting 1000
  QPS, but development more tedious?

 Well, waiting several seconds when reloading a development server is
 tedious. Anyway, my point was that other cases (than command-line
 tools) can be negatively impacted by import time.

 So, if there is going to be some baseline performance target I need
 to
   hit
to make people happy I would prefer to know what that (real-world)
benchmark is and what the performance target is going to be on a
   non-debug
build.
  
   - No significant slowdown in startup time.
  
 
  What's significant and measuring what exactly? I mean startup already
 has a
  ton of imports as it is, so this would wash out the point of measuring
  practically anything else for anything small.

 I don't understand your sentence. Yes, startup has a ton of imports and
 that's why I'm fearing it may be negatively impacted :)

 (a ton being a bit less than 50 currently)


So you want less than a 50% startup cost on the standard startup benchmarks?



  This is why I said I want a
  benchmark to target which does actual work since flat-out startup time
  measures nothing meaningful but busy work.

 Actual work can be very small in some cases. For example, if you run
 hg branch I'm quite sure it doesn't do a lot of work except importing
 many modules and then reading a single file in .hg (the one named
 .hg/branch probably, but I'm not a Mercurial dev).

 In the absence of more real world benchmarks, I think the startup
 benchmarks in the benchmarks repo are a good baseline.

 That said you could also install my 3.x port of Twisted here:
 https://bitbucket.org/pitrou/t3k/

 and then run e.g. python3 bin/trial -h.

  I would get more out of code
  that just stat'ed every file in Lib since at least that did some work.

 stat()ing files is not really representative of import work. There are
 many indirections in the import machinery.
 (actually, even import.c appears quite slower than a bunch of stat()
 calls would imply)

   - Within 25% of current performance when importing, say, the struct
module (Lib/struct.py) from bytecode.
  
 
  Why struct? It's such a small module that it isn't really a typical
 module.

 Precisely to measure the overhead. Typical module size will vary
 depending on development style. Some people may prefer writing many
 small modules. Or they may be using many small libraries, or using
 libraries that have adoptes such a development style.

 Measuring the overhead on small modules will make sure we aren't overly
 confident.

  The median file size of Lib is 11K (e.g. tabnanny.py), not 238 bytes
 (which
  is barely past Hello World). And is this just importing struct or is this
  from startup, e.g. ``python -c import struct``?

 Just importing struct, as with the timeit snippets in the other thread.


 OK, so less than 25% slowdown when importing a module with pre-existing
bytecode that is very small.

And here I was worrying you were going to suggest easy goals to reach for.
;)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-08 Thread Brett Cannon
On Tue, Feb 7, 2012 at 21:27, PJ Eby p...@telecommunity.com wrote:



 On Tue, Feb 7, 2012 at 5:24 PM, Brett Cannon br...@python.org wrote:


 On Tue, Feb 7, 2012 at 16:51, PJ Eby p...@telecommunity.com wrote:

 On Tue, Feb 7, 2012 at 3:07 PM, Brett Cannon br...@python.org wrote:

 So, if there is going to be some baseline performance target I need to
 hit to make people happy I would prefer to know what that (real-world)
 benchmark is and what the performance target is going to be on a non-debug
 build. And if people are not worried about the performance then I'm happy
 with that as well. =)


 One thing I'm a bit worried about is repeated imports, especially ones
 that are inside frequently-called functions.  In today's versions of
 Python, this is a performance win for command-line tool platform systems
 like Mercurial and PEAK, where you want to delay importing as long as
 possible, in case the code that needs the import is never called at all...
  but, if it *is* used, you may still need to use it a lot of times.

 When writing that kind of code, I usually just unconditionally import
 inside the function, because the C code check for an already-imported
 module is faster than the Python if statement I'd have to clutter up my
 otherwise-clean function with.

 So, in addition to the things other people have mentioned as performance
 targets, I'd like to keep the slowdown factor low for this type of scenario
 as well.  Specifically, the slowdown shouldn't be so much as to motivate
 lazy importers like Mercurial and PEAK to need to rewrite in-function
 imports to do the already-imported check ourselves.  ;-)

 (Disclaimer: I haven't actually seen Mercurial's delayed/dynamic import
 code, so I can't say for 100% sure if they'd be affected the same way.)


 IOW you want the sys.modules case fast, which I will never be able to
 match compared to C code since that is pure execution with no I/O.


 Couldn't you just prefix the __import__ function with something like this:

  ...
  try:
   module = sys.modules[name]
  except KeyError:
   # slow code path

 (Admittedly, the import lock is still a problem; initially I thought you
 could just skip it for this case, but the problem is that another thread
 could be in the middle of executing the module.)


I practically do already. As of right now there are some 'if' checks that
come ahead of it that I could shift around to fast path this even more
(since who cares about types and such if the module name happens to be in
sys.modules), but  it isn't that far off as-is.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-08 Thread Antoine Pitrou
Le mercredi 08 février 2012 à 11:01 -0500, Brett Cannon a écrit :
 
 
 On Tue, Feb 7, 2012 at 17:42, Antoine Pitrou solip...@pitrou.net
 wrote:
 On Tue, 7 Feb 2012 17:24:21 -0500
 Brett Cannon br...@python.org wrote:
 
  IOW you want the sys.modules case fast, which I will never
 be able to match
  compared to C code since that is pure execution with no I/O.
 
 
 Why wouldn't continue using C code for that? It's trivial
 (just a dict
 lookup).
 
 
  Sure, but it's all the code between the function call and hitting
 sys.modules which would also need to get shoved into the C code. As I
 said, I have not tried to optimize anything yet (and unfortunately a
 lot of the upfront costs are over stupid things like checking if
 __import__ is being called with a string for the module name).

I guess my point was: why is there a function call in that case? The
import statement could look up sys.modules directly.
Or the built-in __import__ could still be written in C, and only defer
to importlib when the module isn't found in sys.modules.
Practicality beats purity.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-08 Thread Brett Cannon
On Tue, Feb 7, 2012 at 22:47, Nick Coghlan ncogh...@gmail.com wrote:

 On Wed, Feb 8, 2012 at 12:54 PM, Terry Reedy tjre...@udel.edu wrote:
  On 2/7/2012 9:35 PM, PJ Eby wrote:
   It's just that not everything I write can depend on Importing.
  Throw an equivalent into the stdlib, though, and I guess I wouldn't have
  to worry about dependencies...
 
  And that is what I think (agree?) should be done to counteract the likely
  slowdown from using importlib.

 Yeah, this is one frequently reinvented wheel that could definitely do
 with a standard implementation. Christian Heimes made an initial
 attempt at such a thing years ago with PEP 369, but an importlib based
 __import__ would let the implementation largely be pure Python (with
 all the increase in power and flexibility that implies).


I'll see if I can come up with a pure Python way to handle setting
attributes on the module since that is the one case that my importers
project code can't handle.


 I'm not sure such an addition would help much with the base
 interpreter start up time though - most of the modules we bring in are
 because we're actually using them for some reason.


It wouldn't. This would be for third-parties only.



 The other thing that shouldn't be underrated here is the value in
 making the builtin import system PEP 302 compliant from a
 *documentation* perspective. I've made occasional attempts at fully
 documenting the import system over the years, and I always end up
 giving up because the combination of the pre-PEP 302 builtin
 mechanisms in import.c and the PEP 302 compliant mechanisms for things
 like zipimport just degenerate into a mess of special cases that are
 impossible to justify beyond nobody got around to fixing this yet.
 The fact that we have an undocumented PEP 302 based reimplementation
 of imports squirrelled away in pkgutil to make pkgutil and runpy work
 is sheer insanity (replacing *that* with importlib might actually be a
 good first step towards full integration).


I actually have never bothered to explain import as it is currently
implemented in any of my PyCon import talks precisely because it is such a
mess. It's just easier to explain from a PEP 302 perspective since you can
actually comprehend that.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-08 Thread Brett Cannon
On Tue, Feb 7, 2012 at 22:47, Nick Coghlan ncogh...@gmail.com wrote

[SNIP]


 The fact that we have an undocumented PEP 302 based reimplementation
 of imports squirrelled away in pkgutil to make pkgutil and runpy work
 is sheer insanity (replacing *that* with importlib might actually be a
 good first step towards full integration).


It easily goes beyond runpy. You could ditch much of imp's C code (e.g.
load_module()), you could write py_compile and compileall using importlib,
you could rewrite zipimport, etc. Anything that touches import could be
refactored to (a) use just Python code, and (b) reshare code so as to not
re-invent the wheel constantly.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-08 Thread Brett Cannon
On Tue, Feb 7, 2012 at 18:26, Alex Gaynor alex.gay...@gmail.com wrote:

 Brett Cannon brett at python.org writes:


  IOW you want the sys.modules case fast, which I will never be able to
 match
 compared to C code since that is pure execution with no I/O.
 


 Sure you can: have a really fast Python VM.

 Constructive: if you can run this code under PyPy it'd be easy to just:

 $ pypy -mtimeit import struct
 $ pypy -mtimeit -s import importlib importlib.import_module('struct')

 Or whatever the right API is.


I'm not worried about PyPy. =) I assume you will just  flat-out use
importlib regardless of what happens with CPython since it is/will be fully
compatible and is already written for you.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-08 Thread Brett Cannon
On Wed, Feb 8, 2012 at 11:09, Antoine Pitrou solip...@pitrou.net wrote:

 Le mercredi 08 février 2012 à 11:01 -0500, Brett Cannon a écrit :
 
 
  On Tue, Feb 7, 2012 at 17:42, Antoine Pitrou solip...@pitrou.net
  wrote:
  On Tue, 7 Feb 2012 17:24:21 -0500
  Brett Cannon br...@python.org wrote:
  
   IOW you want the sys.modules case fast, which I will never
  be able to match
   compared to C code since that is pure execution with no I/O.
 
 
  Why wouldn't continue using C code for that? It's trivial
  (just a dict
  lookup).
 
 
   Sure, but it's all the code between the function call and hitting
  sys.modules which would also need to get shoved into the C code. As I
  said, I have not tried to optimize anything yet (and unfortunately a
  lot of the upfront costs are over stupid things like checking if
  __import__ is being called with a string for the module name).

 I guess my point was: why is there a function call in that case? The
 import statement could look up sys.modules directly.


Because people like to do wacky stuff  with their imports and so fully
bypassing __import__ would be bad.


 Or the built-in __import__ could still be written in C, and only defer
 to importlib when the module isn't found in sys.modules.
 Practicality beats purity.


 It's a possibility, although that would require every function call to
fetch the PyInterpreterState to get at the cached __import__ (so the proper
sys and imp modules are used) and I don't know how expensive that would be
(probably as not as expensive as calling out to Python code but I'm
thinking out loud).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-08 Thread Brett Cannon
On Wed, Feb 8, 2012 at 11:15, Brett Cannon br...@python.org wrote:



 On Tue, Feb 7, 2012 at 22:47, Nick Coghlan ncogh...@gmail.com wrote

 [SNIP]


 The fact that we have an undocumented PEP 302 based reimplementation
 of imports squirrelled away in pkgutil to make pkgutil and runpy work
 is sheer insanity (replacing *that* with importlib might actually be a
 good first step towards full integration).


 It easily goes beyond runpy. You could ditch much of imp's C code (e.g.
 load_module()), you could write py_compile and compileall using importlib,
 you could rewrite zipimport, etc. Anything that touches import could be
 refactored to (a) use just Python code, and (b) reshare code so as to not
 re-invent the wheel constantly.


And taking it even farther, all of the blackbox aspects of import go away.
For instance, the implicit, hidden importers for built-in modules, frozen
modules, extensions, and source could actually be set on sys.path_hooks.
The Meta path importer that handles sys.path could actually exist on
sys.meta_path.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-08 Thread Terry Reedy

On 2/8/2012 11:13 AM, Brett Cannon wrote:

On Tue, Feb 7, 2012 at 22:47, Nick Coghlan ncogh...@gmail.com



I'm not sure such an addition would help much with the base
interpreter start up time though - most of the modules we bring in are
because we're actually using them for some reason.



It wouldn't. This would be for third-parties only.


such as hg. That is what I had in mind.

Would the following work? Treat a function as a 'loop' in that it may be 
executed repeatedly. Treat 'import x' in a function as what it is, an 
__import__ call plus a local assignment. Apply a version of the usual 
optimization: put a sys.modules-based lazy import outside of the 
function (at the top of the module?) and leave the local assignment x = 
sys.modules['x'] in the function. Change sys.modules.__delattr__ to 
replace a module with a dummy, so the function will still work after a 
deletion, as it does now.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-08 Thread Antoine Pitrou
On Wed, 8 Feb 2012 11:07:10 -0500
Brett Cannon br...@python.org wrote:
 
  So, if there is going to be some baseline performance target I need
  to
hit
 to make people happy I would prefer to know what that (real-world)
 benchmark is and what the performance target is going to be on a
non-debug
 build.
   
- No significant slowdown in startup time.
   
  
   What's significant and measuring what exactly? I mean startup already
  has a
   ton of imports as it is, so this would wash out the point of measuring
   practically anything else for anything small.
 
  I don't understand your sentence. Yes, startup has a ton of imports and
  that's why I'm fearing it may be negatively impacted :)
 
  (a ton being a bit less than 50 currently)
 
 
 So you want less than a 50% startup cost on the standard startup benchmarks?

No, ~50 is the number of imports at startup.
I think startup time should grow by less than 10%.
(even better if it shrinks of course :))

 And here I was worrying you were going to suggest easy goals to reach for.
 ;)

He. Well, if importlib enabled user-level functionality, I guess it
could be attractive to trade a slice of performance against it. But
from an user's point of view, bootstrapping importlib is mostly an
implementation detail with not much of a positive impact.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-08 Thread Brett Cannon
On Wed, Feb 8, 2012 at 14:57, Terry Reedy tjre...@udel.edu wrote:

 On 2/8/2012 11:13 AM, Brett Cannon wrote:

 On Tue, Feb 7, 2012 at 22:47, Nick Coghlan ncogh...@gmail.com


 I'm not sure such an addition would help much with the base
interpreter start up time though - most of the modules we bring in are
because we're actually using them for some reason.


  It wouldn't. This would be for third-parties only.


 such as hg. That is what I had in mind.

 Would the following work? Treat a function as a 'loop' in that it may be
 executed repeatedly. Treat 'import x' in a function as what it is, an
 __import__ call plus a local assignment. Apply a version of the usual
 optimization: put a sys.modules-based lazy import outside of the function
 (at the top of the module?) and leave the local assignment x =
 sys.modules['x'] in the function. Change sys.modules.__delattr__ to
 replace a module with a dummy, so the function will still work after a
 deletion, as it does now.


Probably, but I would hate to force people to code in a specific way for it
to work.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-08 Thread Terry Reedy

On 2/8/2012 3:16 PM, Brett Cannon wrote:

On Wed, Feb 8, 2012 at 14:57, Terry Reedy tjre...@udel.edu
Would the following work? Treat a function as a 'loop' in that it
may be executed repeatedly. Treat 'import x' in a function as what
it is, an __import__ call plus a local assignment. Apply a version
of the usual optimization: put a sys.modules-based lazy import
outside of the function (at the top of the module?) and leave the
local assignment x = sys.modules['x'] in the function. Change
sys.modules.__delattr__ to replace a module with a dummy, so the
function will still work after a deletion, as it does now.

Probably, but I would hate to force people to code in a specific way for
it to work.


The intent of what I proposed it to be transparent for imports within 
functions. It would be a minor optimization if anything, but it would 
mean that there is a lazy mechanism in place.


For top-level imports, unless *all* are made lazy, then there *must* be 
some indication in the code of whether to make it lazy or not.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-08 Thread Brett Cannon
On Wed, Feb 8, 2012 at 15:31, Terry Reedy tjre...@udel.edu wrote:

 On 2/8/2012 3:16 PM, Brett Cannon wrote:

 On Wed, Feb 8, 2012 at 14:57, Terry Reedy tjre...@udel.edu
Would the following work? Treat a function as a 'loop' in that it
may be executed repeatedly. Treat 'import x' in a function as what
it is, an __import__ call plus a local assignment. Apply a version
of the usual optimization: put a sys.modules-based lazy import
outside of the function (at the top of the module?) and leave the
local assignment x = sys.modules['x'] in the function. Change
sys.modules.__delattr__ to replace a module with a dummy, so the
function will still work after a deletion, as it does now.

 Probably, but I would hate to force people to code in a specific way for
 it to work.


 The intent of what I proposed it to be transparent for imports within
 functions. It would be a minor optimization if anything, but it would mean
 that there is a lazy mechanism in place.

 For top-level imports, unless *all* are made lazy, then there *must* be
 some indication in the code of whether to make it lazy or not.


Not true; importlib would make it dead-simple to whitelist what modules to
make lazy (e.g. your app code lazy but all stdlib stuff not, etc.).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-08 Thread PJ Eby
On Wed, Feb 8, 2012 at 4:08 PM, Brett Cannon br...@python.org wrote:


 On Wed, Feb 8, 2012 at 15:31, Terry Reedy tjre...@udel.edu wrote:

 For top-level imports, unless *all* are made lazy, then there *must* be
 some indication in the code of whether to make it lazy or not.


 Not true; importlib would make it dead-simple to whitelist what modules to
 make lazy (e.g. your app code lazy but all stdlib stuff not, etc.).


There's actually only a few things stopping all imports from being lazy.
 from x import y immediately de-lazies them, after all.  ;-)

The main two reasons you wouldn't want imports to *always* be lazy are:

1. Changing sys.path or other parameters between the import statement and
the actual import
2. ImportErrors are likewise deferred until point-of-use, so conditional
importing with try/except would break.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-08 Thread Nick Coghlan
On Thu, Feb 9, 2012 at 2:09 AM, Antoine Pitrou solip...@pitrou.net wrote:
 I guess my point was: why is there a function call in that case? The
 import statement could look up sys.modules directly.
 Or the built-in __import__ could still be written in C, and only defer
 to importlib when the module isn't found in sys.modules.
 Practicality beats purity.

I quite like the idea of having builtin __import__ be a *very* thin
veneer around importlib that just does the is this in sys.modules
already so we can just return it from there? checks and delegates
other more complex cases to Python code in importlib.

Poking around in importlib.__import__ [1] (as well as
importlib._gcd_import), I'm thinking what we may want to do is break
up the logic a bit so that there are multiple helper functions that a
C version can call back into so that we can optimise certain simple
code paths to not call back into Python at all, and others to only do
so selectively.

Step 1: separate out the fromlist processing from __import__ into a
separate helper function

def _process_fromlist(module, fromlist):
# Perform any required imports as per existing code:
# 
http://hg.python.org/cpython/file/aba513307f78/Lib/importlib/_bootstrap.py#l987


Step 2: separate out the relative import resolution from _gcd_import
into a separate helper function.

def _resolve_relative_name(name, package, level):
assert hasattr(name, 'rpartition')
assert hasattr(package, 'rpartition')
assert level  0
name = # Recalculate as per the existing code:
# 
http://hg.python.org/cpython/file/aba513307f78/Lib/importlib/_bootstrap.py#l889
return name

Step 3: Implement builtin __import__ in C (pseudo-code below):

def __import__(name, globals={}, locals={}, fromlist=[], level=0):
if level  0:
name = importlib._resolve_relative_import(name)
try:
module = sys.modules[name]
except KeyError:
# Not cached yet, need to invoke the full import machinery
# We already resolved any relative imports though, so
# treat it as an absolute import
return importlib.__import__(name, globals, locals, fromlist, 0)
# Got a hit in the cache, see if there's any more work to do
if not fromlist:
# Duplicate relevant importlib.__import__ logic as C code
# to find the right module to return from sys.modules
elif hasattr(module, __path__):
importlib._process_fromlist(module, fromlist)
return module

This would then be similar to the way main.c already works when it
interacts with runpy - simple cases are handled directly in C, more
complex cases get handed over to the Python module.

Cheers,
Nick.

[1] http://hg.python.org/cpython/file/default/Lib/importlib/_bootstrap.py#l950

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-08 Thread Nick Coghlan
On Thu, Feb 9, 2012 at 11:28 AM, PJ Eby p...@telecommunity.com wrote:
 The main two reasons you wouldn't want imports to *always* be lazy are:

 1. Changing sys.path or other parameters between the import statement and
 the actual import
 2. ImportErrors are likewise deferred until point-of-use, so conditional
 importing with try/except would break.

3. Module level code may have non-local side effects (e.g. installing
codecs, pickle handlers, atexit handlers)

A white-listing based approach to lazy imports would let you manage
all those issues without having to change all the code that actually
*does* the imports.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Brett Cannon
I'm going to start this off with the caveat that
hg.python.org/sandbox/bcannon#bootstrap_importlib is not completely at
feature parity, but getting there shouldn't be hard. There is a FAILING
file that has a list of the tests that are not passing because importlib
bootstrapping and a comment as to why (I think) they are failing. But no
switch would ever happen until the test suite passes.

Anyway, to start this conversation I'm going to open with why I think
removing most of the C code in Python/import.c and replacing it with
importlib/_bootstrap.py is a positive thing.

One is maintainability. Antoine mentioned how if change occurs everyone is
going to have to be able to fix code  in importlib, and that's the point! I
don't know about the rest of you but I find Python code easier to work with
than C code (and if you don't you might be subscribed to the wrong mailing
list =). I would assume the ability to make changes or to fix bugs will be
a lot easier with importlib than import.c. So maintainability should be
easier when it comes to imports.

Two is APIs. PEP 302 introduced this idea of an API for objects that can
perform imports so that people can control it, enhance it, introspect it,
etc. But as it stands right now, import.c implements none of PEP 302 for
any built-in import mechanism. This mostly stems from positive thing #1 I
just mentioned. but since I was able to do this code from scratch I was
able to design for (and extend) PEP 302 compliance in order to make sure
the entire import system was exposed cleanly. This means it is much easier
now to write a custom importer for quirky syntax, a different storage
mechanism, etc.

Third is multi-VM support. IronPython, Jython, and PyPy have all said they
would love importlib to become the default import implementation so that
all VMs have the same implementation. Some people have even said they will
use importlib regardless of what CPython does simply to ease their coding
burden, but obviously that still leads to the possibility of subtle
semantic differences that would go away if all VMs used the same
implementation. So switching would lead to one less possible semantic
difference between the various VMs.

So, that is the positives. What are the negatives? Performance, of course.

Now I'm going to be upfront and say I really did not want to have this
performance conversation now as I have done *NO* profiling or analysis of
the algorithms used in importlib in order to tune performance (e.g. the
function that handles case-sensitivity, which is on the critical path for
importing source code, has a platform check which could go away if I
instead had platform-specific versions of the function that were assigned
to a global variable at startup). I also know that people have a bad habit
of latching on to micro-benchmark numbers, especially for something like
import which involves startup or can easily be measured. I mean I wrote
importlib.test.benchmark to help measure performance changes in any
algorithmic changes I might make, but it isn't a real-world benchmark like
what Unladen Swallow gave us (e.g. the two start-up benchmarks that use
real-world apps -- hg and bzr -- aren't available on Python 3 so only
normal_startup and nosite_startup can be used ATM).

IOW I really do not look forward to someone saying importlib is so much
slower at importing a module containing ``pass`` when (a) that never
happens, and (b) most programs do not spend their time importing but
instead doing interesting work.

For instance, right now importlib does ``python -c import decimal``
(which, BTW, is the largest module in the stdlib) 25% slower on my machine
with a pydebug build (a non-debug build would probably be in my favor as I
have more Python objects being used in importlib and thus more sanity
checks). But if you do something (very) slightly more interesting like
``python -m calendar`` where is a slight amount of work then importlib is
currently only 16% slower. So it all depends on how we measure (as usual).

So, if there is going to be some baseline performance target I need to hit
to make people happy I would prefer to know what that (real-world)
benchmark is and what the performance target is going to be on a non-debug
build. And if people are not worried about the performance then I'm happy
with that as well. =)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Barry Warsaw
Brett, thanks for persevering on importlib!  Given how complicated imports are
in Python, I really appreciate you pushing this forward.  I've been knee deep
in both import.c and importlib at various times. ;)

On Feb 07, 2012, at 03:07 PM, Brett Cannon wrote:

One is maintainability. Antoine mentioned how if change occurs everyone is
going to have to be able to fix code  in importlib, and that's the point! I
don't know about the rest of you but I find Python code easier to work with
than C code (and if you don't you might be subscribed to the wrong mailing
list =). I would assume the ability to make changes or to fix bugs will be
a lot easier with importlib than import.c. So maintainability should be
easier when it comes to imports.

I think it's *really* critical that importlib be well-documented.  Not just
its API, but also design documents (what classes are there, and why it's
decomposed that way), descriptions of how to extend and subclass, maybe even
examples for doing some typical hooks.  Maybe even a guided tour or tutorial
for people digging into importlib for the first time.

So, that is the positives. What are the negatives? Performance, of course.

That's okay.  Get it complete, right, and usable first and then unleash the
Pythonic hoards to bang on performance.

IOW I really do not look forward to someone saying importlib is so much
slower at importing a module containing ``pass`` when (a) that never
happens, and (b) most programs do not spend their time importing but
instead doing interesting work.

Identifying the use cases are important here.  For example, even if it were a
lot slower, Mailman wouldn't care (*I* might care because it takes longer to
run my test, but my users wouldn't).  But Bazaar or Mercurial users would care
a lot.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Dirkjan Ochtman
On Tue, Feb 7, 2012 at 21:24, Barry Warsaw ba...@python.org wrote:
 Identifying the use cases are important here.  For example, even if it were a
 lot slower, Mailman wouldn't care (*I* might care because it takes longer to
 run my test, but my users wouldn't).  But Bazaar or Mercurial users would care
 a lot.

Yeah, startup performance getting worse kinda sucks for command-line
apps. And IIRC it's been getting worse over the past few releases...

Anyway, I think there was enough of a python3 port for Mercurial (from
various GSoC students) that you can probably run some of the very
simple commands (like hg parents or hg id), which should be enough for
your purposes, right?

Cheers,

Dirkjan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Antoine Pitrou
On Tue, 7 Feb 2012 15:07:24 -0500
Brett Cannon br...@python.org wrote:
 
 Now I'm going to be upfront and say I really did not want to have this
 performance conversation now as I have done *NO* profiling or analysis of
 the algorithms used in importlib in order to tune performance (e.g. the
 function that handles case-sensitivity, which is on the critical path for
 importing source code, has a platform check which could go away if I
 instead had platform-specific versions of the function that were assigned
 to a global variable at startup).

From a cursory look, I think you're gonna have to break (special-case)
some abstractions and have some inner loop coded in C for the common
cases.

That said, I think profiling and solving performance issues is critical
*before* integrating this work. It doesn't need to be done by you, but
the python-dev community shouldn't feel strong-armed to solve the issue.

 IOW I really do not look forward to someone saying importlib is so much
 slower at importing a module containing ``pass`` when (a) that never
 happens, and (b) most programs do not spend their time importing but
 instead doing interesting work.

Well, import time is so important that the Mercurial developers have
written an on-demand import mechanism, to reduce the latency of
command-line operations.

But it's not only important for Mercurial and the like. Even if you're
developing a Web app, making imports slower will make restarts slower,
and development more tedious in the first place.

 So, if there is going to be some baseline performance target I need to hit
 to make people happy I would prefer to know what that (real-world)
 benchmark is and what the performance target is going to be on a non-debug
 build.

- No significant slowdown in startup time.

- Within 25% of current performance when importing, say, the struct
  module (Lib/struct.py) from bytecode.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Paul Moore
On 7 February 2012 20:49, Antoine Pitrou solip...@pitrou.net wrote:
 Well, import time is so important that the Mercurial developers have
 written an on-demand import mechanism, to reduce the latency of
 command-line operations.

One question here, I guess - does the importlib integration do
anything to make writing on-demand import mechanisms easier (I'd
suspect not, but you never know...) If it did, then performance issues
might be somewhat less of a sticking point, as usual depending on use
cases.

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread PJ Eby
On Tue, Feb 7, 2012 at 3:07 PM, Brett Cannon br...@python.org wrote:

 So, if there is going to be some baseline performance target I need to hit
 to make people happy I would prefer to know what that (real-world)
 benchmark is and what the performance target is going to be on a non-debug
 build. And if people are not worried about the performance then I'm happy
 with that as well. =)


One thing I'm a bit worried about is repeated imports, especially ones that
are inside frequently-called functions.  In today's versions of Python,
this is a performance win for command-line tool platform systems like
Mercurial and PEAK, where you want to delay importing as long as possible,
in case the code that needs the import is never called at all...  but, if
it *is* used, you may still need to use it a lot of times.

When writing that kind of code, I usually just unconditionally import
inside the function, because the C code check for an already-imported
module is faster than the Python if statement I'd have to clutter up my
otherwise-clean function with.

So, in addition to the things other people have mentioned as performance
targets, I'd like to keep the slowdown factor low for this type of scenario
as well.  Specifically, the slowdown shouldn't be so much as to motivate
lazy importers like Mercurial and PEAK to need to rewrite in-function
imports to do the already-imported check ourselves.  ;-)

(Disclaimer: I haven't actually seen Mercurial's delayed/dynamic import
code, so I can't say for 100% sure if they'd be affected the same way.)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Brett Cannon
On Tue, Feb 7, 2012 at 15:49, Antoine Pitrou solip...@pitrou.net wrote:

 On Tue, 7 Feb 2012 15:07:24 -0500
 Brett Cannon br...@python.org wrote:
 
  Now I'm going to be upfront and say I really did not want to have this
  performance conversation now as I have done *NO* profiling or analysis of
  the algorithms used in importlib in order to tune performance (e.g. the
  function that handles case-sensitivity, which is on the critical path for
  importing source code, has a platform check which could go away if I
  instead had platform-specific versions of the function that were assigned
  to a global variable at startup).

 From a cursory look, I think you're gonna have to break (special-case)
 some abstractions and have some inner loop coded in C for the common
 cases.


Wouldn't shock me if it came to that, but obviously I would like to try to
avoid it.



 That said, I think profiling and solving performance issues is critical
 *before* integrating this work. It doesn't need to be done by you, but
 the python-dev community shouldn't feel strong-armed to solve the issue.


That part of the discussion I'm staying out of since I want to see this in
so I'm biased.


   IOW I really do not look forward to someone saying importlib is so much
  slower at importing a module containing ``pass`` when (a) that never
  happens, and (b) most programs do not spend their time importing but
  instead doing interesting work.

 Well, import time is so important that the Mercurial developers have
 written an on-demand import mechanism, to reduce the latency of
 command-line operations.


Sure, but they are a somewhat extreme case.



 But it's not only important for Mercurial and the like. Even if you're
 developing a Web app, making imports slower will make restarts slower,
 and development more tedious in the first place.


Fine, startup cost from a hard crash I can buy when you are getting 1000
QPS, but development more tedious?


   So, if there is going to be some baseline performance target I need to
 hit
  to make people happy I would prefer to know what that (real-world)
  benchmark is and what the performance target is going to be on a
 non-debug
  build.

 - No significant slowdown in startup time.


What's significant and measuring what exactly? I mean startup already has a
ton of imports as it is, so this would wash out the point of measuring
practically anything else for anything small. This is why I said I want a
benchmark to target which does actual work since flat-out startup time
measures nothing meaningful but busy work. I would get more out of code
that just stat'ed every file in Lib since at least that did some work.



 - Within 25% of current performance when importing, say, the struct
  module (Lib/struct.py) from bytecode.


Why struct? It's such a small module that it isn't really a typical module.
The median file size of Lib is 11K (e.g. tabnanny.py), not 238 bytes (which
is barely past Hello World). And is this just importing struct or is this
from startup, e.g. ``python -c import struct``?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Brett Cannon
On Tue, Feb 7, 2012 at 15:24, Barry Warsaw ba...@python.org wrote:

 Brett, thanks for persevering on importlib!  Given how complicated imports
 are
 in Python, I really appreciate you pushing this forward.  I've been knee
 deep
 in both import.c and importlib at various times. ;)

 On Feb 07, 2012, at 03:07 PM, Brett Cannon wrote:

 One is maintainability. Antoine mentioned how if change occurs everyone is
 going to have to be able to fix code  in importlib, and that's the point!
 I
 don't know about the rest of you but I find Python code easier to work
 with
 than C code (and if you don't you might be subscribed to the wrong mailing
 list =). I would assume the ability to make changes or to fix bugs will be
 a lot easier with importlib than import.c. So maintainability should be
 easier when it comes to imports.

 I think it's *really* critical that importlib be well-documented.  Not just
 its API, but also design documents (what classes are there, and why it's
 decomposed that way), descriptions of how to extend and subclass, maybe
 even
 examples for doing some typical hooks.  Maybe even a guided tour or
 tutorial
 for people digging into importlib for the first time.


That's fine and not difficult to do.



 So, that is the positives. What are the negatives? Performance, of course.

 That's okay.  Get it complete, right, and usable first and then unleash the
 Pythonic hoards to bang on performance.

 IOW I really do not look forward to someone saying importlib is so much
 slower at importing a module containing ``pass`` when (a) that never
 happens, and (b) most programs do not spend their time importing but
 instead doing interesting work.

 Identifying the use cases are important here.  For example, even if it
 were a
 lot slower, Mailman wouldn't care (*I* might care because it takes longer
 to
 run my test, but my users wouldn't).  But Bazaar or Mercurial users would
 care
 a lot.


Right, which is why I'm looking for some agreed upon, concrete benchmark I
can use which isn't fluff.

-Brett



 -Barry
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/brett%40python.org

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Brett Cannon
On Tue, Feb 7, 2012 at 16:19, Paul Moore p.f.mo...@gmail.com wrote:

 On 7 February 2012 20:49, Antoine Pitrou solip...@pitrou.net wrote:
  Well, import time is so important that the Mercurial developers have
  written an on-demand import mechanism, to reduce the latency of
  command-line operations.

 One question here, I guess - does the importlib integration do
 anything to make writing on-demand import mechanisms easier (I'd
 suspect not, but you never know...) If it did, then performance issues
 might be somewhat less of a sticking point, as usual depending on use
 cases.


Depends on what your feature set is. I have a fully working mixin you can
add to any loader which makes it lazy if you trigger the import on reading
an attribute from the module:
http://code.google.com/p/importers/source/browse/importers/lazy.py . But if
you want to trigger the import on *writing* an attribute then I have yet to
make that work in Python source (maybe people have an idea on how to make
that work since __setattr__ doesn't mix well with __getattribute__).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Brett Cannon
On Tue, Feb 7, 2012 at 15:28, Dirkjan Ochtman dirk...@ochtman.nl wrote:

 On Tue, Feb 7, 2012 at 21:24, Barry Warsaw ba...@python.org wrote:
  Identifying the use cases are important here.  For example, even if it
 were a
  lot slower, Mailman wouldn't care (*I* might care because it takes
 longer to
  run my test, but my users wouldn't).  But Bazaar or Mercurial users
 would care
  a lot.

 Yeah, startup performance getting worse kinda sucks for command-line
 apps. And IIRC it's been getting worse over the past few releases...

 Anyway, I think there was enough of a python3 port for Mercurial (from
 various GSoC students) that you can probably run some of the very
 simple commands (like hg parents or hg id), which should be enough for
 your purposes, right?


Possibly. Where is the code?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Brett Cannon
On Tue, Feb 7, 2012 at 16:51, PJ Eby p...@telecommunity.com wrote:

 On Tue, Feb 7, 2012 at 3:07 PM, Brett Cannon br...@python.org wrote:

 So, if there is going to be some baseline performance target I need to
 hit to make people happy I would prefer to know what that (real-world)
 benchmark is and what the performance target is going to be on a non-debug
 build. And if people are not worried about the performance then I'm happy
 with that as well. =)


 One thing I'm a bit worried about is repeated imports, especially ones
 that are inside frequently-called functions.  In today's versions of
 Python, this is a performance win for command-line tool platform systems
 like Mercurial and PEAK, where you want to delay importing as long as
 possible, in case the code that needs the import is never called at all...
  but, if it *is* used, you may still need to use it a lot of times.

 When writing that kind of code, I usually just unconditionally import
 inside the function, because the C code check for an already-imported
 module is faster than the Python if statement I'd have to clutter up my
 otherwise-clean function with.

 So, in addition to the things other people have mentioned as performance
 targets, I'd like to keep the slowdown factor low for this type of scenario
 as well.  Specifically, the slowdown shouldn't be so much as to motivate
 lazy importers like Mercurial and PEAK to need to rewrite in-function
 imports to do the already-imported check ourselves.  ;-)

 (Disclaimer: I haven't actually seen Mercurial's delayed/dynamic import
 code, so I can't say for 100% sure if they'd be affected the same way.)


IOW you want the sys.modules case fast, which I will never be able to match
compared to C code since that is pure execution with no I/O.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Antoine Pitrou
On Tue, 7 Feb 2012 17:24:21 -0500
Brett Cannon br...@python.org wrote:
 
 IOW you want the sys.modules case fast, which I will never be able to match
 compared to C code since that is pure execution with no I/O.

Why wouldn't continue using C code for that? It's trivial (just a dict
lookup).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Barry Warsaw
On Feb 07, 2012, at 09:19 PM, Paul Moore wrote:

One question here, I guess - does the importlib integration do
anything to make writing on-demand import mechanisms easier (I'd
suspect not, but you never know...) If it did, then performance issues
might be somewhat less of a sticking point, as usual depending on use
cases.

It might even be a feature-win if a standard on-demand import mechanism could
be added on top of importlib so all these projects wouldn't have to roll their
own.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Antoine Pitrou
On Tue, 7 Feb 2012 17:16:18 -0500
Brett Cannon br...@python.org wrote:
 
IOW I really do not look forward to someone saying importlib is so much
   slower at importing a module containing ``pass`` when (a) that never
   happens, and (b) most programs do not spend their time importing but
   instead doing interesting work.
 
  Well, import time is so important that the Mercurial developers have
  written an on-demand import mechanism, to reduce the latency of
  command-line operations.
 
 
 Sure, but they are a somewhat extreme case.

I don't think Mercurial is extreme. Any command-line tool written in
Python applies. For example, yum (Fedora's apt-get) is written in
Python. And I'm sure many people do small administration scripts in
Python. These tools may then be run in a loop by whatever other script.

  But it's not only important for Mercurial and the like. Even if you're
  developing a Web app, making imports slower will make restarts slower,
  and development more tedious in the first place.
 
 
 Fine, startup cost from a hard crash I can buy when you are getting 1000
 QPS, but development more tedious?

Well, waiting several seconds when reloading a development server is
tedious. Anyway, my point was that other cases (than command-line
tools) can be negatively impacted by import time.

So, if there is going to be some baseline performance target I need to
  hit
   to make people happy I would prefer to know what that (real-world)
   benchmark is and what the performance target is going to be on a
  non-debug
   build.
 
  - No significant slowdown in startup time.
 
 
 What's significant and measuring what exactly? I mean startup already has a
 ton of imports as it is, so this would wash out the point of measuring
 practically anything else for anything small.

I don't understand your sentence. Yes, startup has a ton of imports and
that's why I'm fearing it may be negatively impacted :)

(a ton being a bit less than 50 currently)

 This is why I said I want a
 benchmark to target which does actual work since flat-out startup time
 measures nothing meaningful but busy work.

Actual work can be very small in some cases. For example, if you run
hg branch I'm quite sure it doesn't do a lot of work except importing
many modules and then reading a single file in .hg (the one named
.hg/branch probably, but I'm not a Mercurial dev).

In the absence of more real world benchmarks, I think the startup
benchmarks in the benchmarks repo are a good baseline. 

That said you could also install my 3.x port of Twisted here:
https://bitbucket.org/pitrou/t3k/

and then run e.g. python3 bin/trial -h.

 I would get more out of code
 that just stat'ed every file in Lib since at least that did some work.

stat()ing files is not really representative of import work. There are
many indirections in the import machinery.
(actually, even import.c appears quite slower than a bunch of stat()
calls would imply)

  - Within 25% of current performance when importing, say, the struct
   module (Lib/struct.py) from bytecode.
 
 
 Why struct? It's such a small module that it isn't really a typical module.

Precisely to measure the overhead. Typical module size will vary
depending on development style. Some people may prefer writing many
small modules. Or they may be using many small libraries, or using
libraries that have adoptes such a development style.

Measuring the overhead on small modules will make sure we aren't overly
confident.

 The median file size of Lib is 11K (e.g. tabnanny.py), not 238 bytes (which
 is barely past Hello World). And is this just importing struct or is this
 from startup, e.g. ``python -c import struct``?

Just importing struct, as with the timeit snippets in the other thread.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Alex Gaynor
Brett Cannon brett at python.org writes:


 IOW you want the sys.modules case fast, which I will never be able to match 
compared to C code since that is pure execution with no I/O.
 


Sure you can: have a really fast Python VM.

Constructive: if you can run this code under PyPy it'd be easy to just:

$ pypy -mtimeit import struct
$ pypy -mtimeit -s import importlib importlib.import_module('struct')

Or whatever the right API is.

Alex

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Terry Reedy

On 2/7/2012 4:51 PM, PJ Eby wrote:


One thing I'm a bit worried about is repeated imports, especially ones
that are inside frequently-called functions.  In today's versions of
Python, this is a performance win for command-line tool platform
systems like Mercurial and PEAK, where you want to delay importing as
long as possible, in case the code that needs the import is never called
at all...  but, if it *is* used, you may still need to use it a lot of
times.

When writing that kind of code, I usually just unconditionally import
inside the function, because the C code check for an already-imported
module is faster than the Python if statement I'd have to clutter up
my otherwise-clean function with.


importlib could provide a parameterized decorator for functions that are 
the only consumers of an import. It could operate much like this:


def imps(mod):
def makewrap(f):
def wrapped(*args, **kwds):
print('first/only call to wrapper')
g = globals()
g[mod] = __import__(mod)
g[f.__name__] = f
f(*args, **kwds)
wrapped.__name__ = f.__name__
return wrapped
return makewrap

@imps('itertools')
def ic():
print(itertools.count)

ic()
ic()
#
first/only call to wrapper
class 'itertools.count'
class 'itertools.count'

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread PJ Eby
On Tue, Feb 7, 2012 at 5:24 PM, Brett Cannon br...@python.org wrote:


 On Tue, Feb 7, 2012 at 16:51, PJ Eby p...@telecommunity.com wrote:

 On Tue, Feb 7, 2012 at 3:07 PM, Brett Cannon br...@python.org wrote:

 So, if there is going to be some baseline performance target I need to
 hit to make people happy I would prefer to know what that (real-world)
 benchmark is and what the performance target is going to be on a non-debug
 build. And if people are not worried about the performance then I'm happy
 with that as well. =)


 One thing I'm a bit worried about is repeated imports, especially ones
 that are inside frequently-called functions.  In today's versions of
 Python, this is a performance win for command-line tool platform systems
 like Mercurial and PEAK, where you want to delay importing as long as
 possible, in case the code that needs the import is never called at all...
  but, if it *is* used, you may still need to use it a lot of times.

 When writing that kind of code, I usually just unconditionally import
 inside the function, because the C code check for an already-imported
 module is faster than the Python if statement I'd have to clutter up my
 otherwise-clean function with.

 So, in addition to the things other people have mentioned as performance
 targets, I'd like to keep the slowdown factor low for this type of scenario
 as well.  Specifically, the slowdown shouldn't be so much as to motivate
 lazy importers like Mercurial and PEAK to need to rewrite in-function
 imports to do the already-imported check ourselves.  ;-)

 (Disclaimer: I haven't actually seen Mercurial's delayed/dynamic import
 code, so I can't say for 100% sure if they'd be affected the same way.)


 IOW you want the sys.modules case fast, which I will never be able to
 match compared to C code since that is pure execution with no I/O.


Couldn't you just prefix the __import__ function with something like this:

 ...
 try:
  module = sys.modules[name]
 except KeyError:
  # slow code path

(Admittedly, the import lock is still a problem; initially I thought you
could just skip it for this case, but the problem is that another thread
could be in the middle of executing the module.)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread PJ Eby
On Tue, Feb 7, 2012 at 6:40 PM, Terry Reedy tjre...@udel.edu wrote:

 importlib could provide a parameterized decorator for functions that are
 the only consumers of an import. It could operate much like this:

 def imps(mod):
def makewrap(f):
def wrapped(*args, **kwds):
print('first/only call to wrapper')
g = globals()
g[mod] = __import__(mod)
g[f.__name__] = f
f(*args, **kwds)
wrapped.__name__ = f.__name__
return wrapped
return makewrap

 @imps('itertools')
 def ic():
print(itertools.count)

 ic()
 ic()
 #
 first/only call to wrapper
 class 'itertools.count'
 class 'itertools.count'


If I were going to rewrite code, I'd just use lazy imports (see
http://pypi.python.org/pypi/Importing ).  They're even faster than this
approach (or using plain import statements), as they have zero per-call
function call overhead.  It's just that not everything I write can depend
on Importing.

Throw an equivalent into the stdlib, though, and I guess I wouldn't have to
worry about dependencies...

(To be clearer; I'm talking about the
http://peak.telecommunity.com/DevCenter/Importing#lazy-imports feature,
which sticks a dummy module subclass instance into sys.modules, whose
__gettattribute__ does a reload() of the module, forcing the normal import
process to run, after first changing the dummy object's type to something
that doesn't have the __getattribute__ any more.  This ensures that all
accesses after the first one are at normal module attribute access speed.
 That, and the whenImported decorator from Importing would probably be of
general stdlib usefulness too.)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Terry Reedy

On 2/7/2012 9:35 PM, PJ Eby wrote:

On Tue, Feb 7, 2012 at 6:40 PM, Terry Reedy tjre...@udel.edu
mailto:tjre...@udel.edu wrote:

importlib could provide a parameterized decorator for functions that
are the only consumers of an import. It could operate much like this:

def imps(mod):
def makewrap(f):
def wrapped(*args, **kwds):
print('first/only call to wrapper')
g = globals()
g[mod] = __import__(mod)
g[f.__name__] = f
f(*args, **kwds)
wrapped.__name__ = f.__name__
return wrapped
return makewrap

@imps('itertools')
def ic():
print(itertools.count)

ic()
ic()
#
first/only call to wrapper
class 'itertools.count'
class 'itertools.count'


If I were going to rewrite code, I'd just use lazy imports (see
http://pypi.python.org/pypi/Importing ).  They're even faster than this
approach (or using plain import statements), as they have zero per-call
function call overhead.


My code above and Importing, as I understand it, both delay imports 
until needed by using a dummy object that gets replaced at first access. 
(Now that I am reminded, sys.modules is the better place for the dummy 
objects. I just wanted to show that there is a simple solution (though 
more specialized) even for existing code.) The cost of delay, which 
might mean never, is a bit of one-time extra overhead. Both have no 
extra overhead after the first call. Unless delayed importing is made 
standard, both require a bit of extra code somewhere.



 It's just that not everything I write can depend on Importing.
Throw an equivalent into the stdlib, though, and I guess I wouldn't have
to worry about dependencies...


And that is what I think (agree?) should be done to counteract the 
likely slowdown from using importlib.



(To be clearer; I'm talking about the
http://peak.telecommunity.com/DevCenter/Importing#lazy-imports feature,
which sticks a dummy module subclass instance into sys.modules, whose
__gettattribute__ does a reload() of the module, forcing the normal
import process to run, after first changing the dummy object's type to
something that doesn't have the __getattribute__ any more.  This ensures
that all accesses after the first one are at normal module attribute
access speed.  That, and the whenImported decorator from Importing
would probably be of general stdlib usefulness too.)


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Nick Coghlan
On Wed, Feb 8, 2012 at 12:54 PM, Terry Reedy tjre...@udel.edu wrote:
 On 2/7/2012 9:35 PM, PJ Eby wrote:
  It's just that not everything I write can depend on Importing.
 Throw an equivalent into the stdlib, though, and I guess I wouldn't have
 to worry about dependencies...

 And that is what I think (agree?) should be done to counteract the likely
 slowdown from using importlib.

Yeah, this is one frequently reinvented wheel that could definitely do
with a standard implementation. Christian Heimes made an initial
attempt at such a thing years ago with PEP 369, but an importlib based
__import__ would let the implementation largely be pure Python (with
all the increase in power and flexibility that implies).

I'm not sure such an addition would help much with the base
interpreter start up time though - most of the modules we bring in are
because we're actually using them for some reason.

The other thing that shouldn't be underrated here is the value in
making the builtin import system PEP 302 compliant from a
*documentation* perspective. I've made occasional attempts at fully
documenting the import system over the years, and I always end up
giving up because the combination of the pre-PEP 302 builtin
mechanisms in import.c and the PEP 302 compliant mechanisms for things
like zipimport just degenerate into a mess of special cases that are
impossible to justify beyond nobody got around to fixing this yet.
The fact that we have an undocumented PEP 302 based reimplementation
of imports squirrelled away in pkgutil to make pkgutil and runpy work
is sheer insanity (replacing *that* with importlib might actually be a
good first step towards full integration).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Eric Snow
On Tue, Feb 7, 2012 at 8:47 PM, Nick Coghlan ncogh...@gmail.com wrote:
 On Wed, Feb 8, 2012 at 12:54 PM, Terry Reedy tjre...@udel.edu wrote:
 On 2/7/2012 9:35 PM, PJ Eby wrote:
  It's just that not everything I write can depend on Importing.
 Throw an equivalent into the stdlib, though, and I guess I wouldn't have
 to worry about dependencies...

 And that is what I think (agree?) should be done to counteract the likely
 slowdown from using importlib.

 Yeah, this is one frequently reinvented wheel that could definitely do
 with a standard implementation. Christian Heimes made an initial
 attempt at such a thing years ago with PEP 369, but an importlib based
 __import__ would let the implementation largely be pure Python (with
 all the increase in power and flexibility that implies).

 I'm not sure such an addition would help much with the base
 interpreter start up time though - most of the modules we bring in are
 because we're actually using them for some reason.

 The other thing that shouldn't be underrated here is the value in
 making the builtin import system PEP 302 compliant from a
 *documentation* perspective. I've made occasional attempts at fully
 documenting the import system over the years, and I always end up
 giving up because the combination of the pre-PEP 302 builtin
 mechanisms in import.c and the PEP 302 compliant mechanisms for things
 like zipimport just degenerate into a mess of special cases that are
 impossible to justify beyond nobody got around to fixing this yet.
 The fact that we have an undocumented PEP 302 based reimplementation
 of imports squirrelled away in pkgutil to make pkgutil and runpy work
 is sheer insanity (replacing *that* with importlib might actually be a
 good first step towards full integration).

+1 on all counts

-eric
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com