Re: [Python-Dev] requirements for moving __import__ over to importlib?
I just tried this and I get a str/bytes issue. I also think your setup3k.py command is missing ``build`` and your build/scripts-3.2 is missing ``/hg``. On Wed, Feb 22, 2012 at 19:26, Éric Araujo mer...@netwok.org wrote: Hi Brett, I think this message went unanswered, so here’s a late reply: Le 07/02/2012 23:21, Brett Cannon a écrit : On Tue, Feb 7, 2012 at 15:28, Dirkjan Ochtman dirk...@ochtman.nl wrote: [...] Anyway, I think there was enough of a python3 port for Mercurial (from various GSoC students) that you can probably run some of the very simple commands (like hg parents or hg id), which should be enough for your purposes, right? Possibly. Where is the code? # get Mercurial from a repo or tarball hg clone http://selenic.com/repo/hg/ cd hg # convert files in place (don’t commit after this :) python3.2 contrib/setup3k.py # the makefile is not py3k-aware, need to run manually # the current stable head fails with a TypeError for me PYTHONPATH=. python3.2 build/scripts-3.2 Cheers ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
Hi Brett, I think this message went unanswered, so here’s a late reply: Le 07/02/2012 23:21, Brett Cannon a écrit : On Tue, Feb 7, 2012 at 15:28, Dirkjan Ochtman dirk...@ochtman.nl wrote: [...] Anyway, I think there was enough of a python3 port for Mercurial (from various GSoC students) that you can probably run some of the very simple commands (like hg parents or hg id), which should be enough for your purposes, right? Possibly. Where is the code? # get Mercurial from a repo or tarball hg clone http://selenic.com/repo/hg/ cd hg # convert files in place (don’t commit after this :) python3.2 contrib/setup3k.py # the makefile is not py3k-aware, need to run manually # the current stable head fails with a TypeError for me PYTHONPATH=. python3.2 build/scripts-3.2 Cheers ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
Le 07/02/2012 23:21, Brett Cannon a écrit : On Tue, Feb 7, 2012 at 15:28, Dirkjan Ochtman dirk...@ochtman.nl wrote: Yeah, startup performance getting worse kinda sucks for command-line apps. And IIRC it's been getting worse over the past few releases... Anyway, I think there was enough of a python3 port for Mercurial (from various GSoC students) that you can probably run some of the very simple commands (like hg parents or hg id), which should be enough for your purposes, right? Possibly. Where is the code? hg clone http://selenic.com/repo/hg/ cd hg python3 contrib/setup3k.py build ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Thu, Feb 9, 2012 at 17:00, PJ Eby p...@telecommunity.com wrote: On Thu, Feb 9, 2012 at 2:53 PM, Mike Meyer m...@mired.org wrote: For those of you not watching -ideas, or ignoring the Python TIOBE -3% discussion, this would seem to be relevant to any discussion of reworking the import mechanism: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html Interesting. This gives me an idea for a way to cut stat calls per sys.path entry per import by roughly 4x, at the cost of a one-time directory read per sys.path entry. That is, an importer created for a particular directory could, upon first use, cache a frozenset(listdir()), and the stat().st_mtime of the directory. All the filename checks could then be performed against the frozenset, and the st_mtime of the directory only checked once per import, to verify whether the frozenset() needed refreshing. I actually contemplated this back in 2006 when I first began importlib for use at Google to get around NFS's crappy stat performance. Never got around to it as compatibility with import.c turned out to be a little tricky. =) Your solution below, PJE, is more-or-less what I was considering (although I also considered variants that didn't stat the directory when you knew your code wasn't changing stuff behind your back). Since a failed module lookup takes at least 5 stat checks (pyc, pyo, py, directory, and compiled extension (pyd/so)), this cuts it down to only 1, at the price of a listdir(). The big question is how long does a listdir() take, compared to a stat() or failed open()? That would tell us whether the tradeoff is worth making. Actually it's pyc OR pyo, py, directory (which can lead to another set for __init__.py and __pycache__), .so, module.so (or whatever your platform uses for extensions). I did some crude timeit tests on frozenset(listdir()) and trapping failed stat calls. It looks like, for a Windows directory the size of the 2.7 stdlib, you need about four *failed* import attempts to overcome the initial caching cost, or about 8 successful bytecode imports. (For Linux, you might need to double these numbers; my tests showed a different ratio there, perhaps due to the Linux stdib I tested having nearly twice as many directory entries as the directory I tested on Windows!) However, the numbers are much better for application directories than for the stdlib, since they are located earlier on sys.path. Every successful stdlib import in an application is equal to one failed import attempt for every preceding directory on sys.path, so as long as the average directory on sys.path isn't vastly larger than the stdlib, and the average application imports at least four modules from the stdlib (on Windows, or 8 on Linux), there would be a net performance gain for the application as a whole. (That is, there'd be an improved per-sys.path entry import time for stdlib modules, even if not for any application modules.) Does this comment take into account the number of modules required to load the interpreter to begin with? That's already like 48 modules loaded by Python 3.2 as it is. For smaller directories, the tradeoff actually gets better. A directory one seventh the size of the 2.7 Windows stdlib has a listdir() that's proportionately faster, but failed stats() in that directory are *not* proportionately faster; they're only somewhat faster. This means that it takes fewer failed module lookups to make caching a win - about 2 in this case, vs. 4 for the stdlib. Now, these numbers are with actual disk or network access abstracted away, because the data's in the operating system cache when I run the tests. It's possible that this strategy could backfire if you used, say, an NFS directory with ten thousand files in it as your first sys.path entry. Without knowing the timings for listdir/stat/failed stat in that setup, it's hard to say how many stdlib imports you need before you come out ahead. When I tried a directory about 7 times larger than the stdlib, creating the frozenset took 10 times as long, but the cost of a failed stat didn't go up by very much. This suggests that there's probably an optimal directory size cutoff for this trick; if only there were some way to check the size of a directory without reading it, we could turn off the caching for oversize directories, and get a major speed boost for everything else. On most platforms, the stat().st_size of the directory itself will give you some idea, but on Windows that's always zero. On Windows, we could work around that by using a lower-level API than listdir() and simply stop reading the directory if we hit the maximum number of entries we're willing to build a cache for, and then call it off. (Another possibility would be to explicitly enable caching by putting a flag file in the directory, or perhaps by putting a special prefix on the sys.path entry, setting the cutoff in an
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Fri, Feb 10, 2012 at 1:05 PM, Brett Cannon br...@python.org wrote: On Thu, Feb 9, 2012 at 17:00, PJ Eby p...@telecommunity.com wrote: I did some crude timeit tests on frozenset(listdir()) and trapping failed stat calls. It looks like, for a Windows directory the size of the 2.7 stdlib, you need about four *failed* import attempts to overcome the initial caching cost, or about 8 successful bytecode imports. (For Linux, you might need to double these numbers; my tests showed a different ratio there, perhaps due to the Linux stdib I tested having nearly twice as many directory entries as the directory I tested on Windows!) However, the numbers are much better for application directories than for the stdlib, since they are located earlier on sys.path. Every successful stdlib import in an application is equal to one failed import attempt for every preceding directory on sys.path, so as long as the average directory on sys.path isn't vastly larger than the stdlib, and the average application imports at least four modules from the stdlib (on Windows, or 8 on Linux), there would be a net performance gain for the application as a whole. (That is, there'd be an improved per-sys.path entry import time for stdlib modules, even if not for any application modules.) Does this comment take into account the number of modules required to load the interpreter to begin with? That's already like 48 modules loaded by Python 3.2 as it is. I didn't count those, no. So, if they're loaded from disk *after* importlib is initialized, then they should pay off the cost of caching even fairly large directories that appear earlier on sys.path than the stdlib. We still need to know about NFS and other ratios, though... I still worry that people with more extreme directory sizes or slow-access situations will run into even worse trouble than they have now. First is that if this were used on Windows or OS X (i.e. the OSs we support that typically have case-insensitive filesystems), then this approach would be a massive gain as we already call os.listdir() when PYTHONCASEOK isn't defined to check case-sensitivity; take your 5 stat calls and add in 5 listdir() calls and that's what you get on Windows and OS X right now. Linux doesn't have this check so you would still be potentially paying a penalty there. Wow. That means it'd always be a win for pre-stdlib sys.path entries, because any successful stdlib import equals a failed pre-stdlib lookup. (Of course, that's just saving some of the overhead that's been *added* by importlib, not a new gain, but still...) Second is variance in filesystems. Are we guaranteed that the stat of a directory is updated before a file change is made? Not quite sure what you mean here. The directory stat is used to ensure that new files haven't been added, old ones removed, or existing ones renamed. Changes to the files themselves shouldn't factor in, should they? Else there is a small race condition there which would suck. We also have the issue of granularity; Antoine has already had to add the source file size to .pyc files in Python 3.3 to combat crappy mtime granularity when generating bytecode. If we get file mod - import - file mod - import, are we guaranteed that the second import will know there was a modification if the first three steps occur fast enough to fit within the granularity of an mtime value? Again, I'm not sure how this relates. Automatic code reloaders monitor individual files that have been previously imported, so the directory timestamps aren't relevant. Of course, I could be confused here. Are you saying that if somebody makes a new .py file and saves it, that it'll be possible to import it before it's finished being written? If so, that could happen already, and again caching the directory doesn't make any difference. Alternately, you could have a situation where the file is deleted after we load the listdir(), but in that case the open will fail and we can fall back... heck, we can even force resetting the cache in that event. I was going to say something about __pycache__, but it actually doesn't affect this. Since you would have to stat the directory anyway, you might as well just stat directory for the file you want to keep it simple. Only if you consider __pycache__ to be immutable except for what the interpreter puts in that directory during execution could you optimize that step (in which case you can stat the directory once and never care again as the set would be just updated by import whenever a new .pyc file was written). Having said all of this, implementing this idea would be trivial using importlib if you don't try to optimize the __pycache__ case. It's just a question of whether people are comfortable with the semantic change to import. This could also be made into something that was in importlib for people to use when desired if we are too worried about semantic changes. Yep.
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Fri, Feb 10, 2012 at 15:07, PJ Eby p...@telecommunity.com wrote: On Fri, Feb 10, 2012 at 1:05 PM, Brett Cannon br...@python.org wrote: On Thu, Feb 9, 2012 at 17:00, PJ Eby p...@telecommunity.com wrote: I did some crude timeit tests on frozenset(listdir()) and trapping failed stat calls. It looks like, for a Windows directory the size of the 2.7 stdlib, you need about four *failed* import attempts to overcome the initial caching cost, or about 8 successful bytecode imports. (For Linux, you might need to double these numbers; my tests showed a different ratio there, perhaps due to the Linux stdib I tested having nearly twice as many directory entries as the directory I tested on Windows!) However, the numbers are much better for application directories than for the stdlib, since they are located earlier on sys.path. Every successful stdlib import in an application is equal to one failed import attempt for every preceding directory on sys.path, so as long as the average directory on sys.path isn't vastly larger than the stdlib, and the average application imports at least four modules from the stdlib (on Windows, or 8 on Linux), there would be a net performance gain for the application as a whole. (That is, there'd be an improved per-sys.path entry import time for stdlib modules, even if not for any application modules.) Does this comment take into account the number of modules required to load the interpreter to begin with? That's already like 48 modules loaded by Python 3.2 as it is. I didn't count those, no. So, if they're loaded from disk *after* importlib is initialized, then they should pay off the cost of caching even fairly large directories that appear earlier on sys.path than the stdlib. We still need to know about NFS and other ratios, though... I still worry that people with more extreme directory sizes or slow-access situations will run into even worse trouble than they have now. It's possible. No way to make it work for everyone. This is why I didn't worry about some crazy perf optimization. First is that if this were used on Windows or OS X (i.e. the OSs we support that typically have case-insensitive filesystems), then this approach would be a massive gain as we already call os.listdir() when PYTHONCASEOK isn't defined to check case-sensitivity; take your 5 stat calls and add in 5 listdir() calls and that's what you get on Windows and OS X right now. Linux doesn't have this check so you would still be potentially paying a penalty there. Wow. That means it'd always be a win for pre-stdlib sys.path entries, because any successful stdlib import equals a failed pre-stdlib lookup. (Of course, that's just saving some of the overhead that's been *added* by importlib, not a new gain, but still...) How so? import.c does a listdir() as well (this is not special to importlib). Second is variance in filesystems. Are we guaranteed that the stat of a directory is updated before a file change is made? Not quite sure what you mean here. The directory stat is used to ensure that new files haven't been added, old ones removed, or existing ones renamed. Changes to the files themselves shouldn't factor in, should they? Changes in any fashion to the directory. Do filesystems atomically update the mtime of a directory when they commit a change? Otherwise we have a potential race condition. Else there is a small race condition there which would suck. We also have the issue of granularity; Antoine has already had to add the source file size to .pyc files in Python 3.3 to combat crappy mtime granularity when generating bytecode. If we get file mod - import - file mod - import, are we guaranteed that the second import will know there was a modification if the first three steps occur fast enough to fit within the granularity of an mtime value? Again, I'm not sure how this relates. Automatic code reloaders monitor individual files that have been previously imported, so the directory timestamps aren't relevant. Don't care about automatic reloaders. I'm just asking about the case where the mtime granularity is coarse enough to allow for a directory change, an import to execute, and then another directory change to occur all within a single mtime increment. That would lead to the set cache to be out of date. Of course, I could be confused here. Are you saying that if somebody makes a new .py file and saves it, that it'll be possible to import it before it's finished being written? If so, that could happen already, and again caching the directory doesn't make any difference. Alternately, you could have a situation where the file is deleted after we load the listdir(), but in that case the open will fail and we can fall back... heck, we can even force resetting the cache in that event. I was going to say something about __pycache__, but it actually doesn't affect this. Since you would have to stat the
Re: [Python-Dev] requirements for moving __import__ over to importlib?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/10/2012 03:38 PM, Brett Cannon wrote: Changes in any fashion to the directory. Do filesystems atomically update the mtime of a directory when they commit a change? Otherwise we have a potential race condition. Hmm, maybe I misundersand you. In POSIX land, the only thing which changes the mtime of a directory is linking / unlinking / renaming a file: changes to individual files aren't detectable by examining their containing directory's stat(). Tres. - -- === Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software Excellence by Designhttp://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk81jDsACgkQ+gerLs4ltQ7YRwCePFEQA7E74dD9/j8ILuRMHLlA xbkAn1vTYGrEn4VOnVpygGafkGgnm42e =rJGg -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Fri, Feb 10, 2012 at 16:29, Tres Seaver tsea...@palladion.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/10/2012 03:38 PM, Brett Cannon wrote: Changes in any fashion to the directory. Do filesystems atomically update the mtime of a directory when they commit a change? Otherwise we have a potential race condition. Hmm, maybe I misundersand you. In POSIX land, the only thing which changes the mtime of a directory is linking / unlinking / renaming a file: changes to individual files aren't detectable by examining their containing directory's stat(). Individual file changes are not important; either the module is already in sys.modules so no attempt is made to detect a change or it hasn't been loaded and so it will have to be read regardless. All I'm asking is whether filesystems typically update the filesystem for a e.g. file deletion atomically with the mtime for the containing directory or not. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/10/2012 04:42 PM, Brett Cannon wrote: On Fri, Feb 10, 2012 at 16:29, Tres Seaver tsea...@palladion.com wrote: On 02/10/2012 03:38 PM, Brett Cannon wrote: Changes in any fashion to the directory. Do filesystems atomically update the mtime of a directory when they commit a change? Otherwise we have a potential race condition. Hmm, maybe I misundersand you. In POSIX land, the only thing which changes the mtime of a directory is linking / unlinking / renaming a file: changes to individual files aren't detectable by examining their containing directory's stat(). Individual file changes are not important; either the module is already in sys.modules so no attempt is made to detect a change or it hasn't been loaded and so it will have to be read regardless. All I'm asking is whether filesystems typically update the filesystem for a e.g. file deletion atomically with the mtime for the containing directory or not. In POSIX land, most certainly. Tres. - -- === Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software Excellence by Designhttp://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk81kCIACgkQ+gerLs4ltQ5MogCfQwP2n4gl9PfsNXuP3c5al8EX TgwAn2EoGz1vk0OQAh5n3Tl9oze1CSSC =3iuR -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Feb 10, 2012 3:38 PM, Brett Cannon br...@python.org wrote: On Fri, Feb 10, 2012 at 15:07, PJ Eby p...@telecommunity.com wrote: On Fri, Feb 10, 2012 at 1:05 PM, Brett Cannon br...@python.org wrote: First is that if this were used on Windows or OS X (i.e. the OSs we support that typically have case-insensitive filesystems), then this approach would be a massive gain as we already call os.listdir() when PYTHONCASEOK isn't defined to check case-sensitivity; take your 5 stat calls and add in 5 listdir() calls and that's what you get on Windows and OS X right now. Linux doesn't have this check so you would still be potentially paying a penalty there. Wow. That means it'd always be a win for pre-stdlib sys.path entries, because any successful stdlib import equals a failed pre-stdlib lookup. (Of course, that's just saving some of the overhead that's been *added* by importlib, not a new gain, but still...) How so? import.c does a listdir() as well (this is not special to importlib). IIRC, it does a FindFirstFile on Windows, which is not the same thing. That's one system call into a preallocated buffer, not a series of system calls and creation of Python string objects. Don't care about automatic reloaders. I'm just asking about the case where the mtime granularity is coarse enough to allow for a directory change, an import to execute, and then another directory change to occur all within a single mtime increment. That would lead to the set cache to be out of date. Ah. Good point. Well, if there's any way to know what the mtime granularity is, we can avoid the race condition by never performing the listdir when the current clock time is too close to the stat(). In effect, we can bypass the optimization if the directory was just modified. Something like: mtime = stat(dir).st_mtime if abs(time.time()-mtime)unsafe_window: old_mtime, files = cache.get(dir, (-1, ())) if mtime!=old_mtime: files = frozenset(listdir(dir)) cache[dir] = mtime, files # code to check for possibility of importing # and shortcut if found, or # exit with failure if no matching files # fallthrough to direct filesystem checking The unsafe window is presumably filesystem and platform dependent, but ISTR that even FAT filesystems have 2-second accuracy. The other catch is the relationship between st_mtime and time.time(); I assume they'd be the same in any sane system, but what if you're working across a network and there's clock skew? Ugh. Worst case example would be say, accessing a FAT device that's been shared over a Windows network from a machine whose clock is several hours off. So it always looks safe to read, even if it's just been changed. What's the downside in that case? You're trying to import something that just changed in the last fraction of a second... why? I mean, sure, the directory listing will be wrong, no question. But it only matters that it was wrong if you added, removed, or renamed importable files. Why are you trying to import one of them? Ah, here's a use case: you're starting up IDLE, and while it's loading, you save some .py files you plan to import later. Your editor saves them all at once, but IDLE does the listdir() midway through. You then do an import from the IDLE prompt, and it fails because the listdir() didn't catch everything. Okay, now I know how to fix this. The problem isn't that there's a race condition per se, the problem is that the race results in a broken cache later. After all, it could just as easily have been the case that the import failed due to timing. The problem is that all *future* imports would fail in this circumstance. So the fix is a time-to-live recheck: if TTL seconds have passed since the last use of the cached frozenset, reload it, and reset the TTL to infinity. In other words: mtime = stat(dir).st_mtime now - time.time() if abs(now-mtime)unsafe_window: old_mtime, then, files = cache.get(dir, (-1, now, ())) if mtime!=old_mtime or then is not None and now-thenTTL: files = frozenset(listdir(dir)) cache[dir] = mtime, now if mtime!=old_mtime else None, files # code to check for possibility of importing # and shortcut if found, or # exit with failure if no matching files # fallthrough to direct filesystem checking What this does (or should do) is handle clock-skew race condition stale caches by reloading the listdir even if mtime hasn't changed, as soon as TTL seconds have passed since the last snapshot was taken. However, if the mtime stays the same, no subsequent listdirs will occur. As long as the TTL is set high enough that a full startup of Python can occur, but low enough that it resets by the time a human can notice something's wrong, it should be golden. ;-) The TTL approach could be used in place of the unsafe_window, actually; there's probably not much need for
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Sat, Feb 11, 2012 at 11:23 AM, PJ Eby p...@telecommunity.com wrote: What's the downside in that case? You're trying to import something that just changed in the last fraction of a second... why? I don't know if it's normal in the Python world, but these sorts of race conditions occur most annoyingly when a single process changes a file, then attempts to import it. If you open a file, write to it, explicitly close it, and then load it, you would expect to read back what you wrote, not the version that was there previously. Chris Angelico ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Wed, Feb 8, 2012 at 20:28, PJ Eby p...@telecommunity.com wrote: On Wed, Feb 8, 2012 at 4:08 PM, Brett Cannon br...@python.org wrote: On Wed, Feb 8, 2012 at 15:31, Terry Reedy tjre...@udel.edu wrote: For top-level imports, unless *all* are made lazy, then there *must* be some indication in the code of whether to make it lazy or not. Not true; importlib would make it dead-simple to whitelist what modules to make lazy (e.g. your app code lazy but all stdlib stuff not, etc.). There's actually only a few things stopping all imports from being lazy. from x import y immediately de-lazies them, after all. ;-) The main two reasons you wouldn't want imports to *always* be lazy are: 1. Changing sys.path or other parameters between the import statement and the actual import 2. ImportErrors are likewise deferred until point-of-use, so conditional importing with try/except would break. This actually depends on the type of ImportError. My current solution actually would trigger an ImportError at the import statement if no finder could locate the module. But if some ImportError was raised because of some other issue during load then that would come up at first use. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Wed, Feb 8, 2012 at 20:26, Nick Coghlan ncogh...@gmail.com wrote: On Thu, Feb 9, 2012 at 2:09 AM, Antoine Pitrou solip...@pitrou.net wrote: I guess my point was: why is there a function call in that case? The import statement could look up sys.modules directly. Or the built-in __import__ could still be written in C, and only defer to importlib when the module isn't found in sys.modules. Practicality beats purity. I quite like the idea of having builtin __import__ be a *very* thin veneer around importlib that just does the is this in sys.modules already so we can just return it from there? checks and delegates other more complex cases to Python code in importlib. Poking around in importlib.__import__ [1] (as well as importlib._gcd_import), I'm thinking what we may want to do is break up the logic a bit so that there are multiple helper functions that a C version can call back into so that we can optimise certain simple code paths to not call back into Python at all, and others to only do so selectively. Step 1: separate out the fromlist processing from __import__ into a separate helper function def _process_fromlist(module, fromlist): # Perform any required imports as per existing code: # http://hg.python.org/cpython/file/aba513307f78/Lib/importlib/_bootstrap.py#l987 Fine by me. Step 2: separate out the relative import resolution from _gcd_import into a separate helper function. def _resolve_relative_name(name, package, level): assert hasattr(name, 'rpartition') assert hasattr(package, 'rpartition') assert level 0 name = # Recalculate as per the existing code: # http://hg.python.org/cpython/file/aba513307f78/Lib/importlib/_bootstrap.py#l889 return name I was actually already thinking of exposing this as importlib.resolve_name() so breaking it out makes sense. I also think it might be possible to expose a sort of importlib.find_module() that does nothing more than find the loader for a module (if available). Step 3: Implement builtin __import__ in C (pseudo-code below): def __import__(name, globals={}, locals={}, fromlist=[], level=0): if level 0: name = importlib._resolve_relative_import(name) try: module = sys.modules[name] except KeyError: # Not cached yet, need to invoke the full import machinery # We already resolved any relative imports though, so # treat it as an absolute import return importlib.__import__(name, globals, locals, fromlist, 0) # Got a hit in the cache, see if there's any more work to do if not fromlist: # Duplicate relevant importlib.__import__ logic as C code # to find the right module to return from sys.modules elif hasattr(module, __path__): importlib._process_fromlist(module, fromlist) return module This would then be similar to the way main.c already works when it interacts with runpy - simple cases are handled directly in C, more complex cases get handed over to the Python module. I suspect that if people want the case where you load from bytecode is fast then this will have to expand beyond this to include C functions and/or classes which can be used as accelerators; while this accelerates the common case of sys.modules, this (probably) won't make Antoine happy enough for importing a small module from bytecode (importing large modules like decimal are already fast enough). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Feb 9, 2012 9:58 AM, Brett Cannon br...@python.org wrote: This actually depends on the type of ImportError. My current solution actually would trigger an ImportError at the import statement if no finder could locate the module. But if some ImportError was raised because of some other issue during load then that would come up at first use. That's not really a lazy import then, or at least not as lazy as what Mercurial or PEAK use for general lazy importing. If you have a lot of them, that module-finding time really adds up. Again, the goal is fast startup of command-line tools that only use a small subset of the overall framework; doing disk access for lazy imports goes against that goal. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Thu, Feb 9, 2012 at 13:43, PJ Eby p...@telecommunity.com wrote: On Feb 9, 2012 9:58 AM, Brett Cannon br...@python.org wrote: This actually depends on the type of ImportError. My current solution actually would trigger an ImportError at the import statement if no finder could locate the module. But if some ImportError was raised because of some other issue during load then that would come up at first use. That's not really a lazy import then, or at least not as lazy as what Mercurial or PEAK use for general lazy importing. If you have a lot of them, that module-finding time really adds up. Again, the goal is fast startup of command-line tools that only use a small subset of the overall framework; doing disk access for lazy imports goes against that goal. Depends if you consider stat calls the overhead vs. the actual disk read/write to load the data. Anyway, this is going to lead down to a discussion/argument over design parameters which I'm not up to having since I'm not actively working on a lazy loader for the stdlib right now. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Thu, 9 Feb 2012 14:19:59 -0500 Brett Cannon br...@python.org wrote: On Thu, Feb 9, 2012 at 13:43, PJ Eby p...@telecommunity.com wrote: Again, the goal is fast startup of command-line tools that only use a small subset of the overall framework; doing disk access for lazy imports goes against that goal. Depends if you consider stat calls the overhead vs. the actual disk read/write to load the data. Anyway, this is going to lead down to a discussion/argument over design parameters which I'm not up to having since I'm not actively working on a lazy loader for the stdlib right now. For those of you not watching -ideas, or ignoring the Python TIOBE -3% discussion, this would seem to be relevant to any discussion of reworking the import mechanism: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html mike -- Mike Meyer m...@mired.org http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O ascii ribbon campaign - stop html mail - www.asciiribbon.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On 2/9/2012 11:53 AM, Mike Meyer wrote: On Thu, 9 Feb 2012 14:19:59 -0500 Brett Cannonbr...@python.org wrote: On Thu, Feb 9, 2012 at 13:43, PJ Ebyp...@telecommunity.com wrote: Again, the goal is fast startup of command-line tools that only use a small subset of the overall framework; doing disk access for lazy imports goes against that goal. Depends if you consider stat calls the overhead vs. the actual disk read/write to load the data. Anyway, this is going to lead down to a discussion/argument over design parameters which I'm not up to having since I'm not actively working on a lazy loader for the stdlib right now. For those of you not watching -ideas, or ignoring the Python TIOBE -3% discussion, this would seem to be relevant to any discussion of reworking the import mechanism: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html mike So what is the implication here? That building a cache of module locations (cleared when a new module is installed) would be more effective than optimizing the search for modules on every invocation of Python? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On 2/9/2012 3:27 PM, Glenn Linderman wrote: On 2/9/2012 11:53 AM, Mike Meyer wrote: On Thu, 9 Feb 2012 14:19:59 -0500 Brett Cannonbr...@python.org wrote: On Thu, Feb 9, 2012 at 13:43, PJ Ebyp...@telecommunity.com wrote: Again, the goal is fast startup of command-line tools that only use a small subset of the overall framework; doing disk access for lazy imports goes against that goal. Depends if you consider stat calls the overhead vs. the actual disk read/write to load the data. Anyway, this is going to lead down to a discussion/argument over design parameters which I'm not up to having since I'm not actively working on a lazy loader for the stdlib right now. For those of you not watching -ideas, or ignoring the Python TIOBE -3% discussion, this would seem to be relevant to any discussion of reworking the import mechanism: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html For 32k processes on BlueGene/P, importing 100 trivial C-extension modules takes 5.5 hours, compared to 35 minutes for all other interpreter loading and initialization. We developed a simple pure-Python module (based on knee.py, a hierarchical import example) that cuts the import time from 5.5 hours to 6 minutes. So what is the implication here? That building a cache of module locations (cleared when a new module is installed) would be more effective than optimizing the search for modules on every invocation of Python? -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Thu, Feb 9, 2012 at 2:53 PM, Mike Meyer m...@mired.org wrote: For those of you not watching -ideas, or ignoring the Python TIOBE -3% discussion, this would seem to be relevant to any discussion of reworking the import mechanism: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html Interesting. This gives me an idea for a way to cut stat calls per sys.path entry per import by roughly 4x, at the cost of a one-time directory read per sys.path entry. That is, an importer created for a particular directory could, upon first use, cache a frozenset(listdir()), and the stat().st_mtime of the directory. All the filename checks could then be performed against the frozenset, and the st_mtime of the directory only checked once per import, to verify whether the frozenset() needed refreshing. Since a failed module lookup takes at least 5 stat checks (pyc, pyo, py, directory, and compiled extension (pyd/so)), this cuts it down to only 1, at the price of a listdir(). The big question is how long does a listdir() take, compared to a stat() or failed open()? That would tell us whether the tradeoff is worth making. I did some crude timeit tests on frozenset(listdir()) and trapping failed stat calls. It looks like, for a Windows directory the size of the 2.7 stdlib, you need about four *failed* import attempts to overcome the initial caching cost, or about 8 successful bytecode imports. (For Linux, you might need to double these numbers; my tests showed a different ratio there, perhaps due to the Linux stdib I tested having nearly twice as many directory entries as the directory I tested on Windows!) However, the numbers are much better for application directories than for the stdlib, since they are located earlier on sys.path. Every successful stdlib import in an application is equal to one failed import attempt for every preceding directory on sys.path, so as long as the average directory on sys.path isn't vastly larger than the stdlib, and the average application imports at least four modules from the stdlib (on Windows, or 8 on Linux), there would be a net performance gain for the application as a whole. (That is, there'd be an improved per-sys.path entry import time for stdlib modules, even if not for any application modules.) For smaller directories, the tradeoff actually gets better. A directory one seventh the size of the 2.7 Windows stdlib has a listdir() that's proportionately faster, but failed stats() in that directory are *not* proportionately faster; they're only somewhat faster. This means that it takes fewer failed module lookups to make caching a win - about 2 in this case, vs. 4 for the stdlib. Now, these numbers are with actual disk or network access abstracted away, because the data's in the operating system cache when I run the tests. It's possible that this strategy could backfire if you used, say, an NFS directory with ten thousand files in it as your first sys.path entry. Without knowing the timings for listdir/stat/failed stat in that setup, it's hard to say how many stdlib imports you need before you come out ahead. When I tried a directory about 7 times larger than the stdlib, creating the frozenset took 10 times as long, but the cost of a failed stat didn't go up by very much. This suggests that there's probably an optimal directory size cutoff for this trick; if only there were some way to check the size of a directory without reading it, we could turn off the caching for oversize directories, and get a major speed boost for everything else. On most platforms, the stat().st_size of the directory itself will give you some idea, but on Windows that's always zero. On Windows, we could work around that by using a lower-level API than listdir() and simply stop reading the directory if we hit the maximum number of entries we're willing to build a cache for, and then call it off. (Another possibility would be to explicitly enable caching by putting a flag file in the directory, or perhaps by putting a special prefix on the sys.path entry, setting the cutoff in an environment variable, etc.) In any case, this seems really worth a closer look: in non-pathological cases, it could make directory-based importing as fast as zip imports are. I'd be especially interested in knowing how the listdir/stat/failed stat ratios work on NFS - ISTM that they might be even *more* conducive to this approach, if setup latency dominates the cost of individual system calls. If this works out, it'd be a good example of why importlib is a good idea; i.e., allowing us to play with ideas like this. Brett, wouldn't you love to be able to say importlib is *faster* than the old C-based importing? ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Thu, 9 Feb 2012 17:00:04 -0500 PJ Eby p...@telecommunity.com wrote: On Thu, Feb 9, 2012 at 2:53 PM, Mike Meyer m...@mired.org wrote: For those of you not watching -ideas, or ignoring the Python TIOBE -3% discussion, this would seem to be relevant to any discussion of reworking the import mechanism: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html Interesting. This gives me an idea for a way to cut stat calls per sys.path entry per import by roughly 4x, at the cost of a one-time directory read per sys.path entry. Why do you even think this is a problem with stat calls? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On 2/9/12 10:15 PM, Antoine Pitrou wrote: On Thu, 9 Feb 2012 17:00:04 -0500 PJ Ebyp...@telecommunity.com wrote: On Thu, Feb 9, 2012 at 2:53 PM, Mike Meyerm...@mired.org wrote: For those of you not watching -ideas, or ignoring the Python TIOBE -3% discussion, this would seem to be relevant to any discussion of reworking the import mechanism: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html Interesting. This gives me an idea for a way to cut stat calls per sys.path entry per import by roughly 4x, at the cost of a one-time directory read per sys.path entry. Why do you even think this is a problem with stat calls? All he said is that reading about that problem and its solution gave him an idea about dealing with stat call overhead. The cost of stat calls has demonstrated itself to be a significant problem in other, more typical contexts. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Thu, Feb 9, 2012 at 5:34 PM, Robert Kern robert.k...@gmail.com wrote: On 2/9/12 10:15 PM, Antoine Pitrou wrote: On Thu, 9 Feb 2012 17:00:04 -0500 PJ Ebyp...@telecommunity.com wrote: On Thu, Feb 9, 2012 at 2:53 PM, Mike Meyerm...@mired.org wrote: For those of you not watching -ideas, or ignoring the Python TIOBE -3% discussion, this would seem to be relevant to any discussion of reworking the import mechanism: http://mail.scipy.org/**pipermail/numpy-discussion/** 2012-January/059801.htmlhttp://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html Interesting. This gives me an idea for a way to cut stat calls per sys.path entry per import by roughly 4x, at the cost of a one-time directory read per sys.path entry. Why do you even think this is a problem with stat calls? All he said is that reading about that problem and its solution gave him an idea about dealing with stat call overhead. The cost of stat calls has demonstrated itself to be a significant problem in other, more typical contexts. Right. It was the part of the post that mentioned that all they sped up was knowing which directory the files were in, not the actual loading of bytecode. The thought then occurred to me that this could perhaps be applied to normal importing, as a zipimport-style speedup. (The zipimport module caches each zipfile directory it finds on sys.path, so failed import lookups are extremely fast.) It occurs to me, too, that applying the caching trick to *only* the stdlib directories would still be a win as soon as you have between four and eight site-packages (or user specific site-packages) imports in an application, so it might be worth applying unconditionally to system-defined stdlib (non-site) directories. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Fri, Feb 10, 2012 at 1:05 AM, Brett Cannon br...@python.org wrote: This would then be similar to the way main.c already works when it interacts with runpy - simple cases are handled directly in C, more complex cases get handed over to the Python module. I suspect that if people want the case where you load from bytecode is fast then this will have to expand beyond this to include C functions and/or classes which can be used as accelerators; while this accelerates the common case of sys.modules, this (probably) won't make Antoine happy enough for importing a small module from bytecode (importing large modules like decimal are already fast enough). No, my suggestion of keeping a de minimis C implementation for the builtin __import__ is purely about ensuring the case of repeated imports (especially those nested inside functions) remains as fast as it is today. To speed up *first time* imports (regardless of their origin), I think it makes a lot more sense to use better algorithms at the importlib level, and that's much easier in Python than it is in C. It's not like we've ever been philosophically *opposed* to smarter approaches, it's just that import.c was already hairy enough and we had grave doubts about messing with it too much (I still have immense respect for the effort that Victor put in to sorting out most of its problems with Unicode handling). Not having that millstone hanging around our necks should open up *lots* of avenues for improvement without breaking backwards compatibility (since we can really do what we like, so long as the PEP 302 APIs are still invoked in the right order and the various public APIs remain backwards compatible). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On 2/9/2012 7:19 PM, PJ Eby wrote: Right. It was the part of the post that mentioned that all they sped up was knowing which directory the files were in, not the actual loading of bytecode. The thought then occurred to me that this could perhaps be applied to normal importing, as a zipimport-style speedup. (The zipimport module caches each zipfile directory it finds on sys.path, so failed import lookups are extremely fast.) It occurs to me, too, that applying the caching trick to *only* the stdlib directories would still be a win as soon as you have between four and eight site-packages (or user specific site-packages) imports in an application, so it might be worth applying unconditionally to system-defined stdlib (non-site) directories. It might be worthwhile to store a single file in in the directory that contains /Lib with the info inport needs to get files in /Lib and its subdirs, and check that it is not outdated relative to /Lib. Since in Python 3, .pyc files go in __pycache__, if /Lib included an empyty __pycache__ on installation, /Lib would never be touched on most installations. Ditto for the non-__pycache__ subdirs. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, Feb 7, 2012 at 17:42, Antoine Pitrou solip...@pitrou.net wrote: On Tue, 7 Feb 2012 17:24:21 -0500 Brett Cannon br...@python.org wrote: IOW you want the sys.modules case fast, which I will never be able to match compared to C code since that is pure execution with no I/O. Why wouldn't continue using C code for that? It's trivial (just a dict lookup). Sure, but it's all the code between the function call and hitting sys.modules which would also need to get shoved into the C code. As I said, I have not tried to optimize anything yet (and unfortunately a lot of the upfront costs are over stupid things like checking if __import__ is being called with a string for the module name). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, Feb 7, 2012 at 18:08, Antoine Pitrou solip...@pitrou.net wrote: On Tue, 7 Feb 2012 17:16:18 -0500 Brett Cannon br...@python.org wrote: IOW I really do not look forward to someone saying importlib is so much slower at importing a module containing ``pass`` when (a) that never happens, and (b) most programs do not spend their time importing but instead doing interesting work. Well, import time is so important that the Mercurial developers have written an on-demand import mechanism, to reduce the latency of command-line operations. Sure, but they are a somewhat extreme case. I don't think Mercurial is extreme. Any command-line tool written in Python applies. For example, yum (Fedora's apt-get) is written in Python. And I'm sure many people do small administration scripts in Python. These tools may then be run in a loop by whatever other script. But it's not only important for Mercurial and the like. Even if you're developing a Web app, making imports slower will make restarts slower, and development more tedious in the first place. Fine, startup cost from a hard crash I can buy when you are getting 1000 QPS, but development more tedious? Well, waiting several seconds when reloading a development server is tedious. Anyway, my point was that other cases (than command-line tools) can be negatively impacted by import time. So, if there is going to be some baseline performance target I need to hit to make people happy I would prefer to know what that (real-world) benchmark is and what the performance target is going to be on a non-debug build. - No significant slowdown in startup time. What's significant and measuring what exactly? I mean startup already has a ton of imports as it is, so this would wash out the point of measuring practically anything else for anything small. I don't understand your sentence. Yes, startup has a ton of imports and that's why I'm fearing it may be negatively impacted :) (a ton being a bit less than 50 currently) So you want less than a 50% startup cost on the standard startup benchmarks? This is why I said I want a benchmark to target which does actual work since flat-out startup time measures nothing meaningful but busy work. Actual work can be very small in some cases. For example, if you run hg branch I'm quite sure it doesn't do a lot of work except importing many modules and then reading a single file in .hg (the one named .hg/branch probably, but I'm not a Mercurial dev). In the absence of more real world benchmarks, I think the startup benchmarks in the benchmarks repo are a good baseline. That said you could also install my 3.x port of Twisted here: https://bitbucket.org/pitrou/t3k/ and then run e.g. python3 bin/trial -h. I would get more out of code that just stat'ed every file in Lib since at least that did some work. stat()ing files is not really representative of import work. There are many indirections in the import machinery. (actually, even import.c appears quite slower than a bunch of stat() calls would imply) - Within 25% of current performance when importing, say, the struct module (Lib/struct.py) from bytecode. Why struct? It's such a small module that it isn't really a typical module. Precisely to measure the overhead. Typical module size will vary depending on development style. Some people may prefer writing many small modules. Or they may be using many small libraries, or using libraries that have adoptes such a development style. Measuring the overhead on small modules will make sure we aren't overly confident. The median file size of Lib is 11K (e.g. tabnanny.py), not 238 bytes (which is barely past Hello World). And is this just importing struct or is this from startup, e.g. ``python -c import struct``? Just importing struct, as with the timeit snippets in the other thread. OK, so less than 25% slowdown when importing a module with pre-existing bytecode that is very small. And here I was worrying you were going to suggest easy goals to reach for. ;) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, Feb 7, 2012 at 21:27, PJ Eby p...@telecommunity.com wrote: On Tue, Feb 7, 2012 at 5:24 PM, Brett Cannon br...@python.org wrote: On Tue, Feb 7, 2012 at 16:51, PJ Eby p...@telecommunity.com wrote: On Tue, Feb 7, 2012 at 3:07 PM, Brett Cannon br...@python.org wrote: So, if there is going to be some baseline performance target I need to hit to make people happy I would prefer to know what that (real-world) benchmark is and what the performance target is going to be on a non-debug build. And if people are not worried about the performance then I'm happy with that as well. =) One thing I'm a bit worried about is repeated imports, especially ones that are inside frequently-called functions. In today's versions of Python, this is a performance win for command-line tool platform systems like Mercurial and PEAK, where you want to delay importing as long as possible, in case the code that needs the import is never called at all... but, if it *is* used, you may still need to use it a lot of times. When writing that kind of code, I usually just unconditionally import inside the function, because the C code check for an already-imported module is faster than the Python if statement I'd have to clutter up my otherwise-clean function with. So, in addition to the things other people have mentioned as performance targets, I'd like to keep the slowdown factor low for this type of scenario as well. Specifically, the slowdown shouldn't be so much as to motivate lazy importers like Mercurial and PEAK to need to rewrite in-function imports to do the already-imported check ourselves. ;-) (Disclaimer: I haven't actually seen Mercurial's delayed/dynamic import code, so I can't say for 100% sure if they'd be affected the same way.) IOW you want the sys.modules case fast, which I will never be able to match compared to C code since that is pure execution with no I/O. Couldn't you just prefix the __import__ function with something like this: ... try: module = sys.modules[name] except KeyError: # slow code path (Admittedly, the import lock is still a problem; initially I thought you could just skip it for this case, but the problem is that another thread could be in the middle of executing the module.) I practically do already. As of right now there are some 'if' checks that come ahead of it that I could shift around to fast path this even more (since who cares about types and such if the module name happens to be in sys.modules), but it isn't that far off as-is. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
Le mercredi 08 février 2012 à 11:01 -0500, Brett Cannon a écrit : On Tue, Feb 7, 2012 at 17:42, Antoine Pitrou solip...@pitrou.net wrote: On Tue, 7 Feb 2012 17:24:21 -0500 Brett Cannon br...@python.org wrote: IOW you want the sys.modules case fast, which I will never be able to match compared to C code since that is pure execution with no I/O. Why wouldn't continue using C code for that? It's trivial (just a dict lookup). Sure, but it's all the code between the function call and hitting sys.modules which would also need to get shoved into the C code. As I said, I have not tried to optimize anything yet (and unfortunately a lot of the upfront costs are over stupid things like checking if __import__ is being called with a string for the module name). I guess my point was: why is there a function call in that case? The import statement could look up sys.modules directly. Or the built-in __import__ could still be written in C, and only defer to importlib when the module isn't found in sys.modules. Practicality beats purity. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, Feb 7, 2012 at 22:47, Nick Coghlan ncogh...@gmail.com wrote: On Wed, Feb 8, 2012 at 12:54 PM, Terry Reedy tjre...@udel.edu wrote: On 2/7/2012 9:35 PM, PJ Eby wrote: It's just that not everything I write can depend on Importing. Throw an equivalent into the stdlib, though, and I guess I wouldn't have to worry about dependencies... And that is what I think (agree?) should be done to counteract the likely slowdown from using importlib. Yeah, this is one frequently reinvented wheel that could definitely do with a standard implementation. Christian Heimes made an initial attempt at such a thing years ago with PEP 369, but an importlib based __import__ would let the implementation largely be pure Python (with all the increase in power and flexibility that implies). I'll see if I can come up with a pure Python way to handle setting attributes on the module since that is the one case that my importers project code can't handle. I'm not sure such an addition would help much with the base interpreter start up time though - most of the modules we bring in are because we're actually using them for some reason. It wouldn't. This would be for third-parties only. The other thing that shouldn't be underrated here is the value in making the builtin import system PEP 302 compliant from a *documentation* perspective. I've made occasional attempts at fully documenting the import system over the years, and I always end up giving up because the combination of the pre-PEP 302 builtin mechanisms in import.c and the PEP 302 compliant mechanisms for things like zipimport just degenerate into a mess of special cases that are impossible to justify beyond nobody got around to fixing this yet. The fact that we have an undocumented PEP 302 based reimplementation of imports squirrelled away in pkgutil to make pkgutil and runpy work is sheer insanity (replacing *that* with importlib might actually be a good first step towards full integration). I actually have never bothered to explain import as it is currently implemented in any of my PyCon import talks precisely because it is such a mess. It's just easier to explain from a PEP 302 perspective since you can actually comprehend that. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, Feb 7, 2012 at 22:47, Nick Coghlan ncogh...@gmail.com wrote [SNIP] The fact that we have an undocumented PEP 302 based reimplementation of imports squirrelled away in pkgutil to make pkgutil and runpy work is sheer insanity (replacing *that* with importlib might actually be a good first step towards full integration). It easily goes beyond runpy. You could ditch much of imp's C code (e.g. load_module()), you could write py_compile and compileall using importlib, you could rewrite zipimport, etc. Anything that touches import could be refactored to (a) use just Python code, and (b) reshare code so as to not re-invent the wheel constantly. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, Feb 7, 2012 at 18:26, Alex Gaynor alex.gay...@gmail.com wrote: Brett Cannon brett at python.org writes: IOW you want the sys.modules case fast, which I will never be able to match compared to C code since that is pure execution with no I/O. Sure you can: have a really fast Python VM. Constructive: if you can run this code under PyPy it'd be easy to just: $ pypy -mtimeit import struct $ pypy -mtimeit -s import importlib importlib.import_module('struct') Or whatever the right API is. I'm not worried about PyPy. =) I assume you will just flat-out use importlib regardless of what happens with CPython since it is/will be fully compatible and is already written for you. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Wed, Feb 8, 2012 at 11:09, Antoine Pitrou solip...@pitrou.net wrote: Le mercredi 08 février 2012 à 11:01 -0500, Brett Cannon a écrit : On Tue, Feb 7, 2012 at 17:42, Antoine Pitrou solip...@pitrou.net wrote: On Tue, 7 Feb 2012 17:24:21 -0500 Brett Cannon br...@python.org wrote: IOW you want the sys.modules case fast, which I will never be able to match compared to C code since that is pure execution with no I/O. Why wouldn't continue using C code for that? It's trivial (just a dict lookup). Sure, but it's all the code between the function call and hitting sys.modules which would also need to get shoved into the C code. As I said, I have not tried to optimize anything yet (and unfortunately a lot of the upfront costs are over stupid things like checking if __import__ is being called with a string for the module name). I guess my point was: why is there a function call in that case? The import statement could look up sys.modules directly. Because people like to do wacky stuff with their imports and so fully bypassing __import__ would be bad. Or the built-in __import__ could still be written in C, and only defer to importlib when the module isn't found in sys.modules. Practicality beats purity. It's a possibility, although that would require every function call to fetch the PyInterpreterState to get at the cached __import__ (so the proper sys and imp modules are used) and I don't know how expensive that would be (probably as not as expensive as calling out to Python code but I'm thinking out loud). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Wed, Feb 8, 2012 at 11:15, Brett Cannon br...@python.org wrote: On Tue, Feb 7, 2012 at 22:47, Nick Coghlan ncogh...@gmail.com wrote [SNIP] The fact that we have an undocumented PEP 302 based reimplementation of imports squirrelled away in pkgutil to make pkgutil and runpy work is sheer insanity (replacing *that* with importlib might actually be a good first step towards full integration). It easily goes beyond runpy. You could ditch much of imp's C code (e.g. load_module()), you could write py_compile and compileall using importlib, you could rewrite zipimport, etc. Anything that touches import could be refactored to (a) use just Python code, and (b) reshare code so as to not re-invent the wheel constantly. And taking it even farther, all of the blackbox aspects of import go away. For instance, the implicit, hidden importers for built-in modules, frozen modules, extensions, and source could actually be set on sys.path_hooks. The Meta path importer that handles sys.path could actually exist on sys.meta_path. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On 2/8/2012 11:13 AM, Brett Cannon wrote: On Tue, Feb 7, 2012 at 22:47, Nick Coghlan ncogh...@gmail.com I'm not sure such an addition would help much with the base interpreter start up time though - most of the modules we bring in are because we're actually using them for some reason. It wouldn't. This would be for third-parties only. such as hg. That is what I had in mind. Would the following work? Treat a function as a 'loop' in that it may be executed repeatedly. Treat 'import x' in a function as what it is, an __import__ call plus a local assignment. Apply a version of the usual optimization: put a sys.modules-based lazy import outside of the function (at the top of the module?) and leave the local assignment x = sys.modules['x'] in the function. Change sys.modules.__delattr__ to replace a module with a dummy, so the function will still work after a deletion, as it does now. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Wed, 8 Feb 2012 11:07:10 -0500 Brett Cannon br...@python.org wrote: So, if there is going to be some baseline performance target I need to hit to make people happy I would prefer to know what that (real-world) benchmark is and what the performance target is going to be on a non-debug build. - No significant slowdown in startup time. What's significant and measuring what exactly? I mean startup already has a ton of imports as it is, so this would wash out the point of measuring practically anything else for anything small. I don't understand your sentence. Yes, startup has a ton of imports and that's why I'm fearing it may be negatively impacted :) (a ton being a bit less than 50 currently) So you want less than a 50% startup cost on the standard startup benchmarks? No, ~50 is the number of imports at startup. I think startup time should grow by less than 10%. (even better if it shrinks of course :)) And here I was worrying you were going to suggest easy goals to reach for. ;) He. Well, if importlib enabled user-level functionality, I guess it could be attractive to trade a slice of performance against it. But from an user's point of view, bootstrapping importlib is mostly an implementation detail with not much of a positive impact. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Wed, Feb 8, 2012 at 14:57, Terry Reedy tjre...@udel.edu wrote: On 2/8/2012 11:13 AM, Brett Cannon wrote: On Tue, Feb 7, 2012 at 22:47, Nick Coghlan ncogh...@gmail.com I'm not sure such an addition would help much with the base interpreter start up time though - most of the modules we bring in are because we're actually using them for some reason. It wouldn't. This would be for third-parties only. such as hg. That is what I had in mind. Would the following work? Treat a function as a 'loop' in that it may be executed repeatedly. Treat 'import x' in a function as what it is, an __import__ call plus a local assignment. Apply a version of the usual optimization: put a sys.modules-based lazy import outside of the function (at the top of the module?) and leave the local assignment x = sys.modules['x'] in the function. Change sys.modules.__delattr__ to replace a module with a dummy, so the function will still work after a deletion, as it does now. Probably, but I would hate to force people to code in a specific way for it to work. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On 2/8/2012 3:16 PM, Brett Cannon wrote: On Wed, Feb 8, 2012 at 14:57, Terry Reedy tjre...@udel.edu Would the following work? Treat a function as a 'loop' in that it may be executed repeatedly. Treat 'import x' in a function as what it is, an __import__ call plus a local assignment. Apply a version of the usual optimization: put a sys.modules-based lazy import outside of the function (at the top of the module?) and leave the local assignment x = sys.modules['x'] in the function. Change sys.modules.__delattr__ to replace a module with a dummy, so the function will still work after a deletion, as it does now. Probably, but I would hate to force people to code in a specific way for it to work. The intent of what I proposed it to be transparent for imports within functions. It would be a minor optimization if anything, but it would mean that there is a lazy mechanism in place. For top-level imports, unless *all* are made lazy, then there *must* be some indication in the code of whether to make it lazy or not. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Wed, Feb 8, 2012 at 15:31, Terry Reedy tjre...@udel.edu wrote: On 2/8/2012 3:16 PM, Brett Cannon wrote: On Wed, Feb 8, 2012 at 14:57, Terry Reedy tjre...@udel.edu Would the following work? Treat a function as a 'loop' in that it may be executed repeatedly. Treat 'import x' in a function as what it is, an __import__ call plus a local assignment. Apply a version of the usual optimization: put a sys.modules-based lazy import outside of the function (at the top of the module?) and leave the local assignment x = sys.modules['x'] in the function. Change sys.modules.__delattr__ to replace a module with a dummy, so the function will still work after a deletion, as it does now. Probably, but I would hate to force people to code in a specific way for it to work. The intent of what I proposed it to be transparent for imports within functions. It would be a minor optimization if anything, but it would mean that there is a lazy mechanism in place. For top-level imports, unless *all* are made lazy, then there *must* be some indication in the code of whether to make it lazy or not. Not true; importlib would make it dead-simple to whitelist what modules to make lazy (e.g. your app code lazy but all stdlib stuff not, etc.). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Wed, Feb 8, 2012 at 4:08 PM, Brett Cannon br...@python.org wrote: On Wed, Feb 8, 2012 at 15:31, Terry Reedy tjre...@udel.edu wrote: For top-level imports, unless *all* are made lazy, then there *must* be some indication in the code of whether to make it lazy or not. Not true; importlib would make it dead-simple to whitelist what modules to make lazy (e.g. your app code lazy but all stdlib stuff not, etc.). There's actually only a few things stopping all imports from being lazy. from x import y immediately de-lazies them, after all. ;-) The main two reasons you wouldn't want imports to *always* be lazy are: 1. Changing sys.path or other parameters between the import statement and the actual import 2. ImportErrors are likewise deferred until point-of-use, so conditional importing with try/except would break. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Thu, Feb 9, 2012 at 2:09 AM, Antoine Pitrou solip...@pitrou.net wrote: I guess my point was: why is there a function call in that case? The import statement could look up sys.modules directly. Or the built-in __import__ could still be written in C, and only defer to importlib when the module isn't found in sys.modules. Practicality beats purity. I quite like the idea of having builtin __import__ be a *very* thin veneer around importlib that just does the is this in sys.modules already so we can just return it from there? checks and delegates other more complex cases to Python code in importlib. Poking around in importlib.__import__ [1] (as well as importlib._gcd_import), I'm thinking what we may want to do is break up the logic a bit so that there are multiple helper functions that a C version can call back into so that we can optimise certain simple code paths to not call back into Python at all, and others to only do so selectively. Step 1: separate out the fromlist processing from __import__ into a separate helper function def _process_fromlist(module, fromlist): # Perform any required imports as per existing code: # http://hg.python.org/cpython/file/aba513307f78/Lib/importlib/_bootstrap.py#l987 Step 2: separate out the relative import resolution from _gcd_import into a separate helper function. def _resolve_relative_name(name, package, level): assert hasattr(name, 'rpartition') assert hasattr(package, 'rpartition') assert level 0 name = # Recalculate as per the existing code: # http://hg.python.org/cpython/file/aba513307f78/Lib/importlib/_bootstrap.py#l889 return name Step 3: Implement builtin __import__ in C (pseudo-code below): def __import__(name, globals={}, locals={}, fromlist=[], level=0): if level 0: name = importlib._resolve_relative_import(name) try: module = sys.modules[name] except KeyError: # Not cached yet, need to invoke the full import machinery # We already resolved any relative imports though, so # treat it as an absolute import return importlib.__import__(name, globals, locals, fromlist, 0) # Got a hit in the cache, see if there's any more work to do if not fromlist: # Duplicate relevant importlib.__import__ logic as C code # to find the right module to return from sys.modules elif hasattr(module, __path__): importlib._process_fromlist(module, fromlist) return module This would then be similar to the way main.c already works when it interacts with runpy - simple cases are handled directly in C, more complex cases get handed over to the Python module. Cheers, Nick. [1] http://hg.python.org/cpython/file/default/Lib/importlib/_bootstrap.py#l950 -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Thu, Feb 9, 2012 at 11:28 AM, PJ Eby p...@telecommunity.com wrote: The main two reasons you wouldn't want imports to *always* be lazy are: 1. Changing sys.path or other parameters between the import statement and the actual import 2. ImportErrors are likewise deferred until point-of-use, so conditional importing with try/except would break. 3. Module level code may have non-local side effects (e.g. installing codecs, pickle handlers, atexit handlers) A white-listing based approach to lazy imports would let you manage all those issues without having to change all the code that actually *does* the imports. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] requirements for moving __import__ over to importlib?
I'm going to start this off with the caveat that hg.python.org/sandbox/bcannon#bootstrap_importlib is not completely at feature parity, but getting there shouldn't be hard. There is a FAILING file that has a list of the tests that are not passing because importlib bootstrapping and a comment as to why (I think) they are failing. But no switch would ever happen until the test suite passes. Anyway, to start this conversation I'm going to open with why I think removing most of the C code in Python/import.c and replacing it with importlib/_bootstrap.py is a positive thing. One is maintainability. Antoine mentioned how if change occurs everyone is going to have to be able to fix code in importlib, and that's the point! I don't know about the rest of you but I find Python code easier to work with than C code (and if you don't you might be subscribed to the wrong mailing list =). I would assume the ability to make changes or to fix bugs will be a lot easier with importlib than import.c. So maintainability should be easier when it comes to imports. Two is APIs. PEP 302 introduced this idea of an API for objects that can perform imports so that people can control it, enhance it, introspect it, etc. But as it stands right now, import.c implements none of PEP 302 for any built-in import mechanism. This mostly stems from positive thing #1 I just mentioned. but since I was able to do this code from scratch I was able to design for (and extend) PEP 302 compliance in order to make sure the entire import system was exposed cleanly. This means it is much easier now to write a custom importer for quirky syntax, a different storage mechanism, etc. Third is multi-VM support. IronPython, Jython, and PyPy have all said they would love importlib to become the default import implementation so that all VMs have the same implementation. Some people have even said they will use importlib regardless of what CPython does simply to ease their coding burden, but obviously that still leads to the possibility of subtle semantic differences that would go away if all VMs used the same implementation. So switching would lead to one less possible semantic difference between the various VMs. So, that is the positives. What are the negatives? Performance, of course. Now I'm going to be upfront and say I really did not want to have this performance conversation now as I have done *NO* profiling or analysis of the algorithms used in importlib in order to tune performance (e.g. the function that handles case-sensitivity, which is on the critical path for importing source code, has a platform check which could go away if I instead had platform-specific versions of the function that were assigned to a global variable at startup). I also know that people have a bad habit of latching on to micro-benchmark numbers, especially for something like import which involves startup or can easily be measured. I mean I wrote importlib.test.benchmark to help measure performance changes in any algorithmic changes I might make, but it isn't a real-world benchmark like what Unladen Swallow gave us (e.g. the two start-up benchmarks that use real-world apps -- hg and bzr -- aren't available on Python 3 so only normal_startup and nosite_startup can be used ATM). IOW I really do not look forward to someone saying importlib is so much slower at importing a module containing ``pass`` when (a) that never happens, and (b) most programs do not spend their time importing but instead doing interesting work. For instance, right now importlib does ``python -c import decimal`` (which, BTW, is the largest module in the stdlib) 25% slower on my machine with a pydebug build (a non-debug build would probably be in my favor as I have more Python objects being used in importlib and thus more sanity checks). But if you do something (very) slightly more interesting like ``python -m calendar`` where is a slight amount of work then importlib is currently only 16% slower. So it all depends on how we measure (as usual). So, if there is going to be some baseline performance target I need to hit to make people happy I would prefer to know what that (real-world) benchmark is and what the performance target is going to be on a non-debug build. And if people are not worried about the performance then I'm happy with that as well. =) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
Brett, thanks for persevering on importlib! Given how complicated imports are in Python, I really appreciate you pushing this forward. I've been knee deep in both import.c and importlib at various times. ;) On Feb 07, 2012, at 03:07 PM, Brett Cannon wrote: One is maintainability. Antoine mentioned how if change occurs everyone is going to have to be able to fix code in importlib, and that's the point! I don't know about the rest of you but I find Python code easier to work with than C code (and if you don't you might be subscribed to the wrong mailing list =). I would assume the ability to make changes or to fix bugs will be a lot easier with importlib than import.c. So maintainability should be easier when it comes to imports. I think it's *really* critical that importlib be well-documented. Not just its API, but also design documents (what classes are there, and why it's decomposed that way), descriptions of how to extend and subclass, maybe even examples for doing some typical hooks. Maybe even a guided tour or tutorial for people digging into importlib for the first time. So, that is the positives. What are the negatives? Performance, of course. That's okay. Get it complete, right, and usable first and then unleash the Pythonic hoards to bang on performance. IOW I really do not look forward to someone saying importlib is so much slower at importing a module containing ``pass`` when (a) that never happens, and (b) most programs do not spend their time importing but instead doing interesting work. Identifying the use cases are important here. For example, even if it were a lot slower, Mailman wouldn't care (*I* might care because it takes longer to run my test, but my users wouldn't). But Bazaar or Mercurial users would care a lot. -Barry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, Feb 7, 2012 at 21:24, Barry Warsaw ba...@python.org wrote: Identifying the use cases are important here. For example, even if it were a lot slower, Mailman wouldn't care (*I* might care because it takes longer to run my test, but my users wouldn't). But Bazaar or Mercurial users would care a lot. Yeah, startup performance getting worse kinda sucks for command-line apps. And IIRC it's been getting worse over the past few releases... Anyway, I think there was enough of a python3 port for Mercurial (from various GSoC students) that you can probably run some of the very simple commands (like hg parents or hg id), which should be enough for your purposes, right? Cheers, Dirkjan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, 7 Feb 2012 15:07:24 -0500 Brett Cannon br...@python.org wrote: Now I'm going to be upfront and say I really did not want to have this performance conversation now as I have done *NO* profiling or analysis of the algorithms used in importlib in order to tune performance (e.g. the function that handles case-sensitivity, which is on the critical path for importing source code, has a platform check which could go away if I instead had platform-specific versions of the function that were assigned to a global variable at startup). From a cursory look, I think you're gonna have to break (special-case) some abstractions and have some inner loop coded in C for the common cases. That said, I think profiling and solving performance issues is critical *before* integrating this work. It doesn't need to be done by you, but the python-dev community shouldn't feel strong-armed to solve the issue. IOW I really do not look forward to someone saying importlib is so much slower at importing a module containing ``pass`` when (a) that never happens, and (b) most programs do not spend their time importing but instead doing interesting work. Well, import time is so important that the Mercurial developers have written an on-demand import mechanism, to reduce the latency of command-line operations. But it's not only important for Mercurial and the like. Even if you're developing a Web app, making imports slower will make restarts slower, and development more tedious in the first place. So, if there is going to be some baseline performance target I need to hit to make people happy I would prefer to know what that (real-world) benchmark is and what the performance target is going to be on a non-debug build. - No significant slowdown in startup time. - Within 25% of current performance when importing, say, the struct module (Lib/struct.py) from bytecode. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On 7 February 2012 20:49, Antoine Pitrou solip...@pitrou.net wrote: Well, import time is so important that the Mercurial developers have written an on-demand import mechanism, to reduce the latency of command-line operations. One question here, I guess - does the importlib integration do anything to make writing on-demand import mechanisms easier (I'd suspect not, but you never know...) If it did, then performance issues might be somewhat less of a sticking point, as usual depending on use cases. Paul. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, Feb 7, 2012 at 3:07 PM, Brett Cannon br...@python.org wrote: So, if there is going to be some baseline performance target I need to hit to make people happy I would prefer to know what that (real-world) benchmark is and what the performance target is going to be on a non-debug build. And if people are not worried about the performance then I'm happy with that as well. =) One thing I'm a bit worried about is repeated imports, especially ones that are inside frequently-called functions. In today's versions of Python, this is a performance win for command-line tool platform systems like Mercurial and PEAK, where you want to delay importing as long as possible, in case the code that needs the import is never called at all... but, if it *is* used, you may still need to use it a lot of times. When writing that kind of code, I usually just unconditionally import inside the function, because the C code check for an already-imported module is faster than the Python if statement I'd have to clutter up my otherwise-clean function with. So, in addition to the things other people have mentioned as performance targets, I'd like to keep the slowdown factor low for this type of scenario as well. Specifically, the slowdown shouldn't be so much as to motivate lazy importers like Mercurial and PEAK to need to rewrite in-function imports to do the already-imported check ourselves. ;-) (Disclaimer: I haven't actually seen Mercurial's delayed/dynamic import code, so I can't say for 100% sure if they'd be affected the same way.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, Feb 7, 2012 at 15:49, Antoine Pitrou solip...@pitrou.net wrote: On Tue, 7 Feb 2012 15:07:24 -0500 Brett Cannon br...@python.org wrote: Now I'm going to be upfront and say I really did not want to have this performance conversation now as I have done *NO* profiling or analysis of the algorithms used in importlib in order to tune performance (e.g. the function that handles case-sensitivity, which is on the critical path for importing source code, has a platform check which could go away if I instead had platform-specific versions of the function that were assigned to a global variable at startup). From a cursory look, I think you're gonna have to break (special-case) some abstractions and have some inner loop coded in C for the common cases. Wouldn't shock me if it came to that, but obviously I would like to try to avoid it. That said, I think profiling and solving performance issues is critical *before* integrating this work. It doesn't need to be done by you, but the python-dev community shouldn't feel strong-armed to solve the issue. That part of the discussion I'm staying out of since I want to see this in so I'm biased. IOW I really do not look forward to someone saying importlib is so much slower at importing a module containing ``pass`` when (a) that never happens, and (b) most programs do not spend their time importing but instead doing interesting work. Well, import time is so important that the Mercurial developers have written an on-demand import mechanism, to reduce the latency of command-line operations. Sure, but they are a somewhat extreme case. But it's not only important for Mercurial and the like. Even if you're developing a Web app, making imports slower will make restarts slower, and development more tedious in the first place. Fine, startup cost from a hard crash I can buy when you are getting 1000 QPS, but development more tedious? So, if there is going to be some baseline performance target I need to hit to make people happy I would prefer to know what that (real-world) benchmark is and what the performance target is going to be on a non-debug build. - No significant slowdown in startup time. What's significant and measuring what exactly? I mean startup already has a ton of imports as it is, so this would wash out the point of measuring practically anything else for anything small. This is why I said I want a benchmark to target which does actual work since flat-out startup time measures nothing meaningful but busy work. I would get more out of code that just stat'ed every file in Lib since at least that did some work. - Within 25% of current performance when importing, say, the struct module (Lib/struct.py) from bytecode. Why struct? It's such a small module that it isn't really a typical module. The median file size of Lib is 11K (e.g. tabnanny.py), not 238 bytes (which is barely past Hello World). And is this just importing struct or is this from startup, e.g. ``python -c import struct``? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, Feb 7, 2012 at 15:24, Barry Warsaw ba...@python.org wrote: Brett, thanks for persevering on importlib! Given how complicated imports are in Python, I really appreciate you pushing this forward. I've been knee deep in both import.c and importlib at various times. ;) On Feb 07, 2012, at 03:07 PM, Brett Cannon wrote: One is maintainability. Antoine mentioned how if change occurs everyone is going to have to be able to fix code in importlib, and that's the point! I don't know about the rest of you but I find Python code easier to work with than C code (and if you don't you might be subscribed to the wrong mailing list =). I would assume the ability to make changes or to fix bugs will be a lot easier with importlib than import.c. So maintainability should be easier when it comes to imports. I think it's *really* critical that importlib be well-documented. Not just its API, but also design documents (what classes are there, and why it's decomposed that way), descriptions of how to extend and subclass, maybe even examples for doing some typical hooks. Maybe even a guided tour or tutorial for people digging into importlib for the first time. That's fine and not difficult to do. So, that is the positives. What are the negatives? Performance, of course. That's okay. Get it complete, right, and usable first and then unleash the Pythonic hoards to bang on performance. IOW I really do not look forward to someone saying importlib is so much slower at importing a module containing ``pass`` when (a) that never happens, and (b) most programs do not spend their time importing but instead doing interesting work. Identifying the use cases are important here. For example, even if it were a lot slower, Mailman wouldn't care (*I* might care because it takes longer to run my test, but my users wouldn't). But Bazaar or Mercurial users would care a lot. Right, which is why I'm looking for some agreed upon, concrete benchmark I can use which isn't fluff. -Brett -Barry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, Feb 7, 2012 at 16:19, Paul Moore p.f.mo...@gmail.com wrote: On 7 February 2012 20:49, Antoine Pitrou solip...@pitrou.net wrote: Well, import time is so important that the Mercurial developers have written an on-demand import mechanism, to reduce the latency of command-line operations. One question here, I guess - does the importlib integration do anything to make writing on-demand import mechanisms easier (I'd suspect not, but you never know...) If it did, then performance issues might be somewhat less of a sticking point, as usual depending on use cases. Depends on what your feature set is. I have a fully working mixin you can add to any loader which makes it lazy if you trigger the import on reading an attribute from the module: http://code.google.com/p/importers/source/browse/importers/lazy.py . But if you want to trigger the import on *writing* an attribute then I have yet to make that work in Python source (maybe people have an idea on how to make that work since __setattr__ doesn't mix well with __getattribute__). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, Feb 7, 2012 at 15:28, Dirkjan Ochtman dirk...@ochtman.nl wrote: On Tue, Feb 7, 2012 at 21:24, Barry Warsaw ba...@python.org wrote: Identifying the use cases are important here. For example, even if it were a lot slower, Mailman wouldn't care (*I* might care because it takes longer to run my test, but my users wouldn't). But Bazaar or Mercurial users would care a lot. Yeah, startup performance getting worse kinda sucks for command-line apps. And IIRC it's been getting worse over the past few releases... Anyway, I think there was enough of a python3 port for Mercurial (from various GSoC students) that you can probably run some of the very simple commands (like hg parents or hg id), which should be enough for your purposes, right? Possibly. Where is the code? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, Feb 7, 2012 at 16:51, PJ Eby p...@telecommunity.com wrote: On Tue, Feb 7, 2012 at 3:07 PM, Brett Cannon br...@python.org wrote: So, if there is going to be some baseline performance target I need to hit to make people happy I would prefer to know what that (real-world) benchmark is and what the performance target is going to be on a non-debug build. And if people are not worried about the performance then I'm happy with that as well. =) One thing I'm a bit worried about is repeated imports, especially ones that are inside frequently-called functions. In today's versions of Python, this is a performance win for command-line tool platform systems like Mercurial and PEAK, where you want to delay importing as long as possible, in case the code that needs the import is never called at all... but, if it *is* used, you may still need to use it a lot of times. When writing that kind of code, I usually just unconditionally import inside the function, because the C code check for an already-imported module is faster than the Python if statement I'd have to clutter up my otherwise-clean function with. So, in addition to the things other people have mentioned as performance targets, I'd like to keep the slowdown factor low for this type of scenario as well. Specifically, the slowdown shouldn't be so much as to motivate lazy importers like Mercurial and PEAK to need to rewrite in-function imports to do the already-imported check ourselves. ;-) (Disclaimer: I haven't actually seen Mercurial's delayed/dynamic import code, so I can't say for 100% sure if they'd be affected the same way.) IOW you want the sys.modules case fast, which I will never be able to match compared to C code since that is pure execution with no I/O. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, 7 Feb 2012 17:24:21 -0500 Brett Cannon br...@python.org wrote: IOW you want the sys.modules case fast, which I will never be able to match compared to C code since that is pure execution with no I/O. Why wouldn't continue using C code for that? It's trivial (just a dict lookup). Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Feb 07, 2012, at 09:19 PM, Paul Moore wrote: One question here, I guess - does the importlib integration do anything to make writing on-demand import mechanisms easier (I'd suspect not, but you never know...) If it did, then performance issues might be somewhat less of a sticking point, as usual depending on use cases. It might even be a feature-win if a standard on-demand import mechanism could be added on top of importlib so all these projects wouldn't have to roll their own. -Barry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, 7 Feb 2012 17:16:18 -0500 Brett Cannon br...@python.org wrote: IOW I really do not look forward to someone saying importlib is so much slower at importing a module containing ``pass`` when (a) that never happens, and (b) most programs do not spend their time importing but instead doing interesting work. Well, import time is so important that the Mercurial developers have written an on-demand import mechanism, to reduce the latency of command-line operations. Sure, but they are a somewhat extreme case. I don't think Mercurial is extreme. Any command-line tool written in Python applies. For example, yum (Fedora's apt-get) is written in Python. And I'm sure many people do small administration scripts in Python. These tools may then be run in a loop by whatever other script. But it's not only important for Mercurial and the like. Even if you're developing a Web app, making imports slower will make restarts slower, and development more tedious in the first place. Fine, startup cost from a hard crash I can buy when you are getting 1000 QPS, but development more tedious? Well, waiting several seconds when reloading a development server is tedious. Anyway, my point was that other cases (than command-line tools) can be negatively impacted by import time. So, if there is going to be some baseline performance target I need to hit to make people happy I would prefer to know what that (real-world) benchmark is and what the performance target is going to be on a non-debug build. - No significant slowdown in startup time. What's significant and measuring what exactly? I mean startup already has a ton of imports as it is, so this would wash out the point of measuring practically anything else for anything small. I don't understand your sentence. Yes, startup has a ton of imports and that's why I'm fearing it may be negatively impacted :) (a ton being a bit less than 50 currently) This is why I said I want a benchmark to target which does actual work since flat-out startup time measures nothing meaningful but busy work. Actual work can be very small in some cases. For example, if you run hg branch I'm quite sure it doesn't do a lot of work except importing many modules and then reading a single file in .hg (the one named .hg/branch probably, but I'm not a Mercurial dev). In the absence of more real world benchmarks, I think the startup benchmarks in the benchmarks repo are a good baseline. That said you could also install my 3.x port of Twisted here: https://bitbucket.org/pitrou/t3k/ and then run e.g. python3 bin/trial -h. I would get more out of code that just stat'ed every file in Lib since at least that did some work. stat()ing files is not really representative of import work. There are many indirections in the import machinery. (actually, even import.c appears quite slower than a bunch of stat() calls would imply) - Within 25% of current performance when importing, say, the struct module (Lib/struct.py) from bytecode. Why struct? It's such a small module that it isn't really a typical module. Precisely to measure the overhead. Typical module size will vary depending on development style. Some people may prefer writing many small modules. Or they may be using many small libraries, or using libraries that have adoptes such a development style. Measuring the overhead on small modules will make sure we aren't overly confident. The median file size of Lib is 11K (e.g. tabnanny.py), not 238 bytes (which is barely past Hello World). And is this just importing struct or is this from startup, e.g. ``python -c import struct``? Just importing struct, as with the timeit snippets in the other thread. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
Brett Cannon brett at python.org writes: IOW you want the sys.modules case fast, which I will never be able to match compared to C code since that is pure execution with no I/O. Sure you can: have a really fast Python VM. Constructive: if you can run this code under PyPy it'd be easy to just: $ pypy -mtimeit import struct $ pypy -mtimeit -s import importlib importlib.import_module('struct') Or whatever the right API is. Alex ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On 2/7/2012 4:51 PM, PJ Eby wrote: One thing I'm a bit worried about is repeated imports, especially ones that are inside frequently-called functions. In today's versions of Python, this is a performance win for command-line tool platform systems like Mercurial and PEAK, where you want to delay importing as long as possible, in case the code that needs the import is never called at all... but, if it *is* used, you may still need to use it a lot of times. When writing that kind of code, I usually just unconditionally import inside the function, because the C code check for an already-imported module is faster than the Python if statement I'd have to clutter up my otherwise-clean function with. importlib could provide a parameterized decorator for functions that are the only consumers of an import. It could operate much like this: def imps(mod): def makewrap(f): def wrapped(*args, **kwds): print('first/only call to wrapper') g = globals() g[mod] = __import__(mod) g[f.__name__] = f f(*args, **kwds) wrapped.__name__ = f.__name__ return wrapped return makewrap @imps('itertools') def ic(): print(itertools.count) ic() ic() # first/only call to wrapper class 'itertools.count' class 'itertools.count' -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, Feb 7, 2012 at 5:24 PM, Brett Cannon br...@python.org wrote: On Tue, Feb 7, 2012 at 16:51, PJ Eby p...@telecommunity.com wrote: On Tue, Feb 7, 2012 at 3:07 PM, Brett Cannon br...@python.org wrote: So, if there is going to be some baseline performance target I need to hit to make people happy I would prefer to know what that (real-world) benchmark is and what the performance target is going to be on a non-debug build. And if people are not worried about the performance then I'm happy with that as well. =) One thing I'm a bit worried about is repeated imports, especially ones that are inside frequently-called functions. In today's versions of Python, this is a performance win for command-line tool platform systems like Mercurial and PEAK, where you want to delay importing as long as possible, in case the code that needs the import is never called at all... but, if it *is* used, you may still need to use it a lot of times. When writing that kind of code, I usually just unconditionally import inside the function, because the C code check for an already-imported module is faster than the Python if statement I'd have to clutter up my otherwise-clean function with. So, in addition to the things other people have mentioned as performance targets, I'd like to keep the slowdown factor low for this type of scenario as well. Specifically, the slowdown shouldn't be so much as to motivate lazy importers like Mercurial and PEAK to need to rewrite in-function imports to do the already-imported check ourselves. ;-) (Disclaimer: I haven't actually seen Mercurial's delayed/dynamic import code, so I can't say for 100% sure if they'd be affected the same way.) IOW you want the sys.modules case fast, which I will never be able to match compared to C code since that is pure execution with no I/O. Couldn't you just prefix the __import__ function with something like this: ... try: module = sys.modules[name] except KeyError: # slow code path (Admittedly, the import lock is still a problem; initially I thought you could just skip it for this case, but the problem is that another thread could be in the middle of executing the module.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, Feb 7, 2012 at 6:40 PM, Terry Reedy tjre...@udel.edu wrote: importlib could provide a parameterized decorator for functions that are the only consumers of an import. It could operate much like this: def imps(mod): def makewrap(f): def wrapped(*args, **kwds): print('first/only call to wrapper') g = globals() g[mod] = __import__(mod) g[f.__name__] = f f(*args, **kwds) wrapped.__name__ = f.__name__ return wrapped return makewrap @imps('itertools') def ic(): print(itertools.count) ic() ic() # first/only call to wrapper class 'itertools.count' class 'itertools.count' If I were going to rewrite code, I'd just use lazy imports (see http://pypi.python.org/pypi/Importing ). They're even faster than this approach (or using plain import statements), as they have zero per-call function call overhead. It's just that not everything I write can depend on Importing. Throw an equivalent into the stdlib, though, and I guess I wouldn't have to worry about dependencies... (To be clearer; I'm talking about the http://peak.telecommunity.com/DevCenter/Importing#lazy-imports feature, which sticks a dummy module subclass instance into sys.modules, whose __gettattribute__ does a reload() of the module, forcing the normal import process to run, after first changing the dummy object's type to something that doesn't have the __getattribute__ any more. This ensures that all accesses after the first one are at normal module attribute access speed. That, and the whenImported decorator from Importing would probably be of general stdlib usefulness too.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On 2/7/2012 9:35 PM, PJ Eby wrote: On Tue, Feb 7, 2012 at 6:40 PM, Terry Reedy tjre...@udel.edu mailto:tjre...@udel.edu wrote: importlib could provide a parameterized decorator for functions that are the only consumers of an import. It could operate much like this: def imps(mod): def makewrap(f): def wrapped(*args, **kwds): print('first/only call to wrapper') g = globals() g[mod] = __import__(mod) g[f.__name__] = f f(*args, **kwds) wrapped.__name__ = f.__name__ return wrapped return makewrap @imps('itertools') def ic(): print(itertools.count) ic() ic() # first/only call to wrapper class 'itertools.count' class 'itertools.count' If I were going to rewrite code, I'd just use lazy imports (see http://pypi.python.org/pypi/Importing ). They're even faster than this approach (or using plain import statements), as they have zero per-call function call overhead. My code above and Importing, as I understand it, both delay imports until needed by using a dummy object that gets replaced at first access. (Now that I am reminded, sys.modules is the better place for the dummy objects. I just wanted to show that there is a simple solution (though more specialized) even for existing code.) The cost of delay, which might mean never, is a bit of one-time extra overhead. Both have no extra overhead after the first call. Unless delayed importing is made standard, both require a bit of extra code somewhere. It's just that not everything I write can depend on Importing. Throw an equivalent into the stdlib, though, and I guess I wouldn't have to worry about dependencies... And that is what I think (agree?) should be done to counteract the likely slowdown from using importlib. (To be clearer; I'm talking about the http://peak.telecommunity.com/DevCenter/Importing#lazy-imports feature, which sticks a dummy module subclass instance into sys.modules, whose __gettattribute__ does a reload() of the module, forcing the normal import process to run, after first changing the dummy object's type to something that doesn't have the __getattribute__ any more. This ensures that all accesses after the first one are at normal module attribute access speed. That, and the whenImported decorator from Importing would probably be of general stdlib usefulness too.) -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Wed, Feb 8, 2012 at 12:54 PM, Terry Reedy tjre...@udel.edu wrote: On 2/7/2012 9:35 PM, PJ Eby wrote: It's just that not everything I write can depend on Importing. Throw an equivalent into the stdlib, though, and I guess I wouldn't have to worry about dependencies... And that is what I think (agree?) should be done to counteract the likely slowdown from using importlib. Yeah, this is one frequently reinvented wheel that could definitely do with a standard implementation. Christian Heimes made an initial attempt at such a thing years ago with PEP 369, but an importlib based __import__ would let the implementation largely be pure Python (with all the increase in power and flexibility that implies). I'm not sure such an addition would help much with the base interpreter start up time though - most of the modules we bring in are because we're actually using them for some reason. The other thing that shouldn't be underrated here is the value in making the builtin import system PEP 302 compliant from a *documentation* perspective. I've made occasional attempts at fully documenting the import system over the years, and I always end up giving up because the combination of the pre-PEP 302 builtin mechanisms in import.c and the PEP 302 compliant mechanisms for things like zipimport just degenerate into a mess of special cases that are impossible to justify beyond nobody got around to fixing this yet. The fact that we have an undocumented PEP 302 based reimplementation of imports squirrelled away in pkgutil to make pkgutil and runpy work is sheer insanity (replacing *that* with importlib might actually be a good first step towards full integration). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] requirements for moving __import__ over to importlib?
On Tue, Feb 7, 2012 at 8:47 PM, Nick Coghlan ncogh...@gmail.com wrote: On Wed, Feb 8, 2012 at 12:54 PM, Terry Reedy tjre...@udel.edu wrote: On 2/7/2012 9:35 PM, PJ Eby wrote: It's just that not everything I write can depend on Importing. Throw an equivalent into the stdlib, though, and I guess I wouldn't have to worry about dependencies... And that is what I think (agree?) should be done to counteract the likely slowdown from using importlib. Yeah, this is one frequently reinvented wheel that could definitely do with a standard implementation. Christian Heimes made an initial attempt at such a thing years ago with PEP 369, but an importlib based __import__ would let the implementation largely be pure Python (with all the increase in power and flexibility that implies). I'm not sure such an addition would help much with the base interpreter start up time though - most of the modules we bring in are because we're actually using them for some reason. The other thing that shouldn't be underrated here is the value in making the builtin import system PEP 302 compliant from a *documentation* perspective. I've made occasional attempts at fully documenting the import system over the years, and I always end up giving up because the combination of the pre-PEP 302 builtin mechanisms in import.c and the PEP 302 compliant mechanisms for things like zipimport just degenerate into a mess of special cases that are impossible to justify beyond nobody got around to fixing this yet. The fact that we have an undocumented PEP 302 based reimplementation of imports squirrelled away in pkgutil to make pkgutil and runpy work is sheer insanity (replacing *that* with importlib might actually be a good first step towards full integration). +1 on all counts -eric ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com