On 6/11/07, Phillip J. Eby <[EMAIL PROTECTED]> wrote:
At 12:46 AM 6/12/2007 +0200, Giovanni Bajo wrote: >Hi Philip, > >I'm going to submit a PEP for Python 3000 (and possibly backported >as an option off by default in Python 2). It's related to imports >and how to make them faster. Given your expertise on the subject, >I'd appreciate if you could review my ideas. I briefly spoken of it >with Alex Martelli a few days ago at PyCon Italia and he was not >negative about it. > >Problems: > >- A single import causes many syscalls (.pyo, .pyc, .py, in both >directory and .zip file). >- Situation is getting worse and worse with the advent of >easy_install which produces many .pth files (longer sys.path). >- Python startup time is slow, and a noticable fraction of it is >dominated by site.py-related stuff (a simple hello world runs takes >0.012s if run without -S, and 0.008s if run with -S). >- Many people might not be interested in this, but others are really >concerned. Eg: again at PyCon italia, I spoke with one of the >leading Sugar programmers (OLPC) who told me that one of the biggest >blocker right now is the python startup time (applications on latest >OLPC prototype take 3-4 seconds to startup). He suggested that this >was related to the large number of syscalls made for imports. > > >Proposed solution: > >- A site cache is introduced. It's a dictionary mapping module names >to absolute file paths. >- When an import occurs, for each directory/zipfile we walk in >sys.path, we read all directory entries, and update the site cache >with all the Python modules found in it (all the Python modules >found in the directory/zipfile). >- If the filepath for a certain module is found in the site cache, >the module is directly accessed. Otherwise, sys.path is walked. >- The site cache can be cleared with sys.clear_site_cache(). This >must be used after manual editing of sys.path (or could be done >automatically by making sys.path a list subclass which notices each >modification). >- The site cache must be manually cleared if a Python file is added >to a directory in sys.path after the application has started. This >is a rare-enough scenario to require an additional explicit call. >- If for whatever reason a filepath found in the site cache cannot >be accessed (unmounted device, whatever) ImportError is raised. >Again, this is something which is very rare and does not require >much attention. Here's a simpler solution, one that's easily testable using existing Python versions. Create a subclass of pkgutil.ImpImporter (Python >=2.5) that caches a listdir of its contents, and uses it to immediately reject any find_module() requests for which matching data is not in its cached listdir. Add this class to sys.path_hooks, and see if it speeds things up.
I thought about this use case when writing importlib for lowering the penalty of importing over NFS and this is exactly how I would do it as well (except I would use the code from importlib instead of pkgutil =). -Brett
_______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com