On 6/11/07, Phillip J. Eby <[EMAIL PROTECTED]> wrote:

At 12:46 AM 6/12/2007 +0200, Giovanni Bajo wrote:
>Hi Philip,
>
>I'm going to submit a PEP for Python 3000 (and possibly backported
>as an option off by default in Python 2). It's related to imports
>and how to make them faster. Given your expertise on the subject,
>I'd appreciate if you could review my ideas. I briefly spoken of it
>with Alex Martelli a few days ago at PyCon Italia and he was not
>negative about it.
>
>Problems:
>
>- A single import causes many syscalls (.pyo, .pyc, .py, in both
>directory and .zip file).
>- Situation is getting worse and worse with the advent of
>easy_install which produces many .pth files (longer sys.path).
>- Python startup time is slow, and a noticable fraction of it is
>dominated by site.py-related stuff (a simple hello world runs takes
>0.012s if run without -S, and 0.008s if run with -S).
>- Many people might not be interested in this, but others are really
>concerned. Eg: again at PyCon italia, I spoke with one of the
>leading Sugar programmers (OLPC) who told me that one of the biggest
>blocker right now is the python startup time (applications on latest
>OLPC prototype take 3-4 seconds to startup). He suggested that this
>was related to the large number of syscalls made for imports.
>
>
>Proposed solution:
>
>- A site cache is introduced. It's a dictionary mapping module names
>to absolute file paths.
>- When an import occurs, for each directory/zipfile we walk in
>sys.path, we read all directory entries, and update the site cache
>with all the Python modules found in it (all the Python modules
>found in the directory/zipfile).
>- If the filepath for a certain module is found in the site cache,
>the module is directly accessed. Otherwise, sys.path is walked.
>- The site cache can be cleared with sys.clear_site_cache(). This
>must be used after manual editing of sys.path (or could be done
>automatically by making sys.path a list subclass which notices each
>modification).
>- The site cache must be manually cleared if a Python file is added
>to a directory in sys.path after the application has started. This
>is a rare-enough scenario to require an additional explicit call.
>- If for whatever reason a filepath found in the site cache cannot
>be accessed (unmounted device, whatever) ImportError is raised.
>Again, this is something which is very rare and does not require
>much attention.

Here's a simpler solution, one that's easily testable using existing
Python versions.  Create a subclass of pkgutil.ImpImporter
(Python >=2.5) that caches a listdir of its contents, and uses it to
immediately reject any find_module() requests for which matching data
is not in its cached listdir.  Add this class to sys.path_hooks, and
see if it speeds things up.



I thought about this use case when writing importlib for lowering the
penalty of importing over NFS and this is exactly how I would do it as well
(except I would use the code from importlib instead of pkgutil  =).

-Brett
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to