On 6/12/07, Giovanni Bajo <[EMAIL PROTECTED]> wrote:
On 6/12/2007 6:30 PM, Phillip J. Eby wrote: >> import imp, os, sys >> from pkgutil import ImpImporter >> >> suffixes = set(ext for ext,mode,typ in imp.get_suffixes()) >> >> class CachedImporter(ImpImporter): >> def __init__(self, path): >> if not os.path.isdir(path): >> raise ImportError("Not an existing directory") >> super(CachedImporter, self).__init__(path) >> self.refresh() >> >> def refresh(self): >> self.cache = set() >> for fname in os.listdir(path): >> base, ext = os.path.splitext(fname) >> if ext in suffixes and '.' not in base: >> self.cache.add(base) >> >> def find_module(self, fullname, path=None): >> if fullname.split(".")[-1] not in self.cache: >> return None # no need to check further >> return super(CachedImporter, self).find_module(fullname, >> path) >> >> sys.path_hooks.append(CachedImporter) > > After a bit of reflection, it seems the refresh() method needs to be a > bit different: > > def refresh(self): > cache = set() > for fname in os.listdir(self.path): > base, ext = os.path.splitext(fname) > if not ext or (ext in suffixes and '.' not in base): > cache.add(base) > self.cache = cache > > This version fixes two problems: first, a race condition could occur if > you called refresh() while an import was taking place in another > thread. This version fixes that by only updating self.cache after the > new cache is completely built. > > Second, the old version didn't handle packages at all. This version > handles them by treating extension-less filenames as possible package > directories. I originally thought this should check for a subdirectory > and __init__, but this could get very expensive if a sys.path directory > has a lot of subdirectories (whether or not they're packages). Having > false positives in the cache (i.e. names that can't actually be > imported) could slow things down a bit, but *only* if those names match > something you're trying to import. Thus, it seems like a reasonable > trade-off versus needing to scan every subdirectory at startup or even > to check whether all those names *are* subdirectories. There is another couple of things I'll fix as soon as I try it. First is that I'd call refresh() lazily on the first find_module because I don't want to listdir() directories on sys.path that will never be accessed. The idea of using sys.path_hooks is very clever (I hadn't thought of it... because I didn't know of path_hooks in the first place! It appears to be undocumented and sparsely indexed by google as well), and it will probably help me a lot in my task of fixing this problem in the 2.x serie.
PEP 302 documents all of this, but unfortunately was never documented in the official docs. I also have some pseudocode of how import (roughly) works at sandbox/trunk/import_in_py/pseudocode.py . -Brett
_______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com