Eric Snow added the comment:

For interpreter startup, stats are not involved for builtin and frozen 
modules[1].  They are tied to imports that involve traversing sys.path (a.k.a. 
PathFinder).  Most stats happen in FileFinder.find_loader.  The remainder are 
for source (.py) files (a.k.a. SourceFileLoader).

Here's a rough sketch of what typically happens currently during the import of 
a path-based module[2], as related to stats (and other FS access):

(lines with FS access start with *)

def load_module(fullname):
    suffixes = ['.cpython-34m.so', '.abi3.so', '.so', '.py', '.pyc']
    tailname = fullname.rpartition('.')[2]
    for entry in sys.path:
*       mtime = os.stat(entry).st_mtime
        if mtime != cached_mtime:
*           cached_listdir = os.listdir(entry)
        if tailname in cached_listdir:
            basename = entry/tailname
*           if os.stat(basename).st_mode implies directory:  # superfluous?
                # package?
                for suffix in suffixes:
                    full_path = basename + suffix
*                   if os.stat(full_path).st_mode implies file:
                        if is_extension:
*                           <dlopen>(full_path)
                        elif is_sourceless:
*                           open(full_path).read()
                        else:
                            load_from_source(full_path)
                        return
        # ...non-package module?
        for suffix in suffixes:
            full_path = entry/tailname + suffix
            if tailname + suffix in cached_listdir:
*               if os.stat(full_path).st_mode implies file:  # superfluous?
                    if is_extension:
*                       <dlopen>(full_path)
                    elif is_sourceless:
*                       open(full_path).read()
                    else:
                        load_from_source(full_path)

def load_from_source(sourcepath):
*   st = os.stat(sourcepath)
    if st:
*       open(bytecodepath).read()
    else:
*       open(sourcepath).read()
*       os.stat(sourcepath).st_mode
        for parent in ancestor_dirs(sourcepath):
*           os.stat(parent).st_mode  ->  missing_parents
        for parent in missing_parents:
*           os.mkdir(parent)
*       open(tempname).write()
*       os.replace(tempname, bytecodepath)


Obviously there are some unix-isms in there.  Windows ends up not that 
different though.


stat/FS count
-------------

load_module (*per path entry*):
    (add 1 listdir to each if the cache is stale)
    not found: 1 stat
    non-package dir: 7 (num_suffixes + 2 stats)

    package (best): 4/5-9+ (3 stats, 1 read or load_from_source)
    package (worst): 8/9-13+ (num_suffixes + 2 stats, 1 read or 
load_from_source)
    non-package module 3/4-8+ (best): (2 stats, 1 read or load_from_source)
    non-package module 7/8-12+ (worst): (num_suffixes + 1 stats, 1 read or 
load_from_source)
    non-package module + dir (best): 10/11-15+ (num_suffixes + 4 stats, 1 read 
or load_from_source)
    non-package module + dir (best): 14/15-19+ (num_suffixes * 2 + 3 stats, 1 
read or load_from_source)

load_from_source:
    cached: 2 (1 stat, 1 read)
    uncached, no parents: 4 (2 stats, 1 write, 1 replace)
    uncached, no missing parents: 5+ (num_parents + 2 stats, 1 write, 1 replace)
    uncached, missing parents: 6+ (num_parents + 2 stats, num_missing mkdirs, 1 
write, 1 replace)


Highlights:

* the common case is not fast (for the sake of the slight possibility that 
files may change between imports)--not as much an issue during interpreter 
startup.
* up to 5 different suffixes with a separate stat for each (with extension 
module suffixes tried first).
* the size and ordering of sys.path has a decided impact on # stats.
* if a module is cached, a lot less FS access happens.
* the more nested a module, the more access happen.
* namespace packages don't have much impact on performance.

Possible improvements:

* provide an internal mechanism to turn on/off caching all stats (don't worry 
about staleness) and maybe expose it via a context manager/API. (not unlike 
what Christian put in his patch.)
* at least do some temporally local caching where the risk of staleness is 
particularly small.
* Move .py ahead of extension modules (or just behind .cpython-34m.so)?
* non-packages are more common than packages (?) so look for those first (hard 
to make effective without breaking key import semantics).
* remove 2 possibly superfluous stats?


[1] Maybe we should freeze the stdlib. <0.5 wink>
[2] importing a module usually involves importing the module's parent and its 
parent and so forth.  Each of those incurs the same stat hits all over again 
(though usually packages have only 1 path entry to traverse).  The stdlib is 
pretty flat (particularly among modules involved during startup) so this is 
less of an issue for this ticket.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19216>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to