[issue19216] stat cache for import bootstrap

2019-05-14 Thread STINNER Victor


STINNER Victor  added the comment:

The benefit of avoiding stat() calls seems to not be obvious to everybody. 
Moreover, importlib now implements a "path cache". I close the issue.

The most efficient solution is to pack all your modules and the Python stdlib 
into a ZIP file: everything is done in memory, no more filesystem access.

--
resolution:  -> rejected
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-12-22 Thread Antoine Pitrou

Changes by Antoine Pitrou :


--
versions: +Python 3.5 -Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-10-19 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> > Also, have you read what I've just posted?
> 
> About the fuzziness of when startup is finished?  As implied above,
> I'd say at the end of Py_Initialize().

You only have imported a handful of modules by then. Real-world
applications will import many more afterwards.
Here's a little experiment (done with a system install of Python 2.7):

$ python -v -c pass 2>&1 | grep "^import" | wc -l
33
$ python -v `which hg` 2>&1 | grep "^import" | wc -l
117

Note that Mercurial has a lazy importer in order to improve startup
time, otherwise the number would be higher yet.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-10-19 Thread Eric Snow

Eric Snow added the comment:

> I don't really understand the algorithm you're proposing.

In importlib._bootstrap:

We have some global like "_CHECK_STAT=True".  FileFinder would use it to decide 
on using stat checks or not.

In Python/pythonrun.c:

At the end of import_init(), we set importlib._bootstrap _CHECK_STAT to False.  
Then at the end of _Py_InitializeEx_Private() we set it back to True.

(As an alternative, we could always not do stat checking for just the standard 
library)

> Also, have you read what I've just posted?

About the fuzziness of when startup is finished?  As implied above, I'd say at 
the end of Py_Initialize().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-10-19 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> Would it be feasible to have an explicit (but private?) flag in
> importlib indicating stat checking (or even all FS checking) should be
> disabled, defaulting to True?  runpy could set it to False after
> initializing importlib and then back to True when startup is done.

I don't really understand the algorithm you're proposing. Also, have you
read what I've just posted?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-10-19 Thread Eric Snow

Eric Snow added the comment:

Would it be feasible to have an explicit (but private?) flag in importlib 
indicating stat checking (or even all FS checking) should be disabled, 
defaulting to True?  runpy could set it to False after initializing importlib 
and then back to True when startup is done.

If that was useful for more than just startup, we could also add a 
contextmanager for it in importlib.util.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-10-19 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis :


--
nosy: +Arfrever

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-10-19 Thread Antoine Pitrou

Antoine Pitrou added the comment:

The real problem here is that the definition of "bootstrap" or "startup" is 
fuzzy. How do you decide when you stop caching?
The only workable approach IMO is to adopt a time-based heuristic, which I did 
in issue14067.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-10-18 Thread Brett Cannon

Changes by Brett Cannon :


--
assignee: brett.cannon -> 

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-10-10 Thread Eric Snow

Eric Snow added the comment:

I forgot to mention that optimizing the default composition of sys.path (from 
site) could help speed things up, though it might already be optimized in that 
regard.

I also forgot to mention the idea of zipping up the stdlib.

Sorry for the sidetrack.  Now, back to the stat discussion...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-10-10 Thread Eric Snow

Eric Snow added the comment:

I realized those two stats are not superfluous in the case that a directory 
name has a .py suffix or a file doesn't have any suffix.  However, I expect 
that's pretty uncommon.

Worst case, these cases cost 2 stats per path entry.  In practice they cost 
nothing due to the dir caching we already do.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-10-10 Thread Brett Cannon

Brett Cannon added the comment:

So the 2 stat calls in the general case are superfluous, it's just a question 
of whether they make any performance difference. Turns out that at least on my 
Macbook their is no performance difference and thus not worth the cost of 
breaking semantics over it: http://bugs.python.org/issue18810 .

As for completely turning off stat calls during interpreter startup, that would 
definitely buy us something, but the question is how much and how do we make it 
work reliably?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-10-10 Thread Eric Snow

Eric Snow added the comment:

For interpreter startup, stats are not involved for builtin and frozen 
modules[1].  They are tied to imports that involve traversing sys.path (a.k.a. 
PathFinder).  Most stats happen in FileFinder.find_loader.  The remainder are 
for source (.py) files (a.k.a. SourceFileLoader).

Here's a rough sketch of what typically happens currently during the import of 
a path-based module[2], as related to stats (and other FS access):

(lines with FS access start with *)

def load_module(fullname):
suffixes = ['.cpython-34m.so', '.abi3.so', '.so', '.py', '.pyc']
tailname = fullname.rpartition('.')[2]
for entry in sys.path:
*   mtime = os.stat(entry).st_mtime
if mtime != cached_mtime:
*   cached_listdir = os.listdir(entry)
if tailname in cached_listdir:
basename = entry/tailname
*   if os.stat(basename).st_mode implies directory:  # superfluous?
# package?
for suffix in suffixes:
full_path = basename + suffix
*   if os.stat(full_path).st_mode implies file:
if is_extension:
*   (full_path)
elif is_sourceless:
*   open(full_path).read()
else:
load_from_source(full_path)
return
# ...non-package module?
for suffix in suffixes:
full_path = entry/tailname + suffix
if tailname + suffix in cached_listdir:
*   if os.stat(full_path).st_mode implies file:  # superfluous?
if is_extension:
*   (full_path)
elif is_sourceless:
*   open(full_path).read()
else:
load_from_source(full_path)

def load_from_source(sourcepath):
*   st = os.stat(sourcepath)
if st:
*   open(bytecodepath).read()
else:
*   open(sourcepath).read()
*   os.stat(sourcepath).st_mode
for parent in ancestor_dirs(sourcepath):
*   os.stat(parent).st_mode  ->  missing_parents
for parent in missing_parents:
*   os.mkdir(parent)
*   open(tempname).write()
*   os.replace(tempname, bytecodepath)


Obviously there are some unix-isms in there.  Windows ends up not that 
different though.


stat/FS count
-

load_module (*per path entry*):
(add 1 listdir to each if the cache is stale)
not found: 1 stat
non-package dir: 7 (num_suffixes + 2 stats)

package (best): 4/5-9+ (3 stats, 1 read or load_from_source)
package (worst): 8/9-13+ (num_suffixes + 2 stats, 1 read or 
load_from_source)
non-package module 3/4-8+ (best): (2 stats, 1 read or load_from_source)
non-package module 7/8-12+ (worst): (num_suffixes + 1 stats, 1 read or 
load_from_source)
non-package module + dir (best): 10/11-15+ (num_suffixes + 4 stats, 1 read 
or load_from_source)
non-package module + dir (best): 14/15-19+ (num_suffixes * 2 + 3 stats, 1 
read or load_from_source)

load_from_source:
cached: 2 (1 stat, 1 read)
uncached, no parents: 4 (2 stats, 1 write, 1 replace)
uncached, no missing parents: 5+ (num_parents + 2 stats, 1 write, 1 replace)
uncached, missing parents: 6+ (num_parents + 2 stats, num_missing mkdirs, 1 
write, 1 replace)


Highlights:

* the common case is not fast (for the sake of the slight possibility that 
files may change between imports)--not as much an issue during interpreter 
startup.
* up to 5 different suffixes with a separate stat for each (with extension 
module suffixes tried first).
* the size and ordering of sys.path has a decided impact on # stats.
* if a module is cached, a lot less FS access happens.
* the more nested a module, the more access happen.
* namespace packages don't have much impact on performance.

Possible improvements:

* provide an internal mechanism to turn on/off caching all stats (don't worry 
about staleness) and maybe expose it via a context manager/API. (not unlike 
what Christian put in his patch.)
* at least do some temporally local caching where the risk of staleness is 
particularly small.
* Move .py ahead of extension modules (or just behind .cpython-34m.so)?
* non-packages are more common than packages (?) so look for those first (hard 
to make effective without breaking key import semantics).
* remove 2 possibly superfluous stats?


[1] Maybe we should freeze the stdlib. <0.5 wink>
[2] importing a module usually involves importing the module's parent and its 
parent and so forth.  Each of those incurs the same stat hits all over again 
(though usually packages have only 1 path entry to traverse).  The stdlib is 
pretty flat (particularly among modules involved during startup) so this is 
less of an issue for this ticket.

--

___
Python tracker 


[issue19216] stat cache for import bootstrap

2013-10-10 Thread Eric Snow

Eric Snow added the comment:

With ModuleSpec (PEP 451), the finder creates the spec object (where it stores 
the loader).  At that point the finder is free to store any stat object you 
like in spec.loader_state.  The spec is made available to the loader during 
exec (if the loader supports it, which the importlib loaders will).  So there 
is no need to add anything to any loader __init__.

The only catch is the slim possibility that the stat object will be stale by 
the time it gets used.  I seem to remember a case where something like this 
happened (related to distros building their system Python or something).

--
nosy: +eric.snow

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-10-10 Thread Brett Cannon

Brett Cannon added the comment:

importlib/_bootstrap.py is importlib, period, so there is no separation of what 
is used to start Python and what is used after interpreter startup is completed.

As for adding a 'stat' argument to the loaders, it's possible but as always it 
comes down to whether it will break someone or not. Since loaders do not 
necessarily execute immediately you are running the risk of a very stale cached 
stat object. Plus Eric Snow has his PEP where the API in terms of loader 
__init__ signature so you would want to look into that.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-10-10 Thread Christian Heimes

Christian Heimes added the comment:

Is the content of the bootstrap module used after the interpreter is boot 
strapped? I see ... that's a problem. It's a proof of concept anyway and the 
speed up is minimal. On my computer with a SSD the speedup barely measurable. 
I'd like to see if it makes a difference on a Raspbarry Pi or a NFS shares

I have another idea, too. Could we add an optional 'stat' argument to 
__init__() of FileLoader and ExtensionFileLoader so we can pass the stat object 
around and reuse it for loading?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-10-10 Thread Brett Cannon

Brett Cannon added the comment:

A cursory look at the patch suggests that the cache use is permanent and so any 
dynamic changes to a file or directory after an initial caching will not be 
picked up. Did you run the test suite with this patch as it should have failed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-10-10 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Benchmarks?

--
nosy: +pitrou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-10-10 Thread STINNER Victor

STINNER Victor added the comment:

See also #14604.

--
nosy: +haypo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-10-10 Thread Barry A. Warsaw

Changes by Barry A. Warsaw :


--
nosy: +barry

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-10-10 Thread R. David Murray

Changes by R. David Murray :


--
nosy: +r.david.murray

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19216] stat cache for import bootstrap

2013-10-10 Thread Christian Heimes

New submission from Christian Heimes:

The import library uses excessive stat() calls. I've implemented a simple cache 
for the bootstrap module that reduces the amount of stat() calls by almost 1/3 
(236 -> 159 on Linux).

--
assignee: brett.cannon
files: import_stat_cache.patch
keywords: patch
messages: 199378
nosy: brett.cannon, christian.heimes
priority: normal
severity: normal
stage: patch review
status: open
title: stat cache for import bootstrap
type: performance
versions: Python 3.4
Added file: http://bugs.python.org/file32032/import_stat_cache.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com