On Wed, Apr 11, 2018 at 10:08:58AM +1000, Chris Angelico wrote: > File system limits aren't usually an issue; as you say, even FAT32 can > store a metric ton of files in a single directory. I'm more interested > in how long it takes to open a file, and whether doubling that time > will have a measurable impact on Python startup time. Part of that > cost can be reduced by using openat(), on platforms that support it, > but even with a directory handle, there's still a definite non-zero > cost to opening and reading an additional file.
Yes, it will double the number of files. Actually quadruple it, if the annotations and line numbers are in separate files too. But if most of those extra files never need to be opened, then there's no cost to them. And whatever extra cost there is, is amortized over the lifetime of the interpreter. The expectation here is that this could lead to reducing startup time, since the files which are read are smaller and less data needs to be read and traverse the network up front, but can be defered until they're actually needed. Serhiy is experienced enough that I think we should assume he's not going to push this optimization into production unless it actually does reduce startup time. He has proven himself enough that we should assume competence rather than incompetence :-) Here is the proposal as I understand it: - by default, change .pyc files to store annotations, docstrings and line numbers as references to external files which will be lazily loaded on-need; - single-file .pyc files must still be supported, but this won't be the default and could rely on an external "merge" tool; - objects that rely on docstrings or annotations, such as dataclass, may experience a (hopefully very small) increase of import time, since they may not be able to defer loading the extra files; - but in general, most modules should (we expect) see an decrease in the load time; - which will (we hope) reduce startup time; - libraries which make eager use of docstrings and annotations might even ship with the single-file .pyc instead (the library installer can look after that aspect), and so avoid any extra cost. Naturally pushing this into production will require benchmarks that prove this actually does improve startup time. I believe that Serhiy's reason for asking is to determine whether it is worth his while to experiment on this. There's no point in implementing these changes and benchmarking them, if there's no chance of it being accepted. So on the assumptions that: - benchmarking does demonstrate a non-trivial speedup of interpreter startup; - single-file .pyc files are still supported, for the use of byte-code only libraries; - and modules which are particularly badly impacted by this change are able to opt-out and use a single .pyc file; I see no reason not to support this idea if Serhiy (or someone else) is willing to put in the work. -- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/