On Sat, Sep 15, 2018 at 2:53 AM Paul Moore <p.f.mo...@gmail.com> wrote:
> On Fri, 14 Sep 2018 at 23:28, Neil Schemenauer <nas-pyt...@arctrix.com> > wrote: > > > > On 2018-09-14, Larry Hastings wrote: > > > [..] adding the stat calls back in costs you half the startup. So > > > any mechanism where we're talking to the disk _at all_ simply > > > isn't going to be as fast. > > > > Okay, so if we use hundreds of small .pyc files scattered all over > > the disk, that's bad? Who would have thunk it. ;-P > > > > We could have a new format, .pya (compiled python archive) that has > > data for many .pyc files in it. In normal runs you would have one > > or just and handlful of these things (e.g. one for stdlib, one for > > your app and all the packages it uses). Then you mmap these just > > once and rely on OS page faults to bring in the data as you need it. > > The .pya would have a hash table at the start or end that tells you > > the offset for each module. > > Isn't that essentially what putting the stdlib in a zipfile does? (See > the windows embedded distribution for an example). It probably uses > normal IO rather than mmap, but maybe adding a "use mmap" flag to the > zipfile module would be a more general enhancement that zipimport > could use for free. > > Paul > To share a lesson learned: Putting the stdlib in a zip file is doable, but comes with a caveats that would likely make OS distros want to undo the change if done with CPython today: We did that for one of our internal pre-built Python 2.7 distributions used internally at Google used in the 2012-2014 timeframe. Thinking at the time "yay, less inodes and disk space and stat calls by the interpreter on all machines." The caveat we didn't anticipate was unfortunately that zipimport.c cannot handle the zip file changing out from underneath a running process. Ever. It does not hold an open file handle to the zip file (which on posix systems would ameliorate the problem) but instead regularly reopens it by name while using a startup-time cached zip file index. So when you deploy a change to your Python interpreter (as any OS distro package update, security update, upgrade, etc.) existing running processes that go on to do another import of a stdlib module that hadn't already been imported (statistically likely to be a codec related module, as those are often imported upon first use rather than at startup time with most modules the way people tend to structure their code) read a different zipfile using a cached index from a previous one and... boom. A strange rolling error in production that is not pretty to debug. Fixing zipimport.c to deal with this properly was tried, but still ran into issues, and was deemed ultimately infeasible. There's a BPO issue or three filed about this if you go hunting. On the contrary, having compiled in constants in the executable is fine and will never suffer from this problem. Those are mapped as RO data by the dynamic loader and demand paged. No complicated code in CPython required to manage them aside from the stdlib startup code import intercepting logic (which should be reasonably small, even without having looked at the patch in the PR yet). There's ongoing work to rewrite zipimport.c in python using zipfile itself which if used for the stdlib will require everything that it needs to be frozen into C data similar to existing bootstrap import logic - and being a different implementation of zip file reading code might be possible to do without suffering the same caveat. But storing the data on the C side still sounds like a much simpler code path to me. The maintenance concern is mostly about testing and building to make sure we include everything needed by the interpreter and keep it up to date. I'd like a configure flag controlling when the feature is to be "on by default". Having it off by default and enabled by an interpreter command line flag otherwise. Consider adding the individual configure flag to the set of things that --with-optimizations turns on for people. Don't be surprised if Facebook reports a startup time speedup greater than what you ever measure yourself. Their applications are different, and if they're using their XAR thing that mounts applications as a FUSE filesystem - that increases stat() overhead beyond what it already is with additional kernel round trips so it'll benefit that design even more. Any savings in startup time by not doing a crazy amount of sequential high latency blocking system calls is a good thing regardless. Not just for command line tools. Serving applications that are starting up are effectively spinning consuming CPUs to ultimately compute the same result everywhere for every application every time before performing useful work... You can measure such an optimization in a worthwhile amount of $ or carbon footprint saved around the world. Heat death of the universe by a billion cuts. Thanks for working on this! -G
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com