Hi Nick, Sorry for the late reply, and thanks for the feedback!
We’ve been working on publishing the package, and the first version is available at https://github.com/alibaba/code-data-share-for-python/, with user guide and some statistics (TL;DR: ~15% speedup in startup). We welcome code review, comments or questions. > I assume the files wouldn't be portable across architectures That’s true, this file is basically a snapshot of part of the CPython heap that could be shared between processes. > so does the cache file naming scheme take that into account? Currently no, this file is intended to be generated on demand (rather than generating a huge archive from all the third-party packages installed). Thus the file itself and the name should be managed by user. > (The idea is interesting regardless of whether it produces arch-specific > files - kind of a middle ground between portable serialisation based pycs and > fully frozen modules) I think our package could be the substitution of the frozen module mechanism for third-party packages — while builtin modules can be compiled to C code, code-data-share could automatically create a similar file that requires no compilation / deserialization. Actually we do have a POC which is integrated with CPython and can speedup importing builtin modules, but after make it third-party package, there’s not much we can do to the builtins, so freeze and deep-freeze is quite exciting to us. Best, Yichen > On Mar 20, 2022, at 23:26, Nick Coghlan <ncogh...@gmail.com> wrote: > > (belated follow-up as I noticed there hadn't been a reply on list yet, just > the previous feedback on the faster-cpython ticket) > > On Mon, 21 Feb 2022, 6:53 pm Yichen Yan via Python-Dev, > <python-dev@python.org> wrote: >> >> Hi folks, as illustrated in faster-cpython#150 [1], we have implemented a >> mechanism that supports data persistence of a subset of python date types >> with mmap, therefore can reduce package import time by caching code object. >> This could be seen as a more eager pyc format, as they are for the same >> purpose, but our approach try to avoid [de]serialization. Therefore, we get >> a speedup in overall python startup by ~15%. > > > This certainly sounds interesting! > >> >> Currently, we’ve made it a third-party library and have been working on >> open-sourcing. >> >> Our implementation (whose non-official name is “pycds”) mainly contains two >> parts: >> importlib hooks, this implements the mechanism to dump code objects to an >> archive and a `Finder` that supports loading code object from mapped memory. >> Dumping and loading (subset of) python types with mmap. In this part, we >> deal with 1) ASLR by patching `ob_type` fields; 2) hash seed randomization >> by supporting only basic types who don’t have hash-based layout (i.e. dict >> is not supported); 3) interned string by re-interning strings while loading >> mmap archive and so on. > > I assume the files wouldn't be portable across architectures, so does the > cache file naming scheme take that into account? > > (The idea is interesting regardless of whether it produces arch-specific > files - kind of a middle ground between portable serialisation based pycs and > fully frozen modules) > > Cheers, > Nick. >
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OPJV5HF4MUB2YHGZZQZXMTBNF6ZAJML5/ Code of Conduct: http://python.org/psf/codeofconduct/