[Python-Dev] Re: A better way to freeze modules
On Sat, Sep 11, 2021 at 7:08 PM Gregory Szorc wrote: > Thanks for all the replies, everyone! I'll reply to a few comments > individually. But I first wanted to address the common theme around > zipimport. > > First, one idea that nobody mentioned (and came to me after reading the > replies) was to possibly leverage zipimport for freezing the standard > library instead of extending the frozen importer. I strongly feel this > option is preferable to extending the frozen importer with additional > functionality. I suspect the Python core developers would prefer to close > importer feature gaps / bugs with zipimport over the frozen importer. And > since zipimport is actually usable without having to build your own > binaries, improvements to zipimport would more significantly benefit the > larger Python ecosystem. If zipimporter gained the ability to open a zip > archive residing in a memory address or a PyObject implementing the buffer > protocol, [parts of] the standard library could be persisted as a zip file > in libpython and the frozen importer would be limited to bootstrapping the > import subsystem, just like it is today. This avoids adding additional > complexity (like supporting __file__ and __cached__) to the frozen > importer. And it keeps the standard library using a commonly-used importer > in the standard library, just like it is today with PathFinder. > > Onto the bigger question that can be summarized as "why not use zipimport: > why do we need something different?" I sympathize with this reasoning. > zipimport exists, it is functional, and it is widely used and has > demonstrated value. > > Performance is a major reason to build something better than zipimport. > > I response to your replies, I implemented a handful of benchmarks for > oxidized_importer and also implemented a pure Rust implementation of a zip > file importer to collect some hard data. You can reproduce my results by > cloning https://github.com/indygreg/PyOxidizer.git and running `cargo > bench -p pyembed-bench`. At time of writing, the benchmarks materialize the > full standard library on the filesystem, in a zip archive (with no > compression), and in the "Python packed resources" format. It then fires up > a Python interpreter and imports ~450 modules comprising the standard > library. I encourage you to obtain your own numbers and look at the > benchmark code to better understand testing methodology. But here are some > results from Linux on my Ryzen 5950x. > > * zipimporter is slower than PathFinder to import the entirety of the > standard library when the disk cache is hot. 201.81ms for zipimporter vs > 174.06ms for PathFinder. > * My pure Rust zip importer is faster than zipimporter and PathFinder for > the same operation. 161.67ms when reading zip data from memory; 164.45ms > when using buffered filesystem I/O (8kb read operations). > * OxidizedFinder + Python packed resources are the fastest of all. > 121.07ms loading from memory. > * Parsing/indexing the container formats is fast in Rust. Python packed > resources parses in 107.69us and indexes in 200.52us (0.2ms). A zip archive > table of contents is parsed in 809.61us and indexes in 1.205ms. If that > same zip archive is read/seeked using filesystem I/O, the numbers go up to > 4.6768ms and 5.1591ms. > * Starting and finalizing a Python interpreter takes 11.930ms with > PathFinder and 4.8887ms with OxidizedFinder. > Now this is for importing 450 modules, correct? My suspicion is that import load for something that's start-up sensitive is not common. While there's an obvious performance difference between e.g. 202ms and 174ms, that may be at the extreme end. If you assume average import time per module and you assume about 100 modules imported then the difference is 45ms versus 39ms which is negligible. So while I appreciate the collection of these numbers and seeing there's room for improvement, I also don't think it's best for us to focus on the worst-case scenario. > > I won't post the full set of numbers for Windows, but they are generally > higher, especially if filesystem I/O is involved. PathFinder is still > faster than zipimporter, however. And zipimporter's relative slowness > compared to OxidizedFinder is more pronounced. > > There are many interesting takeaways from these numbers. But here are what > I think are the most important: > > * The Rust implementation of a zip importer trouncing performance of > zipimporter probably means zipimporter could be made a lot faster (I didn't > profile to measure why zipimporter is so slow. But I suspect its > performance is hindered by being implemented in Python.) > Being implemented in Python is very much on purpose for zipimporter as it used to be in C and no one wanted to work on it then. Having it in Python has made tweaking how it functions much easier. > * OxidizedFinder + Python packed resources are still significantly faster > than the next fastest solution (Rust implemented zip importer). > I
[Python-Dev] Re: A better way to freeze modules
Thanks for all the replies, everyone! I'll reply to a few comments individually. But I first wanted to address the common theme around zipimport. First, one idea that nobody mentioned (and came to me after reading the replies) was to possibly leverage zipimport for freezing the standard library instead of extending the frozen importer. I strongly feel this option is preferable to extending the frozen importer with additional functionality. I suspect the Python core developers would prefer to close importer feature gaps / bugs with zipimport over the frozen importer. And since zipimport is actually usable without having to build your own binaries, improvements to zipimport would more significantly benefit the larger Python ecosystem. If zipimporter gained the ability to open a zip archive residing in a memory address or a PyObject implementing the buffer protocol, [parts of] the standard library could be persisted as a zip file in libpython and the frozen importer would be limited to bootstrapping the import subsystem, just like it is today. This avoids adding additional complexity (like supporting __file__ and __cached__) to the frozen importer. And it keeps the standard library using a commonly-used importer in the standard library, just like it is today with PathFinder. Onto the bigger question that can be summarized as "why not use zipimport: why do we need something different?" I sympathize with this reasoning. zipimport exists, it is functional, and it is widely used and has demonstrated value. Performance is a major reason to build something better than zipimport. I response to your replies, I implemented a handful of benchmarks for oxidized_importer and also implemented a pure Rust implementation of a zip file importer to collect some hard data. You can reproduce my results by cloning https://github.com/indygreg/PyOxidizer.git and running `cargo bench -p pyembed-bench`. At time of writing, the benchmarks materialize the full standard library on the filesystem, in a zip archive (with no compression), and in the "Python packed resources" format. It then fires up a Python interpreter and imports ~450 modules comprising the standard library. I encourage you to obtain your own numbers and look at the benchmark code to better understand testing methodology. But here are some results from Linux on my Ryzen 5950x. * zipimporter is slower than PathFinder to import the entirety of the standard library when the disk cache is hot. 201.81ms for zipimporter vs 174.06ms for PathFinder. * My pure Rust zip importer is faster than zipimporter and PathFinder for the same operation. 161.67ms when reading zip data from memory; 164.45ms when using buffered filesystem I/O (8kb read operations). * OxidizedFinder + Python packed resources are the fastest of all. 121.07ms loading from memory. * Parsing/indexing the container formats is fast in Rust. Python packed resources parses in 107.69us and indexes in 200.52us (0.2ms). A zip archive table of contents is parsed in 809.61us and indexes in 1.205ms. If that same zip archive is read/seeked using filesystem I/O, the numbers go up to 4.6768ms and 5.1591ms. * Starting and finalizing a Python interpreter takes 11.930ms with PathFinder and 4.8887ms with OxidizedFinder. I won't post the full set of numbers for Windows, but they are generally higher, especially if filesystem I/O is involved. PathFinder is still faster than zipimporter, however. And zipimporter's relative slowness compared to OxidizedFinder is more pronounced. There are many interesting takeaways from these numbers. But here are what I think are the most important: * The Rust implementation of a zip importer trouncing performance of zipimporter probably means zipimporter could be made a lot faster (I didn't profile to measure why zipimporter is so slow. But I suspect its performance is hindered by being implemented in Python.) * OxidizedFinder + Python packed resources are still significantly faster than the next fastest solution (Rust implemented zip importer). * The overhead of reading and parsing the container format can matter. PyOxidizer built binaries can start and finalize a Python interpreter in <5ms (this ignores new process overhead). ~1.2ms for the Rust zip importer to index the zip file is a significant percentage! Succinctly, today zipimporter is somewhat slow when you aren't I/O constrained. The existence proof of a faster Rust implementation implies it could be made significantly faster. Is that "good enough" to forego standard library inclusion of a yet more efficient solution? That's a healthy debate to have. You know which side I'm on :) But it would probably be prudent to optimize zipimporter before investing in something more esoteric. Onto the individual replies. On Fri, Sep 3, 2021 at 12:42 AM Paul Moore wrote: > My quick reaction was somewhat different - it would be a great idea, but > it’s entirely possible to implement this outside the stdlib as a 3rd party > module. So the
[Python-Dev] Re: A better way to freeze modules
On Fri, Sep 3, 2021 at 5:32 AM Paul Moore wrote: > On Fri, 3 Sept 2021 at 10:29, Simon Cross > wrote: > > I think adding a meta path importer that reads from a standard > > optimized format could be a great addition. > > I think the biggest open question would be "what benefits does this > have over the existing zipimport?" +1 > > As you mentioned in your email, this is a big detour from the current > > start-up performance work, so I think practically the people working > > on performance are unlikely to take a detour from their detour right > > now. > > Agreed, it would probably have to be an independent development > initially. If it delivers better performance, then switching the > startup work to use it would give a second set of performance > improvements, which no-one is going to object to. Similarly, if it's > simpler to manage, then the maintainability benefits could justify > switching over. +1 > > * Write the meta path importer in a separate package (it sounds like > > you've already done a lot of the work and gained a lot of > > understanding of the issues while writing PyOxidizer!) > > This is the key thing, though. The import machinery allows new > importers to be written as standalone modules, so I'd strongly > recommend that the proposed format/importer gets developed as a PyPI > module initially, with the PEP then being simply a proposal that the > module gets added to the stdlib and/or built into the interpreter. FWIW, I'm a big fan of folks taking advantage of the flexibility of the import machinery and writing importers like this (especially ones that folks must explicitly enable). As noted elsewhere, it would need to prove its worth before we consider putting it into importlib. > The key argument would be bootstrapping, IMO. I would definitely expect > interest in something like this to be lower if it's an external module > (needing a dependency to load your other dependencies is suboptimal). > Conversely, though, if no-one shows any interest in a PyPI version of > this idea, that would strongly imply that it's not as useful in > practice as you'd hoped. Excellent point! > In particular, I'd involve the maintainers of pyinstaller in the > design. If a new "frozen module importer" mechanism isn't of interest > to them, it's probably not going to get the necessary support to be > worth adding to the stdlib. +1 > On a personal note, I love the flexibility of Python's import system, > and I've always wanted to write importers for additional storage > formats (import from a sqlite database, for instance). But I've never > actually done so, because a zipfile is basically always sufficient for > any practical use case I've had. One day I hope to find a real use > case, though :-) Cool! I'd love to see what you make. -eric ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/YV2K3BPVDZRZTGLM4HWQEJWMVPI6BGHD/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: A better way to freeze modules
On Thu, Sep 2, 2021 at 10:46 PM Gregory Szorc wrote: > Over in https://bugs.python.org/issue45020 there is some exciting work around > expanding the use of the frozen importer to speed up Python interpreter > startup. I wholeheartedly support the effort and don't want to discourage > progress in this area. > > Simultaneously, I've been down this path before with PyOxidizer and feel like > I have some insight to share. Thanks for the support and for taking the time to share your insight! Your work on PyOxidizer is really neat. Before I dive in to replying, I want to be clear about what we are discussing here. There are two related topics: the impact of freezing stdlib modules and usability problems with frozen modules in general (stdlib or not). https://bugs.python.org/issue45020 is concerned with the former but prompted some good discussion about the latter. From what I understand, this python-dev thread is more about the latter (and then some). That's totally worth discussing! I just don't want the two topics to be unnecessarily conflated. FYI, frozen modules (effectively the .pyc data) are compiled into the Python binary and lhen loaded from there during import rather than from the filesystem. This allows us to avoid disk access, giving us a performance benefit, but we still have to unmarshal and execute the module code. It also allows us to have the import machinery written in pure Python (importlib._bootstrap and importlib._bootstrap_external). (Thanks Brett!) While frozen modules are derived from .py files, they currently have some differences from the corresponding source modules: the loader (which has less capability), the repr, frozen packages have __path__ set to [], and frozen modules don't have __file__, __cached__, etc. set. This has been the case for a long time. MAL worked on addressing __file__ but the effort stalled out. (See https://bugs.python.org/issue45020#msg400769 and especially https://bugs.python.org/issue21736.) The challenge with solving this for non-stdlib modules is that the frozen importer would need help to know where to find corresponding .py files. bpo-45020 is about freezing a small subset of the stdlib as a performance improvement. It's the 11 stdlib modules (plus encodings) that get imported every time during "./python -c pass". Freezing them provides a roughly 15% startup time improvement. (The 11 modules are: abc, codecs, encodings, io, _collections_abc, _site_builtins, os, os.path, genericpath, site, and stat. Maybe there are a few other modules it would make sense to freeze but we're starting with those 11.) This work is probably somewhat affected by the differences between frozen and source modules, and we may need to set an appropriate __file__ on frozen stdlib modules to avoid impacting folks that expect any of those stdlib modules to have it set. Otherwise, for bpo-45020 there likely isn't much more we need to do about frozen stdlib modules shipping with CPython by default. Regardless, bpo-45020 doesn't introduce any new problems; rather it slightly exposes the existing ones. In contrast to the use of frozen modules in default Python builds, there are a number of tools in the community for freezing modules (both stdlib and not) into custom Python binaries, like PyOxidizer and MAL's PyRun. Such tools would benefit from broader compatibility between frozen modules and the corresponding source modules. Consequently the tool maintainers would be the most likely drivers of any effort to improve frozen modules (which the discussion with MAL and Gregory bears out). The tools would especially benefit if those improvements could apply to non-stdlib modules, which requires a more complex solution than is needed for stdlib modules. At the (relative) extreme is to throw out the existing frozen module approach (or even the "unmarshal + exec" approach of source-based modules) and replace it with something more efficient and/or more compatible (and cross-platform). From what I understood, this is the main focus of this thread. It's interesting stuff and I hope the discussion renders a productive result. FTR, in bpo-45020 Gregory helpfully linked to some insightful material related to PyOxidizer and frozen modules: * https://github.com/indygreg/PyOxidizer/issues/69 * https://pyoxidizer.readthedocs.io/en/stable/oxidized_importer_behavior_and_compliance.html?highlight=__file__#file-and-cached-module-attributes * https://pypi.org/project/oxidized-importer/ and https://pyoxidizer.readthedocs.io/en/stable/oxidized_importer.html With that said, on to replying. :) > I don't think I'll be offending anyone by saying the existing CPython frozen > importer is quite primitive in terms of functionality: it does the minimum it > needs to do to support importing module bytecode embedded in the interpreter > binary [for purposes of bootstrapping the Python-based importlib modules]. > The C struct representing frozen modules is literally just the
[Python-Dev] Re: A better way to freeze modules
On Fri, 3 Sept 2021 at 10:29, Simon Cross wrote: > > Hi Gregory, > > I think adding a meta path importer that reads from a standard > optimized format could be a great addition. I think the biggest open question would be "what benefits does this have over the existing zipimport?" Maybe it could be a little faster? But would the downside of it not being possible to manage the format with existing standard tools outweigh that? A clear description of how to decide which is the most appropriate to use in a given situation between the new format and a zipfile would be a benefit here. > As you mentioned in your email, this is a big detour from the current > start-up performance work, so I think practically the people working > on performance are unlikely to take a detour from their detour right > now. Agreed, it would probably have to be an independent development initially. If it delivers better performance, then switching the startup work to use it would give a second set of performance improvements, which no-one is going to object to. Similarly, if it's simpler to manage, then the maintainability benefits could justify switching over. > * Ask if there are any Python core developers who would be willing to > look at the early stages of the code and/or PEP that you might produce > in the next couple of steps. Perhaps also ask on one of the packaging > mailing lists. If you get others involved as reviewers or contributors > from the start, convincing them later that it is a good idea will be > much easier. :) I'd be willing to look. I'm more interested in the design at this stage than in looking at code, as it's awfully easy to develop something that ends up being a "solution looking for a problem", so a solid case for having a general solution would be important for me. > * Write the meta path importer in a separate package (it sounds like > you've already done a lot of the work and gained a lot of > understanding of the issues while writing PyOxidizer!) This is the key thing, though. The import machinery allows new importers to be written as standalone modules, so I'd strongly recommend that the proposed format/importer gets developed as a PyPI module initially, with the PEP then being simply a proposal that the module gets added to the stdlib and/or built into the interpreter. The key argument would be bootstrapping, IMO. I would definitely expect interest in something like this to be lower if it's an external module (needing a dependency to load your other dependencies is suboptimal). Conversely, though, if no-one shows any interest in a PyPI version of this idea, that would strongly imply that it's not as useful in practice as you'd hoped. In particular, I'd involve the maintainers of pyinstaller in the design. If a new "frozen module importer" mechanism isn't of interest to them, it's probably not going to get the necessary support to be worth adding to the stdlib. > * Write a PEP. > > It seems to me that PEPs that come with an implementation and the > support of a few existing core developers have a much less painful PEP > review process. Agreed. In particular, existing code with a clearly demonstrated user base only has to persuade people that being in the core is important. Most proposals that I've seen which could be developed as a PyPI module never get anywhere because it turns out no-one is willing to do the work. You don't need to be a core developer to write a PyPI module, so if no-one has done that, it's likely to be either because the implementation needs tight integration into the core, or because nobody is actually as interested in the issue as you thought... On a personal note, I love the flexibility of Python's import system, and I've always wanted to write importers for additional storage formats (import from a sqlite database, for instance). But I've never actually done so, because a zipfile is basically always sufficient for any practical use case I've had. One day I hope to find a real use case, though :-) > Thank you for writing PyOxidizer and offering some of your time to > help make Python itself better. +1 Paul. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2THOX327G2SBUBNC4CEFK2JOH7VICFHC/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: A better way to freeze modules
Hi Gregory, I think adding a meta path importer that reads from a standard optimized format could be a great addition. As you mentioned in your email, this is a big detour from the current start-up performance work, so I think practically the people working on performance are unlikely to take a detour from their detour right now. If you would like to see your suggested feature in Python, I *think* the following might be a reasonable approach: * Email python-dev about your idea (done already! :) * Ask if there are any Python core developers who would be willing to look at the early stages of the code and/or PEP that you might produce in the next couple of steps. Perhaps also ask on one of the packaging mailing lists. If you get others involved as reviewers or contributors from the start, convincing them later that it is a good idea will be much easier. :) * Write the meta path importer in a separate package (it sounds like you've already done a lot of the work and gained a lot of understanding of the issues while writing PyOxidizer!) * Write a PEP. It seems to me that PEPs that come with an implementation and the support of a few existing core developers have a much less painful PEP review process. Thank you for writing PyOxidizer and offering some of your time to help make Python itself better. Yours sincerely, Simon Cross P.S. I am not a core developer, and I haven't even written any PEPs. :) ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/LMHQ2S2YDJHQNQG3U65GIUPU6IB5QDXY/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: A better way to freeze modules
My quick reaction was somewhat different - it would be a great idea, but it’s entirely possible to implement this outside the stdlib as a 3rd party module. So the fact that no-one has yet done so means there’s less general interest than the OP is suggesting. And from my experience, the reason for that is that zipimport is almost always sufficient. That’s what tools like pyinstaller use, for example. Paul On Fri, 3 Sep 2021 at 06:25, Guido van Rossum wrote: > Quick reaction: This feels like a bait and switch to me. Also, there are > many advantages to using a standard format like zip (many formats are > really zip with some conventions). Finally, the bytecode format you are > using is “marshal”, and is fully portable — as is zip. > > On Thu, Sep 2, 2021 at 21:44 Gregory Szorc > wrote: > >> Over in https://bugs.python.org/issue45020 there is some exciting work >> around expanding the use of the frozen importer to speed up Python >> interpreter startup. I wholeheartedly support the effort and don't want to >> discourage progress in this area. >> >> Simultaneously, I've been down this path before with PyOxidizer and feel >> like I have some insight to share. >> >> I don't think I'll be offending anyone by saying the existing CPython >> frozen importer is quite primitive in terms of functionality: it does the >> minimum it needs to do to support importing module bytecode embedded in the >> interpreter binary [for purposes of bootstrapping the Python-based >> importlib modules]. The C struct representing frozen modules is literally >> just the module name and a pointer to a sized buffer containing bytecode. >> >> In issue45020 there is talk of enhancing the functionality of the frozen >> importer to support its potential broader use. For example, setting >> __file__ or exposing .__loader__.get_source(). I support the overall >> initiative. >> >> However, introducing enhanced functionality of the frozen importer will >> at the C level require either: >> >> a) backwards incompatible changes to the C API to support additional >> metadata on frozen modules (or at the very least a supplementary API that >> fragments what a "frozen" module is). >> b) CPython only hacks to support additional functionality for "freezing" >> the standard library for purposes of speeding up startup. >> >> I'm not a CPython core developer, but neither "a" nor "b" seem ideal to >> me. "a" is backwards incompatible. "b" seems like a stop-gap solution until >> a more generic version is available outside the CPython standard library. >> >> From my experience with PyOxidizer and software in general, here is what >> I think is going to happen: >> >> 1. CPython enhances the frozen importer to be usable in more situations. >> 2. Python programmers realize this solution has performance and >> ease-of-distribution wins and want to use it more. >> 3. Limitations in the frozen importer are found. Bugs are reported. >> Feature requests are made. >> 4. The frozen importer keeps getting incrementally extended or Python >> developers grow frustrated that its enhancements are only available to the >> standard library. You end up slowly reimplementing the importing mechanism >> in C (remember Python 2?) or disappoint users. >> >> Rather than extending the frozen importer, I would suggest considering an >> alternative solution that is far more useful to the long-term success of >> Python: I would consider building a fully-featured, generic importer that >> is capable of importing modules and resource data from a well-defined and >> portable serialization format / data structure that isn't defined by C >> structs and APIs. >> >> Instead of defining module bytecode (and possible additional minimal >> metadata) in C structs in a frozen modules array (or an equivalent C API), >> what if we instead defined a serialization format for representing the >> contents of loadable Python data (module source, module bytecode, resource >> files, extension module library data, etc)? We could then point the Python >> interpreter at instances of this data structure (in memory or in files) so >> it could import/load the resources within using a meta path importer. >> >> What if this serialization format were designed so that it was extremely >> efficient to parse and imports could be serviced with the same trivially >> minimal overhead that the frozen importer currently has? We could embed >> these data structures in produced binaries and achieve the same desirable >> results we'll be getting in issue45020 all while delivering a more generic >> solution. >> >> What if this serialization format were portable across machines? The >> entire Python ecosystem could leverage it as a container format for >> distributing Python resources. Rather than splatting dozens or hundreds of >> files on the filesystem, you could write a single file with all of a >> package's resources. Bugs around filesystem implementation details such as >> case (in)sensitivity and Unicode normalization
[Python-Dev] Re: A better way to freeze modules
Quick reaction: This feels like a bait and switch to me. Also, there are many advantages to using a standard format like zip (many formats are really zip with some conventions). Finally, the bytecode format you are using is “marshal”, and is fully portable — as is zip. On Thu, Sep 2, 2021 at 21:44 Gregory Szorc wrote: > Over in https://bugs.python.org/issue45020 there is some exciting work > around expanding the use of the frozen importer to speed up Python > interpreter startup. I wholeheartedly support the effort and don't want to > discourage progress in this area. > > Simultaneously, I've been down this path before with PyOxidizer and feel > like I have some insight to share. > > I don't think I'll be offending anyone by saying the existing CPython > frozen importer is quite primitive in terms of functionality: it does the > minimum it needs to do to support importing module bytecode embedded in the > interpreter binary [for purposes of bootstrapping the Python-based > importlib modules]. The C struct representing frozen modules is literally > just the module name and a pointer to a sized buffer containing bytecode. > > In issue45020 there is talk of enhancing the functionality of the frozen > importer to support its potential broader use. For example, setting > __file__ or exposing .__loader__.get_source(). I support the overall > initiative. > > However, introducing enhanced functionality of the frozen importer will at > the C level require either: > > a) backwards incompatible changes to the C API to support additional > metadata on frozen modules (or at the very least a supplementary API that > fragments what a "frozen" module is). > b) CPython only hacks to support additional functionality for "freezing" > the standard library for purposes of speeding up startup. > > I'm not a CPython core developer, but neither "a" nor "b" seem ideal to > me. "a" is backwards incompatible. "b" seems like a stop-gap solution until > a more generic version is available outside the CPython standard library. > > From my experience with PyOxidizer and software in general, here is what I > think is going to happen: > > 1. CPython enhances the frozen importer to be usable in more situations. > 2. Python programmers realize this solution has performance and > ease-of-distribution wins and want to use it more. > 3. Limitations in the frozen importer are found. Bugs are reported. > Feature requests are made. > 4. The frozen importer keeps getting incrementally extended or Python > developers grow frustrated that its enhancements are only available to the > standard library. You end up slowly reimplementing the importing mechanism > in C (remember Python 2?) or disappoint users. > > Rather than extending the frozen importer, I would suggest considering an > alternative solution that is far more useful to the long-term success of > Python: I would consider building a fully-featured, generic importer that > is capable of importing modules and resource data from a well-defined and > portable serialization format / data structure that isn't defined by C > structs and APIs. > > Instead of defining module bytecode (and possible additional minimal > metadata) in C structs in a frozen modules array (or an equivalent C API), > what if we instead defined a serialization format for representing the > contents of loadable Python data (module source, module bytecode, resource > files, extension module library data, etc)? We could then point the Python > interpreter at instances of this data structure (in memory or in files) so > it could import/load the resources within using a meta path importer. > > What if this serialization format were designed so that it was extremely > efficient to parse and imports could be serviced with the same trivially > minimal overhead that the frozen importer currently has? We could embed > these data structures in produced binaries and achieve the same desirable > results we'll be getting in issue45020 all while delivering a more generic > solution. > > What if this serialization format were portable across machines? The > entire Python ecosystem could leverage it as a container format for > distributing Python resources. Rather than splatting dozens or hundreds of > files on the filesystem, you could write a single file with all of a > package's resources. Bugs around filesystem implementation details such as > case (in)sensitivity and Unicode normalization go away. Package installs > are quicker. Run-time performance is better due to faster imports. > > (OK, maybe that last point brings back bad memories of eggs and you > instinctively reject the idea. Or you have concerns about development > ergonomics when module source code isn't in standalone editable files. > These are fair points!) > > What if the Python interpreter gains an "app mode" where it is capable of > being paired with a single "resources file" and running the application > within? Think running zip applications today, but a bit