This post details the design that John Snow and I are planning for QEMU 8.1. The purpose is to detect possible inconsistencies in the build environment, that could happen on enterprise distros once Python 3.6 support is dropped.
Signed-off-by: Paolo Bonzini <pbonz...@redhat.com> --- _posts/2023-03-22-python.md | 213 ++++++++++++++++++++++++++++++++++++ 1 file changed, 213 insertions(+) create mode 100644 _posts/2023-03-22-python.md diff --git a/_posts/2023-03-22-python.md b/_posts/2023-03-22-python.md new file mode 100644 index 0000000..85e1d30 --- /dev/null +++ b/_posts/2023-03-22-python.md @@ -0,0 +1,213 @@ +--- +layout: post +title: "Preparing a consistent Python environment" +date: 2023-03-22 13:30:00 +0000 +categories: [build, python, developers] +--- +Building QEMU is a complex task, divided between several programs. +configure finds the host and cross compilers that are needed to build +emulators and firmware; Meson prepares the build environment for the +emulators; finally, Make and ninja actually run the build steps and +possibly the testing steps as well. + +In addition to compiling C code, some build steps are coded as external +tools which are mostly written in the Python language. These include +processing the emulator configuration, code generators for tracepoints +and QAPI, extensions for the Sphinx documentation tool, and the Avocado +testing framework. Another important tool written in Python is the +Meson build system itself. + +Some of these tools are run through the `python3` executable, while others +are invoked directly as `sphinx-build` or `meson`, and this can create +inconsistencies. For example, QEMU's `configure` script checks for a +minimum version of Python and rejects too-old interpreters. What would +happen if code run via Sphinx or Meson used a different version? + +This situation has been largely hypothetical until recently; QEMU's +Python code is already tested with a wide range of versions of the +interpreter, and it would not be a huge issue if Sphinx used a different +version of Python as long as both of them were supported. This will +change in version 8.1 of QEMU, which will bump the minimum supported +version of Python from 3.6 to 3.8. While all the distros that QEMU +supports have a recent-enough interpreter, the default on RHEL8 and +SLES15 is still version 3.6, and that is what all binaries in `/usr/bin` +use unconditionally. + +As a result, even if `configure` is told to use `/usr/bin/python3.8` +for the build, QEMU's custom Sphinx extensions would still run under +Python 3.6. This post will explain how to avoid this inconsistency +and ensure that a single interpreter is used for the whole of the +build process. Getting there will require some familiarity with +Python packaging, so let's start with virtual environments + +## Virtual environments + +It is surprisingly hard to find what Python interpreter a given script +will use. You can try to parse the first line of the script, which will +be something like `#! /usr/bin/python3`, but that is no guarantee of +success. For example, on some version of Homebrew `/usr/bin/meson` +will be a wrapper script like: + +```bash +#!/bin/bash +PYTHONPATH="/usr/local/Cellar/meson/0.55.0/lib/python3.8/site-packages" exec "/usr/local/Cellar/meson/0.55.0/libexec/bin/meson" "$@" +``` + +and the actual Python shebang line will be in `/usr/local/Cellar`. +Therefore, performing some kind of consistency check on the scripts +is ruled out. QEMU needs to set up a consistent environment on its own. + +For users who are building QEMU, the simplest way to do so would be +to use Python virtual environments. A virtual environment takes an +existing Python installation but gives it a local set of Python packages. +It also has its own `bin` directory; place it at the beginning of your +`PATH` and you will be able to control the Python interpreter for scripts +that begin with `#! /usr/bin/env python3`. + +Furthermore, when packages are installed into the virtual environment +with `pip`, they always refer to the Python interpreter that was used to +create the environment. Virtual environments mostly solve the consistency +problem at the cost of an extra `pip install` step to put QEMU's build +dependencies into the environment. + +Unfortunately, this extra step has a substantial downside. Even though +the virtual environment can optionally refer to the base installation's +installed packages, `pip` will always install packages from scratch +into the virtual environment. For all Linux distributions except RHEL8 +and SLES15 this is unnecessary, and users would be happy to build QEMU +using the versions of Meson and Sphinx included in the distribution. + +Even worse, `pip install` will access the Python package index (PyPI) +over the Internet, which is often impossible on build machines that +are sealed from the outside world. Automated installation of PyPI +dependencies may actually be a welcome feature, but it must also remain +a strictly optional feature. + +In other words, the ideal solution would use a non-isolated virtual +environment to be able to use system packages for the sake of Linux +distributions; but it would also ensure that scripts (`sphinx-build`, +`meson`, `avocado`) are placed into `bin` just like `pip install` does. + +## Distribution packages + +When it comes to packages, Python surely makes an effort to be confusing. +The fundamental unit for *importing* code into a Python program is called +a package; for example `os` and `sys` are two examples of a package. +However, a program or library that is distributed on PyPI consists +of many such "import packages": that's because while `pip` is usually +said to be a "package installer" for Python, more precisely it installs +"distribution packages". + +To add to the confusion, the term "distribution package" is often +shortened to *either* "package" or "distribution". And finally, +the metadata of the distribution package remains available even after +installation, so you may refer to "distributions" even for things that +are already installed and are not being distributed anywhere. + +All this matters because distribution metadata will be the key to +building the perfect virtual environment. If you look at the content +of `bin/meson` in a virtual environment, after installing the package +with `pip`, you will see this: + +```python +# -*- coding: utf-8 -*- +import re +import sys +from mesonbuild.mesonmain import main +if __name__ == '__main__': + sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0]) + sys.exit(main()) +``` + +This looks a lot like automatically generated code, and in fact it +is. The only part that varies is the `from mesonbuild.mesonmain +import main` line. `pip` knew what to put in there because +Meson's `setup.cfg` contains the following stanza: + +``` +[options.entry_points] +console_scripts = + meson = mesonbuild.mesonmain:main +``` + +Similar declarations exists in Sphinx, Avocado and so on, and accessing their +content is easy via `importlib.metadata` (available in Python 3.8+): + +``` +$ python3 +>>> from importlib.metadata import distribution +>>> distribution('meson').entry_points +[EntryPoint(name='meson', value='mesonbuild.mesonmain:main', group='console_scripts')] +``` + +`importlib` looks up the metadata in the running Python interpreter's +search path; if Meson is installed under another interpreter's `site-packages` +directory, it will not be found: + +``` +$ python3.8 +>>> from importlib.metadata import distribution +>>> distribution('meson').entry_points +Traceback (most recent call last): +... +importlib.metadata.PackageNotFoundError: meson +``` + +So finally we have a plan! `configure` can build a non-isolated virtual +environment, use `importlib` to check that the required packages exist +in the base installation, and create scripts in `bin` that point to the +right Python interpreter. Then, it can optionally use `pip install` to +install the missing packages. + +Python provides a customizable [`venv` +module](https://docs.python.org/3/library/venv.html) to create virtual +environments. While this process includes a certain amount of specialized +logic, it is easy to embed it in a subclass of `venv.EnvBuilder`. + +This will provide the same experience as QEMU 8.0, except that there will +be no need for the `--meson` and `--sphinx-build`` options to the +`configure` script. The path to the Python interpreter is enough to +set up all Python programs used during the build. + +There is only one thing left to fix... + +## Nesting virtual environments + +Remember how we started from a user that creates her own virtual +environment before building QEMU? _This_ usecase would not work +anymore, because virtual environments cannot be nested. As soon +as `configure` creates its own virtual environment, the packages +installed by the user are not available anymore. + +Fortunately, the "appearance" of a nested virtual environment is easy +to emulate. Detecting whether `python3` runs in a virtual environment +is as easy as checking `sys.prefix != sys.base_prefix`; if it is, +we need to retrieve the parent virtual environments `site-packages` +directory: + +``` +>>> import sysconfig +>>> sysconfig.get_path('purelib') +'/home/pbonzini/my-venv/lib/python3.11/site-packages' +``` + +and write it to a `.pth` file in the `lib` directory of the new virtual +environment. The following demo shows how a distribution package in the +parent virtual environment is available in the child as well: + +<script async id="asciicast-31xjLsR4KjsU9HuhOUpU08tvb" src="https://asciinema.org/a/31xjLsR4KjsU9HuhOUpU08tvb.js"></script> + +A small detail is that `configure`'s new virtual environment should +copy the isolation setting of the parent. An isolated venv can be +detected because `sys.base_prefix in site.PREFIXES` is false. + +## Conclusion + +Right now, QEMU only provides a minimal attempt at ensuring consistency +of the Python environment; Meson is always run using the interpreter +that was passed to the configure script with `--python` or `$PYTHON`, +but that's it. Once the above technique will be implemented in QEMU 8.1, +there will be no difference in the build experience but a wider set +of invalid build environments will be detected. We will merge these +checks before dropping support for Python 3.6, so that users on older +enterprise distributions will have a smooth transition. -- 2.39.2