Re: RFC: Python minimization in Fedora

2020-01-15 Thread Zbigniew Jędrzejewski-Szmek
On Wed, Jan 15, 2020 at 06:05:42PM +0100, Miro Hrončok wrote:
> ### File types (and bytecode caches)
> 
> The orthogonal dimension is the file type. Python standard library
> contains directories with both "extension modules" (written in C
> (usually) and compiled to `*.cpython-38-x86_64-linux-gnu.so` shared
> object file) and "pure Python" modules (written in Python and saved
> as `*.py` source file).
> 
> Each pure Python module comes in 4 files:
> 
> - `module.py` -- the source
> - `__pycache__/module.cpython-38.pyc` -- regular (not optimized) bytecode 
> cache
> - `__pycache__/module.cpython-38.opt-1.pyc` -- optimized bytecode cache 
> (level 1)
> - `__pycache__/module.cpython-38.opt-2.pyc` -- optimized bytecode cache 
> (level 2)

I suspect that the difference in speed between loading various .pyc
files is negligible. Do you have actual benchmarks for this?

> ### Solution 5: Stop shipping mandatory bytecode cache
> 
> This solution sounds simple: We do no longer ship the bytecode cache
> mandatorily. Technically, we move the `.pyc` files to a subpackage
> of `python3-libs` (or three different subpackages, that is not
> important here). And we only *Recommend* them from `python3-libs` --
> by default, the users get them, but for space critical Fedora
> flavors (such as container images) the maintainers can opt-out and
> so can the powerusers.
> 
> This would **save 18.6 MiB / 50%** -- quite a lot.
> 
> However, as said earlier, if the bytecode cache files are not there,
> Python attempts to create them upon first import. That can result in
> several problems, here we will try to propose how to workaround
> them.

Below using a flag file in each __pycache__ directory is suggested.
What about a different route: having a flag file for all descendants
of a directory?

For example, /usr/lib/python3.8/.dont_write_bytecode
would cover all modules under /usr/lib/python3.8/.
If a .pyc file is present, python could still make use of it.

This would be a nicer solution because it wouldn't require modifying
individual packages, but would still avoid the selinux issues and
slowdowns from failed attempts to write the optimized files.
The __pycache__ files wouldn't need to exist at all.

Zbyszek
___
python-devel mailing list -- python-devel@lists.fedoraproject.org
To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/python-devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-15 Thread Victor Stinner
> Solution 4: ZIP the entire standard library
> (...)
> Nevertheless, this might (in theory) **save 17.8 MiB / 47 %**.

It's my favorite option. Almost 50% smaller is quite good! It would be
very efficient to have such disk space gain!

Using a ZIP file for the stdlib is commonly suggested solution when
the slow Python startup time is discussed. Python does tons of system
calls to load stdlib modules at startup: many stat() and open() calls.
Having a single large ZIP file allows to do more work in pure
userland.

This solution is well supported by unmodified Python: it's part of the
default sys.path search path:

$ python3
Python 3.7.6 (default, Dec 19 2019, 22:52:49)
>>> import sys; sys.path
['', '/usr/lib64/python37.zip', ...]

It's the second item of sys.path ;-)

I'm ok to discourage users to override *system files* by modifying
them as root. It's too easy to mess up your system this way.

It is easy to extract the ZIP file in your home directory, hack some
files and use PYTHONPATH environment variable to force loading your
modified stdlib.

* faster startup
* less disk space
* harder to mess up your system

Where are drawbacks by the way? ;-)

Victor
___
python-devel mailing list -- python-devel@lists.fedoraproject.org
To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/python-devel@lists.fedoraproject.org


RFC: Python minimization in Fedora

2020-01-15 Thread Miro Hrončok

Hello Fedora!

In Python Maint, we sat down and we came up with several ideas how to minimize 
the filesystem footprint of Python. Unfortunately, the result is horribly long, 
sorry about that.


Please, share your feedback, additional solutions, comments etc.

Version with formatting and pictures is available at:

https://github.com/hroncok/python-minimization/blob/master/document.md


Enclosing here for better in-line responses:



# Python minimization in Fedora

> While Fedora is well suited for traditional physical/virtual workstations and 
servers, it is often overlooked for use cases beyond traditional installs.

>
> Some modern types of deployments — such as IoT and containers — are quite 
sensitive to size. For IoT that's usually slow data connections (for 
updates/management) and for cloud and containers it’s the massive scale.


-- the preamble of the [Fedora Minimization 
Objective](https://docs.fedoraproject.org/en-US/minimization/)


One of the biggest things in Fedora is Python. Because [Fedora loves 
Python](https://fedoralovespython.org/) and because the package manager for 
Fedora packages -- dnf -- happens to be written in Python, the Python 
interpreter and its standard library comes pre-installed on many (if not all) 
Fedora systems and is often not possible to remove it without destroying the 
system completely or making it unmanageable.


Python comes with [Batteries 
Included](https://en.wikipedia.org/wiki/Batteries_Included) -- the standard 
library is quite big. While pleasant for the programmers, this comes with a 
large filesystem footprint not entirely desired in Fedora. In this document, we 
will analyze the footprint and offer several minimization solutions/ideas with 
their challenges, pros (MiB saved) and cons. It is a list of ideas; **we're not 
promising to do any of this**.



**Goal:**

 1. Significantly lower the filesystem footprint of the mandatory Python 
installation in Fedora.


**Non-goals:**

 1. We don't aim to lower the filesystem footprint of all Python installations 
in Fedora -- the default may remain big, if there is an opt-out mechanism.
 2. We don't aim to lower the filesystem footprint of all Fedora Python RPM 
packages, just the `python3` package and its subpackages -- the interpreter and 
the standard library.


However, if any non-goal becomes a side effect of the solution of our goal, 
good.

**Constraints:**

 1. Do not break Python users' expectations. As an example, we don't strip 
Python standard library to the bare minimum and still call it Python.
 2. Do not break Fedora users' expectations. As an example, we don't break the 
ability to hot patch Python files on a live system by default.
 3. Do not break Fedora packagers' expectations. As an example, we don't 
[require "system tools" to use a custom Python 
entrypoint](https://fedoraproject.org/wiki/Changes/System_Python), such as 
`/usr/libexec/platform-python` or `/usr/libexec/system-python`.
 4. Do not significantly increase the filesystem footprint of the default 
Python installation. As an example, we don't package [two separate versions (and 
stacks) of Python](https://fedoraproject.org/wiki/Changes/Platform_Python_Stack) 
-- one minimal for dnf (or Ansible) and another "normal" for the users.
 5. Do not diverge from upstream significantly (but we can drive upstream 
change). As an example, we don't reinvent the import machinery of Python 
downstream only, but we might do it in upstream and even [use Fedora to pioneer 
the change](https://fedoraproject.org/wiki/Changes/python3_c.utf-8_locale).


The listed constraints are not absolute. We will mention in each solution, 
whether we feel that some constraints are violated, but that doesn't mean we 
shall outright discard the solution.



## How large is Python, actually

tl;dr Python 3.8.1 in Fedora has 111 MiB (approximately 77 3.5" floppy disks), 
but we only **install 37.5 MiB by default** (26 floppy disks).


![77 3.5" floppy 
disks](https://github.com/hroncok/python-minimization/raw/master/77-floppy-disks-gray.jpg)

*77 3.5" floppy disks, courtesy of Dana Walker. Imagine one of them is faulty.*

(All numbers are real installed disk sizes based on the `python38` package 
installed on Fedora 31, x86_64. The split into subpackages is based on the 
`python3` package from Fedora 32. Slight differences between Fedora 31 and 32 or 
between various architectures are irrelevant here, we aim for a long term 
minimization. See the [source of the numbers][source].)


In Fedora we split the Python interpreter into various RPM subpackages, some of 
them are optional. This is what you get all the time:


 - `python3` contains `/usr/bin/python3` and friends; has 21 KiB.
 - `python3-libs` contains `/usr/lib64/libpython3.8.so.1.0` and the majority of 
the standard library, is required by `python3`; has 37.5 MiB.


And this is what you get optionally:

 - `python3-devel` contains the "development files" and makes it possible to 
compile extension