Re: RFC: Python minimization in Fedora

2020-03-28 Thread Miro Hrončok

On 15. 01. 20 23:59, Zbigniew Jędrzejewski-Szmek wrote:

### Solution 5: Stop shipping mandatory bytecode cache

This solution sounds simple: We do no longer ship the bytecode cache
mandatorily. Technically, we move the `.pyc` files to a subpackage
of `python3-libs` (or three different subpackages, that is not
important here). And we only*Recommend*  them from `python3-libs` --
by default, the users get them, but for space critical Fedora
flavors (such as container images) the maintainers can opt-out and
so can the powerusers.

This would **save 18.6 MiB / 50%** -- quite a lot.

However, as said earlier, if the bytecode cache files are not there,
Python attempts to create them upon first import. That can result in
several problems, here we will try to propose how to workaround
them.

Below using a flag file in each __pycache__ directory is suggested.
What about a different route: having a flag file for all descendants
of a directory?

For example, /usr/lib/python3.8/.dont_write_bytecode
would cover all modules under /usr/lib/python3.8/.
If a .pyc file is present, python could still make use of it.

This would be a nicer solution because it wouldn't require modifying
individual packages, but would still avoid the selinux issues and
slowdowns from failed attempts to write the optimized files.
The __pycache__ files wouldn't need to exist at all.


To follow up on this, I got an idea recently.

If we add a reason to this marker file, Python can warn properly, without 
distro-specific patches.


Something like:

echo "Install python3-libs-bytecode-opt-0 or python3-libs-bytecode-opt-1 to get 
the cache."  > /usr/lib64/python3.8/.dont_write_bytecode


python -0 ...
Warning: Bytecode cashe for the selected optimization level was not found and 
the /usr/lib64/python3.8/.dont_write_bytecode file prevents it to be created.

Python startup and imports may be slower.
Install python3-libs-bytecode-opt-0 or python3-libs-bytecode-opt-1 to get the 
cache.

--
Miro Hrončok
--
Phone: +420777974800
IRC: mhroncok
___
python-devel mailing list -- python-devel@lists.fedoraproject.org
To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/python-devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-03-28 Thread Miro Hrončok

On 15. 01. 20 23:59, Zbigniew Jędrzejewski-Szmek wrote:

### Solution 5: Stop shipping mandatory bytecode cache

This solution sounds simple: We do no longer ship the bytecode cache
mandatorily. Technically, we move the `.pyc` files to a subpackage
of `python3-libs` (or three different subpackages, that is not
important here). And we only*Recommend*  them from `python3-libs` --
by default, the users get them, but for space critical Fedora
flavors (such as container images) the maintainers can opt-out and
so can the powerusers.

This would **save 18.6 MiB / 50%** -- quite a lot.

However, as said earlier, if the bytecode cache files are not there,
Python attempts to create them upon first import. That can result in
several problems, here we will try to propose how to workaround
them.

Below using a flag file in each __pycache__ directory is suggested.
What about a different route: having a flag file for all descendants
of a directory?

For example, /usr/lib/python3.8/.dont_write_bytecode
would cover all modules under /usr/lib/python3.8/.
If a .pyc file is present, python could still make use of it.

This would be a nicer solution because it wouldn't require modifying
individual packages, but would still avoid the selinux issues and
slowdowns from failed attempts to write the optimized files.
The __pycache__ files wouldn't need to exist at all.


To follow up on this, I got an idea recently.

If we add a reason to this marker file, Python can warn properly, without 
distro-specific patches.


Something like:

echo "Install python3-libs-bytecode-opt-0 or python3-libs-bytecode-opt-1 to get 
the cache."  > /usr/lib64/python3.8/.dont_write_bytecode


python -0 ...
Warning: Bytecode cashe for the selected optimization level was not found and 
the /usr/lib64/python3.8/.dont_write_bytecode file prevents it to be created.

Python startup and imports may be slower.
Install python3-libs-bytecode-opt-0 or python3-libs-bytecode-opt-1 to get the 
cache.

--
Miro Hrončok
--
Phone: +420777974800
IRC: mhroncok
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-19 Thread Zbigniew Jędrzejewski-Szmek
On Sat, Jan 18, 2020 at 03:35:29PM -0500, James Cassell wrote:
> 
> On Thu, Jan 16, 2020, at 5:16 AM, Zbigniew Jędrzejewski-Szmek wrote:
> > 
> > A quick benchmark:
> > $ time python3 -c 'import importlib as i, pydoc_data.topics as t; 
> > [i.reload(t) for _ in range(1)]'
> > python3 -c   4.16s user 0.45s system 99% cpu 4.646 total
> [...]
> > sudo rm /usr/lib64/python3.7/pydoc_data/__pycache__/topics.cpython-37.*
> > 
> > $ time python3 -c 'import importlib as i, pydoc_data.topics as t; 
> > [i.reload(t) for _ in range(1000)]'
> > python3 -c   13.73s user 0.46s system 96% cpu 14.728 total
> [...]
> > But the effect of having *some* .pyc file is not. For this file (which
> > is 600+kb), the difference is 147.28/4.646 ≈ 30 times. So we clearly
> > need to keep the possibility of installing .pyc files, at least optionally.
> > 
> 
> Thanks for doing these benchmarks! I think you misplaced a decimal in the 
> analysis, though; it's closer to 3 times performance difference, not 30 
> times.  (Unless I missed something.)

The number of loops is different (10k vs 1k), so the ratio I posted is
correct.

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-18 Thread James Cassell

On Thu, Jan 16, 2020, at 5:16 AM, Zbigniew Jędrzejewski-Szmek wrote:
> 
> A quick benchmark:
> $ time python3 -c 'import importlib as i, pydoc_data.topics as t; 
> [i.reload(t) for _ in range(1)]'
> python3 -c   4.16s user 0.45s system 99% cpu 4.646 total
[...]
> sudo rm /usr/lib64/python3.7/pydoc_data/__pycache__/topics.cpython-37.*
> 
> $ time python3 -c 'import importlib as i, pydoc_data.topics as t; 
> [i.reload(t) for _ in range(1000)]'
> python3 -c   13.73s user 0.46s system 96% cpu 14.728 total
[...]
> But the effect of having *some* .pyc file is not. For this file (which
> is 600+kb), the difference is 147.28/4.646 ≈ 30 times. So we clearly
> need to keep the possibility of installing .pyc files, at least optionally.
> 

Thanks for doing these benchmarks! I think you misplaced a decimal in the 
analysis, though; it's closer to 3 times performance difference, not 30 times.  
(Unless I missed something.)

V/r,
James Cassell
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-18 Thread Barry Scott


> On 15 Jan 2020, at 17:05, Miro Hrončok  wrote:
> 
> In Python Maint, we sat down and we came up with several ideas how to 
> minimize the filesystem footprint of Python. Unfortunately, the result is 
> horribly long, sorry about that.

Did you calculate file sizes including rounding up by the
"filesystem block size" (statvfs f_bsize)?

What was the f_bsize of the file system you collected stats on?

The work to stop needing libpython is going to drop the number of files by 1
for the min install.

Can you link some of the .so's from stdlib into the main python image?

If all stdlib .so are linked into the main python and you have .zip for the
.py/.pyc files you can get python down to a handful of files.

Python app making software often exploits a trick that you can concatenate
the .zip on the end of the python image. I'm guessing that would break too many
of the constraints.

I'm not sure how you would do it but what if you created a SquashFS image
for python to lose the f_bsize overhead and use its compression?

Today python has 2 optimised file types. But the python devs have been talking
about ways to implement more optimisations and cache those results as well.
I'll failed to track down the discussion on python dev. I recall wanting to
reduce the file size by storing line number data for traceback outside of
the .pyc.

Barry

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-16 Thread Miro Hrončok

On 16. 01. 20 21:55, David Malcolm wrote:

If a traceback for an exception includes files from the .zip, can the
traceback-printing machinery still print the pertinent lines of source?


Apparently no:

$ echo 0/0 > t.py
$ zip t.zip t.py
  adding: t.py (stored 0%)
$ python -c 'import t'
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/churchyard/tmp/test/t.py", line 1, in 
0/0
ZeroDivisionError: division by zero
$ rm t.py
$ python -c 'import sys; sys.path.insert(0, "t.zip"); import t'
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 983, in _find_and_load
  File "", line 967, in _find_and_load_unlocked
  File "", line 668, in _load_unlocked
  File "", line 638, in _load_backward_compatible
  File "t.zip/t.py", line 1, in 
ZeroDivisionError: division by zero

That's bad UX. But maybe something that can be fixed in Python?

--
Miro Hrončok
--
Phone: +420777974800
IRC: mhroncok
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-16 Thread Nicolas Mailhot via devel
Le jeudi 16 janvier 2020 à 22:00 +0100, Felix Schwarz a écrit :
> 
> If I understood Nicolas correctly this was about installing multiple
> versions
> of the same *library* in the global Python site-packages directory?

Whatever you wish to call it :) The non stdlib parts projects do not
seem to agree on, forcing venv use

Regards,

-- 
Nicolas Mailhot
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-16 Thread Felix Schwarz

Am 16.01.20 um 21:15 schrieb Zbigniew Jędrzejewski-Szmek:
>> Accommodating component versioning would mean deploying
>>
>> /usr/lib/pythonxx/site-packages/something-semver.zip
> 
> This path includes xx, which contains the major and minor numbers. So
> adding "semver" would only allow accommodating different patch levels.
> Would that be useful? Different patch levels are supposed to be about
> bug fix only changes, so there's usually very little reason to carry
> anything except the latest one for any specific major.minor combination.

If I understood Nicolas correctly this was about installing multiple versions
of the same *library* in the global Python site-packages directory?

Felix
___
python-devel mailing list -- python-devel@lists.fedoraproject.org
To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/python-devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-16 Thread Felix Schwarz

Am 16.01.20 um 21:15 schrieb Zbigniew Jędrzejewski-Szmek:
>> Accommodating component versioning would mean deploying
>>
>> /usr/lib/pythonxx/site-packages/something-semver.zip
> 
> This path includes xx, which contains the major and minor numbers. So
> adding "semver" would only allow accommodating different patch levels.
> Would that be useful? Different patch levels are supposed to be about
> bug fix only changes, so there's usually very little reason to carry
> anything except the latest one for any specific major.minor combination.

If I understood Nicolas correctly this was about installing multiple versions
of the same *library* in the global Python site-packages directory?

Felix
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-16 Thread David Malcolm
On Thu, 2020-01-16 at 10:27 +0100, Miro Hrončok wrote:
> On 15. 01. 20 23:11, Victor Stinner wrote:
> > > Solution 4: ZIP the entire standard library
> > > (...)
> > > Nevertheless, this might (in theory) **save 17.8 MiB / 47 %**.
> > 
> > It's my favorite option. Almost 50% smaller is quite good! It would
> > be
> > very efficient to have such disk space gain!
> > 
> > Using a ZIP file for the stdlib is commonly suggested solution when
> > the slow Python startup time is discussed. Python does tons of
> > system
> > calls to load stdlib modules at startup: many stat() and open()
> > calls.
> > Having a single large ZIP file allows to do more work in pure
> > userland.
> > 
> > This solution is well supported by unmodified Python: it's part of
> > the
> > default sys.path search path:
> > 
> > $ python3
> > Python 3.7.6 (default, Dec 19 2019, 22:52:49)
> > > > > import sys; sys.path
> > ['', '/usr/lib64/python37.zip', ...]
> > 
> > It's the second item of sys.path ;-)
> 
> It is, yet modules in the standard library still do read files next
> to __file__ 
> and will blow up when zipped. That makes me believe we can put some
> modules into 
> /usr/lib64/python38.zip, but not the entire unmodified stdlib at this
> moment. We 
> can certainly work towards this goal if we get somebody to drive it.
> 
> > I'm ok to discourage users to override *system files* by modifying
> > them as root. It's too easy to mess up your system this way.
> 
> Discouraging users is hard. We discourage users to use sudo pip and
> yet **you** 
> still do it Victor :D
> 
> > It is easy to extract the ZIP file in your home directory, hack
> > some
> > files and use PYTHONPATH environment variable to force loading your
> > modified stdlib.
> > 
> > * faster startup
> > * less disk space
> > * harder to mess up your system
> > 
> > Where are drawbacks by the way? ;-)
> 
> Behind the corner.

If a traceback for an exception includes files from the .zip, can the
traceback-printing machinery still print the pertinent lines of source?


Dave
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-16 Thread Zbigniew Jędrzejewski-Szmek
On Thu, Jan 16, 2020 at 03:36:11PM +0100, Nicolas Mailhot via devel wrote:
> Le 2020-01-16 15:10, Felix Schwarz a écrit :
> >Am 16.01.20 um 13:37 schrieb Nicolas Mailhot via devel:
> >>If we start messing with the Python tree it would be nice to put
> >>each shared
> >>python component in a separate zip/xz/whatever, and allow
> >>versioning those
> >>archives
> >>
> >>(ie use the highest semver zip present unless the code
> >>explicitely requests
> >>another version, and this version is available on the filesystem)
> >>
> >>That would heal the breach between venv users and Fedora/rpm.
> >>We’re alienating
> >>a lot of users, because un-versioned python components, do not
> >>permit the
> >>version divergence, some third party software requires
> >
> >Could you give a specific example? Even though my $DAYJOB is
> >mostly about
> >working with Python I don't have a clue which "un-versioned python
> >components"
> >you are referring to.
> 
> Right now we (in Fedora) deploy things like
> 
> /usr/lib/pythonxx/site-packages/something
> 
> That means only one something may exist on-disk at a given time.
> Python users workaround this with venvs and blame rpm and Fedora for
> making a single something possible.
> 
> Accommodating component versioning would mean deploying
> 
> /usr/lib/pythonxx/site-packages/something-semver.zip

This path includes xx, which contains the major and minor numbers. So
adding "semver" would only allow accommodating different patch levels.
Would that be useful? Different patch levels are supposed to be about
bug fix only changes, so there's usually very little reason to carry
anything except the latest one for any specific major.minor combination.

Zbyszek
___
python-devel mailing list -- python-devel@lists.fedoraproject.org
To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/python-devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-16 Thread Zbigniew Jędrzejewski-Szmek
On Thu, Jan 16, 2020 at 03:36:11PM +0100, Nicolas Mailhot via devel wrote:
> Le 2020-01-16 15:10, Felix Schwarz a écrit :
> >Am 16.01.20 um 13:37 schrieb Nicolas Mailhot via devel:
> >>If we start messing with the Python tree it would be nice to put
> >>each shared
> >>python component in a separate zip/xz/whatever, and allow
> >>versioning those
> >>archives
> >>
> >>(ie use the highest semver zip present unless the code
> >>explicitely requests
> >>another version, and this version is available on the filesystem)
> >>
> >>That would heal the breach between venv users and Fedora/rpm.
> >>We’re alienating
> >>a lot of users, because un-versioned python components, do not
> >>permit the
> >>version divergence, some third party software requires
> >
> >Could you give a specific example? Even though my $DAYJOB is
> >mostly about
> >working with Python I don't have a clue which "un-versioned python
> >components"
> >you are referring to.
> 
> Right now we (in Fedora) deploy things like
> 
> /usr/lib/pythonxx/site-packages/something
> 
> That means only one something may exist on-disk at a given time.
> Python users workaround this with venvs and blame rpm and Fedora for
> making a single something possible.
> 
> Accommodating component versioning would mean deploying
> 
> /usr/lib/pythonxx/site-packages/something-semver.zip

This path includes xx, which contains the major and minor numbers. So
adding "semver" would only allow accommodating different patch levels.
Would that be useful? Different patch levels are supposed to be about
bug fix only changes, so there's usually very little reason to carry
anything except the latest one for any specific major.minor combination.

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-16 Thread Neal Gompa
On Thu, Jan 16, 2020 at 2:13 PM Christian Glombek  wrote:
>
> On a side note (and without reading all of the above in detail), I'd like to 
> note that Fedora CoreOS (aka FCOS) is completely Python free by now - 
> probably not achievable for Desktop, but it may be for IoT.
>

Unfortunately neither FCOS nor Silverblue are terribly useful options
right now. Most people aren't doing work in containers because using
containers that way is too hard or too brittle.

It's also important to note that the major trade-off those systems
make is not one everyone is willing to accept: a massive expansion of
storage usage.


-- 
真実はいつも一つ!/ Always, there's only one truth!
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-16 Thread Christian Glombek
On a side note (and without reading all of the above in detail), I'd like
to note that Fedora CoreOS (aka FCOS) is completely Python free by now -
probably not achievable for Desktop, but it may be for IoT.


Miroslav Suchý  schrieb am Do., 16. Jan. 2020, 16:35:

> Dne 15. 01. 20 v 18:05 Miro Hrončok napsal(a):
> > ### Solution 2: Move developer oriented modules to python3-devel (or
> split the stdlib into pieces)
>
> +1
>
> > ### Solution 5: Stop shipping mandatory bytecode cache
>
> +1
>
> >  Problem 5.1: Slower starts without bytecode cache
>
> Especially in container's world, the applications are Flask or Django
> applications and the slower startup is not IMHO
> important. We are speaking about fractions of seconds. Run-time speed will
> not be affected.
>
> Desktop application does not need to minimize storage requirements and get
> the recommended pyc files and will not be
> affected at all.
>
> --
> Miroslav Suchy, RHCA
> Red Hat, Associate Manager ABRT/Copr, #brno, #fedora-buildsys
> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
>
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-16 Thread Miroslav Suchý
Dne 15. 01. 20 v 18:05 Miro Hrončok napsal(a):
> ### Solution 2: Move developer oriented modules to python3-devel (or split 
> the stdlib into pieces)

+1

> ### Solution 5: Stop shipping mandatory bytecode cache

+1

>  Problem 5.1: Slower starts without bytecode cache

Especially in container's world, the applications are Flask or Django 
applications and the slower startup is not IMHO
important. We are speaking about fractions of seconds. Run-time speed will not 
be affected.

Desktop application does not need to minimize storage requirements and get the 
recommended pyc files and will not be
affected at all.

-- 
Miroslav Suchy, RHCA
Red Hat, Associate Manager ABRT/Copr, #brno, #fedora-buildsys
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-16 Thread Nicolas Mailhot via devel

Le 2020-01-16 15:10, Felix Schwarz a écrit :

Am 16.01.20 um 13:37 schrieb Nicolas Mailhot via devel:
If we start messing with the Python tree it would be nice to put each 
shared
python component in a separate zip/xz/whatever, and allow versioning 
those

archives

(ie use the highest semver zip present unless the code explicitely 
requests

another version, and this version is available on the filesystem)

That would heal the breach between venv users and Fedora/rpm. We’re 
alienating
a lot of users, because un-versioned python components, do not permit 
the

version divergence, some third party software requires


Could you give a specific example? Even though my $DAYJOB is mostly 
about
working with Python I don't have a clue which "un-versioned python 
components"

you are referring to.


Right now we (in Fedora) deploy things like

/usr/lib/pythonxx/site-packages/something

That means only one something may exist on-disk at a given time. Python 
users workaround this with venvs and blame rpm and Fedora for making a 
single something possible.


Accommodating component versioning would mean deploying

/usr/lib/pythonxx/site-packages/something-semver.zip

Regards,

--
Nicolas Mailhot
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-16 Thread Felix Schwarz

Am 16.01.20 um 13:37 schrieb Nicolas Mailhot via devel:
> If we start messing with the Python tree it would be nice to put each shared
> python component in a separate zip/xz/whatever, and allow versioning those
> archives
> 
> (ie use the highest semver zip present unless the code explicitely requests
> another version, and this version is available on the filesystem)
> 
> That would heal the breach between venv users and Fedora/rpm. We’re alienating
> a lot of users, because un-versioned python components, do not permit the
> version divergence, some third party software requires

Could you give a specific example? Even though my $DAYJOB is mostly about
working with Python I don't have a clue which "un-versioned python components"
you are referring to.

Felix
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-16 Thread Nicolas Mailhot via devel

Le 2020-01-16 11:18, Felix Schwarz a écrit :

Am 15.01.20 um 23:11 schrieb Victor Stinner:

This solution is well supported by unmodified Python: it's part of the
default sys.path search path:

$ python3
Python 3.7.6 (default, Dec 19 2019, 22:52:49)

import sys; sys.path

['', '/usr/lib64/python37.zip', ...]

It's the second item of sys.path ;-)


Also CPython provides an "embedded" variant in the downloads (IIRC for
Windows, everything in a zip file without installer) which provides its
standard library in a zip file by default.

The standard library in these zip files is only a subset of the regular 
stdlib
so that might be a good starting point to see which modules could be 
zipped

without modification.


If we start messing with the Python tree it would be nice to put each 
shared python component in a separate zip/xz/whatever, and allow 
versioning those archives


(ie use the highest semver zip present unless the code explicitely 
requests another version, and this version is available on the 
filesystem)


That would heal the breach between venv users and Fedora/rpm. We’re 
alienating a lot of users, because un-versioned python components, do 
not permit the version divergence, some third party software requires


Regards,

--
Nicolas Mailhot
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-16 Thread Miro Hrončok

On 16. 01. 20 11:16, Zbigniew Jędrzejewski-Szmek wrote:

So we clearly
need to keep the possibility of installing .pyc files, at least optionally.


To clarify: We would keep the .pyc files by default. We would only provide an 
opt-out.



No, it will not (TTBMK). The file (or the lack of it) will be cached
in the dentry cache, so the kernel will give an answer extremely
quickly. And the python process can easily store the directories
is checked in a lru_cache or something like that, to avoid the round
trip to the kernel.


Good. I'll update the document to mention the possibility.

--
Miro Hrončok
--
Phone: +420777974800
IRC: mhroncok
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-16 Thread Felix Schwarz

Am 15.01.20 um 23:11 schrieb Victor Stinner:
> This solution is well supported by unmodified Python: it's part of the
> default sys.path search path:
> 
> $ python3
> Python 3.7.6 (default, Dec 19 2019, 22:52:49)
 import sys; sys.path
> ['', '/usr/lib64/python37.zip', ...]
> 
> It's the second item of sys.path ;-)

Also CPython provides an "embedded" variant in the downloads (IIRC for
Windows, everything in a zip file without installer) which provides its
standard library in a zip file by default.

The standard library in these zip files is only a subset of the regular stdlib
so that might be a good starting point to see which modules could be zipped
without modification.

Felix
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-16 Thread Zbigniew Jędrzejewski-Szmek
On Thu, Jan 16, 2020 at 10:33:32AM +0100, Miro Hrončok wrote:
> On 15. 01. 20 23:59, Zbigniew Jędrzejewski-Szmek wrote:
> >On Wed, Jan 15, 2020 at 06:05:42PM +0100, Miro Hrončok wrote:
> >>### File types (and bytecode caches)
> >>
> >>The orthogonal dimension is the file type. Python standard library
> >>contains directories with both "extension modules" (written in C
> >>(usually) and compiled to `*.cpython-38-x86_64-linux-gnu.so` shared
> >>object file) and "pure Python" modules (written in Python and saved
> >>as `*.py` source file).
> >>
> >>Each pure Python module comes in 4 files:
> >>
> >>- `module.py` -- the source
> >>- `__pycache__/module.cpython-38.pyc` -- regular (not optimized) bytecode 
> >>cache
> >>- `__pycache__/module.cpython-38.opt-1.pyc` -- optimized bytecode cache 
> >>(level 1)
> >>- `__pycache__/module.cpython-38.opt-2.pyc` -- optimized bytecode cache 
> >>(level 2)
> >
> >I suspect that the difference in speed between loading various .pyc
> >files is negligible. Do you have actual benchmarks for this?
> 
> Loading time is theoretically faster for smaller files. Generality,
> the opt-2 files in the stdlib are a bit smaller, but the opt-1 are
> not. Technically, I agree that the loading time difference is
> negligible.

A quick benchmark:
$ time python3 -c 'import importlib as i, pydoc_data.topics as t; [i.reload(t) 
for _ in range(1)]'
python3 -c   4.16s user 0.45s system 99% cpu 4.646 total
$ time python3 -O -c 'import importlib as i, pydoc_data.topics as t; 
[i.reload(t) for _ in range(1)]'
python3 -O -c   4.01s user 0.45s system 99% cpu 4.492 total
$ time python3 -OO -c 'import importlib as i, pydoc_data.topics as t; 
[i.reload(t) for _ in range(1)]'
python3 -OO -c   3.97s user 0.42s system 98% cpu 4.467 total

sudo rm /usr/lib64/python3.7/pydoc_data/__pycache__/topics.cpython-37.*

$ time python3 -c 'import importlib as i, pydoc_data.topics as t; [i.reload(t) 
for _ in range(1000)]'
python3 -c   13.73s user 0.46s system 96% cpu 14.728 total
$ time python3 -O -c 'import importlib as i, pydoc_data.topics as t; 
[i.reload(t) for _ in range(1000)]'
python3 -O -c   13.01s user 0.33s system 98% cpu 13.480 total
$ time python3 -OO -c 'import importlib as i, pydoc_data.topics as t; 
[i.reload(t) for _ in range(1000)]'
python3 -OO -c   11.95s user 0.15s system 99% cpu 12.156 total

So... the benefit from -O and -OO is negligible for most scenarios.
(Note that here the page cache is hot, so what is being measured is the
time Pythons spends doing CPU crunching. Normally, the latency to load
the file from disk would be in play, and this latency would be the same
for files of similar size. The difference of a few percent between opt
levels would become negligible. And note that this is particularly big
file, so the time required for parsing would be even less important for
small files which are much more common.)

But the effect of having *some* .pyc file is not. For this file (which
is 600+kb), the difference is 147.28/4.646 ≈ 30 times. So we clearly
need to keep the possibility of installing .pyc files, at least optionally.

> But no, we didn't do any benchmarking (yet anyway) at the scale of
> the current document, that would take a lot of time and energy. The
> plan is to only do them for solutions we actually decide to go for
> (but only if we anticipate a change -- for example not with the
> hardlink-based deduplication, but yet with the zipped stdlib).
> 
> >>### Solution 5: Stop shipping mandatory bytecode cache
> >>
> >>This solution sounds simple: We do no longer ship the bytecode cache
> >>mandatorily. Technically, we move the `.pyc` files to a subpackage
> >>of `python3-libs` (or three different subpackages, that is not
> >>important here). And we only *Recommend* them from `python3-libs` --
> >>by default, the users get them, but for space critical Fedora
> >>flavors (such as container images) the maintainers can opt-out and
> >>so can the powerusers.
> >>
> >>This would **save 18.6 MiB / 50%** -- quite a lot.
> >>
> >>However, as said earlier, if the bytecode cache files are not there,
> >>Python attempts to create them upon first import. That can result in
> >>several problems, here we will try to propose how to workaround
> >>them.
> >
> >Below using a flag file in each __pycache__ directory is suggested.
> >What about a different route: having a flag file for all descendants
> >of a directory?
> 
> The idea was to avoid traversing up, as that can potentially slow
> down Python invocation from a deep PATH. But yes, that is possible
> as well.

No, it will not (TTBMK). The file (or the lack of it) will be cached
in the dentry cache, so the kernel will give an answer extremely
quickly. And the python process can easily store the directories
is checked in a lru_cache or something like that, to avoid the round
trip to the kernel.

> >For example, /usr/lib/python3.8/.dont_write_bytecode
> >would cover all modules under /usr/lib/python3.8/.
> >If a .pyc file is present, 

Re: RFC: Python minimization in Fedora

2020-01-16 Thread Miro Hrončok

On 15. 01. 20 23:59, Zbigniew Jędrzejewski-Szmek wrote:

On Wed, Jan 15, 2020 at 06:05:42PM +0100, Miro Hrončok wrote:

### File types (and bytecode caches)

The orthogonal dimension is the file type. Python standard library
contains directories with both "extension modules" (written in C
(usually) and compiled to `*.cpython-38-x86_64-linux-gnu.so` shared
object file) and "pure Python" modules (written in Python and saved
as `*.py` source file).

Each pure Python module comes in 4 files:

- `module.py` -- the source
- `__pycache__/module.cpython-38.pyc` -- regular (not optimized) bytecode cache
- `__pycache__/module.cpython-38.opt-1.pyc` -- optimized bytecode cache (level 
1)
- `__pycache__/module.cpython-38.opt-2.pyc` -- optimized bytecode cache (level 
2)


I suspect that the difference in speed between loading various .pyc
files is negligible. Do you have actual benchmarks for this?


Loading time is theoretically faster for smaller files. Generality, the opt-2 
files in the stdlib are a bit smaller, but the opt-1 are not. Technically, I 
agree that the loading time difference is negligible.


But no, we didn't do any benchmarking (yet anyway) at the scale of the current 
document, that would take a lot of time and energy. The plan is to only do them 
for solutions we actually decide to go for (but only if we anticipate a change 
-- for example not with the hardlink-based deduplication, but yet with the 
zipped stdlib).



### Solution 5: Stop shipping mandatory bytecode cache

This solution sounds simple: We do no longer ship the bytecode cache
mandatorily. Technically, we move the `.pyc` files to a subpackage
of `python3-libs` (or three different subpackages, that is not
important here). And we only *Recommend* them from `python3-libs` --
by default, the users get them, but for space critical Fedora
flavors (such as container images) the maintainers can opt-out and
so can the powerusers.

This would **save 18.6 MiB / 50%** -- quite a lot.

However, as said earlier, if the bytecode cache files are not there,
Python attempts to create them upon first import. That can result in
several problems, here we will try to propose how to workaround
them.


Below using a flag file in each __pycache__ directory is suggested.
What about a different route: having a flag file for all descendants
of a directory?


The idea was to avoid traversing up, as that can potentially slow down Python 
invocation from a deep PATH. But yes, that is possible as well.



For example, /usr/lib/python3.8/.dont_write_bytecode
would cover all modules under /usr/lib/python3.8/.
If a .pyc file is present, python could still make use of it.

This would be a nicer solution because it wouldn't require modifying
individual packages, but would still avoid the selinux issues and
slowdowns from failed attempts to write the optimized files.
The __pycache__ files wouldn't need to exist at all.


Correct.

--
Miro Hrončok
--
Phone: +420777974800
IRC: mhroncok
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-16 Thread Miro Hrončok

On 15. 01. 20 23:11, Victor Stinner wrote:

Solution 4: ZIP the entire standard library
(...)
Nevertheless, this might (in theory) **save 17.8 MiB / 47 %**.


It's my favorite option. Almost 50% smaller is quite good! It would be
very efficient to have such disk space gain!

Using a ZIP file for the stdlib is commonly suggested solution when
the slow Python startup time is discussed. Python does tons of system
calls to load stdlib modules at startup: many stat() and open() calls.
Having a single large ZIP file allows to do more work in pure
userland.

This solution is well supported by unmodified Python: it's part of the
default sys.path search path:

$ python3
Python 3.7.6 (default, Dec 19 2019, 22:52:49)

import sys; sys.path

['', '/usr/lib64/python37.zip', ...]

It's the second item of sys.path ;-)


It is, yet modules in the standard library still do read files next to __file__ 
and will blow up when zipped. That makes me believe we can put some modules into 
/usr/lib64/python38.zip, but not the entire unmodified stdlib at this moment. We 
can certainly work towards this goal if we get somebody to drive it.



I'm ok to discourage users to override *system files* by modifying
them as root. It's too easy to mess up your system this way.


Discouraging users is hard. We discourage users to use sudo pip and yet **you** 
still do it Victor :D



It is easy to extract the ZIP file in your home directory, hack some
files and use PYTHONPATH environment variable to force loading your
modified stdlib.

* faster startup
* less disk space
* harder to mess up your system

Where are drawbacks by the way? ;-)


Behind the corner.

--
Miro Hrončok
--
Phone: +420777974800
IRC: mhroncok
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-16 Thread Miro Hrončok

On 15. 01. 20 18:56, Chris wrote:

That's an amazing amount of work!


Thanks.


My only criticism would be:
- the quest for reducing disk space is getting a bit over the top.  I mean to 
make comparisons to 3.5" floppy disks which haven't been around for 20 years? 


That is obviously only used to lighten the text up and make it easier to read. 
We don't actually use floppy disks count to justify the need.


Why is ~100MB so much? If you scale up from floppy disks and even reference a 
8GB USB stick (which you can barely find any more), you'll fit just fine.  Most 
Raspberry Pi's (out of the box solutions) even ship with Python, so the size has 
never been their concern either (where otherwise space would be).


Mostly for contianer images. Fedora is huge there.

--
Miro Hrončok
--
Phone: +420777974800
IRC: mhroncok
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-15 Thread Zbigniew Jędrzejewski-Szmek
On Wed, Jan 15, 2020 at 06:05:42PM +0100, Miro Hrončok wrote:
> ### File types (and bytecode caches)
> 
> The orthogonal dimension is the file type. Python standard library
> contains directories with both "extension modules" (written in C
> (usually) and compiled to `*.cpython-38-x86_64-linux-gnu.so` shared
> object file) and "pure Python" modules (written in Python and saved
> as `*.py` source file).
> 
> Each pure Python module comes in 4 files:
> 
> - `module.py` -- the source
> - `__pycache__/module.cpython-38.pyc` -- regular (not optimized) bytecode 
> cache
> - `__pycache__/module.cpython-38.opt-1.pyc` -- optimized bytecode cache 
> (level 1)
> - `__pycache__/module.cpython-38.opt-2.pyc` -- optimized bytecode cache 
> (level 2)

I suspect that the difference in speed between loading various .pyc
files is negligible. Do you have actual benchmarks for this?

> ### Solution 5: Stop shipping mandatory bytecode cache
> 
> This solution sounds simple: We do no longer ship the bytecode cache
> mandatorily. Technically, we move the `.pyc` files to a subpackage
> of `python3-libs` (or three different subpackages, that is not
> important here). And we only *Recommend* them from `python3-libs` --
> by default, the users get them, but for space critical Fedora
> flavors (such as container images) the maintainers can opt-out and
> so can the powerusers.
> 
> This would **save 18.6 MiB / 50%** -- quite a lot.
> 
> However, as said earlier, if the bytecode cache files are not there,
> Python attempts to create them upon first import. That can result in
> several problems, here we will try to propose how to workaround
> them.

Below using a flag file in each __pycache__ directory is suggested.
What about a different route: having a flag file for all descendants
of a directory?

For example, /usr/lib/python3.8/.dont_write_bytecode
would cover all modules under /usr/lib/python3.8/.
If a .pyc file is present, python could still make use of it.

This would be a nicer solution because it wouldn't require modifying
individual packages, but would still avoid the selinux issues and
slowdowns from failed attempts to write the optimized files.
The __pycache__ files wouldn't need to exist at all.

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-15 Thread Zbigniew Jędrzejewski-Szmek
On Wed, Jan 15, 2020 at 06:05:42PM +0100, Miro Hrončok wrote:
> ### File types (and bytecode caches)
> 
> The orthogonal dimension is the file type. Python standard library
> contains directories with both "extension modules" (written in C
> (usually) and compiled to `*.cpython-38-x86_64-linux-gnu.so` shared
> object file) and "pure Python" modules (written in Python and saved
> as `*.py` source file).
> 
> Each pure Python module comes in 4 files:
> 
> - `module.py` -- the source
> - `__pycache__/module.cpython-38.pyc` -- regular (not optimized) bytecode 
> cache
> - `__pycache__/module.cpython-38.opt-1.pyc` -- optimized bytecode cache 
> (level 1)
> - `__pycache__/module.cpython-38.opt-2.pyc` -- optimized bytecode cache 
> (level 2)

I suspect that the difference in speed between loading various .pyc
files is negligible. Do you have actual benchmarks for this?

> ### Solution 5: Stop shipping mandatory bytecode cache
> 
> This solution sounds simple: We do no longer ship the bytecode cache
> mandatorily. Technically, we move the `.pyc` files to a subpackage
> of `python3-libs` (or three different subpackages, that is not
> important here). And we only *Recommend* them from `python3-libs` --
> by default, the users get them, but for space critical Fedora
> flavors (such as container images) the maintainers can opt-out and
> so can the powerusers.
> 
> This would **save 18.6 MiB / 50%** -- quite a lot.
> 
> However, as said earlier, if the bytecode cache files are not there,
> Python attempts to create them upon first import. That can result in
> several problems, here we will try to propose how to workaround
> them.

Below using a flag file in each __pycache__ directory is suggested.
What about a different route: having a flag file for all descendants
of a directory?

For example, /usr/lib/python3.8/.dont_write_bytecode
would cover all modules under /usr/lib/python3.8/.
If a .pyc file is present, python could still make use of it.

This would be a nicer solution because it wouldn't require modifying
individual packages, but would still avoid the selinux issues and
slowdowns from failed attempts to write the optimized files.
The __pycache__ files wouldn't need to exist at all.

Zbyszek
___
python-devel mailing list -- python-devel@lists.fedoraproject.org
To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/python-devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-15 Thread Victor Stinner
> Solution 4: ZIP the entire standard library
> (...)
> Nevertheless, this might (in theory) **save 17.8 MiB / 47 %**.

It's my favorite option. Almost 50% smaller is quite good! It would be
very efficient to have such disk space gain!

Using a ZIP file for the stdlib is commonly suggested solution when
the slow Python startup time is discussed. Python does tons of system
calls to load stdlib modules at startup: many stat() and open() calls.
Having a single large ZIP file allows to do more work in pure
userland.

This solution is well supported by unmodified Python: it's part of the
default sys.path search path:

$ python3
Python 3.7.6 (default, Dec 19 2019, 22:52:49)
>>> import sys; sys.path
['', '/usr/lib64/python37.zip', ...]

It's the second item of sys.path ;-)

I'm ok to discourage users to override *system files* by modifying
them as root. It's too easy to mess up your system this way.

It is easy to extract the ZIP file in your home directory, hack some
files and use PYTHONPATH environment variable to force loading your
modified stdlib.

* faster startup
* less disk space
* harder to mess up your system

Where are drawbacks by the way? ;-)

Victor
___
python-devel mailing list -- python-devel@lists.fedoraproject.org
To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/python-devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-15 Thread Victor Stinner
> Solution 4: ZIP the entire standard library
> (...)
> Nevertheless, this might (in theory) **save 17.8 MiB / 47 %**.

It's my favorite option. Almost 50% smaller is quite good! It would be
very efficient to have such disk space gain!

Using a ZIP file for the stdlib is commonly suggested solution when
the slow Python startup time is discussed. Python does tons of system
calls to load stdlib modules at startup: many stat() and open() calls.
Having a single large ZIP file allows to do more work in pure
userland.

This solution is well supported by unmodified Python: it's part of the
default sys.path search path:

$ python3
Python 3.7.6 (default, Dec 19 2019, 22:52:49)
>>> import sys; sys.path
['', '/usr/lib64/python37.zip', ...]

It's the second item of sys.path ;-)

I'm ok to discourage users to override *system files* by modifying
them as root. It's too easy to mess up your system this way.

It is easy to extract the ZIP file in your home directory, hack some
files and use PYTHONPATH environment variable to force loading your
modified stdlib.

* faster startup
* less disk space
* harder to mess up your system

Where are drawbacks by the way? ;-)

Victor
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-15 Thread Łukasz Posadowski
Wed, 15 Jan 2020 18:05:42 +0100
Miro Hrončok :

> Hello Fedora!
> 
> In Python Maint, we sat down and we came up with several ideas how to
> minimize the filesystem footprint of Python. Unfortunately, the
> result is horribly long, sorry about that.

It was delightfull to read. I have some better understanding of
what Python setup.  Maybe even Python core may adopt some ideas in
the future. It that not what Fedora does? :) 

I used Micropython in Mirobit boards and that Python *is* tiny.
Standard Fedora Python is not really that big, but more code on disk
(even unused) is always a potential security problem. I sometimes build
my own Python without extras like tkinter, curses, or xml. It's super
easy and I always get the version I want.

I was suprised to see a proposal to remove pyc files. I know they're
big and, unless something is constantly using particular module, mostly
useless. Python is creating that files during "make install" and every
other module does that, during install. Python is faster with them.

Compressing data in modules is also nice. While zip is not the best,
it's what we have in Python. I'm suprised that this is largest part of
Python installation.

> Optimization level 2 is already broken.

That is a good point. Almost no one uses pure Python. 

> ### Solution 10: Stop shipping mandatory Python, rewrite dnf to Rust

No I just started to work really well. I didn't know about
libdnf, sounds interesting.

-- 
Łukasz Posadowski
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-15 Thread Przemek Klosowski via devel

On 1/15/20 12:56 PM, Chris wrote:

That's an amazing amount of work! My only criticism would be:
- the quest for reducing disk space is getting a bit over the top.  I 
mean to make comparisons to 3.5" floppy disks which haven't been 
around for 20 years? Why is ~100MB so much? If you scale up from 
floppy disks and even reference a 8GB USB stick (which you can barely 
find any more), you'll fit just fine.  Most Raspberry Pi's (out of the 
box solutions) even ship with Python, so the size has never been their 
concern either (where otherwise space would be).


I am bias, because I absolutely adore Python and it's added bloat to 
basically be the swiss army knife that can solve any problem isn't 
worth the few MB you're trying to cut out of it.


Me too---but it's useful to have Python in super-small environments. For 
comparison, people squeezed Python onto ARM Arduino Nano-class 
platforms, using Cortex M0 chips on a 1"x2" board costing 5$. The total 
memory is on the order of 256kB; of course it doesn't run Linux, but you 
do get a Python REPL over a serial/USB link.


https://makezine.com/2017/08/11/circuitpython-snakes-way-adafruit-hardware/
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: RFC: Python minimization in Fedora

2020-01-15 Thread Chris
That's an amazing amount of work! My only criticism would be:
- the quest for reducing disk space is getting a bit over the top.  I mean
to make comparisons to 3.5" floppy disks which haven't been around for 20
years? Why is ~100MB so much? If you scale up from floppy disks and even
reference a 8GB USB stick (which you can barely find any more), you'll fit
just fine.  Most Raspberry Pi's (out of the box solutions) even ship with
Python, so the size has never been their concern either (where otherwise
space would be).

I am bias, because I absolutely adore Python and it's added bloat to
basically be the swiss army knife that can solve any problem isn't worth
the few MB you're trying to cut out of it.

That all said: as a dev, I've got no problems with the solution that just
involves removing dev-related packages from the main core build of Python
unless you pull in python-devel. Solution 5 seems also seems good (stop
shipping .pyc files)... Just pick one (.pyc, or .pyo) file to ship with the
distribution; I'm not sure if both are really required. Just my two cents;
I don't comment to much here, i enjoy seeing you all debate though! :)

Chris

On Wed, Jan 15, 2020 at 12:15 PM Miro Hrončok  wrote:

> Hello Fedora!
>
> In Python Maint, we sat down and we came up with several ideas how to
> minimize
> the filesystem footprint of Python. Unfortunately, the result is horribly
> long,
> sorry about that.
>
> Please, share your feedback, additional solutions, comments etc.
>
> Version with formatting and pictures is available at:
>
> https://github.com/hroncok/python-minimization/blob/master/document.md
>
>
> Enclosing here for better in-line responses:
>
>
>
> # Python minimization in Fedora
>
>  > While Fedora is well suited for traditional physical/virtual
> workstations and
> servers, it is often overlooked for use cases beyond traditional installs.
>  >
>  > Some modern types of deployments — such as IoT and containers — are
> quite
> sensitive to size. For IoT that's usually slow data connections (for
> updates/management) and for cloud and containers it’s the massive scale.
>
> -- the preamble of the [Fedora Minimization
> Objective](https://docs.fedoraproject.org/en-US/minimization/)
>
> One of the biggest things in Fedora is Python. Because [Fedora loves
> Python](https://fedoralovespython.org/) and because the package manager
> for
> Fedora packages -- dnf -- happens to be written in Python, the Python
> interpreter and its standard library comes pre-installed on many (if not
> all)
> Fedora systems and is often not possible to remove it without destroying
> the
> system completely or making it unmanageable.
>
> Python comes with [Batteries
> Included](https://en.wikipedia.org/wiki/Batteries_Included) -- the
> standard
> library is quite big. While pleasant for the programmers, this comes with
> a
> large filesystem footprint not entirely desired in Fedora. In this
> document, we
> will analyze the footprint and offer several minimization solutions/ideas
> with
> their challenges, pros (MiB saved) and cons. It is a list of ideas;
> **we're not
> promising to do any of this**.
>
>
> **Goal:**
>
>   1. Significantly lower the filesystem footprint of the mandatory Python
> installation in Fedora.
>
> **Non-goals:**
>
>   1. We don't aim to lower the filesystem footprint of all Python
> installations
> in Fedora -- the default may remain big, if there is an opt-out mechanism.
>   2. We don't aim to lower the filesystem footprint of all Fedora Python
> RPM
> packages, just the `python3` package and its subpackages -- the
> interpreter and
> the standard library.
>
> However, if any non-goal becomes a side effect of the solution of our
> goal, good.
>
> **Constraints:**
>
>   1. Do not break Python users' expectations. As an example, we don't
> strip
> Python standard library to the bare minimum and still call it Python.
>   2. Do not break Fedora users' expectations. As an example, we don't
> break the
> ability to hot patch Python files on a live system by default.
>   3. Do not break Fedora packagers' expectations. As an example, we don't
> [require "system tools" to use a custom Python
> entrypoint](https://fedoraproject.org/wiki/Changes/System_Python), such
> as
> `/usr/libexec/platform-python` or `/usr/libexec/system-python`.
>   4. Do not significantly increase the filesystem footprint of the default
> Python installation. As an example, we don't package [two separate
> versions (and
> stacks) of Python](
> https://fedoraproject.org/wiki/Changes/Platform_Python_Stack)
> -- one minimal for dnf (or Ansible) and another "normal" for the users.
>   5. Do not diverge from upstream significantly (but we can drive upstream
> change). As an example, we don't reinvent the import machinery of Python
> downstream only, but we might do it in upstream and even [use Fedora to
> pioneer
> the change](https://fedoraproject.org/wiki/Changes/python3_c.utf-8_locale
> ).
>
> The listed constraints are not absolute. We will 

RFC: Python minimization in Fedora

2020-01-15 Thread Miro Hrončok

Hello Fedora!

In Python Maint, we sat down and we came up with several ideas how to minimize 
the filesystem footprint of Python. Unfortunately, the result is horribly long, 
sorry about that.


Please, share your feedback, additional solutions, comments etc.

Version with formatting and pictures is available at:

https://github.com/hroncok/python-minimization/blob/master/document.md


Enclosing here for better in-line responses:



# Python minimization in Fedora

> While Fedora is well suited for traditional physical/virtual workstations and 
servers, it is often overlooked for use cases beyond traditional installs.

>
> Some modern types of deployments — such as IoT and containers — are quite 
sensitive to size. For IoT that's usually slow data connections (for 
updates/management) and for cloud and containers it’s the massive scale.


-- the preamble of the [Fedora Minimization 
Objective](https://docs.fedoraproject.org/en-US/minimization/)


One of the biggest things in Fedora is Python. Because [Fedora loves 
Python](https://fedoralovespython.org/) and because the package manager for 
Fedora packages -- dnf -- happens to be written in Python, the Python 
interpreter and its standard library comes pre-installed on many (if not all) 
Fedora systems and is often not possible to remove it without destroying the 
system completely or making it unmanageable.


Python comes with [Batteries 
Included](https://en.wikipedia.org/wiki/Batteries_Included) -- the standard 
library is quite big. While pleasant for the programmers, this comes with a 
large filesystem footprint not entirely desired in Fedora. In this document, we 
will analyze the footprint and offer several minimization solutions/ideas with 
their challenges, pros (MiB saved) and cons. It is a list of ideas; **we're not 
promising to do any of this**.



**Goal:**

 1. Significantly lower the filesystem footprint of the mandatory Python 
installation in Fedora.


**Non-goals:**

 1. We don't aim to lower the filesystem footprint of all Python installations 
in Fedora -- the default may remain big, if there is an opt-out mechanism.
 2. We don't aim to lower the filesystem footprint of all Fedora Python RPM 
packages, just the `python3` package and its subpackages -- the interpreter and 
the standard library.


However, if any non-goal becomes a side effect of the solution of our goal, 
good.

**Constraints:**

 1. Do not break Python users' expectations. As an example, we don't strip 
Python standard library to the bare minimum and still call it Python.
 2. Do not break Fedora users' expectations. As an example, we don't break the 
ability to hot patch Python files on a live system by default.
 3. Do not break Fedora packagers' expectations. As an example, we don't 
[require "system tools" to use a custom Python 
entrypoint](https://fedoraproject.org/wiki/Changes/System_Python), such as 
`/usr/libexec/platform-python` or `/usr/libexec/system-python`.
 4. Do not significantly increase the filesystem footprint of the default 
Python installation. As an example, we don't package [two separate versions (and 
stacks) of Python](https://fedoraproject.org/wiki/Changes/Platform_Python_Stack) 
-- one minimal for dnf (or Ansible) and another "normal" for the users.
 5. Do not diverge from upstream significantly (but we can drive upstream 
change). As an example, we don't reinvent the import machinery of Python 
downstream only, but we might do it in upstream and even [use Fedora to pioneer 
the change](https://fedoraproject.org/wiki/Changes/python3_c.utf-8_locale).


The listed constraints are not absolute. We will mention in each solution, 
whether we feel that some constraints are violated, but that doesn't mean we 
shall outright discard the solution.



## How large is Python, actually

tl;dr Python 3.8.1 in Fedora has 111 MiB (approximately 77 3.5" floppy disks), 
but we only **install 37.5 MiB by default** (26 floppy disks).


![77 3.5" floppy 
disks](https://github.com/hroncok/python-minimization/raw/master/77-floppy-disks-gray.jpg)

*77 3.5" floppy disks, courtesy of Dana Walker. Imagine one of them is faulty.*

(All numbers are real installed disk sizes based on the `python38` package 
installed on Fedora 31, x86_64. The split into subpackages is based on the 
`python3` package from Fedora 32. Slight differences between Fedora 31 and 32 or 
between various architectures are irrelevant here, we aim for a long term 
minimization. See the [source of the numbers][source].)


In Fedora we split the Python interpreter into various RPM subpackages, some of 
them are optional. This is what you get all the time:


 - `python3` contains `/usr/bin/python3` and friends; has 21 KiB.
 - `python3-libs` contains `/usr/lib64/libpython3.8.so.1.0` and the majority of 
the standard library, is required by `python3`; has 37.5 MiB.


And this is what you get optionally:

 - `python3-devel` contains the "development files" and makes it possible to 
compile extension 

RFC: Python minimization in Fedora

2020-01-15 Thread Miro Hrončok

Hello Fedora!

In Python Maint, we sat down and we came up with several ideas how to minimize 
the filesystem footprint of Python. Unfortunately, the result is horribly long, 
sorry about that.


Please, share your feedback, additional solutions, comments etc.

Version with formatting and pictures is available at:

https://github.com/hroncok/python-minimization/blob/master/document.md


Enclosing here for better in-line responses:



# Python minimization in Fedora

> While Fedora is well suited for traditional physical/virtual workstations and 
servers, it is often overlooked for use cases beyond traditional installs.

>
> Some modern types of deployments — such as IoT and containers — are quite 
sensitive to size. For IoT that's usually slow data connections (for 
updates/management) and for cloud and containers it’s the massive scale.


-- the preamble of the [Fedora Minimization 
Objective](https://docs.fedoraproject.org/en-US/minimization/)


One of the biggest things in Fedora is Python. Because [Fedora loves 
Python](https://fedoralovespython.org/) and because the package manager for 
Fedora packages -- dnf -- happens to be written in Python, the Python 
interpreter and its standard library comes pre-installed on many (if not all) 
Fedora systems and is often not possible to remove it without destroying the 
system completely or making it unmanageable.


Python comes with [Batteries 
Included](https://en.wikipedia.org/wiki/Batteries_Included) -- the standard 
library is quite big. While pleasant for the programmers, this comes with a 
large filesystem footprint not entirely desired in Fedora. In this document, we 
will analyze the footprint and offer several minimization solutions/ideas with 
their challenges, pros (MiB saved) and cons. It is a list of ideas; **we're not 
promising to do any of this**.



**Goal:**

 1. Significantly lower the filesystem footprint of the mandatory Python 
installation in Fedora.


**Non-goals:**

 1. We don't aim to lower the filesystem footprint of all Python installations 
in Fedora -- the default may remain big, if there is an opt-out mechanism.
 2. We don't aim to lower the filesystem footprint of all Fedora Python RPM 
packages, just the `python3` package and its subpackages -- the interpreter and 
the standard library.


However, if any non-goal becomes a side effect of the solution of our goal, 
good.

**Constraints:**

 1. Do not break Python users' expectations. As an example, we don't strip 
Python standard library to the bare minimum and still call it Python.
 2. Do not break Fedora users' expectations. As an example, we don't break the 
ability to hot patch Python files on a live system by default.
 3. Do not break Fedora packagers' expectations. As an example, we don't 
[require "system tools" to use a custom Python 
entrypoint](https://fedoraproject.org/wiki/Changes/System_Python), such as 
`/usr/libexec/platform-python` or `/usr/libexec/system-python`.
 4. Do not significantly increase the filesystem footprint of the default 
Python installation. As an example, we don't package [two separate versions (and 
stacks) of Python](https://fedoraproject.org/wiki/Changes/Platform_Python_Stack) 
-- one minimal for dnf (or Ansible) and another "normal" for the users.
 5. Do not diverge from upstream significantly (but we can drive upstream 
change). As an example, we don't reinvent the import machinery of Python 
downstream only, but we might do it in upstream and even [use Fedora to pioneer 
the change](https://fedoraproject.org/wiki/Changes/python3_c.utf-8_locale).


The listed constraints are not absolute. We will mention in each solution, 
whether we feel that some constraints are violated, but that doesn't mean we 
shall outright discard the solution.



## How large is Python, actually

tl;dr Python 3.8.1 in Fedora has 111 MiB (approximately 77 3.5" floppy disks), 
but we only **install 37.5 MiB by default** (26 floppy disks).


![77 3.5" floppy 
disks](https://github.com/hroncok/python-minimization/raw/master/77-floppy-disks-gray.jpg)

*77 3.5" floppy disks, courtesy of Dana Walker. Imagine one of them is faulty.*

(All numbers are real installed disk sizes based on the `python38` package 
installed on Fedora 31, x86_64. The split into subpackages is based on the 
`python3` package from Fedora 32. Slight differences between Fedora 31 and 32 or 
between various architectures are irrelevant here, we aim for a long term 
minimization. See the [source of the numbers][source].)


In Fedora we split the Python interpreter into various RPM subpackages, some of 
them are optional. This is what you get all the time:


 - `python3` contains `/usr/bin/python3` and friends; has 21 KiB.
 - `python3-libs` contains `/usr/lib64/libpython3.8.so.1.0` and the majority of 
the standard library, is required by `python3`; has 37.5 MiB.


And this is what you get optionally:

 - `python3-devel` contains the "development files" and makes it possible to 
compile extension