On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <pablog...@gmail.com>
wrote:

> Although we were originally not sympathetic with it, we may need to offer
> an opt-out mechanism for those users that care about the impact of the
> overhead of the new data in pyc files
> and in in-memory code objectsas was suggested by some folks (Thomas, Yury,
> and others). For this, we could propose that the functionality will be
> deactivated along with the extra
> information when Python is executed in optimized mode (``python -O``) and
> therefore pyo files will not have the overhead associated with the extra
> required data.
>

Just to be clear, .pyo files have not existed for a while:
https://www.python.org/dev/peps/pep-0488/.


> Notice that Python
> already strips docstrings in this mode so it would be "aligned" with the
> current mechanism of optimized mode.
>

This only kicks in at the -OO level.


>
> Although this complicates the implementation, it certainly is still much
> easier than dealing with compression (and more useful for those that don't
> want the feature). Notice that we also
> expect pessimistic results from compression as offsets would be quite
> random (although predominantly in the range 10 - 120).
>

I personally prefer the idea of dropping the data with -OO since if you're
stripping out docstrings you're already hurting introspection capabilities
in the name of memory. Or one could go as far as to introduce -Os to do -OO
plus dropping this extra data.

As for .pyc file size, I personally wouldn't worry about it. If someone is
that space-constrained they either aren't using .pyc files or are only
shipping a single set of .pyc files under -OO and skipping source code. And
.pyc files are an implementation detail of CPython so there  shouldn't be
too much of a concern for other interpreters.

-Brett


>
> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado <pablog...@gmail.com>
> wrote:
>
>> One last note for clarity: that's the increase of size in the stdlib, the
>> increase of size
>> for pyc files goes from 28.471296MB to 34.750464MB, which is an increase
>> of 22%.
>>
>> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado <pablog...@gmail.com>
>> wrote:
>>
>>> Some update on the numbers. We have made some draft implementation to
>>> corroborate the
>>> numbers with some more realistic tests and seems that our original
>>> calculations were wrong.
>>> The actual increase in size is quite bigger than previously advertised:
>>>
>>> Using bytes object to encode the final object and marshalling that to
>>> disk (so using uint8_t) as the underlying
>>> type:
>>>
>>> BEFORE:
>>>
>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>>> ❯ du -h Lib -c --max-depth=0
>>> 70M     Lib
>>> 70M     total
>>>
>>> AFTER:
>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>>> ❯ du -h Lib -c --max-depth=0
>>> 76M     Lib
>>> 76M     total
>>>
>>> So that's an increase of 8.56 % over the original value. This is storing
>>> the start offset and end offset with no compression
>>> whatsoever.
>>>
>>> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado <pablog...@gmail.com>
>>> wrote:
>>>
>>>> Hi there,
>>>>
>>>> We are preparing a PEP and we would like to start some early discussion
>>>> about one of the main aspects of the PEP.
>>>>
>>>> The work we are preparing is to allow the interpreter to produce more
>>>> fine-grained error messages, pointing to
>>>> the source associated to the instructions that are failing. For example:
>>>>
>>>> Traceback (most recent call last):
>>>>
>>>>   File "test.py", line 14, in <module>
>>>>
>>>>     lel3(x)
>>>>
>>>>     ^^^^^^^
>>>>
>>>>   File "test.py", line 12, in lel3
>>>>
>>>>     return lel2(x) / 23
>>>>
>>>>            ^^^^^^^
>>>>
>>>>   File "test.py", line 9, in lel2
>>>>
>>>>     return 25 + lel(x) + lel(x)
>>>>
>>>>                 ^^^^^^
>>>>
>>>>   File "test.py", line 6, in lel
>>>>
>>>>     return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>>>>
>>>>                          ^^^^^^^^^^^^^^^^^^^^^
>>>>
>>>> TypeError: 'NoneType' object is not subscriptable
>>>>
>>>> The cost of this is having the start column number and end
>>>> column number information for every bytecode instruction
>>>> and this is what we want to discuss (there is also some stack cost to
>>>> re-raise exceptions but that's not a big problem in
>>>> any case). Given that column numbers are not very big compared with
>>>> line numbers, we plan to store these as unsigned chars
>>>> or unsigned shorts. We ran some experiments over the standard library
>>>> and we found that the overhead of all pyc files is:
>>>>
>>>> * If we use shorts, the total overhead is ~3% (total size 28MB and the
>>>> extra size is 0.88 MB).
>>>> * If we use chars. the total overhead is ~1.5% (total size 28 MB and
>>>> the extra size is 0.44MB).
>>>>
>>>> One of the disadvantages of using chars is that we can only report
>>>> columns from 1 to 255 so if an error happens in a column
>>>> bigger than that then we would have to exclude it (and not show the
>>>> highlighting) for that frame. Unsigned short will allow
>>>> the values to go from 0 to 65535.
>>>>
>>>> Unfortunately these numbers are not easily compressible, as every
>>>> instruction would have very different offsets.
>>>>
>>>> There is also the possibility of not doing this based on some build
>>>> flag on when using -O to allow users to opt out, but given the fact
>>>> that these numbers can be quite useful to other tools like coverage
>>>> measuring tools, tracers, profilers and the such adding conditional
>>>> logic to many places would complicate the implementation considerably
>>>> and will potentially reduce the usability of those tools so we prefer
>>>> not to have the conditional logic. We believe this is extra cost is
>>>> very much worth the better error reporting but we understand and respect
>>>> other points of view.
>>>>
>>>> Does anyone see a better way to encode this information **without
>>>> complicating a lot the implementation**? What are people thoughts on the
>>>> feature?
>>>>
>>>> Thanks in advance,
>>>>
>>>> Regards from cloudy London,
>>>> Pablo Galindo Salgado
>>>>
>>>> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/JUXUC7TYPAMB4EKW6HJL77ORDYQRJEFG/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/2NKLQFEVHQ53QSTV4ZKQ3EYPCLTZXMFF/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to