One last note for clarity: that's the increase of size in the stdlib, the
increase of size
for pyc files goes from 28.471296MB to 34.750464MB, which is an increase of
22%.

On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado <pablog...@gmail.com>
wrote:

> Some update on the numbers. We have made some draft implementation to
> corroborate the
> numbers with some more realistic tests and seems that our original
> calculations were wrong.
> The actual increase in size is quite bigger than previously advertised:
>
> Using bytes object to encode the final object and marshalling that to disk
> (so using uint8_t) as the underlying
> type:
>
> BEFORE:
>
> ❯ ./python -m compileall -r 1000 Lib > /dev/null
> ❯ du -h Lib -c --max-depth=0
> 70M     Lib
> 70M     total
>
> AFTER:
> ❯ ./python -m compileall -r 1000 Lib > /dev/null
> ❯ du -h Lib -c --max-depth=0
> 76M     Lib
> 76M     total
>
> So that's an increase of 8.56 % over the original value. This is storing
> the start offset and end offset with no compression
> whatsoever.
>
> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado <pablog...@gmail.com>
> wrote:
>
>> Hi there,
>>
>> We are preparing a PEP and we would like to start some early discussion
>> about one of the main aspects of the PEP.
>>
>> The work we are preparing is to allow the interpreter to produce more
>> fine-grained error messages, pointing to
>> the source associated to the instructions that are failing. For example:
>>
>> Traceback (most recent call last):
>>
>>   File "test.py", line 14, in <module>
>>
>>     lel3(x)
>>
>>     ^^^^^^^
>>
>>   File "test.py", line 12, in lel3
>>
>>     return lel2(x) / 23
>>
>>            ^^^^^^^
>>
>>   File "test.py", line 9, in lel2
>>
>>     return 25 + lel(x) + lel(x)
>>
>>                 ^^^^^^
>>
>>   File "test.py", line 6, in lel
>>
>>     return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>>
>>                          ^^^^^^^^^^^^^^^^^^^^^
>>
>> TypeError: 'NoneType' object is not subscriptable
>>
>> The cost of this is having the start column number and end column number
>> information for every bytecode instruction
>> and this is what we want to discuss (there is also some stack cost to
>> re-raise exceptions but that's not a big problem in
>> any case). Given that column numbers are not very big compared with line
>> numbers, we plan to store these as unsigned chars
>> or unsigned shorts. We ran some experiments over the standard library and
>> we found that the overhead of all pyc files is:
>>
>> * If we use shorts, the total overhead is ~3% (total size 28MB and the
>> extra size is 0.88 MB).
>> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
>> extra size is 0.44MB).
>>
>> One of the disadvantages of using chars is that we can only report
>> columns from 1 to 255 so if an error happens in a column
>> bigger than that then we would have to exclude it (and not show the
>> highlighting) for that frame. Unsigned short will allow
>> the values to go from 0 to 65535.
>>
>> Unfortunately these numbers are not easily compressible, as every
>> instruction would have very different offsets.
>>
>> There is also the possibility of not doing this based on some build flag
>> on when using -O to allow users to opt out, but given the fact
>> that these numbers can be quite useful to other tools like coverage
>> measuring tools, tracers, profilers and the such adding conditional
>> logic to many places would complicate the implementation considerably and
>> will potentially reduce the usability of those tools so we prefer
>> not to have the conditional logic. We believe this is extra cost is very
>> much worth the better error reporting but we understand and respect
>> other points of view.
>>
>> Does anyone see a better way to encode this information **without
>> complicating a lot the implementation**? What are people thoughts on the
>> feature?
>>
>> Thanks in advance,
>>
>> Regards from cloudy London,
>> Pablo Galindo Salgado
>>
>>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/RICGTXCABZPK7RLDB7SISR4E64S6FEKR/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to