On Fri, May 7, 2021 at 3:01 PM Larry Hastings <la...@hastings.org> wrote:

> On 5/7/21 2:45 PM, Pablo Galindo Salgado wrote:
>
> Given that column numbers are not very big compared with line numbers, we
> plan to store these as unsigned chars
> or unsigned shorts. We ran some experiments over the standard library and
> we found that the overhead of all pyc files is:
>
> * If we use shorts, the total overhead is ~3% (total size 28MB and the
> extra size is 0.88 MB).
> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> extra size is 0.44MB).
>
> One of the disadvantages of using chars is that we can only report columns
> from 1 to 255 so if an error happens in a column
> bigger than that then we would have to exclude it (and not show the
> highlighting) for that frame. Unsigned short will allow
> the values to go from 0 to 65535.
>
> Are lnotab entries required to be a fixed size?  If not:
>
> if column < 255:
>     lnotab.write_one_byte(column)
> else:
>     lnotab.write_one_byte(255)
>     lnotab.write_two_bytes(column)
>
> If non-fixed size is acceptable. use utf-8 to encode the column number as
a single codepoint number into bytes and you don't even need to write your
own encode/decode logic for a varint.

-gps
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QNWOZWTNFAVPD77KNG4LRYWCEDY3F6HX/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to