On Fri, May 7, 2021 at 3:01 PM Larry Hastings <la...@hastings.org> wrote:
> On 5/7/21 2:45 PM, Pablo Galindo Salgado wrote: > > Given that column numbers are not very big compared with line numbers, we > plan to store these as unsigned chars > or unsigned shorts. We ran some experiments over the standard library and > we found that the overhead of all pyc files is: > > * If we use shorts, the total overhead is ~3% (total size 28MB and the > extra size is 0.88 MB). > * If we use chars. the total overhead is ~1.5% (total size 28 MB and the > extra size is 0.44MB). > > One of the disadvantages of using chars is that we can only report columns > from 1 to 255 so if an error happens in a column > bigger than that then we would have to exclude it (and not show the > highlighting) for that frame. Unsigned short will allow > the values to go from 0 to 65535. > > Are lnotab entries required to be a fixed size? If not: > > if column < 255: > lnotab.write_one_byte(column) > else: > lnotab.write_one_byte(255) > lnotab.write_two_bytes(column) > > If non-fixed size is acceptable. use utf-8 to encode the column number as a single codepoint number into bytes and you don't even need to write your own encode/decode logic for a varint. -gps
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QNWOZWTNFAVPD77KNG4LRYWCEDY3F6HX/ Code of Conduct: http://python.org/psf/codeofconduct/