You can certainly get fancy and apply delta encoding + entropy compression, such as done in Parquet, a high-performance data storage format: https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-encoding-delta_binary_packed--5
(the linked paper from Lemire and Boytsov gives a lot of ideas) But it would be weird to apply such level of engineering when we never bothered compressing docstrings. Regards Antoine. On Fri, 7 May 2021 23:30:46 +0100 Pablo Galindo Salgado <[email protected]> wrote: > This is actually a very good point. The only disadvantage is that it > complicates the parsing a bit and we loose the possibility of indexing > the table by instruction offset. > > On Fri, 7 May 2021 at 23:01, Larry Hastings <[email protected]> wrote: > > > On 5/7/21 2:45 PM, Pablo Galindo Salgado wrote: > > > > Given that column numbers are not very big compared with line numbers, we > > plan to store these as unsigned chars > > or unsigned shorts. We ran some experiments over the standard library and > > we found that the overhead of all pyc files is: > > > > * If we use shorts, the total overhead is ~3% (total size 28MB and the > > extra size is 0.88 MB). > > * If we use chars. the total overhead is ~1.5% (total size 28 MB and the > > extra size is 0.44MB). > > > > One of the disadvantages of using chars is that we can only report columns > > from 1 to 255 so if an error happens in a column > > bigger than that then we would have to exclude it (and not show the > > highlighting) for that frame. Unsigned short will allow > > the values to go from 0 to 65535. > > > > Are lnotab entries required to be a fixed size? If not: > > > > if column < 255: > > lnotab.write_one_byte(column) > > else: > > lnotab.write_one_byte(255) > > lnotab.write_two_bytes(column) > > > > > > I might even write four bytes instead of two in the latter case, > > > > > > */arry* > > _______________________________________________ > > Python-Dev mailing list -- [email protected] > > To unsubscribe send an email to [email protected] > > https://mail.python.org/mailman3/lists/python-dev.python.org/ > > Message archived at > > https://mail.python.org/archives/list/[email protected]/message/B3SFCZPXIKGO3LM6UJVSJXFIRAZH2R26/ > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ Python-Dev mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/[email protected]/message/UOCHN5ZY3ERPNWOCO2SJRTCDTEYMYVD7/ Code of Conduct: http://python.org/psf/codeofconduct/
