Kristján Valur Jónsson added the comment:
Basically, reuse of strings (and preservation of their internment status) fell
by the wayside somewhere in the 3.x transition. Strings have been reused, and
interned strings re-interned, since protocol version 1 in 2.x. This patch adds
that feature back, and uses that mechanism to reuse not only strings, but also
any other multiply-referenced object.
It is not desirable to simply intern all strings that are read from marshaled
data. Only selected strings are interned by python during compilation and we
want to keep it that way. Also, 2.x reuses not only interned strings but other
strings as well.
Generalizing reuse of strings to other objects is trivial, and a logical step
forward. This allows optimizations to be made on code objects where common
data are identified and instanced, and those code objects to be saved and
reloaded with that instancing intact.
But even without such code-object optimization, the changes are significant:
The sizes of the marshaled code object of lib/test/test_marshal drops from
24093 bytes in version 2 to 17841 bytes with version 3, without any additional
massaging of the module code object.
Python tracker <rep...@bugs.python.org>
Python-bugs-list mailing list