Kristján Valur Jónsson added the comment:

Ok, I did some tests with my recode module.  The following are the sizes of the 
marshal data:

test2To3 ... 24748 24748 212430 212430
test3To3 ... 18420 17848 178969 174806
test4To3 ... 18425 18411 178969 178550

The columns:
a) test_marshal.py without transform
b) test_marshal.py with recode.intern() (folding common objects)
c) and d): decimal.py module (the largest one in lib)

The lines:
1) Version 2 of the protocol.
2) Version 3 of the protocol (object instancing and the works)
3) Version 4, an dummy version that only instances strings)

As expected, there is no difference between version 3 and 4 unless I employ the 
recode module to fold common subobjects.  This brings an additional saving of 
some 3% bringing the total reduction up to 28% and 
18% respectively.

Note that the transform is a simple recursive folding of objects.  common 
argument lists, such as (self) are subject to this.  No renaming of local 
variables or other stripping is performed.
So, although the "recode" module is work in progress, and not the subject of 
this "defect", its use shows how it is important to be able to support proper 
instancing in serialization protocols.

Implementation note:  The trick of using a bit flag on the type to indicate a 
slot reservation in the instance list is one that has been in use in CCP´s own 
"Marshal" format, a proprietary serialization format based on marshal back in 
2002 (adding many more special opcodes and other stuff)

Serhiy: There is no reason _not_ to reuse INT objects if we are doing it for 
other immutables to.  As you note, the size of the data is the same. This will 
ensure that integers that are not cached can be folded into the same object, 
e.g. the value 123, if used in two functions, can be the same int object.

I should also point out that the marshal protocol takes care to be able to 
serialize lists, sets and frozensets correctly, the latter being added in 
version 2.4.  This despite the fact that code objects don't make use of these.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue16475>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to