[issue16475] Support object instancing and recursion in marshal

2014-04-04 Thread Kristján Valur Jónsson

Changes by Kristján Valur Jónsson krist...@ccpgames.com:


--
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2014-04-04 Thread Kristján Valur Jónsson

Changes by Kristján Valur Jónsson krist...@ccpgames.com:


--
resolution:  - fixed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2013-03-26 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 84e73ace3d7e by Kristjan Valur Jonsson in branch 'default':
Issue #16475: Add a whatsnew entry for 3.4
http://hg.python.org/cpython/rev/84e73ace3d7e

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2013-03-25 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Thanks, Martin, for clarifying this.
Glad you are seeing improvements Amaury.  In fact, large modules can get 
additional improvements by identifying identical generated constructs from the 
compiler, like tuples and strings.  The compiler isn't very good at this to 
begin with.  I should probably work some more on my recode module and make it 
public.

I am unsure about how whatsnew is handled these days.  Is it incrementally 
updated or managed by someone?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2013-03-25 Thread Ezio Melotti

Ezio Melotti added the comment:

 I am unsure about how whatsnew is handled these days.
 Is it incrementally updated or managed by someone?

It's better to add at least a stub to the whatsnew, even if someone will 
eventually go through it before the release.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2013-03-24 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc added the comment:

Sorry, what does instancing mean?
And does this change bring interesting features?
And is there an impact on regular .pyc files?

--
nosy: +amaury.forgeotdarc

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2013-03-24 Thread Martin v . Löwis

Martin v. Löwis added the comment:

 Sorry, what does instancing mean?

He means keeping track of instance identities, so that objects 
that were shared before marshal continue to be shared after loading.

 And does this change bring interesting features?

interesting to whom?

 And is there an impact on regular .pyc files?

Definitely. It may now contain 't' (interned) codes again, and it may contain 
'r' (reference) codes. The size of the pyc files may decrease.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2013-03-24 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc added the comment:

 The size of the pyc files may decrease

This is very good news! Indeed, I noticed decimal.cpython-34.pyc going from 
212k to 178k.  17% less!
This is worth an entry in whatsnew/3.4.rst IMO.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2013-03-21 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Yes, they were fixed with #42bf74b90626 which also added unittests in 
test_marshal.py to make sure invalid EOFs are always caught.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2013-03-20 Thread Ezio Melotti

Ezio Melotti added the comment:

I'm getting two failures after this:
==
ERROR: testRaising (test.test_exceptions.ExceptionTests)
--
Traceback (most recent call last):
  File /home/wolf/dev/py/py3k/Lib/test/test_exceptions.py, line 51, in 
testRaising
marshal.loads(b'')
ValueError: bad marshal data (unknown type code)

--

and

==
ERROR: test_no_marshal 
(test.test_importlib.source.test_file_loader.SourceLoaderBadBytecodeTest)
--
Traceback (most recent call last):
  File /home/wolf/dev/py/py3k/Lib/test/test_importlib/source/util.py, line 
23, in wrapper
to_return = fxn(*args, **kwargs)
  File 
/home/wolf/dev/py/py3k/Lib/test/test_importlib/source/test_file_loader.py, 
line 364, in test_no_marshal
self._test_no_marshal()
  File 
/home/wolf/dev/py/py3k/Lib/test/test_importlib/source/test_file_loader.py, 
line 265, in _test_no_marshal
self.import_(file_path, '_temp')
  File 
/home/wolf/dev/py/py3k/Lib/test/test_importlib/source/test_file_loader.py, 
line 194, in import_
module = loader.load_module(module_name)
  File frozen importlib._bootstrap, line 572, in _check_name_wrapper
  File frozen importlib._bootstrap, line 1032, in load_module
  File frozen importlib._bootstrap, line 1013, in load_module
  File frozen importlib._bootstrap, line 548, in module_for_loader_wrapper
  File frozen importlib._bootstrap, line 869, in _load_module
  File frozen importlib._bootstrap, line 990, in get_code
  File frozen importlib._bootstrap, line 668, in _compile_bytecode
ValueError: bad marshal data (unknown type code)

==
ERROR: test_no_marshal 
(test.test_importlib.source.test_file_loader.SourcelessLoaderBadBytecodeTest)
--
Traceback (most recent call last):
  File 
/home/wolf/dev/py/py3k/Lib/test/test_importlib/source/test_file_loader.py, 
line 470, in test_no_marshal
self._test_no_marshal(del_source=True)
  File 
/home/wolf/dev/py/py3k/Lib/test/test_importlib/source/test_file_loader.py, 
line 265, in _test_no_marshal
self.import_(file_path, '_temp')
  File 
/home/wolf/dev/py/py3k/Lib/test/test_importlib/source/test_file_loader.py, 
line 194, in import_
module = loader.load_module(module_name)
  File frozen importlib._bootstrap, line 1099, in load_module
  File frozen importlib._bootstrap, line 548, in module_for_loader_wrapper
  File frozen importlib._bootstrap, line 869, in _load_module
  File frozen importlib._bootstrap, line 1105, in get_code
  File frozen importlib._bootstrap, line 668, in _compile_bytecode
ValueError: bad marshal data (unknown type code)

--

--
nosy: +ezio.melotti

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2013-03-20 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Indeed, would be nice to fix the test failures.

Besides, it would be extra nice if you could run the test suite *before* 
pushing your changes. Otherwise you're wasting everyone else's time.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2013-03-20 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

This should not have happened and it was indeed all tested. I'll investigate 
why these errors are happening.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2013-03-20 Thread Roundup Robot

Roundup Robot added the comment:

New changeset f4c21179690b by Kristján Valur Jónsson in branch 'default':
Issue #16475: Simplify the interface to r_ref_allocate and improve comments.
http://hg.python.org/cpython/rev/f4c21179690b

New changeset 42bf74b90626 by Kristján Valur Jónsson in branch 'default':
Issue #16475 : Correctly handle the EOF when reading marshal streams.
http://hg.python.org/cpython/rev/42bf74b90626

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2013-03-19 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 01372117a5b4 by Kristján Valur Jónsson in branch 'default':
Issue #16475: Support object instancing, recursion and interned strings
http://hg.python.org/cpython/rev/01372117a5b4

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2013-03-19 Thread Benjamin Peterson

Benjamin Peterson added the comment:

I don't understand some of this code. Why does r_ref_reserve take a first 
parameter which it just returns on success without using?

--
nosy: +benjamin.peterson

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2013-03-19 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

I thought I had explained this already, but can't find it, so here is the 
explanation:
It matches the pattern of r_ref(), which semantically is a combination of 
r_ref_register() and r_ref_insert().

It is a convenence calling pattern because these functions will either:
a) pass a NULL operand through,
b) succeed and pass the non-NULL operatn through
c) fail, and release the operand and return NULL.
In all cases, it is sufficient to look at the value of the operand after this 
transformation to know if everything was allright.  This simplifies all the 
calling sites where these things are used, e.g. by the R_REF() macro.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2013-03-19 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

I should add comments explaining this to the file.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-21 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Code objects can indeed be shared.
One thing that the recode module does, or allows you to do, is to strip file 
and line number information from code objects.  This will theoretically allow 
them to be collapsed.

Martin, I agree the .pyc size matters.  You are right, priorities vary.  I am 
mainly focused on memory use, while others may be looking at disk use.  Disk 
use can of course be reduced by using tools like zip.  And code objects can be 
re-optimized at load time too using special importers.  But it is nice to be 
able to achieve both objectives by enabling the marshal format to preserve 
those optimizations that are performed on it prior to saving it.

I'm currently working on the recode module.  When its done, I'll report back 
and share it with you so that you can toy around with it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-20 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Antoine, I understand that _you_ may not see any need for object references in 
marshal streams.  Also, I am not going to try to convince you it is a good 
idea, since I have long figured out that you are against any contributions from 
me on some sort of principle.

However, even if you cannot agree that it is a good idea, can you explain to me 
how it is a BAD idea?  How can expanding object references to strings to all 
objects, using the same mechanism, be bad?  How can it be bad to make the 
marshal format more complete with minimal effort?  Keep in mind that this 
change removes a number of warnings and caveats present both in the 
documentation and the in-line comments.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-20 Thread Mark Dickinson

Mark Dickinson added the comment:

 I have long figured out that you are against any contributions from
 me on some sort of principle.

I suspect that Antoine's principles have very little to do with *who* the 
contributions originate from, and much more to do with the content of those 
contributions.

We've all got the same overarching goal of improving and maintaining the 
quality of Python.  Please can we not make this personal?

--
nosy: +mark.dickinson

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-20 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Serhiy, to answer your questions:

 Can you please measure the time of unmarshalling?
Sure. Normally, I wouldn't consider time to be important here because for code 
objects, unmarshaling is done only once per run  However, testing it is simple:
I added timeit(number=100) for unmarshaling of the different version code.
0.09614 0.09605 0.582672 0.553599
0.05590 0.05628 0.293370 0.285517
0.05703 0.05754 0.294601 0.292070
The lines are version 2, 3, and 4(*) respectively.  The columns are 
test_marshal.py, recode.intern(test_marshal.py), decimal.py and 
recode.intern(decimal.py)

As you see, loading time is almost halfed with version 3 and 4 compared to 2.  
Version 3 is also slightly faster than 4
(*)verion 4 is a special version that only instances strings.


 If you can count the statistics about marshalled types
 (what percent of shared and non shared integers, strings, etc),
 it would also be very interesting.
That's more tricky, at least on a type=by-type basis, but I could do a global 
object count.  Later.

 There is at least one reason. This increases size of the refs table.
I checked this, by printing out the size of the instance list when loads() was 
done:
457 1571
297 1163
429 1539
The columns are test_marshal.py and decimal.py
the lines are version 3, version 4( only strings ) and special version5 which 
is like 3 but omits ints.
As you see, the ints correspond to roughly 6% and 2% of the instances 
respectively.  The bulk of the list is taken up by strings (65% and 74%)

This shows that adding instancing of all other types on top of the strings does 
not typically expand the instance list more than 50%

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-20 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

New patch, incorporating suggested fixes from review.

--
Added file: http://bugs.python.org/file28047/marshalinstance.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-20 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Thank you, Kristján, for the statistics. It makes your proposition more 
attractive.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-20 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

New patch with changes.

--
Added file: http://bugs.python.org/file28048/marshalinstance.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-20 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Personally I don't like the use of macros inn this code. I think that without 
them the code would be clearer.

If anyone is interested, here are the statistics for all the standard modules 
(Lib/__pycache__/*.pyc).

UNICODE 105248  61%
TUPLE33193  19%
STRING   13527  7.8%
INT   7373  4.3%
CODE  6438  3.7%
NONE  5291  3.1%
TRUE   474  0.27%
FALSE  401  0.23%
BINARY_FLOAT   340  0.2%
LONG60  0.035%
FROZENSET   20  0.012%

Strings (unicode and bytes), tuples, short ints, code objects and None in sum 
are 99% of all objects. Mutable collections, complex numbers, Ellipsis and 
StopIteration are not used at all.

If size of compiled modules is a problem, we can get about 10% by using more 
compact representation for sizes (1- or 2-bytes). This requires additional 
codes for strings and collections (at least for unicode strings and tuples).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-20 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Changes as suggested by Serhiy

--
Added file: http://bugs.python.org/file28052/marshalinstance.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-20 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

The size of the .pyc files is secondary.  The size that is important is the 
memory footprint of loaded code objects, which can be done by stripping and 
folding code objects.
This works springs out of work for embedding python on the PS3 console where 
every byte counts.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-20 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 However, even if you cannot agree that it is a good idea, can you
 explain to me how it is a BAD idea?  How can expanding object
 references to strings to all objects, using the same mechanism, be
 bad?

It is a bad idea because features have to be supported in the long-term,
which means more maintenance effort. So, basically, this is the same
reason we don't accept every feature request + patch that gets posted
to the tracker.

And, again, I think the string interning part is a good thing.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-20 Thread Antoine Pitrou

Antoine Pitrou added the comment:

By the way, please follow PEP 8 in test_marshal.py.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-20 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here is the statistics for all pyc-files (not only in Lib/__pycache__). This 
includes encoding tables and tests. I count also memory usage for some types 
(for tuples shared size is estimated upper limit).

type count %  size   shared %

UNICODE 622812 58%26105085 14885090 57%
TUPLE   224214 21% 8184848  3498300 43%
STRING   90992 8.4%6931342   832224 12%
INT  52087 4.8% 71540058666 8.2%
CODE 42147 3.9%28659960 0%
NONE 39777 3.7%  
BINARY_FLOAT  3120 0.29% 
TRUE  2363 0.22% 
FALSE 1976 0.18% 
LONG  1012 0.094%
ELLIPSIS   528 0.049%
BINARY_COMPLEX 465 0.043%
FROZENSET   24 0.0022%

Total  1081517 100%   44802671 19274280 ~43%


Yet some statistics. If we import all standard modules, we can save using 
object sharing 2.3 MB of total 4.6 MB (50%) memory size for unicode objects, up 
to 0.47 MB of 1.25 MB for tuples (approximate upper limit), 84 KB of 117 KB 
(7%) for bytes objects, and only 2 KB of 101 KB for integers (most integers 
already interned). None of code objects can be shared (because code object 
contains almost unique first line number).

Therefore there is a sense to share unicode objects, tuples, and may be bytes 
objects. Most integers (in range -5..257) already interned. None of code 
objects can be shared (because code object contains almost unique first line 
number). Floats, complexes and frozensets unlikely save much of memory.

--
Added file: http://bugs.python.org/file28055/marshalstat.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-20 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here is the statistics for all pyc-files (not only in Lib/__pycache__). This 
includes encoding tables and tests. I count also memory usage for some types 
(for tuples shared size is estimated upper limit).

type count %  size   shared %

UNICODE 622812 58%26105085 14885090 57%
TUPLE   224214 21% 8184848  3498300 43%
STRING   90992 8.4%6931342   832224 12%
INT  52087 4.8% 71540058666 8.2%
CODE 42147 3.9%28659960 0%
NONE 39777 3.7%  
BINARY_FLOAT  3120 0.29% 
TRUE  2363 0.22% 
FALSE 1976 0.18% 
LONG  1012 0.094%
ELLIPSIS   528 0.049%
BINARY_COMPLEX 465 0.043%
FROZENSET   24 0.0022%

Total  1081517 100%   44802671 19274280 ~43%


Therefore there is a sense to share unicode objects, tuples, and may be bytes 
objects. Most integers (in range -5..257) already interned. None of code 
objects can be shared (because code object contains almost unique first line 
number). Floats, complexes and frozensets unlikely save much of memory.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-20 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
Removed message: http://bugs.python.org/msg176012

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-20 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 Here is the statistics for all pyc-files (not only in
 Lib/__pycache__). This includes encoding tables and tests. I count
 also memory usage for some types (for tuples shared size is estimated
 upper limit).

Did you examine the sharing per file or among all files?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-20 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Per file. With total sharing:

UNICODE 622812 58%26105085 18262484 70%
TUPLE   224214 21% 8184848  4007404 49%
STRING   90992 8.4%6931342  1361618 20%
INT  52087 4.8% 71540095666 13%
CODE 42147 3.9%28659960 0%

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-20 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Total size of all *.pyc files is 22 MB.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-20 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Am 20.11.12 17:32, schrieb Kristján Valur Jónsson:
 The size of the .pyc files is secondary.

This really depends on whom you ask. When I did the string interning
support, the primary concern *was* for the size of the pyc files, and
there *was* a real project where it mattered (namely, code size in
a frozen application). It may be secondary to *you* and *now*.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-20 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Am 20.11.12 18:02, schrieb Antoine Pitrou:
 It is a bad idea because features have to be supported in the long-term,
 which means more maintenance effort. So, basically, this is the same
 reason we don't accept every feature request + patch that gets posted
 to the tracker.

For marshal, this actually is of less concern - we are free to change it
whenever we please (and people actually did change it when they
pleased).

Of course, there still must be a demonstrated gain, and that must be
significant enough to justify the size of the change (in diffstat).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-19 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

If you have string sharing, adding support for general sharing falls 
automatically out without any effort.  There is no reason _not_ to support it, 
in other words.
Marshal may be primarily used for .pyc files but it is not the only usage.  It 
is a very fast and powerful serializer for data that is not subject to the 
overhead or safety concerns of the general pickle protocol.  This is 
illustrated by the following code (2.7):

case TYPE_CODE:
if (PyEval_GetRestricted()) {
PyErr_SetString(PyExc_RuntimeError,
cannot unmarshal code objects in 
restricted execution mode);
Obviously, this shows that marshal is still expected to work and be useful even 
if not for pickling code objects.

It is good to know that you care about the size of the .pyc files, Martin.  But 
we should bear in mind that this size difference is directly reflected in the 
memory use of the loaded data.  A reduction by 25% of the .pyc size is roughly 
equivalent to a 25% memory use reduction by the loaded code object.

I haven't produced data about the savings of general object reuse because it 
relies on my recode code optimizer module which is still work in progress.  
However, I will do some tests and let you know.  Suffice to say that it is 
enormously frustrating to re-generate code objects with an optimization tool, 
sharing common or identical sub-objects and so on, and then finding that the 
marshal module undoes all of that.

I'll report back with additional figures.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-19 Thread Christian Heimes

Changes by Christian Heimes li...@cheimes.de:


--
nosy: +christian.heimes

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

There is no many sense to use references for TYPE_INT whose representation size 
not greater then a reference representation size.  I doubt about references to 
mutable objects.

--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-19 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Ok, I did some tests with my recode module.  The following are the sizes of the 
marshal data:

test2To3 ... 24748 24748 212430 212430
test3To3 ... 18420 17848 178969 174806
test4To3 ... 18425 18411 178969 178550

The columns:
a) test_marshal.py without transform
b) test_marshal.py with recode.intern() (folding common objects)
c) and d): decimal.py module (the largest one in lib)

The lines:
1) Version 2 of the protocol.
2) Version 3 of the protocol (object instancing and the works)
3) Version 4, an dummy version that only instances strings)

As expected, there is no difference between version 3 and 4 unless I employ the 
recode module to fold common subobjects.  This brings an additional saving of 
some 3% bringing the total reduction up to 28% and 
18% respectively.

Note that the transform is a simple recursive folding of objects.  common 
argument lists, such as (self) are subject to this.  No renaming of local 
variables or other stripping is performed.
So, although the recode module is work in progress, and not the subject of 
this defect, its use shows how it is important to be able to support proper 
instancing in serialization protocols.

Implementation note:  The trick of using a bit flag on the type to indicate a 
slot reservation in the instance list is one that has been in use in CCP´s own 
Marshal format, a proprietary serialization format based on marshal back in 
2002 (adding many more special opcodes and other stuff)

Serhiy: There is no reason _not_ to reuse INT objects if we are doing it for 
other immutables to.  As you note, the size of the data is the same. This will 
ensure that integers that are not cached can be folded into the same object, 
e.g. the value 123, if used in two functions, can be the same int object.

I should also point out that the marshal protocol takes care to be able to 
serialize lists, sets and frozensets correctly, the latter being added in 
version 2.4.  This despite the fact that code objects don't make use of these.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 The following are the sizes of the marshal data:

Can you please measure the time of unmarshalling? It would be interesting. If 
you can count the statistics about marshalled types (what percent of shared and 
non shared integers, strings, etc), it would also be very interesting.

 There is no reason _not_ to reuse INT objects if we are doing it for other 
 immutables to.

There is at least one reason. This increases size of the refs table.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-19 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 I should also point out that the marshal protocol takes care to be
 able to serialize lists, sets and frozensets correctly, the latter
 being added in version 2.4.  This despite the fact that code objects
 don't make use of these.

Code objects do use frozensets:

 def f(x):
... return x in {1,2,3,4,5,6}
... 
 dis.dis(f)
  2   0 LOAD_FAST0 (x) 
  3 LOAD_CONST   7 (frozenset({1, 2, 3, 4, 5, 6})) 
  6 COMPARE_OP   6 (in) 
  9 RETURN_VALUE 

I don't think marshal supports any type that isn't (or hasn't been)
used in code objects.

 Obviously, this shows that marshal is still expected to work and be
 useful even if not for pickling code objects.

The module officially intended for general-purpose serialization is
pickle; if you use marshal for such a purpose, you're on your own.
If you think pickle is not good enough, your improvements are welcome.

 As expected, there is no difference between version 3 and 4 unless
 I employ the recode module to fold common subobjects.  This brings
 an additional saving of some 3% bringing the total reduction up to
 28% and 18% respectively.

3% doesn't sound like a worthwhile improvement at all.

 The trick of using a bit flag on the type to indicate a slot
 reservation in the instance list is one that has been in use in
 CCP´s own Marshal format, a proprietary serialization format
 based on marshal back in 2002 (adding many more special opcodes
 and other stuff)

Why don't you release your proprietary marshal on pypi? You would
get feedback and a sense of whether people are interested in your
approach.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-18 Thread Gregory P. Smith

Changes by Gregory P. Smith g...@krypto.org:


--
nosy: +gregory.p.smith

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-17 Thread Antoine Pitrou

Antoine Pitrou added the comment:

I agree that restoring the string interning behaviour would be a good thing.
However, I don't agree that supporting recursive objects and instantiation is 
useful. marshal is specialized for code objects, and you shouldn't find any 
recursive constants there.

As for the size of pyc files, who cares? Memory footprint may be useful to 
shrink (especially for cache efficiency reasons), but I don't see why we should 
try to reduce the size of on-disk bytecode. And if we do, it would probably be 
simpler to zlib-compress them.

--
nosy: +loewis

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-17 Thread Martin v . Löwis

Martin v. Löwis added the comment:

When I added interning support to marshal, I specifically cared about the size 
of pyc. I find it sad that this support was thrown out, so I support restoring 
it.

I'm also skeptical about general sharing, and would like to see some specific 
numbers pointing out the gain of such a mechanism (compared to a version that 
merely preserves interned strings).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-16 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

This change is specifically aimed at code objects.
As it is, it is impossible to produce code objects that share common data (e.g. 
filename strings, common tuples, name strings, etc) that don't unserialize to 
separate objects.

Also, separately but related (see second patch) the effort spent in interning 
names when compiling, is lost when code objects are loaded back from disk.

This change is based on work done at CCP to reduce the size of compiled code in 
memory.  Simple preprocessing of code objects prior to writing them to disk can 
result in important memory savings.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-16 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 This change is specifically aimed at code objects.
 As it is, it is impossible to produce code objects that share common
 data (e.g. filename strings, common tuples, name strings, etc) that
 don't unserialize to separate objects.

Shouldn't strings be interned when the code object is unmarshalled?

 This change is based on work done at CCP to reduce the size of
 compiled code in memory.  Simple preprocessing of code objects prior
 to writing them to disk can result in important memory savings.

How important?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-16 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Basically, reuse of strings (and preservation of their internment status) fell 
by the wayside somewhere in the 3.x transition.  Strings have been reused, and 
interned strings re-interned, since protocol version 1 in 2.x.  This patch adds 
that feature back, and uses that mechanism to reuse not only strings, but also 
any other multiply-referenced object.

It is not desirable to simply intern all strings that are read from marshaled 
data.  Only selected strings are interned by python during compilation and we 
want to keep it that way.  Also, 2.x reuses not only interned strings but other 
strings as well.

Generalizing reuse of strings to other objects is trivial, and a logical step 
forward.  This allows optimizations to be made on code objects where common 
data are identified and instanced, and those code objects to be saved and 
reloaded with that instancing intact.

But even without such code-object optimization, the changes are significant:
The sizes of the marshaled code object of lib/test/test_marshal drops from 
24093 bytes in version 2 to 17841 bytes with version 3, without any additional 
massaging of the module code object.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-15 Thread Kristján Valur Jónsson

New submission from Kristján Valur Jónsson:

The format used by the marshal module does not support instancing.  This 
precludes certain data optimizations, such as sharing string constants, common 
tuples, even common code objects.  Since the marshal format is used to write 
compiled code, this makes it impossible to do data optimization on code prior 
to writing it out.

This patch adds proper instancing support for all the supported types and 
increases the default version to three.

(A separate defect/regression is that interned strings are no longer preserved 
as was implemented in version 1 of the original 2.x branch.  This also negates 
any interning done at compile time.  That is a separate defect.)

--
components: Interpreter Core
files: marshalinstance.patch
keywords: patch
messages: 175603
nosy: kristjan.jonsson
priority: normal
severity: normal
status: open
title: Support object instancing and recursion in marshal
type: enhancement
versions: Python 3.4
Added file: http://bugs.python.org/file27987/marshalinstance.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-15 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Second patch which adds the missing internment support for strings, including 
unittests.

--
Added file: http://bugs.python.org/file27988/marshalinstance.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16475] Support object instancing and recursion in marshal

2012-11-15 Thread Antoine Pitrou

Antoine Pitrou added the comment:

marshal is only supposed to be used to serialize code objects, not arbitrary 
user data. Why don't you use pickle?

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16475
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com