[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
Serhiy Storchaka added the comment: Patch updated (comment for load_binstring added). -- Added file: http://bugs.python.org/file28097/pickle_nonportable_size_2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12848 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
Roundup Robot added the comment: New changeset 55fe4b57dd9c by Antoine Pitrou in branch '3.2': Issue #12848: The pure Python pickle implementation now treats object lengths as unsigned 32-bit integers, like the C implementation does. http://hg.python.org/cpython/rev/55fe4b57dd9c New changeset c9d205e2dd02 by Antoine Pitrou in branch '3.3': Issue #12848: The pure Python pickle implementation now treats object lengths as unsigned 32-bit integers, like the C implementation does. http://hg.python.org/cpython/rev/c9d205e2dd02 New changeset aac6b313ef5f by Antoine Pitrou in branch 'default': Issue #12848: The pure Python pickle implementation now treats object lengths as unsigned 32-bit integers, like the C implementation does. http://hg.python.org/cpython/rev/aac6b313ef5f -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12848 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
Antoine Pitrou added the comment: I've committed the latest patch (pickle_nonportable_size_2.patch). Thank you for working on this! -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12848 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
Antoine Pitrou added the comment: Here is a patch for 3.x. It unify behavior of Python and C implementations and unify behavior on 32- and 64-bit platforms. For backward compatibility Pickler can pickle up to 2G data, but Unpickler can unpickle up to 4G on 64-bit. I agree the right tradeoff is not easy to find, but I don't think we should introduce a regression in _pickle.c just for the sake of making it more consistent with pickle.py's bugs. So _pickle.c behaviour should probably be preserved, and pickle.py should be improved to accept unpickling 4G pickles. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12848 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
Antoine Pitrou added the comment: I'd like to add that anyone wanting to serialize large data will certainly be using _pickle (or its ancestor cPickle), since using pickle.py is probably excruciatingly slow. Meaning we should favour preserving _pickle/cPickle's behaviour over preserving pickle.py's behaviour here. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12848 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
Serhiy Storchaka added the comment: The issue is not only in difference between Python and C implementations, but also between 32-bit and 64-bit. pickle.py on 32-bit accepts data up to 2G. pickle.py on 64-bit accepts data up to 2G. _pickle.c on 32-bit accepts data up to 2G. _pickle.c on 64-bit accepts data up to 4G. 3:1 for 2G. Current _pickle.c behavior is just not portable. Of course, I can rewrite the patch, expanding the limit to 4G on 64-bit if you insist. But I doubt that this is the best variant. -- nosy: +loewis ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12848 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
Martin v. Löwis added the comment: IMO, the right solution is to finish PEP 3154, and support large strings in the format. For the time being, I'd claim that signed length in the existing implementations are just a bug, and that unsigned lengths are the intended semantics of these opcodes. I can't see anything that is gained by allowing negative lengths. OTOH, I also think that it won't matter much in practive: if you try to unpickle a string with more than 2GiB on a 32-bit system, chances are really high that you run out of memory. So whether any bug fix needs to be backported, I don't know. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12848 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
Serhiy Storchaka added the comment: Here is a patch for 3.x which extends supported size to 4G on 64-bit. -- Added file: http://bugs.python.org/file28010/pickle_nonportable_size.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12848 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
Antoine Pitrou added the comment: OTOH, I also think that it won't matter much in practive: if you try to unpickle a string with more than 2GiB on a 32-bit system, chances are really high that you run out of memory. Agreed. I think this issue is mostly about 64-bit systems, even though we may want to fix to apply to 32-bit systems as well, if it doesn't make things more complicated. And, yes, PEP 3154 should be finished, but it is currently stalled in issue 15642. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12848 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
Serhiy Storchaka added the comment: Here is a patch for 3.x. It unify behavior of Python and C implementations and unify behavior on 32- and 64-bit platforms. For backward compatibility Pickler can pickle up to 2G data, but Unpickler can unpickle up to 4G on 64-bit. -- keywords: +patch stage: - patch review Added file: http://bugs.python.org/file28005/pickle_portable_size.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12848 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
Serhiy Storchaka added the comment: The C implementation writes and reads BINBYTES and BINUNICODE up to 4G (on 64-bit platform). The Python implementation writes and reads BINBYTES and BINUNICODE up to 2G. What should be compatible fix? Allow the Python implementation to write and read up to 4G? Then Python can pickle a large data which can't be unpickled on non-patched Python (and on 2.7). Limit size to 2G? Then non-patched Python (including 3.1) can pickle a data which can't be unpickled on patched Python. Also there is an unpleasant fact that 64-bit Python can pickle data which can't unpickle 32-bit Python. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12848 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
Serhiy Storchaka added the comment: What if just add 0x? -- nosy: +serhiy.storchaka versions: +Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12848 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
Serhiy Storchaka added the comment: Ah, for unpacking 32-bit unsigned big-endian bytes you can use len = int.from_bytes(self.read(4), 'big'). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12848 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
Serhiy Storchaka added the comment: Or you can use len = struct.unpack('I', self.read(4)). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12848 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
Alexandre Vassalotti added the comment: pickle.py is the buggy one here. Its use of the marshal module is really a hack. Plus, it is slower than both struct and int.from_bytes. 14:40:57 [~/cpython]$ ./python -m timeit int.from_bytes(b'\xff\xff\xff\xff', 'big') 100 loops, best of 3: 0.209 usec per loop 14:38:03 [~/cpython]$ ./python -m timeit -s import struct struct.unpack('I', b'\xff\xff\xff\xff') 1000 loops, best of 3: 0.147 usec per loop 14:37:44 [~/cpython]$ ./python -m timeit -s import marshal marshal.loads(b'i'+b'\xff\xff\xff\xff') 100 loops, best of 3: 0.236 usec per loop -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12848 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
New submission from Antoine Pitrou pit...@free.fr: In several opcodes (BINBYTES, BINUNICODE... what else?), _pickle.c happily accepts 32-bit lengths of more than 2**31, while pickle.py uses marshal's i typecode which means signed... and therefore fails reading the data. Apparently, pickle.py uses marshal for speed reasons, but marshal doesn't support unsigned types. (seen from http://bugs.python.org/issue11564) -- components: Library (Lib) messages: 143065 nosy: alexandre.vassalotti, pitrou priority: normal severity: normal status: open title: pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned type: behavior versions: Python 3.2, Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12848 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com