[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned

2012-11-24 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Patch updated (comment for load_binstring added).

--
Added file: http://bugs.python.org/file28097/pickle_nonportable_size_2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned

2012-11-24 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 55fe4b57dd9c by Antoine Pitrou in branch '3.2':
Issue #12848: The pure Python pickle implementation now treats object lengths 
as unsigned 32-bit integers, like the C implementation does.
http://hg.python.org/cpython/rev/55fe4b57dd9c

New changeset c9d205e2dd02 by Antoine Pitrou in branch '3.3':
Issue #12848: The pure Python pickle implementation now treats object lengths 
as unsigned 32-bit integers, like the C implementation does.
http://hg.python.org/cpython/rev/c9d205e2dd02

New changeset aac6b313ef5f by Antoine Pitrou in branch 'default':
Issue #12848: The pure Python pickle implementation now treats object lengths 
as unsigned 32-bit integers, like the C implementation does.
http://hg.python.org/cpython/rev/aac6b313ef5f

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned

2012-11-24 Thread Antoine Pitrou

Antoine Pitrou added the comment:

I've committed the latest patch (pickle_nonportable_size_2.patch). Thank you 
for working on this!

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned

2012-11-17 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 Here is a patch for 3.x. It unify behavior of Python and C
 implementations and unify behavior on 32- and 64-bit platforms.  For 
 backward compatibility Pickler can pickle up to 2G data, but Unpickler 
 can unpickle up to 4G on 64-bit.

I agree the right tradeoff is not easy to find, but I don't think we should 
introduce a regression in _pickle.c just for the sake of making it more 
consistent with pickle.py's bugs. So _pickle.c behaviour should probably be 
preserved, and pickle.py should be improved to accept unpickling 4G pickles.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned

2012-11-17 Thread Antoine Pitrou

Antoine Pitrou added the comment:

I'd like to add that anyone wanting to serialize large data will certainly be 
using _pickle (or its ancestor cPickle), since using pickle.py is probably 
excruciatingly slow. Meaning we should favour preserving _pickle/cPickle's 
behaviour over preserving pickle.py's behaviour here.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned

2012-11-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The issue is not only in difference between Python and C implementations, but 
also between 32-bit and 64-bit.

pickle.py on 32-bit accepts data up to 2G.
pickle.py on 64-bit accepts data up to 2G.
_pickle.c on 32-bit accepts data up to 2G.
_pickle.c on 64-bit accepts data up to 4G.

3:1 for 2G.  Current _pickle.c behavior is just not portable.

Of course, I can rewrite the patch, expanding the limit to 4G on 64-bit if you 
insist.  But I doubt that this is the best variant.

--
nosy: +loewis

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned

2012-11-17 Thread Martin v . Löwis

Martin v. Löwis added the comment:

IMO, the right solution is to finish PEP 3154, and support large strings in the 
format.

For the time being, I'd claim that signed length in the existing 
implementations are just a bug, and that unsigned lengths are the intended 
semantics of these opcodes. I can't see anything that is gained by allowing 
negative lengths.

OTOH, I also think that it won't matter much in practive: if you try to 
unpickle a string with more than 2GiB on a 32-bit system, chances are really 
high that you run out of memory. So whether any bug fix needs to be backported, 
I don't know.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned

2012-11-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here is a patch for 3.x which extends supported size to 4G on 64-bit.

--
Added file: http://bugs.python.org/file28010/pickle_nonportable_size.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned

2012-11-17 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 OTOH, I also think that it won't matter much in practive: if you try to 
 unpickle a string with more than 2GiB on a 32-bit system, chances are
 really high that you run out of memory.

Agreed. I think this issue is mostly about 64-bit systems, even though we may 
want to fix to apply to 32-bit systems as well, if it doesn't make things more 
complicated.

And, yes, PEP 3154 should be finished, but it is currently stalled in issue 
15642.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned

2012-11-16 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here is a patch for 3.x. It unify behavior of Python and C implementations and 
unify behavior on 32- and 64-bit platforms.  For backward compatibility Pickler 
can pickle up to 2G data, but Unpickler can unpickle up to 4G on 64-bit.

--
keywords: +patch
stage:  - patch review
Added file: http://bugs.python.org/file28005/pickle_portable_size.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned

2012-11-15 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The C implementation writes and reads BINBYTES and BINUNICODE up to 4G (on 
64-bit platform). The Python implementation writes and reads BINBYTES and 
BINUNICODE up to 2G. What should be compatible fix? Allow the Python 
implementation to write and read up to 4G? Then Python can pickle a large data 
which can't be unpickled on non-patched Python (and on 2.7). Limit size to 2G? 
Then non-patched Python (including 3.1) can pickle a data which can't be 
unpickled on patched Python.

Also there is an unpleasant fact that 64-bit Python can pickle data which can't 
unpickle 32-bit Python.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned

2012-11-06 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

What if just add   0x?

--
nosy: +serhiy.storchaka
versions: +Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned

2012-11-06 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Ah, for unpacking 32-bit unsigned big-endian bytes you can use len = 
int.from_bytes(self.read(4), 'big').

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned

2012-11-06 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Or you can use len = struct.unpack('I', self.read(4)).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned

2012-11-06 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

pickle.py is the buggy one here. Its use of the marshal module is really a 
hack. Plus, it is slower than both struct and int.from_bytes.

14:40:57 [~/cpython]$ ./python -m timeit int.from_bytes(b'\xff\xff\xff\xff', 
'big')
100 loops, best of 3: 0.209 usec per loop
14:38:03 [~/cpython]$ ./python -m timeit -s import struct 
struct.unpack('I', b'\xff\xff\xff\xff')
1000 loops, best of 3: 0.147 usec per loop
14:37:44 [~/cpython]$ ./python -m timeit -s import marshal 
marshal.loads(b'i'+b'\xff\xff\xff\xff')
100 loops, best of 3: 0.236 usec per loop

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned

2011-08-27 Thread Antoine Pitrou

New submission from Antoine Pitrou pit...@free.fr:

In several opcodes (BINBYTES, BINUNICODE... what else?), _pickle.c happily 
accepts 32-bit lengths of more than 2**31, while pickle.py uses marshal's i 
typecode which means signed... and therefore fails reading the data.
Apparently, pickle.py uses marshal for speed reasons, but marshal doesn't 
support unsigned types.

(seen from http://bugs.python.org/issue11564)

--
components: Library (Lib)
messages: 143065
nosy: alexandre.vassalotti, pitrou
priority: normal
severity: normal
status: open
title: pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
type: behavior
versions: Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com