[issue13505] Bytes objects pickled in 3.x with protocol =2 are unpickled incorrectly in 2.x

2011-12-13 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 14695b4825dc by Alexandre Vassalotti in branch '3.2':
Issue #13505: Make pickling of bytes object compatible with Python 2.
http://hg.python.org/cpython/rev/14695b4825dc

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13505
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13505] Bytes objects pickled in 3.x with protocol =2 are unpickled incorrectly in 2.x

2011-12-13 Thread Alexandre Vassalotti

Alexandre Vassalotti alexan...@peadrop.com added the comment:

Fixed. Thanks for the patch!

--
assignee:  - alexandre.vassalotti
resolution:  - fixed
stage: needs patch - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13505
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13505] Bytes objects pickled in 3.x with protocol =2 are unpickled incorrectly in 2.x

2011-12-12 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 Only worry is that codecs.latin_1_encode.__module__ is '_codecs', and
 _codecs is undocumented.

It seems we have to choose between two evils here. Given that the 
codecs.latin_1_encode produces more compact pickles, I'd say go for it.

Note that for the empty bytes object (b), the encoding can be massively 
simplified by simply calling bytes() with no argument.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13505
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13505] Bytes objects pickled in 3.x with protocol =2 are unpickled incorrectly in 2.x

2011-12-12 Thread sbt

sbt shibt...@gmail.com added the comment:

I now realise latin_1_encode won't work because it returns a pair (bytes_obj, 
length).

I have done a patch using _codecs.encode instead -- the pickles turn out to be 
exactly the same size anyway.

 pickletools.dis(pickle.dumps(babc, 2))
0: \x80 PROTO  2
2: cGLOBAL '_codecs encode'
   18: qBINPUT 0
   20: XBINUNICODE 'abc'
   28: qBINPUT 1
   30: XBINUNICODE 'latin1'
   41: qBINPUT 2
   43: \x86 TUPLE2
   44: qBINPUT 3
   46: RREDUCE
   47: qBINPUT 4
   49: .STOP

--
Added file: http://bugs.python.org/file23938/issue13505-codecs-encode.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13505
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13505] Bytes objects pickled in 3.x with protocol =2 are unpickled incorrectly in 2.x

2011-12-12 Thread Meador Inge

Meador Inge mead...@gmail.com added the comment:

On Sun, Dec 11, 2011 at 12:17 PM, sbt rep...@bugs.python.org wrote:

 I don't really know that much about pickle, but Antoine mentioned that 
 'bytearray'
 works fine going from 3.2 to 2.7.  Given that, can't we just compose 'bytes' 
 with
 'bytearray'?

 Yes, although it would only work for 2.6 and 2.7.

Which is fine.  'bytes' and byte literals were not introduced until
2.6 [1,2].  So *any* solution we come
up with is for = 2.6.

 They also produce more compact pickles, particularly codecs.latin_1_encode().

Now that is a better argument.

[1] http://www.python.org/dev/peps/pep-0358/
[2] http://www.python.org/dev/peps/pep-3112/

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13505
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13505] Bytes objects pickled in 3.x with protocol =2 are unpickled incorrectly in 2.x

2011-12-12 Thread sbt

sbt shibt...@gmail.com added the comment:

 Which is fine.  'bytes' and byte literals were not introduced until
 2.6 [1,2].  So *any* solution we come
 up with is for = 2.6.

In 2.6 and 2.7, bytes is just an alias for str.  In all 2.x versions with 
codecs.encode, the result will be str.  (Although I haven't actually tested 
earlier than 2.6.)

Python 2.6.5 (r265:79063, Jun 12 2010, 17:07:01)
[GCC 4.3.4 20090804 (release) 1] on cygwin
Type help, copyright, credits or license for more information.
 import pickle
 pickle.loads('\x80\x02c_codecs\nencode\nq\x00X\x03\x00\x00\x00abcq\x01X\x06\x00\x00\x00latin1q\x02\x86q\x03Rq\x04.')
'abc'
 type(_)
type 'str'

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13505
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13505] Bytes objects pickled in 3.x with protocol =2 are unpickled incorrectly in 2.x

2011-12-11 Thread sbt

sbt shibt...@gmail.com added the comment:

 I don't really know that much about pickle, but Antoine mentioned that 
 'bytearray'
 works fine going from 3.2 to 2.7.  Given that, can't we just compose 'bytes' 
 with
 'bytearray'?

Yes, although it would only work for 2.6 and 2.7.

codecs.encode() seems to be available back to 2.4 and codecs.latin_1_encode() 
back to at least 2.0.  They also produce more compact pickles, particularly 
codecs.latin_1_encode().

 class Bytes(bytes):
... def __reduce__(self):
... return latin_1_encode, (latin_1_decode(self),)
...
[70922 refs]
 pickletools.dis(pickle.dumps(Bytes(b'abc'), 2))
0: \x80 PROTO  2
2: cGLOBAL '_codecs latin_1_encode'
   26: qBINPUT 0
   28: XBINUNICODE 'abc'
   36: qBINPUT 1
   38: KBININT13
   40: \x86 TUPLE2
   41: qBINPUT 2
   43: \x85 TUPLE1
   44: qBINPUT 3
   46: RREDUCE
   47: qBINPUT 4
   49: .STOP
highest protocol among opcodes = 2

Only worry is that codecs.latin_1_encode.__module__ is '_codecs', and _codecs 
is undocumented.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13505
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13505] Bytes objects pickled in 3.x with protocol =2 are unpickled incorrectly in 2.x

2011-12-10 Thread Meador Inge

Meador Inge mead...@gmail.com added the comment:

I don't really know that much about pickle, but Antoine mentioned that 
'bytearray'
works fine going from 3.2 to 2.7.  Given that, can't we just compose 'bytes' 
with
'bytearray'?  Something like:

Python 3.3.0a0 (default:aab45b904141+, Dec 10 2011, 13:34:41)
[GCC 4.6.2 20111027 (Red Hat 4.6.2-1)] on linux
Type help, copyright, credits or license for more information.
...
 class Bytes(bytes):
...def __reduce__(self):
...   return bytes, (bytearray(self),)
... 
 pickletools.dis(pickle.dumps(Bytes(b'abc'), protocol=2))
0: \x80 PROTO  2
2: cGLOBAL '__builtin__ bytes'
   21: qBINPUT 0
   23: cGLOBAL '__builtin__ bytearray'
   46: qBINPUT 1
   48: XBINUNICODE 'abc'
   56: qBINPUT 2
   58: XBINUNICODE 'latin-1'
   70: qBINPUT 3
   72: \x86 TUPLE2
   73: qBINPUT 4
   75: RREDUCE
   76: qBINPUT 5
   78: \x85 TUPLE1
   79: qBINPUT 6
   81: RREDUCE
   82: qBINPUT 7
   84: .STOP
highest protocol among opcodes = 2
 pickle.dumps(Bytes(b'abc'), protocol=2)
b'\x80\x02c__builtin__\nbytes\nq\x00c__builtin__\nbytearray\nq\x01X\x03\x00\x00\x00abcq\x02X\x07\x00\x00\x00latin-1q\x03\x86q\x04Rq\x05\x85q\x06Rq\x07.'

[meadori@motherbrain cpython]$ python
Python 2.7.2 (default, Oct 27 2011, 01:40:22) 
[GCC 4.6.1 20111003 (Red Hat 4.6.1-10)] on linux2
Type help, copyright, credits or license for more information.
...
 pickle.loads(b'\x80\x02c__builtin__\nbytes\nq\x00c__builtin__\nbytearray\nq\x01X\x03\x00\x00\x00abcq\x02X\x07\x00\x00\x00latin-1q\x03\x86q\x04Rq\x05\x85q\x06Rq\x07.')
'abc'

If this method is OK, then the patch is pretty simple.  See attached.

--
keywords: +patch
Added file: http://bugs.python.org/file23907/issue13505-0.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13505
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13505] Bytes objects pickled in 3.x with protocol =2 are unpickled incorrectly in 2.x

2011-12-09 Thread sbt

sbt shibt...@gmail.com added the comment:

 sbt, the bug is not that the encoding is inefficient. The problem is we 
 cannot unpickle bytes streams from Python 3 using Python 2.

Ah.  Well you can do it using codecs.encode.

Python 3.3.0a0 (default, Dec  8 2011, 17:56:13) [MSC v.1500 32 bit (Intel)] on 
win32
Type help, copyright, credits or license for more information.
 import pickle, codecs

 class MyBytes(bytes):
... def __reduce__(self):
... return codecs.encode, (self.decode('latin1'), 'latin1')
...
 pickle.dumps(MyBytes(bhello), 2)
b'\x80\x02c_codecs\nencode\nq\x00X\x05\x00\x00\x00helloq\x01X\x06\x00\x00\x00latin1q\x02\x86q\x03Rq\x04.'

Actually, I notice that array objects created by Python 3 are not decodable on 
Python 2.  See Issue 13566.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13505
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13505] Bytes objects pickled in 3.x with protocol =2 are unpickled incorrectly in 2.x

2011-12-09 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

  sbt, the bug is not that the encoding is inefficient. The problem is we 
  cannot unpickle bytes streams from Python 3 using Python 2.
 
 Ah.  Well you can do it using codecs.encode.

Great. A bit hackish but functional and not too inefficient (50% average
expansion).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13505
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13505] Bytes objects pickled in 3.x with protocol =2 are unpickled incorrectly in 2.x

2011-12-08 Thread Alexandre Vassalotti

Alexandre Vassalotti alexan...@peadrop.com added the comment:

sbt, the bug is not that the encoding is inefficient. The problem is we cannot 
unpickle bytes streams from Python 3 using Python 2.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13505
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13505] Bytes objects pickled in 3.x with protocol =2 are unpickled incorrectly in 2.x

2011-12-06 Thread sbt

sbt shibt...@gmail.com added the comment:

 One *dirty* trick I am thinking about would be to use something like 
 array.tostring() to construct the byte string.

array('B', ...) objects are pickled using two bytes per character, so there 
would be no advantage:

   pickle.dumps(array.array('B', bhello), 2)
  
b'\x80\x02carray\narray\nq\x00X\x01\x00\x00\x00Bq\x01]q\x02(KhKeKlKlKoe\x86q\x03Rq\x04.'

--
nosy: +sbt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13505
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13505] Bytes objects pickled in 3.x with protocol =2 are unpickled incorrectly in 2.x

2011-12-05 Thread Alexandre Vassalotti

Alexandre Vassalotti alexan...@peadrop.com added the comment:

I think we are kind of stuck here. I might need to rely on some clever hack to 
generate the desired str object in 2.7 without breaking the bytes support in 
3.3 and without changing 2.7 itself.

One *dirty* trick I am thinking about would be to use something like 
array.tostring() to construct the byte string.

from array import array

class bytes:
def __reduce__(self):
return (array.tostring, (array('B', self),))

Of course, this doesn't work because pickle doesn't method pickling. But, maybe 
someone can figure out a way around this... I don't know.

Also, this is a bit annoying to fix since we changed the semantic meaning of 
the STRING opcodes in 3.x---i.e., it now represents a unicode string instead of 
a byte string.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13505
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13505] Bytes objects pickled in 3.x with protocol =2 are unpickled incorrectly in 2.x

2011-11-30 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

After a bit of testing, my idea was flawed, as str() doesn't accept an encoding 
parameter in 2.x: `str(u'foo', 'latin1')` simply raises a TypeError.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13505
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13505] Bytes objects pickled in 3.x with protocol =2 are unpickled incorrectly in 2.x

2011-11-29 Thread Antoine Pitrou

New submission from Antoine Pitrou pit...@free.fr:

In Python 3.2:

 pickle.dumps(b'xyz', protocol=2)
b'\x80\x02c__builtin__\nbytes\nq\x00]q\x01(KxKyKze\x85q\x02Rq\x03.'

In Python 2.7:

 pickle.loads(b'\x80\x02c__builtin__\nbytes\nq\x00]q\x01(KxKyKze\x85q\x02Rq\x03.')
'[120, 121, 122]'

The problem is that the bytes() constructor argument is a list of ints, which 
gives a different result when reconstructed under 2.x where bytes is an alias 
of str:

 pickletools.dis(pickle.dumps(b'xyz', protocol=2))
0: \x80 PROTO  2
2: cGLOBAL '__builtin__ bytes'
   21: qBINPUT 0
   23: ]EMPTY_LIST
   24: qBINPUT 1
   26: (MARK
   27: KBININT1120
   29: KBININT1121
   31: KBININT1122
   33: eAPPENDS(MARK at 26)
   34: \x85 TUPLE1
   35: qBINPUT 2
   37: RREDUCE
   38: qBINPUT 3
   40: .STOP
highest protocol among opcodes = 2

Bytearray objects use a different trick: they pass a (unicode string, encoding) 
pair which has the same constructor semantics under 2.x and 3.x. Additionally, 
such encoding is statistically more efficient: a list of 1-byte ints will take 
2 bytes per encoded char, while a latin1-to-utf8 transcoded string (BINUNICODE 
uses utf-8) will take on average 1.5 bytes per encoded char (assuming a 50% 
probability of higher-than-127 bytes).

 pickletools.dis(pickle.dumps(bytearray(b'xyz'), protocol=2))
0: \x80 PROTO  2
2: cGLOBAL '__builtin__ bytearray'
   25: qBINPUT 0
   27: XBINUNICODE 'xyz'
   35: qBINPUT 1
   37: XBINUNICODE 'latin-1'
   49: qBINPUT 2
   51: \x86 TUPLE2
   52: qBINPUT 3
   54: RREDUCE
   55: qBINPUT 4
   57: .STOP
highest protocol among opcodes = 2

--
components: Library (Lib)
messages: 148635
nosy: alexandre.vassalotti, irmen, pitrou
priority: high
severity: normal
status: open
title: Bytes objects pickled in 3.x with protocol =2 are unpickled incorrectly 
in 2.x
type: behavior
versions: Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13505
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13505] Bytes objects pickled in 3.x with protocol =2 are unpickled incorrectly in 2.x

2011-11-29 Thread Meador Inge

Changes by Meador Inge mead...@gmail.com:


--
nosy: +meador.inge
stage:  - needs patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13505
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com