On 13 Mar 2012, at 04:44, Merlijn van Deen wrote: > http://bugs.python.org/issue6784 ("byte/unicode pickle > incompatibilities between python2 and python3") > > Hello all, > > Currently, pickle unpickles python2 'str' objects as python3 'str' > objects, where the encoding to use is passed to the Unpickler. > However, there are cases where it makes more sense to unpickle a > python2 'str' as python3 'bytes' - for instance when it is actually > binary data, and not text. > > Currently, the mapping is as follows, when reading a pickle: > python2 'str' -> python3 'str' (using an encoding supplied to Unpickler) > python2 'unicode' -> python3 'str' > > or, when creating a pickle using protocol <= 2: > python3 'str' -> python2 'unicode' > python3 'bytes' -> python2 '__builtins__.bytes object' >
It does seem unfortunate that by default it is impossible for a developer to "do the right thing" as regards pickling / unpickling here. Binary data on Python 2 being unpickled as Unicode on Python 3 is presumably for the convenience of developers doing the *wrong thing* (and only works for ascii anyway). All the best, Michael Foord > This issue suggests to add a flag to change the behaviour as follows: > a) python2 'str' -> python3 'bytes' > b) python3 'bytes' -> python2 'str' > > The question on this is how to pass this flag. To quote Antoine (with > permission) on my mail about this issue on core-mentorship: > >> I haven't answered because I'm unsure about the approach itself - do we >> want to add yet another argument to pickle methods, especially this late >> in the 3.x development cycle? > > > Currently, I have implemented it using an extra argument for the > Pickler and Unpickler objects ('bytestr'), which toggles the > behaviour. I.e.: >>>> pickled = Pickler(data, bytestr=True); unpickled = Unpickler(data, >>>> bytestr=True). > This is the approach used in pickle_bytestr.patch [1] > > Another option would be to implement a seperate Pickler/Unpickler > object, such that >>>> pickled = BytestrPickler(data, bytestr=True); unpickled = >>>> BytestrUnpickler(data, bytestr=True) > This is the approach I initially implemented [2]. > > Alternatively, there is the option only to implement the Unpickler, > leaving the Pickler as it is. This allows >>>> unpickled = Unpickler(data, encoding=bytes) > where the bytes type is used as a special 'flag'. > > And, of course, there is the option not to implement this in the stdlib at > all. > > > What are your ideas on this? > > Best, > Merlijn > > [0] http://bugs.python.org/issue6784 > [1] http://bugs.python.org/file24719/pickle_bytestr.patch > [2] https://github.com/valhallasw/py2/blob/master/bytestrpickle.py > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com