Christian Tanzer added the comment:
IMNSHO, the problem lies in the Python 3 pickle.py and it is **not** restricted
to datetime instances
(for a detailed rambling see
http://c-tanzer.at/en/breaking_py2_pickles_in_py3.html) .
In Python 2, 8-bit strings are used for text and for binary data. Well designed
applications will use unicode for all text, but Python 2 itself forces some
text to be 8-bit string, e.g., names of attributes, classes, and functions. In
other words, **any 8-bit strings explicitly created by such an application will
contain binary data.**
In Python 2, pickle.dump uses BINSTRING (and SHORT_BINSTRING) for 8-bit
strings; Python 3 uses BINBYTES (and SHORT_BINBYTES) instead.
In Python 3, pickle.load should handle BINSTRING (and SHORT_BINSTRING) like
this:
* convert ASCII values to `str`
* convert non-ASCII values to `bytes`
`bytes` is Python 3's equivalent to Python 2's 8-bit string!
It is only because of the use of 8-bit strings for Python 2 names that the
mapping to `str` is necessary but all such names are guaranteed to be ASCII!
I would propose to change `load_binstring` and `load_short_binstring` to call a
function like::
def _decode_binstring(self, value):
# Used to allow strings from Python 2 to be decoded either as
# bytes or Unicode strings. This should be used only with the
# BINSTRING and SHORT_BINSTRING opcodes.
if self.encoding != "bytes":
try :
return value.decode("ASCII")
except UnicodeDecodeError:
pass
return value
instead of the currently called `_decode_string`.
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue22005>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com