[issue25457] json dump fails for mixed-type keys when sort_keys is specified
Christian Tanzer added the comment: Json keys *are strings*. That‘s why json.dump stringifies all keys. If you want to argue that this behavior is wrong I wouldn’t protest except for that it breaks extant code. But arguing that sorting the stringified keys would violate user’s expectations or lead to problems down the line makes no sense. The user is asking for an object with string keys and they want the keys sorted. That is unambiguous and well defined. Neither does adding a second argument make any sense, it would just increase the confusion. My problem was that Python 3.x threw an exception about this for a complex json object in a context where it was not at all obvious what was going on. And the code in question had worked for years in Python 2. This bug report is many, many years old and I don’t much care one way or another but I am very sad that Practicality beats purity got utterly lost in and after the transition to Python 3. Christian Tanzer > On 10.07.2021, at 16:12, Andrei Kulakov wrote: > > Andrei Kulakov added the comment: > > Some observations: > > - sort_keys arg does a deep sorting of nested dictionaries. It's a bit too > much to ask users to do this type of preprocessing manually before dumping to > json. > > - the error doesn't seem too onerous to me. 'unorderable types: str() < > int()' If uncertain, a user can go to interactive shell and try `1 < "2"`, > and then the issue is obvious. > > - to me, current behaviour seems preferable to silently guessing that users > wants stringified sorting, especially since it can surface as a problem way > down the line. > > - what makes this issue interesting is that in roughly half of cases (I'm > guessing) the user will want object sorted and then cast to string and would > be surprised if the reverse happened, and in the other half cases the user > would want them stringified, then sorted, and would be surprised if that > didn't happen. > > It depends on the perspective: you may think of the json as a representation > of a dict of objects, that just happen to be in json format; or you can think > of it as a json document with string keys (of course) that just happen to > come from a dict of objects. Both can be valid depending on the use case. > > Given all of this, I propose keeping current behavior for the existing arg, > and adding another arg for 'stringify then sort' behavior. Then we'll have no > silent guessing and the "unorderable" type error message can point the user > directly to the new argument. > > If the user reads the docs before using this method, they will see two clear > options with respective tradeoffs and decide which one to use. > > So either by reading the docs or running into the error, the user will have a > clear explanation and a clear and convenient solution. > > -- > nosy: +andrei.avk > > ___ > Python tracker > <https://bugs.python.org/issue25457> > ___ -- ___ Python tracker <https://bugs.python.org/issue25457> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22005] datetime.__setstate__ fails decoding python2 pickle
Christian Tanzer added the comment: Paul Ganssle wrote at Fri, 07 Dec 2018 17:22:36 +: > > Gregory P. Smith (gregory.p.smith) 2017-03-02 18:57 > > TL;DR - Just one more example of why nobody should *ever* use pickle > > under any circumstances. It is useless for data that is not transient > > for consumption by the same exact versions of all software that > > created it. > > This *is* something that users can work around by not abusing pickle > in this way and instead using a proper cross-platform serialization > format. I realize that that makes it *more difficult* for some people > to do so, but as Gregory points out, these people are doing dangerous > stuff that will break in a way that we are not going to be willing or > able to fix at some point *anyway*. This is completely and utterly wrong, to put it mildly. The official documentation of the pickle module states (I checked 2.7 and 3.7): The pickle serialization format is guaranteed to be backwards compatible across Python releases. Considering that this issue is 4.5 years old, one would assume that the pickle documentation would have been changed in the meantime if Gregory's and Paul's view matched reality. But my or your personal views about the usability of pickle don't matter anyway. There are too many libraries and applications that have been using pickle for many years. I personally know about this kind of usage in applications since 1998. In that particular case, the pickled information resides on machines owned by the customers of the applications and **must** be readable by any new version of the application no matter how old the file containing the pickle is. Rewriting history by some Python developers is not going to impress the companies involved! Have a nice day! -- ___ Python tracker <https://bugs.python.org/issue22005> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25457] json dump fails for mixed-type keys when sort_keys is specified
Christian Tanzer added the comment: Aaron Hall wrote at Sun, 20 May 2018 16:49:06 +: > Now that dicts are sortable, does that make the sort_keys argument redundant? > > Should this bug be changed to "won't fix"? https://bugs.python.org/issue25457#msg317216 is as good an answer as I could give. Considering that I openend the bug more than 2.5 years ago, it doesn't really matter though. -- ___ Python tracker <https://bugs.python.org/issue25457> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22005] datetime.__setstate__ fails decoding python2 pickle
Christian Tanzer added the comment: This issue is getting old. Is there any way to solve this for Python 3.6? -- ___ Python tracker <http://bugs.python.org/issue22005> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25545] email parsing docs: clarify that only ASCII strings are supported
Christian Tanzer added the comment: Terry J. Reedy wrote at Fri, 06 Nov 2015 22:49:57 +: > email parsing docs: clarify that only ASCII strings are supported If that is the decision, `message_from_string` should raise an exception if it gets a non-ASCII argument! -- ___ Python tracker <http://bugs.python.org/issue25545> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25545] email parsing docs need to be clear that only ASCII strings are supported
Christian Tanzer added the comment: > If you can suggest ways of improving the string support without > breaking existing python3 code that may be using it (most likely > wrongly, but working for them), then I will happily review them. At the moment, I'm mainly interested in having code that runs correctly in both python2.7 and python3. Having the same method behave totally differently in the two versions is what triggered this bug report. Adding new methods won't help with 2.7. > To do what you appear to want, to be able to represent non-ascii as > the equivalent unicode *cannot work*, because email messages may > contain binary data which *cannot* be represented in printable > unicode. I have no problem whatsoever if, and would actually expect that, binary message parts are encoded as necessary for RFS compliance. My beef is with message parts that are text and are naturally represented as unicode not as charset- and transfer-encoded 7-bit strings! I also don't see how such a representation would break existing python3 code but that might just be another example of famous last words. > But, making unicode easier is one big reason python3 exists (the > biggest one, in practice). >From what I have seen up to now, that has failed (spectacularly, in my opinion, if you consider things like unpickling python2-created pickles with binary strings, e.g., datetime instances). Using unicode in python2 worked well enough although there was the problem that one couldn't specify which strings were supposed to be binary. Exactly those strings are a big problem for code that wants to run in both python2 and python3. python3 solves the problem of binary strings, though badly because of the various missing string functions. But there seem to be bugs all over the standard library and in third party modules. That library APIs still haven't settled down yet in python3 is even worse! Maybe python3 would work well if one threw away all existing code and started with completely new code but I don't think that was the intention. -- ___ Python tracker <http://bugs.python.org/issue25545> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25545] email parsing docs need to be clear that only ASCII strings are supported
Christian Tanzer added the comment: > Yes, the port from python2 to python3 of the email package > was...suboptimal. > ... > The whole concept of using unicode as a 7bit data channel only is > just...weird. +100 to both. > But, we are now stuck with maintaining that API for backward > compatibility reasons. That's a weird definition of backward compatibility, though. The API breaks backward compatibility to Python 2. Any Python 3 user shouldn't use the broken API anyway, IMHO. > To fix it, I rewrote significant parts of the email package, which > is the new API. Which unfortunately isn't any help if one needs to stay compatible to 2.7. > It also is...fraught with the danger of bugs...to talk about > serializing an email message as a string, transforming it, and then > trying to re-parse it as an email message. If your transformations > are simple, it will probably work, but anything at all complex runs > the risk of breaking the message. One of Python's mottos used to be: We are all consenting adults here. But there are other uses for converting a message instance to a unicode string. Display, printing, and grepping come to mind. > And having non-ascii bodies counts as non-trivial. For anybody living in a non-ascii country that statement sounds **very strange**. To start with, I have many friends with names that contain non-ascii characters. > You do have to conditionalize your 2/3 code to use the bytes parser > and generator if you are dealing with 8-bit messages. There's just no > way around that. I did that yesterday. There are problems with that though: * Recognizing the problem for what it is. Trying to run Python 2.7 code that *should* run under 3.5 but breaks with weird errors wastes a lot of time. Multiply with the number of Python programmers that want to migrate and you get a problem. If `message_as_string` and `as_string` just weren't there in 3.x it would be much less of a problem (clear documentation would also help but not as much). * Lots of ugly workarounds for the same problem. Most of them (mine certainly included) are done quick and ad-hoc and probably break in many ways. The question then arises: why should one use the email package at all. But of course that way lies madness. Just more roadblocks for the move to Python 3. -- ___ Python tracker <http://bugs.python.org/issue25545> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25545] email parsing docs need to be clear that only ASCII strings are supported
Christian Tanzer added the comment: R. David Murray wrote at Wed, 04 Nov 2015 15:36:27 +: > There is no problem with supporting both 2.7 and python3 with the same > email API as long as your input strings are ASCII only, which is what > is required by the email RFCs (as I said, they do not support > unicode...even the new one only supports utf8 (a unicode encoding) not > unicode itself). You are talking about byte strings. And of course the email RFCs only talk about byte strings. But the email package offers the use of unicode strings for various functions, including `email.message_from_string`, `email.Message.as_string`, and `email.Message.__str__`. These functions could be useful (and were useful in Python 2) but aren't in Python 3. Assume I load an email satisfying all relevant RFCs from a file. Say that email contains three MIMEText parts with content-transfer-encoding "8bit", all with different encodings: * I don't see any use for `as_string` to obfuscate that by re-encoding each of the three to content-transfer-encoding "base64", which is completely unreadable when it could be converted painlessly to a real unicode string. One of my usage scenarios is something of the form:: >>> print(msg) Of course, in this case I'll better use `utf-8` as my output encoding otherwise the print might fail. If I wanted to output a RFC-compliant byte string, I should have used `as_bytes`, not `as_string`. But that would be a different usage scenario. * The same argument applies in reverse to `message_from_string`. If one wants RFC compliance one should use `message_from_bytes`. But if one builds up a unicode string for an email in Python, it should be possible to convert that to a `email.Message` instance via `message_from_string`. I have several use cases where I want to convert an `email.Message` to a unicode string without any embedded content-transfer-encodings like "base64", do some transformations on that string and then convert that back into an `email.Message` instance. > I have an extensive doc rewrite in process, but I'm not sure when it > will land. I thought I had already added the note about ASCII-only to > the parser docs, but I see that I did not. I'll reopen this issue to > remind myself to do that, since the doc rewrite will only apply to 3.6 > (when the new API will no longer be provisional). I don't see any point in the semantics of the string-functions as they are currently implemented, after all one can do things like easily `message_from_string(...).decode("latin-1")` or `msg.as_bytes().encode("latin-1")` if one really wants to convert an RFC-compatible byte-string to/from unicode strings as-is. But this as-is conversion normally isn't very useful because it isn't * human-readable * well suited to search and replace operations or any other text transformations So documenting the current situation would improve the situation slightly but it's more like putting lipstick on a pig. -- ___ Python tracker <http://bugs.python.org/issue25545> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25545] email.message.get_payload returns wrong encoding
Christian Tanzer added the comment: R. David Murray wrote at Tue, 03 Nov 2015 19:59:53 +: > Your problem is that your input email is ia unicode string. A unicode > string has no RFC defintion as an email, so things do not work right, > as you observed. Whether or not email should throw an error when fed > a non-ascii unicode string is an interesting question, but it hasn't > in the past and so for backward compatibility reasons we won't change > that. Excuse me, I am using `email.message_from_string` which is documented to convert a unicode string to an email object. If you are serious `message_from_string` should not even exist! As long as it is there and documented as:: email.message_from_string(s, _class=email.message.Message, *, policy=policy.compat32) Return a message object structure from a string. This is exactly equivalent to Parser().parsestr(s). _class and policy are interpreted as with the Parser class constructor. Changed in version 3.3: Removed the strict argument. Added the policy keyword. your argument is unfounded and this is definitely a serious bug! > You might also be interested in the newer email API, currently > documented in the 'contentmanager' and 'policy' chapters of the > documentation. It says it is provisional, but the changes (other than > bug fixes) between the current API and what will be final in 3.6 are > trivial. I'm using Python 2.7 and only just exploring 3.5. Unfortunately, there are many bugs and your response is a typical example why moving from 2.7 to 3.x is hard. There is gratuitous breakage but the reaction is:: resolution: -> not a bug I would ask you to reconsider that stance. As long as my code needs to support 2.7, use of any new API doesn't fly. After an eventual switch to 3.5 (probably years in the future), I might use new APIs for new code but changing existing code that used to work won't be in the cards > get_content_charset is None because you don't have any actual headers > in your message, just body. This is because of the leading newline in > your triple quoted string, which the email package takes as the end of > the headers. Thanks for the hint. BTW, removing the leading newline doesn't change the buggy behavior of `message_from_string`! -- ___ Python tracker <http://bugs.python.org/issue25545> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25545] email.message.get_payload returns wrong encoding
New submission from Christian Tanzer: For an email message with `Content-type: text/plain; charset=utf-8`, in Python 3.5, get_payload returns a bytes object encoded with `latin-1`. Python 2.7 returns a str object encoded with `utf-8` as expected. Running the attached test script `email_get_payload__test.py` with Python 2.7 and 3.5 shows the difference. Python 2.7:: 2.7.10.final.0 *** utf8 *** From: Christian Tanzer To: Christian Tanzer Content-type: text/plain; charset=utf-8 Sehr geehrte Damen und Herren, ... Danke und mit freundlichen Grüssen, -- Christian Tanzerhttp://www.c-tanzer.at/ Python 3.5:: 3.5.0.final.0 *** latin-1 *** From: Christian Tanzer To: Christian Tanzer Content-type: text/plain; charset=utf-8 Sehr geehrte Damen und Herren, ... Danke und mit freundlichen Grüssen, -- Christian Tanzerhttp://www.c-tanzer.at/ In both Python versions, `msg.get_content_charset()` returns None, which is not correct, either. -- components: Library (Lib) files: email_get_payload__test.py messages: 253994 nosy: tan...@swing.co.at priority: normal severity: normal status: open title: email.message.get_payload returns wrong encoding type: behavior versions: Python 3.5 Added file: http://bugs.python.org/file40934/email_get_payload__test.py ___ Python tracker <http://bugs.python.org/issue25545> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25457] json dump fails for mixed-type keys when sort_keys is specified
Christian Tanzer added the comment: Josh Rosenberg wrote at Fri, 23 Oct 2015 02:56:30 +: > As a workaround (should you absolutely need to sort keys by some > arbitrary criteria), you can initialize a collections.OrderedDict from > the sorted items of your original dict (using whatever key function > you like), then dump without using sort_keys=True. Sigh... I already implemented a workaround but it's not as simple as you think — the dictionary in question is nested. The problem is that this is just another unnecessary difficulty when trying to move to Python 3.x. -- ___ Python tracker <http://bugs.python.org/issue25457> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25457] json dump fails for mixed-type keys when sort_keys is specified
Christian Tanzer added the comment: Josh Rosenberg wrote at Fri, 23 Oct 2015 02:45:51 +: > The Python 2 sort order is a result of the "arbitrary but consistent > fallback comparison" (omitting details, it's comparing the names of > the types), thus the "strange" sort order. Thanks. I knew that. > Python 3 (justifiably) said that incomparable types should be > incomparable rather than silently behaving in non-intuitive ways, > hiding errors. "justifiably" is debatable. I consider the change ill-conveived. Displaying a dictionary (or just its keys) in a readable, or just reproducible, way is useful in many contexts. Python 3 broke that for very little, INMNSHO, gain. I consider "hiding errors" a myth, to say it politely! > Python is being rather generous by allowing non-string keys, because > the JSON spec ( http://json.org/ ) only allows the keys ("names" in > JSON parlance) to be strings. So you're already a bit in the weeds as > far as compliant JSON goes if you have non-string keys. There are two possibilities: 1) Accepting non-string keys is intended. Then `sort_keys` shouldn't break like it does. As far as JSON goes, the output of `json.dump[s]` contains string keys. 2) Accepting non-string keys is a bug. Then `json.dump[s]` should be changed to not accept them. Mixing both approaches is the worst of all worlds. > Since mixed type keys lack meaningful sort order, I'm not sure it's > wrong to reject attempts to sort them. The documentation says: If sort_keys is True (default False), then the output of dictionaries will be sorted by key; this is useful for regression tests to ensure that JSON serializations can be compared on a day-to-day basis. **Reproducible** is the keyword here. **Readability** is another one. Even if the sort order is "strange", it is much better than random order, if you are looking for a specific key. For the record, it was a test failing under Python 3.5 that triggered this bug report. > > As all they keys are dumped as strings, a simple solution would be to > > sort after converting to strings. > Converting to string is as > arbitrary and full of potential for silently incorrect comparisons as > the Python 2 behavior, and reintroducing it seems like a bad idea. json.dumps already does the conversion:: >>> json.dumps({1 : 42, "foo" : "bar", None : "nada"}) '{"foo": "bar", "1": 42, "null": "nada"}' Another run:: >>> json.dumps({1 : 42, "foo" : "bar", None : "nada"}) '{"1": 42, "foo": "bar", "null": "nada"}' That difference is exactly the reason for `sort_keys`. -- ___ Python tracker <http://bugs.python.org/issue25457> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25457] json dump fails for mixed-type keys when sort_keys is specified
New submission from Christian Tanzer: In Python 3, trying to json-dump a dict with keys of different types fails with a TypeError when sort_keys is specified: python2.7 === Python 2.7.10 (default, May 29 2015, 10:02:30) [GCC 4.8.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import json >>> json.dumps({1 : 42, "foo" : "bar", None : "nada"}, sort_keys = True) '{"null": "nada", "1": 42, "foo": "bar"}' python3.5 Python 3.5.0 (default, Oct 5 2015, 12:03:13) [GCC 4.8.5] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import json >>> json.dumps({1 : 42, "foo" : "bar", None : "nada"}, sort_keys = True) Traceback (most recent call last): File "", line 1, in File "/usr/lib64/python3.5/json/__init__.py", line 237, in dumps **kw).encode(obj) File "/usr/lib64/python3.5/json/encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "/usr/lib64/python3.5/json/encoder.py", line 257, in iterencode return _iterencode(o, 0) TypeError: unorderable types: str() < int() Note that the documentation explicitly allows keys of different, if basic, types: If skipkeys is True (default: False), then dict keys that are not of a basic type (str, int, float, bool, None) will be skipped instead of raising a TypeError. As all they keys are dumped as strings, a simple solution would be to sort after converting to strings. Looking closely at the output of Python 2, the sort order is a bit strange! -- components: Library (Lib) messages: 253324 nosy: tan...@swing.co.at priority: normal severity: normal status: open title: json dump fails for mixed-type keys when sort_keys is specified type: behavior versions: Python 3.5 ___ Python tracker <http://bugs.python.org/issue25457> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22005] datetime.__setstate__ fails decoding python2 pickle
Christian Tanzer added the comment: Alexander Belopolsky wrote at Thu, 15 Oct 2015 17:56:42 +: > I don't think your solution will work for date/time/datetime pickles. > There are many values for which pickle payload consists of bytes > within 0-127 range. H. > IIUC, you propose to decode those to Python 3 > strings using ASCII encoding. Yes. There are too many BINSTRING instances that need to be Python 3 strings. > This will in turn require accepting str > type in date/time/datetime constructors. These datetime... constructors are strange beasts already. The documentation says that three integer arguments are required for datetime.datetime but it accepts a single bytes argument anyway. I agree that it would be much nicer if there was a datetime.datetime.load method instead. Unfortunately, that would require Guido's time machine to go back all the way to 2003 (at least). So yes, the only practical solution is to accept a single str typed argument (as long as it is ASCII only). An alternative would be to add a dispatch table for loading functions to Python 3's pickle that would be used by load_global. That would add indirection for the datetime constructors but would allow support for other types requiring arguments of type bytes. The change I proposed in http://bugs.python.org/issue22005#msg253042 to fix the handling of binary 8-bit strings is still necessary. To summarize: IMHO the solution needs to be implemented in Python 3 — otherwise pickles with binary strings created by Python 2.x cannot be loaded in Python 3. Changing the pickle implementation of Python 2 doesn't fix existing pickles and couldn't fix the general problem of binary strings, anyway. -- ___ Python tracker <http://bugs.python.org/issue22005> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22005] datetime.__setstate__ fails decoding python2 pickle
Christian Tanzer added the comment: IMNSHO, the problem lies in the Python 3 pickle.py and it is **not** restricted to datetime instances (for a detailed rambling see http://c-tanzer.at/en/breaking_py2_pickles_in_py3.html) . In Python 2, 8-bit strings are used for text and for binary data. Well designed applications will use unicode for all text, but Python 2 itself forces some text to be 8-bit string, e.g., names of attributes, classes, and functions. In other words, **any 8-bit strings explicitly created by such an application will contain binary data.** In Python 2, pickle.dump uses BINSTRING (and SHORT_BINSTRING) for 8-bit strings; Python 3 uses BINBYTES (and SHORT_BINBYTES) instead. In Python 3, pickle.load should handle BINSTRING (and SHORT_BINSTRING) like this: * convert ASCII values to `str` * convert non-ASCII values to `bytes` `bytes` is Python 3's equivalent to Python 2's 8-bit string! It is only because of the use of 8-bit strings for Python 2 names that the mapping to `str` is necessary but all such names are guaranteed to be ASCII! I would propose to change `load_binstring` and `load_short_binstring` to call a function like:: def _decode_binstring(self, value): # Used to allow strings from Python 2 to be decoded either as # bytes or Unicode strings. This should be used only with the # BINSTRING and SHORT_BINSTRING opcodes. if self.encoding != "bytes": try : return value.decode("ASCII") except UnicodeDecodeError: pass return value instead of the currently called `_decode_string`. -- ___ Python tracker <http://bugs.python.org/issue22005> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22005] datetime.__setstate__ fails decoding python2 pickle
Christian Tanzer added the comment: > The code works when using encoding='bytes'. Thanks Tim for the suggestion. > So this is not a bug, but is there any sense in having encoding='ASCII' by > default in pickle ? It is most definitely a bug. And it adds another road block to moving python applications from 2.7 to 3.x! encoding='bytes' has serious side effects and isn't useful in the general case. For instance, it will result in dict-keys being unpickled as bytes instead of as str after which hilarity ensues. I got the exception UnicodeDecodeError: 'ascii' codec can't decode byte 0xdf in position 1: ordinal not in range(128) when testing an application for compatibility in Python 3.5 on a pickle created by Python 2.7. The pickled data is a nested data structure and it took me quite a while to determine that the single datetime instance was the culprit. Here is a small test case that reproduces the problem:: # -*- coding: utf-8 -*- # pickle_dump.py import datetime, pickle, uuid dti = datetime.datetime(2015, 10, 12, 13, 17, 42, 123456) data = { "ascii" : "abc", "text" : u"äbc", "int" : 42, "date-time" : dti } with open("/tmp/pickle.test", "wb") as file : pickle.dump(data, file, protocol=2) # pickle_load.py # -*- coding: utf-8 -*- import pickle with open("/tmp/pickle.test", "rb") as file : data = pickle.load(file) print(data) $ python2.7 pickle_dump.py $ python2.7 pickle_load.py {'ascii': 'abc', 'text': u'\xe4bc', 'int': 42, 'date-time': datetime.datetime(2015, 10, 12, 13, 17, 42, 123456)} $ python3.5 pickle_load.py Traceback (most recent call last): File "pickle_load.py", line 6, in data = pickle.load(file) UnicodeDecodeError: 'ascii' codec can't decode byte 0xdf in position 1: ordinal not in range(128) That error message is spectacularly useless. -- nosy: +tan...@swing.co.at ___ Python tracker <http://bugs.python.org/issue22005> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com