from:"Christian Tanzer"

[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2021-07-10 Thread Christian Tanzer


Christian Tanzer  added the comment:

Json keys *are strings*. 

That‘s why json.dump stringifies all keys. If you want to argue that this 
behavior is wrong I wouldn’t protest except for that it breaks extant code.

But arguing that sorting the stringified keys would violate user’s expectations 
or lead to problems down the line makes no sense. The user is asking for an 
object with string keys and they want the keys sorted. That is unambiguous and 
well defined.

Neither does adding a second argument make any sense, it would just increase 
the confusion.

My problem was that Python 3.x threw an exception about this for a complex json 
object in a context where it was not at all obvious what was going on. And the 
code in question had worked for years in Python 2.

This bug report is many, many years old and I don’t much care one way or 
another but I am very sad that 

 Practicality beats purity 

got utterly lost in and after the transition to Python 3.

Christian Tanzer

> On 10.07.2021, at 16:12, Andrei Kulakov  wrote:
> 
> Andrei Kulakov  added the comment:
> 
> Some observations:
> 
> - sort_keys arg does a deep sorting of nested dictionaries. It's a bit too 
> much to ask users to do this type of preprocessing manually before dumping to 
> json.
> 
> - the error doesn't seem too onerous to me. 'unorderable types: str() < 
> int()' If uncertain, a user can go to interactive shell and try `1 < "2"`, 
> and then the issue is obvious.
> 
> - to me, current behaviour seems preferable to silently guessing that users 
> wants stringified sorting, especially since it can surface as a problem way 
> down the line.
> 
> - what makes this issue interesting is that in roughly half of cases (I'm 
> guessing) the user will want object sorted and then cast to string and would 
> be surprised if the reverse happened, and in the other half cases the user 
> would want them stringified, then sorted, and would be surprised if that 
> didn't happen.
> 
> It depends on the perspective: you may think of the json as a representation 
> of a dict of objects, that just happen to be in json format; or you can think 
> of it as a json document with string keys (of course) that just happen to 
> come from a dict of objects. Both can be valid depending on the use case.
> 
> Given all of this, I propose keeping current behavior for the existing arg, 
> and adding another arg for 'stringify then sort' behavior. Then we'll have no 
> silent guessing and the "unorderable" type error message can point the user 
> directly to the new argument.
> 
> If the user reads the docs before using this method, they will see two clear 
> options with respective tradeoffs and decide which one to use.
> 
> So either by reading the docs or running into the error, the user will have a 
> clear explanation and a clear and convenient solution.
> 
> --
> nosy: +andrei.avk
> 
> ___
> Python tracker 
> <https://bugs.python.org/issue25457>
> ___

--

___
Python tracker 
<https://bugs.python.org/issue25457>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue22005] datetime.setstate fails decoding python2 pickle

2018-12-09 Thread Christian Tanzer



Christian Tanzer  added the comment:

Paul Ganssle wrote at Fri, 07 Dec 2018 17:22:36 +:

> > Gregory P. Smith (gregory.p.smith) 2017-03-02 18:57
> > TL;DR - Just one more example of why nobody should *ever* use pickle
> > under any circumstances.  It is useless for data that is not transient
> > for consumption by the same exact versions of all software that
> > created it.
>
> This *is* something that users can work around by not abusing pickle
> in this way and instead using a proper cross-platform serialization
> format. I realize that that makes it *more difficult* for some people
> to do so, but as Gregory points out, these people are doing dangerous
> stuff that will break in a way that we are not going to be willing or
> able to fix at some point *anyway*.

This is completely and utterly wrong, to put it mildly.

The official documentation of the pickle module states (I checked 2.7
and 3.7):

The pickle serialization format is guaranteed to be backwards
compatible across Python releases.

Considering that this issue is 4.5 years old, one would assume that the
pickle documentation would have been changed in the meantime if
Gregory's and Paul's view matched reality.

But my or your personal views about the usability of pickle don't
matter anyway. There are too many libraries and applications that have
been using pickle for many years.

I personally know about this kind of usage in applications since 1998.
In that particular case, the pickled information resides on machines
owned by the customers of the applications and **must** be readable by
any new version of the application no matter how old the file
containing the pickle is. Rewriting history by some Python developers
is not going to impress the companies involved!

Have a nice day!

--

___
Python tracker 
<https://bugs.python.org/issue22005>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2018-05-21 Thread Christian Tanzer


Christian Tanzer  added the comment:

Aaron Hall wrote at Sun, 20 May 2018 16:49:06 +:

> Now that dicts are sortable, does that make the sort_keys argument redundant?
>
> Should this bug be changed to "won't fix"?

https://bugs.python.org/issue25457#msg317216 is as good an answer as I
could give.

Considering that I openend the bug more than 2.5 years ago, it doesn't
really matter though.

--

___
Python tracker 
<https://bugs.python.org/issue25457>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue22005] datetime.setstate fails decoding python2 pickle

2016-04-14 Thread Christian Tanzer


Christian Tanzer added the comment:

This issue is getting old. Is there any way to solve this for Python 3.6?

--

___
Python tracker 
<http://bugs.python.org/issue22005>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25545] email parsing docs: clarify that only ASCII strings are supported

2015-11-07 Thread Christian Tanzer


Christian Tanzer added the comment:

Terry J. Reedy wrote at Fri, 06 Nov 2015 22:49:57 +:

> email parsing docs: clarify that only ASCII strings are supported

If that is the decision, `message_from_string` should raise an
exception if it gets a non-ASCII argument!

--

___
Python tracker 
<http://bugs.python.org/issue25545>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25545] email parsing docs need to be clear that only ASCII strings are supported

2015-11-06 Thread Christian Tanzer


Christian Tanzer added the comment:

> If you can suggest ways of improving the string support without
> breaking existing python3 code that may be using it (most likely
> wrongly, but working for them), then I will happily review them.

At the moment, I'm mainly interested in having code that runs
correctly in both python2.7 and python3.

Having the same method behave totally differently in the two versions
is what triggered this bug report.

Adding new methods won't help with 2.7.

> To do what you appear to want, to be able to represent non-ascii as
> the equivalent unicode *cannot work*, because email messages may
> contain binary data which *cannot* be represented in printable
> unicode.

I have no problem whatsoever if, and would actually expect that,
binary message parts are encoded as necessary for RFS compliance. My
beef is with message parts that are text and are naturally represented
as unicode not as charset- and transfer-encoded 7-bit strings!

I also don't see how such a representation would break existing
python3 code but that might just be another example of famous last
words.

> But, making unicode easier is one big reason python3 exists (the
> biggest one, in practice).

>From what I have seen up to now, that has failed (spectacularly, in my
opinion, if you consider things like unpickling python2-created
pickles with binary strings, e.g., datetime instances).

Using unicode in python2 worked well enough although there was the
problem that one couldn't specify which strings were supposed to be
binary. Exactly those strings are a big problem for code that wants to
run in both python2 and python3.

python3 solves the problem of binary strings, though badly because
of the various missing string functions. But there seem to be bugs all
over the standard library and in third party modules.

That library APIs still haven't settled down yet in python3 is even
worse!

Maybe python3 would work well if one threw away all existing code and
started with completely new code but I don't think that was the
intention.

--

___
Python tracker 
<http://bugs.python.org/issue25545>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25545] email parsing docs need to be clear that only ASCII strings are supported

2015-11-05 Thread Christian Tanzer


Christian Tanzer added the comment:

> Yes, the port from python2 to python3 of the email package
> was...suboptimal.
> ...
> The whole concept of using unicode as a 7bit data channel only is
> just...weird.

+100 to both.

> But, we are now stuck with maintaining that API for backward
> compatibility reasons.

That's a weird definition of backward compatibility, though. The API
breaks backward compatibility to Python 2. Any Python 3 user shouldn't
use the broken API anyway, IMHO.

> To fix it, I rewrote significant parts of the email package, which
> is the new API.

Which unfortunately isn't any help if one needs to stay compatible to
2.7.

> It also is...fraught with the danger of bugs...to talk about
> serializing an email message as a string, transforming it, and then
> trying to re-parse it as an email message.  If your transformations
> are simple, it will probably work, but anything at all complex runs
> the risk of breaking the message.

One of Python's mottos used to be:

   We are all consenting adults here.

But there are other uses for converting a message instance to a
unicode string. Display, printing, and grepping come to mind.

> And having non-ascii bodies counts as non-trivial.

For anybody living in a non-ascii country that statement sounds
**very strange**.

To start with, I have many friends with names that contain non-ascii
characters.

> You do have to conditionalize your 2/3 code to use the bytes parser
> and generator if you are dealing with 8-bit messages. There's just no
> way around that.

I did that yesterday. There are problems with that though:

* Recognizing the problem for what it is.

  Trying to run Python 2.7 code that *should* run under 3.5 but breaks
  with weird errors wastes a lot of time.

  Multiply with the number of Python programmers that want to migrate
  and you get a problem.

  If `message_as_string` and `as_string` just weren't there in 3.x it
  would be much less of a problem (clear documentation would also help
  but not as much).

* Lots of ugly workarounds for the same problem.

  Most of them (mine certainly included) are done quick and ad-hoc and
  probably break in many ways.

  The question then arises: why should one use the email package at
  all. But of course that way lies madness.

Just more roadblocks for the move to Python 3.

--

___
Python tracker 
<http://bugs.python.org/issue25545>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25545] email parsing docs need to be clear that only ASCII strings are supported

2015-11-04 Thread Christian Tanzer


Christian Tanzer added the comment:

R. David Murray wrote at Wed, 04 Nov 2015 15:36:27 +:

> There is no problem with supporting both 2.7 and python3 with the same
> email API as long as your input strings are ASCII only, which is what
> is required by the email RFCs (as I said, they do not support
> unicode...even the new one only supports utf8 (a unicode encoding) not
> unicode itself).

You are talking about byte strings. And of course the email RFCs only
talk about byte strings.

But the email package offers the use of unicode strings for various
functions, including `email.message_from_string`,
`email.Message.as_string`, and `email.Message.__str__`. These
functions could be useful (and were useful in Python 2) but aren't in
Python 3.

Assume I load an email satisfying all relevant RFCs from a file. Say
that email contains three MIMEText parts with
content-transfer-encoding "8bit", all with different
encodings:

* I don't see any use for `as_string` to obfuscate that by
  re-encoding each of the three to content-transfer-encoding "base64",
  which is completely unreadable when it could be converted painlessly
  to a real unicode string.

  One of my usage scenarios is something of the form::

>>> print(msg)

  Of course, in this case I'll better use `utf-8` as my output
  encoding otherwise the print might fail.

  If I wanted to output a RFC-compliant byte string, I should have
  used `as_bytes`, not `as_string`. But that would be a different
  usage scenario.

* The same argument applies in reverse to `message_from_string`. If
  one wants RFC compliance one should use `message_from_bytes`.

  But if one builds up a unicode string for an email in Python, it
  should be possible to convert that to a `email.Message` instance via
  `message_from_string`.

I have several use cases where I want to convert an `email.Message`
to a unicode string without any embedded content-transfer-encodings
like "base64", do some transformations on that string and then
convert that back into an `email.Message` instance.

> I have an extensive doc rewrite in process, but I'm not sure when it
> will land.  I thought I had already added the note about ASCII-only to
> the parser docs, but I see that I did not.  I'll reopen this issue to
> remind myself to do that, since the doc rewrite will only apply to 3.6
> (when the new API will no longer be provisional).

I don't see any point in the semantics of the string-functions as they
are currently implemented, after all one can do things like easily
`message_from_string(...).decode("latin-1")` or
`msg.as_bytes().encode("latin-1")` if one really wants to convert an
RFC-compatible byte-string to/from unicode strings as-is. But this
as-is conversion normally isn't very useful because it isn't

* human-readable

* well suited to search and replace operations or any other text
  transformations

So documenting the current situation would improve the situation slightly
but it's more like putting lipstick on a pig.

--

___
Python tracker 
<http://bugs.python.org/issue25545>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25545] email.message.get_payload returns wrong encoding

2015-11-04 Thread Christian Tanzer


Christian Tanzer added the comment:

R. David Murray wrote at Tue, 03 Nov 2015 19:59:53 +:

>  Your problem is that your input email is ia unicode string.  A unicode
>  string has no RFC defintion as an email, so things do not work right,
>  as you observed.  Whether or not email should throw an error when fed
>  a non-ascii unicode string is an interesting question, but it hasn't
>  in the past and so for backward compatibility reasons we won't change
>  that.

Excuse me, I am using `email.message_from_string` which is documented
to convert a unicode string to an email object. If you are serious
`message_from_string` should not even exist! As long as it is there
and documented as::

  email.message_from_string(s, _class=email.message.Message, *, 
policy=policy.compat32)

Return a message object structure from a string. This is exactly
equivalent to Parser().parsestr(s). _class and policy are
interpreted as with the Parser class constructor.

Changed in version 3.3: Removed the strict argument. Added the
policy keyword.

your argument is unfounded and this is definitely a serious bug!

> You might also be interested in the newer email API, currently
> documented in the 'contentmanager' and 'policy' chapters of the
> documentation.  It says it is provisional, but the changes (other than
> bug fixes) between the current API and what will be final in 3.6 are
> trivial.

I'm using Python 2.7 and only just exploring 3.5.

Unfortunately, there are many bugs and your response is a typical
example why moving from 2.7 to 3.x is hard.

There is gratuitous breakage but the reaction is::

resolution:  -> not a bug

I would ask you to reconsider that stance.

As long as my code needs to support 2.7, use of any new API doesn't
fly. After an eventual switch to 3.5 (probably years in the future), I
might use new APIs for new code but changing existing code that used
to work won't be in the cards

> get_content_charset is None because you don't have any actual headers
> in your message, just body.  This is because of the leading newline in
> your triple quoted string, which the email package takes as the end of
> the headers.

Thanks for the hint. BTW, removing the leading newline doesn't change
the buggy behavior of `message_from_string`!

--

___
Python tracker 
<http://bugs.python.org/issue25545>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25545] email.message.get_payload returns wrong encoding

2015-11-03 Thread Christian Tanzer


New submission from Christian Tanzer:

For an email message with `Content-type: text/plain; charset=utf-8`, in Python 
3.5, get_payload returns a bytes object encoded with `latin-1`. Python 2.7 
returns a str object encoded with `utf-8` as expected.

Running the attached test script `email_get_payload__test.py`  with Python 2.7 
and 3.5 shows the difference.

Python 2.7::

2.7.10.final.0 *** utf8 ***
From: Christian Tanzer 
To: Christian Tanzer 
Content-type: text/plain; charset=utf-8


Sehr geehrte Damen und Herren,

...

Danke und mit freundlichen Grüssen,

--
Christian Tanzerhttp://www.c-tanzer.at/

Python 3.5::

3.5.0.final.0 *** latin-1 ***
From: Christian Tanzer 
To: Christian Tanzer 
Content-type: text/plain; charset=utf-8


Sehr geehrte Damen und Herren,

...

Danke und mit freundlichen Grüssen,

--
Christian Tanzerhttp://www.c-tanzer.at/

In both Python versions, `msg.get_content_charset()` returns None, which is not 
correct, either.

--
components: Library (Lib)
files: email_get_payload__test.py
messages: 253994
nosy: tan...@swing.co.at
priority: normal
severity: normal
status: open
title: email.message.get_payload returns wrong encoding
type: behavior
versions: Python 3.5
Added file: http://bugs.python.org/file40934/email_get_payload__test.py

___
Python tracker 
<http://bugs.python.org/issue25545>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2015-10-23 Thread Christian Tanzer


Christian Tanzer added the comment:

Josh Rosenberg wrote at Fri, 23 Oct 2015 02:56:30 +:

> As a workaround (should you absolutely need to sort keys by some
> arbitrary criteria), you can initialize a collections.OrderedDict from
> the sorted items of your original dict (using whatever key function
> you like), then dump without using sort_keys=True.

Sigh...

I already implemented a workaround but it's not as simple as you
think — the dictionary in question is nested.

The problem is that this is just another unnecessary difficulty when
trying to move to Python 3.x.

--

___
Python tracker 
<http://bugs.python.org/issue25457>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2015-10-23 Thread Christian Tanzer


Christian Tanzer added the comment:

Josh Rosenberg wrote at Fri, 23 Oct 2015 02:45:51 +:

> The Python 2 sort order is a result of the "arbitrary but consistent
> fallback comparison" (omitting details, it's comparing the names of
> the types), thus the "strange" sort order.

Thanks. I knew that.

> Python 3 (justifiably) said that incomparable types should be
> incomparable rather than silently behaving in non-intuitive ways,
> hiding errors.

"justifiably" is debatable. I consider the change ill-conveived.

Displaying a dictionary (or just its keys) in a readable, or just
reproducible, way is useful in many contexts. Python 3 broke that for
very little, INMNSHO, gain.

I consider "hiding errors" a myth, to say it politely!

> Python is being rather generous by allowing non-string keys, because
> the  JSON spec ( http://json.org/ ) only allows the keys ("names" in
> JSON parlance) to be strings. So you're already a bit in the weeds as
> far as compliant JSON goes if you have non-string keys.

There are two possibilities:

1) Accepting non-string keys is intended. Then `sort_keys` shouldn't
   break like it does.

   As far as JSON goes, the output of `json.dump[s]` contains string keys.

2) Accepting non-string keys is a bug. Then `json.dump[s]` should be
   changed to not accept them.

Mixing both approaches is the worst of all worlds.

> Since mixed type keys lack meaningful sort order, I'm not sure it's
> wrong to reject attempts to sort them.

The documentation says:

If sort_keys is True (default False), then the output of dictionaries
will be sorted by key; this is useful for regression tests to ensure
that JSON serializations can be compared on a day-to-day basis.

**Reproducible** is the keyword here.

**Readability** is another one. Even if the sort order is "strange",
it is much better than random order, if you are looking for a specific
key.

For the record, it was a test failing under Python 3.5 that triggered
this bug report.

> > As all they keys are dumped as strings, a simple solution would be to
> > sort after converting to strings.
> Converting to string is as
> arbitrary and full of potential for silently incorrect comparisons as
> the Python 2 behavior, and reintroducing it seems like a bad idea.

json.dumps already does the conversion::

>>> json.dumps({1 : 42, "foo" : "bar", None : "nada"})
'{"foo": "bar", "1": 42, "null": "nada"}'

Another run::

>>> json.dumps({1 : 42, "foo" : "bar", None : "nada"})
'{"1": 42, "foo": "bar", "null": "nada"}'

That difference is exactly the reason for `sort_keys`.

--

___
Python tracker 
<http://bugs.python.org/issue25457>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2015-10-22 Thread Christian Tanzer


New submission from Christian Tanzer:

In Python 3, trying to json-dump a dict with keys of different types fails with 
a TypeError when sort_keys is specified:

python2.7
===

Python 2.7.10 (default, May 29 2015, 10:02:30) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> json.dumps({1 : 42, "foo" : "bar", None : "nada"}, sort_keys = True)
'{"null": "nada", "1": 42, "foo": "bar"}'

python3.5


Python 3.5.0 (default, Oct  5 2015, 12:03:13) 
[GCC 4.8.5] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> json.dumps({1 : 42, "foo" : "bar", None : "nada"}, sort_keys = True)
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib64/python3.5/json/__init__.py", line 237, in dumps
**kw).encode(obj)
  File "/usr/lib64/python3.5/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib64/python3.5/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
TypeError: unorderable types: str() < int()

Note that the documentation explicitly allows keys of different, if basic, 
types:

  If skipkeys is True (default: False), then dict keys that are not of a basic 
type (str, int, float, bool, None) will be skipped instead of raising a 
TypeError.

As all they keys are dumped as strings, a simple solution would be to sort 
after converting to strings. Looking closely at the output of Python 2, the 
sort order is a bit strange!

--
components: Library (Lib)
messages: 253324
nosy: tan...@swing.co.at
priority: normal
severity: normal
status: open
title: json dump fails for mixed-type keys when sort_keys is specified
type: behavior
versions: Python 3.5

___
Python tracker 
<http://bugs.python.org/issue25457>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue22005] datetime.setstate fails decoding python2 pickle

2015-10-16 Thread Christian Tanzer


Christian Tanzer added the comment:

Alexander Belopolsky wrote at Thu, 15 Oct 2015 17:56:42 +:

> I don't think your solution will work for date/time/datetime pickles.
> There are many values for which pickle payload consists of bytes
> within 0-127 range.

H.

> IIUC, you propose to decode those to Python 3
> strings using ASCII encoding.

Yes. There are too many BINSTRING instances that need to be Python 3
strings.

> This will in turn require accepting str
> type in date/time/datetime constructors.

These datetime... constructors are strange beasts already.

The documentation says that three integer arguments are required for
datetime.datetime but it accepts a single bytes argument anyway. I
agree that it would be much nicer if there was a
datetime.datetime.load method instead. Unfortunately, that would
require Guido's time machine to go back all the way to 2003 (at least).

So yes, the only practical solution is to accept a single str typed
argument (as long as it is ASCII only). An alternative would be to add
a dispatch table for loading functions to Python 3's pickle that would
be used by load_global. That would add indirection for the datetime
constructors but would allow support for other types requiring
arguments of type bytes.

The change I proposed in http://bugs.python.org/issue22005#msg253042
to fix the handling of binary 8-bit strings is still necessary.

To summarize:

IMHO the solution needs to be implemented in Python 3 — otherwise
pickles with binary strings created by Python 2.x cannot be loaded in
Python 3. Changing the pickle implementation of Python 2 doesn't fix
existing pickles and couldn't fix the general problem of binary
strings, anyway.

--

___
Python tracker 
<http://bugs.python.org/issue22005>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue22005] datetime.setstate fails decoding python2 pickle

2015-10-15 Thread Christian Tanzer


Christian Tanzer added the comment:

IMNSHO, the problem lies in the Python 3 pickle.py and it is **not** restricted 
to datetime instances 
(for a detailed rambling see 
http://c-tanzer.at/en/breaking_py2_pickles_in_py3.html) .

In Python 2, 8-bit strings are used for text and for binary data. Well designed 
applications will use unicode for all text, but Python 2 itself forces some 
text to be 8-bit string, e.g., names of attributes, classes, and functions. In 
other words, **any 8-bit strings explicitly created by such an application will 
contain binary data.**

In Python 2, pickle.dump uses BINSTRING (and SHORT_BINSTRING) for 8-bit 
strings; Python 3 uses BINBYTES (and SHORT_BINBYTES) instead.

In Python 3, pickle.load should handle BINSTRING (and SHORT_BINSTRING) like 
this:

* convert ASCII values to `str`

* convert non-ASCII values to `bytes`

`bytes` is Python 3's equivalent to Python 2's 8-bit string! 

It is only because of the use of 8-bit strings for Python 2 names that the 
mapping to `str` is necessary but all such names are guaranteed to be ASCII!

I would propose to change `load_binstring` and `load_short_binstring` to call a 
function like::

def _decode_binstring(self, value):
# Used to allow strings from Python 2 to be decoded either as
# bytes or Unicode strings.  This should be used only with the
# BINSTRING and SHORT_BINSTRING opcodes.
if self.encoding != "bytes":
try :
return value.decode("ASCII")
except UnicodeDecodeError:
pass
return value

instead of the currently called `_decode_string`.

--

___
Python tracker 
<http://bugs.python.org/issue22005>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue22005] datetime.setstate fails decoding python2 pickle

2015-10-12 Thread Christian Tanzer


Christian Tanzer added the comment:

> The code works when using encoding='bytes'. Thanks Tim for the suggestion.

> So this is not a bug, but is there any sense in having encoding='ASCII' by 
> default in pickle ?

It is most definitely a bug. And it adds another road block to moving python 
applications from 2.7 to 3.x!

encoding='bytes' has serious side effects and isn't useful in the general case. 
For instance, it will result in dict-keys being unpickled as bytes instead of 
as str after which hilarity ensues.

I got the exception

  UnicodeDecodeError: 'ascii' codec can't decode byte 0xdf in position 1: 
ordinal not in range(128)

when testing an application for compatibility in Python 3.5 on a pickle created 
by Python 2.7. The pickled data is a nested data structure and it took me quite 
a while to determine that the single datetime instance was the culprit.

Here is a small test case that reproduces the problem::

# -*- coding: utf-8 -*-
# pickle_dump.py 
import datetime, pickle, uuid
dti = datetime.datetime(2015, 10, 12, 13, 17, 42, 123456)
data = { "ascii" : "abc", "text" : u"äbc", "int" :  42, "date-time" : dti }
with open("/tmp/pickle.test", "wb") as file :
pickle.dump(data, file, protocol=2)

# pickle_load.py
# -*- coding: utf-8 -*-
import pickle
with open("/tmp/pickle.test", "rb") as file :
data = pickle.load(file)
print(data)

$ python2.7 pickle_dump.py
$ python2.7 pickle_load.py 
{'ascii': 'abc', 'text': u'\xe4bc', 'int': 42, 'date-time': 
datetime.datetime(2015, 10, 12, 13, 17, 42, 123456)}
$ python3.5 pickle_load.py 
Traceback (most recent call last):
  File "pickle_load.py", line 6, in 
data = pickle.load(file)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xdf in position 1: ordinal 
not in range(128)

That error message is spectacularly useless.

--
nosy: +tan...@swing.co.at

___
Python tracker 
<http://bugs.python.org/issue22005>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25457] json dump fails for mixed-type keys when sort_keys is specified

[issue22005] datetime.setstate fails decoding python2 pickle

[issue25457] json dump fails for mixed-type keys when sort_keys is specified

[issue22005] datetime.setstate fails decoding python2 pickle

[issue25545] email parsing docs: clarify that only ASCII strings are supported

[issue25545] email parsing docs need to be clear that only ASCII strings are supported

[issue25545] email parsing docs need to be clear that only ASCII strings are supported

[issue25545] email parsing docs need to be clear that only ASCII strings are supported

[issue25545] email.message.get_payload returns wrong encoding

[issue25545] email.message.get_payload returns wrong encoding

[issue25457] json dump fails for mixed-type keys when sort_keys is specified

[issue25457] json dump fails for mixed-type keys when sort_keys is specified

[issue25457] json dump fails for mixed-type keys when sort_keys is specified

[issue22005] datetime.setstate fails decoding python2 pickle

[issue22005] datetime.setstate fails decoding python2 pickle

[issue22005] datetime.setstate fails decoding python2 pickle

16 matches

Site Navigation

Mail list logo

Footer information