[issue22767] `separators` argument to json.dumps() behaves unexpectedly across 2.x vs 3.x

2014-10-30 Thread R. David Murray

R. David Murray added the comment:

Yes, that third party problem is a prime example of exactly why this needed to 
be fixed, but it required python3 to fix it.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22767] `separators` argument to json.dumps() behaves unexpectedly across 2.x vs 3.x

2014-10-30 Thread Tom Christie

Tom Christie added the comment:

> So, as soon as (but only as soon as) you mix unicode with your non-ascii 
> data, your program blows up.

Indeed. For context tho my example of running into this the unicode literals 
used as seperators weren't even in the same package as the non-ASCII binary 
strings. (JSONRenderer in Django REST framework, being excersized by some third 
party test code. End result non-obvious exception.

Anyways, okay with this resolution, although I am now using a compat branch to 
ensure that we use binary seperators in py2 to continue to get the more lax 
rendering style.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22767] `separators` argument to json.dumps() behaves unexpectedly across 2.x vs 3.x

2014-10-30 Thread R. David Murray

R. David Murray added the comment:

Or, to put it another way, we agree with you that both cases should behave the 
same: using binary data in a json dumps call should raise an error.  And in 
python3 they do.  But in python2 there is a confusion as to what is text and 
what is binary, and so sometimes things work that shouldn't.  In python2 a 
binary string with non-ascii characters is accepted by the dumps call...it 
shouldn't be since json is defined as a text protocol.  But it is baked into 
the python2 string model that it such binary does work, because in python2 it 
was assumed that the programmer was responsible for making sure that the 
encoding of all their binary strings was consistent.   But to mix unicode and 
binary, you *must* make the encoding of the binary strings explicit, otherwise 
there's no way to correctly compose the binary data with the text data.
So, as soon as (but only as soon as) you mix unicode with your non-ascii data, 
your program blows up.

Thus python3.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22767] `separators` argument to json.dumps() behaves unexpectedly across 2.x vs 3.x

2014-10-30 Thread R. David Murray

R. David Murray added the comment:

No, it is introducing the unicode that is the problem.  Your first example is 
entirely binary.  It is only when you *mix* binary and unicode that you have 
encoding problems (because python doesn't know the encoding of the binary 
data...well, more precisely it doesn't have one).

This confusion is a large part of why python3 exists :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22767] `separators` argument to json.dumps() behaves unexpectedly across 2.x vs 3.x

2014-10-30 Thread Tom Christie

Tom Christie added the comment:

> But only if you use non-ascii in the binary input, in which case you get an 
> encoding error, which is a correct error.

Kind of, except that this (python 2.7) works just fine:

>>> data = {'snowman': '☃'}
>>> json.dumps(data, ensure_ascii=False)
'{"snowman": "\xe2\x98\x83"}'

Whereas this raises an exception:

>>> json.dumps(data, separators=(u':', u','), ensure_ascii=False)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: 
ordinal not in range(128)

If it was the same in both cases then I wouldn't consider it a problem.
As it is, introducing the `seperators` parameter changes the behaviour.

Anyways, I'll get off my high horse now. :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22767] `separators` argument to json.dumps() behaves unexpectedly across 2.x vs 3.x

2014-10-30 Thread R. David Murray

R. David Murray added the comment:

But only if you use non-ascii in the binary input, in which case you get an 
encoding error, which is a correct error.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22767] `separators` argument to json.dumps() behaves unexpectedly across 2.x vs 3.x

2014-10-30 Thread Tom Christie

Tom Christie added the comment:

Not too fussed if this is addressed or not, but I think this is closed a little 
prematurely.

I don't think there's a problem under Python 3, that's entirely reasonable.

However under Python 2, `json.dumps()` will normally handle *either* bytes or 
unicode transparently for you (just altering the return type accordingly).

If you happen to be using unicode separators, then the normally lax behaviour 
of "either unicode or bytes" that stops being the case.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22767] `separators` argument to json.dumps() behaves unexpectedly across 2.x vs 3.x

2014-10-30 Thread R. David Murray

R. David Murray added the comment:

And that works, including with the future import.  I don't remember if this is 
a bug we've fixed since 2.7.2, but I don't think so.

In Python3, json explicitly does not support bytes.

--
nosy: +r.david.murray
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22767] `separators` argument to json.dumps() behaves unexpectedly across 2.x vs 3.x

2014-10-30 Thread Georg Brandl

Georg Brandl added the comment:

> in the second example

or even, in both examples.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22767] `separators` argument to json.dumps() behaves unexpectedly across 2.x vs 3.x

2014-10-30 Thread Georg Brandl

Georg Brandl added the comment:

IMO the snowman should be a Unicode string in the second example for Python 2.7.

--
nosy: +georg.brandl

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22767] `separators` argument to json.dumps() behaves unexpectedly across 2.x vs 3.x

2014-10-30 Thread Tom Christie

New submission from Tom Christie:

This is one of those behavioural issues that is a borderline bug.

The seperators argument to `json.dumps()` behaves differently across python 2 
and 3.

* In python 2 it should be provided as a bytestring, and can cause a 
UnicodeDecodeError otherwise.
* In python 3 it should be provided as unicode,and can cause a TypeError 
otherwise.

Examples:

Python 2.7.2
>>> print json.dumps({'snowman': '☃'}, separators=(':', ','), 
ensure_ascii=False)
{"snowman","☃"}
>>> print json.dumps({'snowman': '☃'}, separators=(u':', u','), 
ensure_ascii=False)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: 
ordinal not in range(128)

And:

Python 3.4.0
>>> print(json.dumps({'snowman': '☃'}, separators=(':', ','), 
ensure_ascii=False))
{"snowman","☃"}
>>> print(json.dumps({'snowman': '☃'}, separators=(b':', b','), 
ensure_ascii=False))
<...>
TypeError: sequence item 2: expected str instance, bytes found

Technically this isn't out of line with the documentation - in both cases it 
uses `separators=(':', ',')` which is indeed the correct type in both v2 and 
v3. However it's unexpected behaviour that it changes types between versions, 
without being called out.

Working on a codebase with `from __future__ import unicode_literals` this is 
particularly unexpected because we get a `UnicodeDecodeError` when running code 
that otherwise looks correct.

It's also slightly awkward to fix because it's a bit of a weird branch 
condition.

The fix would probably be to forcibly coerce it to the correct type regardless 
of if it is supplied as unicode or a bytestring, or at least to do so for 
python 2.7.

Possibly related to http://bugs.python.org/issue22701 but wasn't able to 
understand if that ticket was in fact a different user error.

--
messages: 230274
nosy: Tom.Christie
priority: normal
severity: normal
status: open
title: `separators` argument to json.dumps() behaves unexpectedly across 2.x vs 
3.x
type: behavior
versions: Python 2.7, Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com