[issue26045] Improve error message for http.client when posting unicode string

2016-02-08 Thread Martin Panter

Changes by Martin Panter :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26045] Improve error message for http.client when posting unicode string

2016-02-08 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 966bd147ccb5 by Martin Panter in branch '3.5':
Issue #26045: Add UTF-8 suggestion to error in http.client
https://hg.python.org/cpython/rev/966bd147ccb5

New changeset 9896ead3cc1d by Martin Panter in branch 'default':
Issue #26045: Merge http.client error addition from 3.5
https://hg.python.org/cpython/rev/9896ead3cc1d

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26045] Improve error message for http.client when posting unicode string

2016-01-31 Thread Guido van Rossum

Guido van Rossum added the comment:

LGTM.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26045] Improve error message for http.client when posting unicode string

2016-01-31 Thread Martin Panter

Martin Panter added the comment:

Here is my cut down version of Guido’s patch. Now it only adds the message when 
someone passes a text string as the HTTPConnection.request(body=...) parameter:

>>> c.request("POST", "", body="Celebrate \U0001F389")
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/proj/python/cpython/Lib/http/client.py", line 1098, in request
self._send_request(method, url, body, headers)
  File "/home/proj/python/cpython/Lib/http/client.py", line 1142, in 
_send_request
body = _encode(body, 'body')
  File "/home/proj/python/cpython/Lib/http/client.py", line 161, in _encode
(name.title(), data[err.start:err.end], name)) from None
UnicodeEncodeError: 'latin-1' codec can't encode character '\U0001f389' in 
position 10: Body ('🎉') is not valid Latin-1. Use body.encode('utf-8') if you 
want to send it encoded in UTF-8.

What do people think?

--
Added file: http://bugs.python.org/file41768/utfpatch.v2.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue26045] Improve error message for http.client when posting unicode string

2016-01-09 Thread Random832
Guido van Rossum  writes:
> UnicodeEncodeError: 'ascii' codec can't encode character '\u1234' in
^  ^
>   position 3: Header ('ሴ') is not valid Latin-1. Use
^ ^^^
>   header.encode('utf-8') if you want to send it encoded in UTF-8.

Er...

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26045] Improve error message for http.client when posting unicode string

2016-01-08 Thread Guido van Rossum

Guido van Rossum added the comment:

Martin, please make a patch along those lines! The only reason I generalized 
this to headers is that one of the three Requests issues referenced in the 
original post seemed to be about a header value 
(https://github.com/kennethreitz/requests/issues/1926). But that one seems 
different than the other two anyways, and it's about Python 3.7 so it wouldn't 
be helped by anything we're doing here anyways.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26045] Improve error message for http.client when posting unicode string

2016-01-08 Thread Martin Panter

Martin Panter added the comment:

Personally I am skeptical if suggesting UTF-8 for the body data is a good idea, 
but I can go along with it, since other people want it. But I do strongly 
question whether it is right to suggest UTF-8 for header fields. The RFC 
 only mentions ASCII and Latin-1.

Newer protocols based on HTTP (RTSP comes to mind) do specify UTF-8 for the 
header, but that is probably out of scope of both the HTTP module and 
beginner-targetted errors.

If I were redoing this patch, I would drop all the changes except at the 
body.encode() line in Emil’s original post.

--
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26045] Improve error message for http.client when posting unicode string

2016-01-08 Thread Guido van Rossum

Guido van Rossum added the comment:

I think this would be okay for 3.5.2 as well.

--
versions: +Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26045] Improve error message for http.client when posting unicode string

2016-01-08 Thread Terry J. Reedy

Changes by Terry J. Reedy :


--
versions:  -Python 3.2, Python 3.3, Python 3.4, Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26045] Improve error message for http.client when posting unicode string

2016-01-08 Thread Guido van Rossum

Guido van Rossum added the comment:

BTW the error and traceback will look something like this:

Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/guido/src/cpython/Lib/http/client.py", line 1138, in 
_send_request
self.putheader(hdr, value)
  File "/Users/guido/src/cpython/Lib/http/client.py", line 1062, in putheader
header = _encode(header, 'ascii', 'header')
  File "/Users/guido/src/cpython/Lib/http/client.py", line 161, in _encode
(name.title(), data[err.start:err.end], name)) from None
UnicodeEncodeError: 'ascii' codec can't encode character '\u1234' in position 
3: Header ('ሴ') is not valid Latin-1. Use header.encode('utf-8') if you want to 
send it encoded in UTF-8.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26045] Improve error message for http.client when posting unicode string

2016-01-08 Thread Guido van Rossum

Guido van Rossum added the comment:

Here's a patch. I noticed there are lots of other places where a similar 
encoding() call exists -- I wrapped them all using a helper function. Please 
review carefully.

--
keywords: +patch
Added file: http://bugs.python.org/file41537/utfpatch.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26045] Improve error message for http.client when posting unicode string

2016-01-08 Thread Emil Stenström

Emil Stenström added the comment:

I think changing the error message is enough for the short term, and 
deprecation of automatic encoding is the correct way in the long term.

A text that mention "utf-8" which will likely be the correct solution 
definitely gets my vote, so Guidos suggestion sounds good to me:

UnicodeEncodeError("Use data.encode('utf-8') if you want the data to be encoded 
in UTF-8")

Andrew's and Pauls suggestions doesn't point to a solution to the problem, 
which I think is a great think for something this basic. Also, the error 
message only gets shown when latin-1 fails, so we can't use text that speaks 
about "no encoding" in general.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26045] Improve error message for http.client when posting unicode string

2016-01-07 Thread Martin Panter

Martin Panter added the comment:

After reading through the linked thread, there are a few error message 
proposals:

Guido: "use data.encode('utf-8') if you want the data to be encoded in UTF-8". 
(Then of course the server might not like it.)

Andrew Barnert: A UnicodeEncodeError (or subclass of it?) with text like "HTTP 
body without encoding defaults to 'latin-1', which can't encode character 
'\u' in position 30: ordinal not in range(256)")

Paul Moore: Encode as ASCII and catch UnicodeEncodeError and re-raise as a 
TypeError "Unicode string supplied without an explicit encoding".

Emil, do you think any of these would help?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26045] Improve error message for http.client when posting unicode string

2016-01-07 Thread Guido van Rossum

Guido van Rossum added the comment:

Any solution that encodes Unicode in a way that works for some characters but 
fails for others has the same problem that Unicode had in Python 3. 
Unfortunately we're stuck with such a solution (Latin-1) and for backwards 
compatibility reasons we can't change it. If we were to deprecate it, we should 
warn for *any* data given as a Unicode string, even if it's plain ASCII (even 
if it's an empty string :-).

But even if we don't deprecate it, we can still change the text of the error 
message (but not the type of the exception used) to be more clear.

Can we please start drafting a suitable error message here?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26045] Improve error message for http.client when posting unicode string

2016-01-07 Thread Martin Panter

Martin Panter added the comment:

For the record, this is what Requests sent when I passed a Latin-1-encodable 
string:

b'POST / HTTP/1.1\r\n'
b'Host: example.com\r\n'
b'Content-Length: 11\r\n'
b'Connection: keep-alive\r\n'
b'Accept: */*\r\n'
b'Accept-Encoding: gzip, deflate\r\n'
b'User-Agent: python-requests/2.9.1\r\n'
b'\r\n'
b'Celebrate \xa9'

There is no Content-Type header field, nor any indication of the encoding used. 
This is also how the lower-level HTTPConnection.request() method works.

The documentation already mentions that a text string gets encoded with 
ISO-8859-1 (a.k.a. Latin-1): 
.
 How do you propose to improve the error message?

Encoding with either Latin-1 or UTF-8 depending on the characters sounds like a 
terrible idea. We may as well send the request without any body and pretend 
everything is okay. I don’t understand the point of changing to UTF-8 either. 
If you actually want UTF-8 encoded text, why not explicitly encode it yourself?

Failing for any unencoded text string would be a serious backwards 
compatibility problem. It would break the POST example using urlencode() at 
 for instance.

IMO the Latin-1 encoding feature is a bad API design, maybe based on a 
misunderstanding of HTTP. Perhaps it would be more reasonable to deprecate the 
automatic Latin-1 encoding, and only allow ASCII characters in a text string. 
That would still cater for the urlencode() scenario in the POST example.

Of the links you posted, they seem to be different problems with separate 
solutions:

Requests bug 2838: Perhaps the user was trying to send URL-encoded form data. 
If so, textual fields should be UTF-8 encoded and then percent-encoded, 
resulting in only ASCII codes in the “data” argument. Python has 
urllib.parse.urlencode() which does this.

Requests bug 1822: It sounds like the user or a library intended to send UTF-8, 
so they should encode it themselves.

Stack Overflow: Custom web service needed fixing, and the user had to encode as 
UTF-8. This is a custom agreement between the client and server, it is not up 
to Python.

Ebay: I’m not familiar with any Ebay API and it is not clear from the post, but 
I suspect the user wasn’t encoding their data properly. Maybe similar to the 
first case.

For the rest it is not clear what the problem or solution was. Some of them 
sound like they were somehow sending text when they really wanted to send 
arbitrary bytes, in which case UTF-8 is not going to help.

--
nosy: +martin.panter

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26045] Improve error message for http.client when posting unicode string

2016-01-07 Thread Guido van Rossum

Changes by Guido van Rossum :


--
nosy: +gvanrossum

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26045] Improve error message for http.client when posting unicode string

2016-01-07 Thread Emil Stenström

New submission from Emil Stenström:

This issue is in response to this thread on python-ideas: 
https://mail.python.org/pipermail/python-ideas/2016-January/037678.html

Note that Cory did a lot of encoding background work here:
https://mail.python.org/pipermail/python-ideas/2016-January/037680.html

---
Bug description:

When posting an unencoded unicode string directly with python-requests you get 
the following stacktrace:

import requests
r = requests.post("http://example.com";, data="Celebrate 🎉") 
...
  File "../lib/python3.4/http/client.py", line 1127, in _send_request
body = body.encode('iso-8859-1')
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 14-15: 
ordinal not in range(256) 

This is because requests uses http.client, and http.client assumes the encoding 
to be latin-1 if given a unicode string. This is a very common source of bugs 
for beginners who assume sending in unicode would automatically encode it in 
utf-8, like in the libraries of many other languages.

The simplest fix here is to catch the UnicodeEncodeError and improve the error 
message to something that points beginners in the right direction.

Another option would be to:
- Keep encoding in latin-1 first, and if that fails try utf-8

Other possible solutions (that would be backwards incompatible) includes:
- Changing the default encoding to utf-8 instead of latin-1
- Detect an unencoded unicode string and fail without encoding it with a 
descriptive error message

---

Just to show that this is a problem that exists in the wild, here are a few 
examples that all crashes on the same line in http.client (not all going 
through the requests library:

- https://github.com/kennethreitz/requests/issues/2838
- https://github.com/kennethreitz/requests/issues/1822
- 
http://stackoverflow.com/questions/34618149/post-unicode-string-to-web-service-using-python-requests-library
- 
https://www.reddit.com/r/learnpython/comments/3violw/unicodeencodeerror_when_searching_ebay_with/
- https://github.com/codecov/codecov-python/issues/35
- https://github.com/google/google-api-python-client/issues/145
- https://bugs.launchpad.net/ubuntu/+source/lazr.restfulclient/+bug/1414063

--
components: Unicode
messages: 257721
nosy: Emil Stenström, ezio.melotti, haypo
priority: normal
severity: normal
status: open
title: Improve error message for http.client when posting unicode string
type: enhancement
versions: Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com