[issue17214] http.client.HTTPConnection.putrequest encode error

2016-05-16 Thread Martin Panter

Martin Panter added the comment:

I restored the “redundant” encoding of space, in case someone’s code was 
relying on this behaviour, and because redirect_request() is a publicly 
documented method.

--
resolution:  -> fixed
stage: commit review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17214] http.client.HTTPConnection.putrequest encode error

2016-05-16 Thread Roundup Robot

Roundup Robot added the comment:

New changeset cb09fdef19f5 by Martin Panter in branch '3.5':
Issue #17214: Percent-encode non-ASCII bytes in redirect targets
https://hg.python.org/cpython/rev/cb09fdef19f5

New changeset 841a9a3f3cf6 by Martin Panter in branch 'default':
Issue #14132, Issue #17214: Merge two redirect handling fixes from 3.5
https://hg.python.org/cpython/rev/841a9a3f3cf6

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17214] http.client.HTTPConnection.putrequest encode error

2016-05-14 Thread Martin Panter

Martin Panter added the comment:

I will look at committing this soon

--
stage: patch review -> commit review
versions:  -Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17214] http.client.HTTPConnection.putrequest encode error

2015-10-31 Thread Terry J. Reedy

Changes by Terry J. Reedy :


--
nosy:  -terry.reedy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17214] http.client.HTTPConnection.putrequest encode error

2015-10-30 Thread Martin Panter

Martin Panter added the comment:

This bug only applies to Python 3. In Python 2, the non-ASCII bytes are sent 
through to the redirect target verbatim. I think this would also be the ideal 
way to handle the problem in 3, but percent-encoding them as proposed also 
seems good enough, and does not require hacking the HTTPConnection.putrequest() 
internals.

My patch updates Christian’s patch:
* Tested, so hopefully no typos :)
* Add test cases based on Issue 22248, as well as a URL already including a 
percent sign
* Process entire URL, not just the path component. A non-ASCII byte could just 
as easily be in the query component, for example.
* Remove redundant encoding of space character from redirect_request() method.

--
stage: test needed -> patch review
versions: +Python 3.5, Python 3.6
Added file: http://bugs.python.org/file40907/issue17214.redirect.v2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17214] http.client.HTTPConnection.putrequest encode error

2015-10-25 Thread Michael

Michael added the comment:

I should have looked more closely. 

The typo is part of the patch. It should be corrected there.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17214] http.client.HTTPConnection.putrequest encode error

2015-10-25 Thread Berker Peksag

Changes by Berker Peksag :


--
nosy: +berker.peksag

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17214] http.client.HTTPConnection.putrequest encode error

2015-10-25 Thread Michael

Michael added the comment:

The patch issue17214 did fix this issue in my 3.4.2 install on Ubuntu LTS.

It triggered however another bug:

  File "/usr/local/lib/python3.4/urllib/request.py", line 646, in http_error_302
path = urlparts.path if urlpaths.path else "/"
NameError: name 'urlpaths' is not defined

This is obviously a typo. 

I'm not sure if that one has been reported yet (a short google search didn't 
find anything) and I don't know how to provoke it independently.

--
nosy: +Strecke
versions:  -Python 3.2, Python 3.3

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17214] http.client.HTTPConnection.putrequest encode error

2015-03-30 Thread Martin Panter

Martin Panter added the comment:

I think this patch needs a test. I left some comments on Reitveld as well. 
Perhaps there should also be a test to prove that redirects to URLs like 
/spaced%20path/ do not get mangled.

Have a look at the HTTPRedirectHandler.redirect_request() method. Perhaps the 
code translating spaces to %20 could be merged with the fix for this issue.

--
nosy: +vadmium

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17214
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17214] http.client.HTTPConnection.putrequest encode error

2013-07-19 Thread Christian Heimes

Christian Heimes added the comment:

Something else is going on here. A valid server never returns an URL with 
non-ASCII chars. Your test server does the right thing, too:

$ LC_ALL=C wget http://www.libon.it/libon/search/isbn/3499155443
--2013-07-19 11:01:54--  http://www.libon.it/libon/search/isbn/3499155443
Resolving www.libon.it (www.libon.it)... 83.103.59.131
Connecting to www.libon.it (www.libon.it)|83.103.59.131|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: 
http://www.libon.it/ricerca/7818684/3499155443/dettaglio/3102314/Onkel-Oswald-und-der-Sudan-K%C3%A4fer/order/date_desc
 [following]
Incomplete or invalid multibyte sequence encountered
--2013-07-19 11:01:54--  
http://www.libon.it/ricerca/7818684/3499155443/dettaglio/3102314/Onkel-Oswald-und-der-Sudan-K%C3%A4fer/order/date_desc
Reusing existing connection to www.libon.it:80.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

I have digged through the code. Now I think that I know what's going on here. 
The header parsing code unquotes and converts the Location header. The code in 
the 302 handler doesn't compensate and therefore fails.

Here is a patch that corrects the code in the 302 function.

--
keywords: +patch
Added file: http://bugs.python.org/file30976/issue17214.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17214
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17214] http.client.HTTPConnection.putrequest encode error

2013-07-18 Thread Vajrasky Kok

Vajrasky Kok added the comment:

The script for demonstrating bug can be simplified to:

---
import urllib.request
url = 
http://www.libon.it/ricerca/7817940/3499155443/dettaglio/3102314/Onkel-Oswald-und-der-Sudan-Käfer/order/date_desc;

req = urllib.request.Request(url)
response = urllib.request.urlopen(req, timeout=30)
the_page = response.read().decode('utf-8')
print(the_page)
---

Attached the simple patch to solve this problem.

The question is whether we should fix this problem in urllib or not because 
strictly speaking the url should be ascii characters only. But if the Firefox 
can open this url, why not urllib?

I will contemplate about this problem and if I (or other people) think that 
urllib should handle url containing non-ascii characters, then I will add 
additional unit test.

Until then, people can use third party package, which is
request package from http://docs.python-requests.org/en/latest/


r = 
requests.get(http://www.libon.it/ricerca/7817940/3499155443/dettaglio/3102314/Onkel-Oswald-und-der-Sudan-Käfer/order/date_desc;)
print(r.text)


--
nosy: +vajrasky
Added file: 
http://bugs.python.org/file30964/patch_to_urllib_handle_non_ascii_char_in_url.txt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17214
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17214] http.client.HTTPConnection.putrequest encode error

2013-07-18 Thread Christian Heimes

Christian Heimes added the comment:

The problem may not be a bug but a deliberate design choice. urllib is rather 
low level and doesn't implement some browser magic. Browsers handle stuff like 
'ä' - '%C3%A4', ' ' - '%20' or IDNA but urllib doesn't. I always saw it as 
may responsibility to quote and encode everything myself. Higher level APIs 
such as requests are free to implement browser magic.

Contrary to common believes an URL with an umlaut or space is *not* a valid 
URI. From 
http://docs.python.org/3/library/urllib.request.html#urllib.request.Request

 url should be a string containing a valid URL.

I suggest that this ticket shall be closed as won't fix.

--
nosy: +christian.heimes

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17214
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17214] http.client.HTTPConnection.putrequest encode error

2013-07-18 Thread Vajrasky Kok

Vajrasky Kok added the comment:

I have no problem if this ticket is classified as won't fix.

I am writing this for the confused souls who want to use urllib to access url 
containing non-ascii characters:

import urllib.request
from urllib.parse import quote
url = 
http://www.libon.it/ricerca/7817940/3499155443/dettaglio/3102314/Onkel-Oswald-und-der-Sudan-Käfer/order/date_desc;

req = urllib.request.Request(url)
try:
req.selector.encode('ascii')
except UnicodeEncodeError:
req.selector = quote(req.selector)
response = urllib.request.urlopen(req, timeout=30)
the_page = response.read().decode('utf-8')
print(the_page)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17214
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17214] http.client.HTTPConnection.putrequest encode error

2013-07-18 Thread Lars Ivarsson

Lars Ivarsson added the comment:

The problem isn't the original requested url, as it is legit. The problem 
appears after the 302 redirect when a new (malformed) url is received from the 
server. There need to be some kind of check of the validity of that second url. 
And, preferably, an URLError returned if something is wrong.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17214
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17214] http.client.HTTPConnection.putrequest encode error

2013-07-18 Thread Vajrasky Kok

Vajrasky Kok added the comment:

Lars, I see.

For the uninitiated, the issue is the original url (containing only ascii 
character) redirects to the url containing non-ascii characters which upsets 
urllib.

To handle that situation, you can do something like this:
-
import urllib.request
from urllib.parse import quote
url = http://www.libon.it/libon/search/isbn/3499155443;
req = urllib.request.Request(url)
req.selector = urllib.parse.quote(req.selector)
response = urllib.request.urlopen(req, timeout=30)
the_page = response.read().decode('utf-8')
print(the_page)
-

I admit it that this code is clunky and not pythonic.

I also believe in python standard library, we should have a module to access 
url containing non-ascii character in an easy manner.

At the very least, maybe we can give proper error message. Something like this 
would be nice:

The url is not valid and contains non-ascii character: 
http://www.libon.it/ricerca/7817940/3499155443/dettaglio/3102314/Onkel-Oswald-und-der-Sudan-Käfer/order/date_desc.
 This url is redirected from this url: 
http://www.libon.it/libon/search/isbn/3499155443;

Because users can be confused. They thought they already gave 
only-ascii-characters url (http://www.libon.it/libon/search/isbn/3499155443) to 
urllib, but why did they get encoding error?

What do you say, Christian?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17214
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17214] http.client.HTTPConnection.putrequest encode error

2013-07-16 Thread LDTech

LDTech added the comment:

This problem still exist in Python 3.3.2. The following code gives you an 
example:

import urllib.request
url = http://www.libon.it/libon/search/isbn/3499155443;
req = urllib.request.Request(url)
response = urllib.request.urlopen(req, timeout=30)
the_page = response.read().decode('utf-8')
print(the_page)


Traceback (most recent call last):
  File C:\X\webpy.py, line 4, in module
response = urllib.request.urlopen(req, timeout=30)
  File C:\Python33\lib\urllib\request.py, line 156, in urlopen
return opener.open(url, data, timeout)
  File C:\Python33\lib\urllib\request.py, line 475, in open
response = meth(req, response)
  File C:\Python33\lib\urllib\request.py, line 587, in http_response
'http', request, response, code, msg, hdrs)
  File C:\Python33\lib\urllib\request.py, line 507, in error
result = self._call_chain(*args)
  File C:\Python33\lib\urllib\request.py, line 447, in _call_chain
result = func(*args)
  File C:\Python33\lib\urllib\request.py, line 692, in http_error_302
return self.parent.open(new, timeout=req.timeout)
  File C:\Python33\lib\urllib\request.py, line 469, in open
response = self._open(req, data)
  File C:\Python33\lib\urllib\request.py, line 487, in _open
'_open', req)
  File C:\Python33\lib\urllib\request.py, line 447, in _call_chain
result = func(*args)
  File C:\Python33\lib\urllib\request.py, line 1268, in http_open
return self.do_open(http.client.HTTPConnection, req)
  File C:\Python33\lib\urllib\request.py, line 1248, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
  File C:\Python33\lib\http\client.py, line 1061, in request
self._send_request(method, url, body, headers)
  File C:\Python33\lib\http\client.py, line 1089, in _send_request
self.putrequest(method, url, **skips)
  File C:\Python33\lib\http\client.py, line 953, in putrequest
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 78-79: 
ordinal not in range(128)

--
nosy: +LDTech

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17214
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17214] http.client.HTTPConnection.putrequest encode error

2013-02-22 Thread Terry J. Reedy

Terry J. Reedy added the comment:

Please give us
1. the exact Python version used. 3.2.3? or something earlier?
2. A minimal but complete example that we can run. What is 'headers'?
3. The complete traceback, not just the last two entries.
4. The result of running with the newer 3.3.0, if you possibly can. Perhaps the 
problem has already been fixed.

While line numbers have changed, even in 3.2.4 in repository, 3.2-3.4 all have

request = '%s %s %s' % (method, url, self._http_vsn_str)
# Non-ASCII characters should have been eliminated earlier
self._output(request.encode('ascii'))

Since there is nothing earlier in the function that would eliminate non-ascii, 
there must be an assumption about what happens earlier in the call chain. That 
might have already been fixed, which is why we need an example to test.

--
components: +Library (Lib) -Unicode
nosy: +orsenthil, terry.reedy
stage:  - test needed
type:  - behavior
versions: +Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17214
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17214] http.client.HTTPConnection.putrequest encode error

2013-02-16 Thread Mi Zou

Changes by Mi Zou aaasuoliw...@gmail.com:


--
title: urllib.client.HTTPConnection.putrequest encode  error - 
http.client.HTTPConnection.putrequest encode  error

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17214
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17214] http.client.HTTPConnection.putrequest encode error

2013-02-16 Thread Mi Zou

Mi Zou added the comment:

while urllib following the redirection(302):
http.client.HTTPConnection.putrequest raise an error:
#--
...
  File D:\Program Files\Python32\lib\http\client.py, line 1004, in 
_send_request
self.putrequest(method, url, **skips)
  File D:\Program Files\Python32\lib\http\client.py, line 868, in putrequest
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 108-111: 
ordinal not in range(128)
#--
in the sourcode i found that:
at line 811

def putrequest(self, method, url, skip_host=0,skip_accept_en...)
...

the argument url may be a unicode,and it was unquoted..
note
in my case:
...
purl=http://bbs.dospy.com/258attachdown.php?aid=14361277bbsid=349;
req=urllib.request.Request(purl,headers=headers)
response=urllib.request.urlopen(req)
...

then,the http serve redirect me to a file download url...
and the url contains some Chinese word
i have print out the argument url:

/f/1ba1f70606223af2aa5c3aeff6c6a46a/511f7b4c/day_111015/20111015_5949e996881b2e28403d26Ch6dOfj6LZ.rar/p/ÒâÁÖ03-08.part1.rar

--
resolution: invalid - 
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17214
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com