Bugs item #1153027, was opened at 2005-02-27 20:16
Message generated for change (Comment added) made by jjlee
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1153027&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: pristine777 (pristine777)
Assigned to: Nobody/Anonymous (nobody)
Summary: http_error_302() crashes with 'HTTP/1.1 400 Bad Request
Initial Comment:
I was able to get to a website by using both IE and
FireFox but my Python code kept giving HTTP 400 Bad
request error. To debug, I set set_http_debuglevel(1) as
in the following code:
hh = urllib2.HTTPHandler()
hh.set_http_debuglevel(1)
opener = urllib2.build_opener
(hh,urllib2.HTTPCookieProcessor(self.cj))
The printed debug messages show that this crash
happens when there is a space in the redirected
location. Here's a cut-and-paste of the relevant debug
messages (note the line starting with send that
http_error_302 is sending):
reply: 'HTTP/1.1 302 Moved Temporarily\r\n'
header: Connection: close
header: Date: Sun, 27 Feb 2005 19:52:51 GMT
header: Server: Microsoft-IIS/6.0
<---other header data-->
send: 'GET /myEmail/User?asOf=02/26/2005 11:38:12
PM&
ddn=87cb51501730
<---remaining header data-->
reply: 'HTTP/1.1 400 Bad Request\r\n'
header: Content-Type: text/html
header: Date: Sun, 27 Feb 2005 19:56:45 GMT
header: Connection: close
header: Content-Length: 20
To fix this, I first tried to encode the redirected location
in the function http_error_302() in urllib2 using the
methods urllib.quote and urllib.urlencode but to no avail
(they encode other data as well).
A temporary solution that works is to replace any space
in the redirected URL by'%20'. Below is a snippet of the
function http_error_302 in urllib2 with this suggested fix:
def http_error_302(self, req, fp, code, msg, headers):
# Some servers (incorrectly) return multiple
Location headers
# (so probably same goes for URI). Use first
header.
if 'location' in headers:
newurl = headers.getheaders('location')[0]
elif 'uri' in headers:
newurl = headers.getheaders('uri')[0]
else:
return
newurl=newurl.replace(' ','%20') # <<< TEMP FIX -
inserting this line temporarily fixes this problem
newurl = urlparse.urljoin(req.get_full_url(), newurl)
<--- remainder of this function -->
Thanks!
----------------------------------------------------------------------
Comment By: John J Lee (jjlee)
Date: 2005-05-19 20:30
Message:
Logged In: YES
user_id=261020
Sure, but if Firefox and IE do it, probably we should do the
same.
I think cookielib.escape_path(), or something similar
(perhaps without the case normalisation) is probably the
right thing to do. That's not part of any documented API; I
suppose that function or a similar one should be added to
module urlparse, and used by urllib2 and urllib when
redirecting.
----------------------------------------------------------------------
Comment By: Jeff Epler (jepler)
Date: 2005-03-01 17:41
Message:
Logged In: YES
user_id=2772
When the server sends the 302 response with 'Location:
http://example.com/url%20with%20whitespace', urllib2 seems
to work just fine.
I believe based on reading rfc2396 that a URL that contains
spaces must contain quoted spaces (%20) not literal spaces,
because space is not an "unreserved character" [2.3] and
"[d]ata must be escaped if it does not have a representation
using an unreserved character" [2.4].
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1153027&group_id=5470
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com