[ python-Bugs-1411097 ] urllib2.urlopen() hangs due to use of socket._fileobject?

SourceForge.net Sat, 21 Jan 2006 14:10:56 -0800

Bugs item #1411097, was opened at 2006-01-20 20:26
Message generated for change (Comment added) made by jjlee
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1411097&group_id=5470


Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: John J Lee (jjlee)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib2.urlopen() hangs due to use of socket._fileobject?

Initial Comment:
To reproduce:

import urllib2
print urllib2.urlopen("http://66.117.37.13/";).read()


The attached patch "fixes" the hang, but that patch is
not acceptable because it also removes the .readline()
and .readlines() methods on the response object
returned by urllib2.urlopen().

The patch seems to demonstrate that the problem is
caused by the (ab)use of socket._fileobject in
urllib2.AbstractHTTPHandler (I believe this hack was
introduced when urllib2 switched to using
httplib.HTTPConnection).

Not sure yet what the actual problem is...


----------------------------------------------------------------------

>Comment By: John J Lee (jjlee)
Date: 2006-01-21 22:10

Message:
Logged In: YES 
user_id=261020

In fact the commit message for rev 36871 states the real
reason _fileobject is used (handling chunked encoding),
showing my workaround is even more harmful than I thought. 
Moreover, doing a urlopen on 66.117.37.13 shows the response
*is* chunked.

The problem seems to be caused by httplib failing to find a
CRLF at the end of the chunked response, so the loop at the
end of _read_chunked() never terminates.  Haven't looked in
detail yet, but I'm guessing a) it's the server's fault and
b) httplib should work around it.


Here's the commit message from 36871:


Fix urllib2.urlopen() handling of chunked content encoding.

The change to use the newer httplib interface admitted the
possibility
that we'd get an HTTP/1.1 chunked response, but the code
didn't handle
it correctly.  The raw socket object can't be pass to
addinfourl(),
because it would read the undecoded response.  Instead,
addinfourl()
must call HTTPResponse.read(), which will handle the decoding.

One extra wrinkle is that the HTTPReponse object can't be
passed to
addinfourl() either, because it doesn't implement readline() or
readlines().  As a quick hack, use socket._fileobject(), which
implements those methods on top of a read buffer. 
(suggested by mwh)

Finally, add some tests based on test_urllibnet.

Thanks to Andrew Sawyers for originally reporting the
chunked problem.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1411097&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[ python-Bugs-1411097 ] urllib2.urlopen() hangs due to use of socket._fileobject?

Reply via email to