This is a problem because while HTTP/1.0 servers are expected to close the connection once the response is finished, HTTP/1.1 servers are allowed keep the connection open. urllib assumes that the connection always closes. Therefore, when urllib receives an HTTP/1.1 response, it hangs until the server feels inclined to close the connection. Obviously the server is wrong, since HTTP/1.0 requests should only receive HTTP/1.0 responses, but I can't do anything about that.
Now, there is actually code in httplib that would allow urllib to correctly understand HTTP/1.1 responses, if only urllib used it. After the headers have been parsed, urllib calls getfile(), but I think it should call getresponse() instead. The result of getresponse() is almost like a file; it just needs readline(s) and iteration. In fact, perhaps httplib's getfile() should be deprecated, since HTTP/1.1 has several options for encoding the response body (chunking and compression) and users of httplib shouldn't have to know about those encodings. Users should use getresponse() instead.
So, is it worth my time to fix urllib and httplib, making urllib use getresponse() instead of getfile()? Would the changes be accepted? Is anyone else working on something similar?
BTW, this is on Python 2.3.5, but I haven't spotted any changes between Python 2.3.5 and current CVS that would have fixed the problem. I'll start trying Python CVS in a moment.
Shane -- http://mail.python.org/mailman/listinfo/python-list