Re: [Python-Dev] httplib and bad response chunking

2006-07-31 Thread Greg Ward
[me, on 25 July]
 I have
 discovered other hypothetical cases of bad chunking that cause httplib
 to go into an infinite loop or block forever on socket.readline().
 Should we worry about those cases as well, despite not having seen them
 happen in the wild?  More annoying, I can reproduce the block forever
 case using a real socket, but not using the StringIO-based FakeSocket
 class in test_httplib.

[John J Lee]
 They have been seen in the wild :-)
 
 http://python.org/sf/1411097

Thanks -- that was really all the encouragement I needed to keep banging
away at this bug.

Did you look at the crude attempt at testing for this bug that I hacked
into test_httplib.py?  I posted it to bug #1486335 here:

  
http://sourceforge.net/tracker/download.php?group_id=5470atid=105470file_id=186245aid=1486335

The idea is simple: put various chunked responses into strings and then
feed those strings to HTTPConnection.  The trouble is that StringIO does
not behave the same as a real socket: where HTTPResponse fails one way
reading from a real socket (eg. infinite loop), it fails differently (or
not at all) reading from a StringIO.  Makes testing with the FakeSocket
class in test_httplib.py problematic.

Maybe the right way to test httplib is to spawn a server process
(thread?) to listen on some random port, feed various HTTP responses at
HTTPConnection/HTTPResponse, and see what happens.  I'm not sure how to
do that portably, though.  Well, I'll see if I can whip up a Unix-y
solution and see if anyone knows how to make it portable.

Greg
-- 
Greg Ward [EMAIL PROTECTED] http://www.gerg.ca/
Be careful: sometimes, you're only standing on the shoulders of idiots.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] httplib and bad response chunking

2006-07-30 Thread Gregory P. Smith
On Tue, Jul 25, 2006 at 10:32:13PM -0400, Greg Ward wrote:

 what I discovered in the wild the other day was a response like this:
 
   0005\r\nabcd\n\r\n0004\r\nabc\n\r\n\r\n
 
 i.e. the chunk-size for the terminating empty chunk was missing.
 This cause httplib.py to blow up with ValueError because it tried to
 call
 
   int(line, 16)
 
 assuming that 'line' contained a hex number, when in fact it was the
 empty string.  Oops.
 
 IMHO the minimal fix is to turn ValueError into HTTPException (or a
 subclass thereof); httplib should not raise ValueError just because some
 server sends a bad response.  (The server in question was Apache 2.0.52
 running PHP 4.3.9 sending a big hairy error page because the database
 was down.)

IMNSHO httplib should be fixed and this shouldn't be an error at all
as its in the wild and will only show up more and more in the future.
Plus file a bug with the apache or php project as appropriate for
having a non-RFC compliant response.  This is part of the good old
network programming addage of being lenient in what you accept.

-g

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] httplib and bad response chunking

2006-07-28 Thread Greg Ward
So I accidentally discovered the other day that httplib does not handle
a particular type of mangled HTTP response very well.  In particular, it
tends to blow up with an undocumented ValueError when the server screws
up chunked encoding.  I'm not the first to discover this, either: see
http://www.python.org/sf/1486335 .

digression
HTTP 1.1 response chunking allows clients to know how many bytes of
response to expect for dynamic content, i.e. when it's not possible to
include a Content-length header.  A chunked response might look like
this:

  0005\r\nabcd\n\r\n0004\r\nabc\n\r\n0\r\n\r\n

which means:
  0x0005 bytes in first chunk, which is abcd\n
  0x0004 bytes in second chunk, which is abc\n

Each chunk size is terminated with \r\n; each chunk is terminated with
\r\n; end of response is indicated by a chunk of 0 bytes, hence the
\r\n\r\n at the end.

Details in RFC 2616:
  http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1
/digression

Anyways, what I discovered in the wild the other day was a response like
this:

  0005\r\nabcd\n\r\n0004\r\nabc\n\r\n\r\n

i.e. the chunk-size for the terminating empty chunk was missing.
This cause httplib.py to blow up with ValueError because it tried to
call

  int(line, 16)

assuming that 'line' contained a hex number, when in fact it was the
empty string.  Oops.

IMHO the minimal fix is to turn ValueError into HTTPException (or a
subclass thereof); httplib should not raise ValueError just because some
server sends a bad response.  (The server in question was Apache 2.0.52
running PHP 4.3.9 sending a big hairy error page because the database
was down.)

Where I'm getting hung up is how far to test this stuff.  I have
discovered other hypothetical cases of bad chunking that cause httplib
to go into an infinite loop or block forever on socket.readline().
Should we worry about those cases as well, despite not having seen them
happen in the wild?  More annoying, I can reproduce the block forever
case using a real socket, but not using the StringIO-based FakeSocket
class in test_httplib.

Anyways, I've cobbled together a crude hack to test_httplib.py that
exposes the problem:

  
http://sourceforge.net/tracker/download.php?group_id=5470atid=105470file_id=186245aid=1486335

Feedback welcome.  (Fixing the inadvertent ValueError is trivial, so I'm
concentrating on getting the tests right first.)

Oh yeah, my patch is relative to the 2.4 branch.

Greg
-- 
Greg Ward [EMAIL PROTECTED] http://www.gerg.ca/
I don't believe there really IS a GAS SHORTAGE.. I think it's all just
a BIG HOAX on the part of the plastic sign salesmen -- to sell more numbers!!
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] httplib and bad response chunking

2006-07-28 Thread John J Lee
On Tue, 25 Jul 2006, Greg Ward wrote:
[...]
 Where I'm getting hung up is how far to test this stuff.

Stop when you run out of time ;-)

 I have
 discovered other hypothetical cases of bad chunking that cause httplib
 to go into an infinite loop or block forever on socket.readline().
 Should we worry about those cases as well, despite not having seen them
 happen in the wild?  More annoying, I can reproduce the block forever
 case using a real socket, but not using the StringIO-based FakeSocket
 class in test_httplib.

They have been seen in the wild :-)

http://python.org/sf/1411097


The IP address referenced isn't under my control, I don't know if it still 
provokes the error, but the problem is clear.


John

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com