Edit report at http://bugs.php.net/bug.php?id=51775&edit=1

 ID:               51775
 Comment by:       vic at zymsys dot com
 Reported by:      vic at zymsys dot com
 Summary:          Chunked response parsing error
 Status:           Feedback
 Type:             Bug
 Package:          SOAP related
 Operating System: CentOS 4.8
 PHP Version:      5.3SVN-2010-05-09 (snap)
 Assigned To:      dmitry

 New Comment:

Do you suggest we re-assign this but to someone who handles the
php_stream_gets() function then?



I can try to create a test case for this, but I'll have a learning curve
to overcome, and I won't have enough time for a while to dedicate to
that.



Digging deeper, php_stream_gets() is really _php_stream_get_line(), and
_php_stream_get_line() calls php_stream_locate_eol() to get a pointer to
the end of line byte.  It looks like php_stream_locate_eol() tries to be
clever about detecting Mac or Unix EOL, and that it treats CR/LF pairs
as Unix, so it would return the CR as part of the string.  So for it to
act as it is in this case, it must be somehow detecting this as
containing old Mac style line breaks, and stopping at the CR.  Here's a
hex dump of what comes in from the start of the HTTP response to the
chunk where it gets thrown off.  It looks like consistent CR/LF pairs to
me so I'm not sure why it would think this response was using just CR
EOL markers.  



I'll have to see if I can get this running under gdb to see why the
stream thinks it is Mac style (CR) EOL markers.



00000760  00 00 01 01 08 0a 01 17  ed ec c0 12 e4 f1 48 54 
|..............HT|

00000770  54 50 2f 31 2e 31 20 32  30 30 20 4f 4b 0d 0a 43  |TP/1.1 200
OK..C|

00000780  6f 6e 6e 65 63 74 69 6f  6e 3a 20 6b 65 65 70 2d  |onnection:
keep-|

00000790  61 6c 69 76 65 0d 0a 54  72 61 6e 73 66 65 72 2d 
|alive..Transfer-|

000007a0  45 6e 63 6f 64 69 6e 67  3a 20 63 68 75 6e 6b 65  |Encoding:
chunke|

000007b0  64 0d 0a 56 69 61 3a 20  31 2e 31 20 42 50 4c 30  |d..Via: 1.1
BPL0|

000007c0  34 32 20 28 56 6f 72 64  65 6c 29 2c 20 31 2e 31  |42
(Vordel), 1.1|

000007d0  20 65 70 6c 32 30 33 20  28 56 6f 72 64 65 6c 29  | epl203
(Vordel)|

000007e0  0d 0a 44 61 74 65 3a 20  46 72 69 2c 20 30 37 20  |..Date:
Fri, 07 |

000007f0  4d 61 79 20 32 30 31 30  20 32 30 3a 30 34 3a 32  |May 2010
20:04:2|

00000800  34 20 47 4d 54 0d 0a 53  4f 41 50 41 63 74 69 6f  |4
GMT..SOAPActio|

00000810  6e 3a 20 22 22 0d 0a 58  2d 50 6f 77 65 72 65 64  |n:
""..X-Powered|

00000820  2d 42 79 3a 20 53 65 72  76 6c 65 74 2f 32 2e 35  |-By:
Servlet/2.5|

00000830  20 4a 53 50 2f 32 2e 31  0d 0a 43 6f 6e 74 65 6e  |
JSP/2.1..Conten|

00000840  74 2d 54 79 70 65 3a 20  74 65 78 74 2f 78 6d 6c  |t-Type:
text/xml|

00000850  3b 20 63 68 61 72 73 65  74 3d 22 75 74 66 2d 38  |;
charset="utf-8|

00000860  22 0d 0a 0d 0a 30 30 30  30 30 31 62 63 0d 0a 3c 
|"....000001bc..<|

00000870  65 6e 76 3a 45 6e 76 65  6c 6f 70 65 20 78 6d 6c 
|env:Envelope xml|


Previous Comments:
------------------------------------------------------------------------
[2010-05-28 11:35:04] dmi...@php.net

I can't reproduce the issue. Anyway php_stream_gets() must make its work
proper and ext/soap doesn't have to care about its mistakes. Especially
because '\n' may be a valid character in data stream.

------------------------------------------------------------------------
[2010-05-09 06:07:09] vic at zymsys dot com

Description:
------------
I was getting an error from a SoapClient call:  "Error Fetching http
body, No Content-Length, connection closed or chunked data".  Thing was
I couldn't see any problem with the HTTP response.



I tracked the problem down to the get_http_body function in
ext/soap/php_http.c, where it reads the chunk size using
php_stream_gets().  That's returning the chunk size plus the CR (0d) but
leaving the LF (0a) unread.  Then the unread LF gets read with HTTP
chunk, and the attempt to read the next chunk size starts with the last
character of the HTTP chunk, since it's behind one thanks to the unread
LF.



Here's a chunk of the response that throws it off, with the chunk size
(000001bc) in the middle, surrounded by CR/LF pairs.



00000850  3b 20 63 68 61 72 73 65  74 3d 22 75 74 66 2d 38  |;
charset="utf-8|

00000860  22 0d 0a 0d 0a 30 30 30  30 30 31 62 63 0d 0a 3c 
|"....000001bc..<|

00000870  65 6e 76 3a 45 6e 76 65  6c 6f 70 65 20 78 6d 6c 
|env:Envelope xml|



I added a little code under the line that adjusts the http buffer
allocation, and above the loop that reads the chunk, and this solved the
problem for me:



ch = php_stream_getc(stream); /* php_stream_gets may stop after CR and
leave LF in the buffer.  If so, we need to eat it. */

if (ch != '\n') {

    // Nope, it wasn't a LF.  Put it at the start of the current buffer,
and advance one character.

    http_buf[http_buf_size] = ch;

    len_size++;

    http_buf_size++;

}



This reads the next character, and if it is an LF it eats it, and if it
isn't it adds it to the http buffer.



I wanted to run this by someone more experienced hacking on the php
source before going any further to make sure the bug is legit, and the
fix looks at all sane.



------------------------------------------------------------------------



-- 
Edit this bug report at http://bugs.php.net/bug.php?id=51775&edit=1

Reply via email to