Bug#745823: libwww-perl: an https request with iso-8859-1 headers, chunked transfer and data with utf8 bit on is corrupted.
On 26/04/14 18:49, John Hughes wrote: if (ref($content_ref) eq 'CODE') { my $buf = $content_ref(); $buf = unless defined($buf); +utf8::downgrade ($buf); $buf = sprintf %x%s%s%s, length($buf), $CRLF, $buf, $CRLF if $chunked; substr($buf, 0, 0) = $req_buf if $req_buf But that's the wrong place to fix it. The bug is realy in $socket-syswrite, aka Crypt::SSLeay::Conn::write. That's where the bug should be fixed. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#745823: libwww-perl: an https request with iso-8859-1 headers, chunked transfer and data with utf8 bit on is corrupted.
On 27/04/14 10:35, John Hughes wrote: But that's the wrong place to fix it. The bug is realy in $socket-syswrite, aka Crypt::SSLeay::Conn::write. That's where the bug should be fixed. This patch fixes it for me. --- SSLeay.xs.dist 2007-08-13 19:42:33.0 +0200 +++ SSLeay.xs 2014-04-27 13:43:47.0 +0200 @@ -283,20 +283,40 @@ int len; int offset = 0; int n; + U8* tmpbuf = NULL; INPUT: char* buf = SvPV(ST(1), blen); CODE: + + if (DO_UTF8(ST(1))) { + STRLEN tmplen = blen; + bool is_utf8 = TRUE; + U8 * const result = bytes_from_utf8((const U8*) buf, tmplen, is_utf8); + if (is_utf8) + croak(Wide character in SSL write (bytes required)); + + if (result != (U8*)buf) { + tmpbuf = result; + buf = (char*) tmpbuf; + blen = tmplen; + } + } + if (items 2) { len = SvOK(ST(2)) ? SvIV(ST(2)) : blen; if (items 3) { offset = SvIV(ST(3)); if (offset 0) { - if (-offset blen) + if (-offset blen) { + Safefree(tmpbuf); croak(Offset outside string); + } offset += blen; } - else if (offset = blen blen 0) + else if (offset = blen blen 0) { + Safefree(tmpbuf); croak(Offset outside string); + } } if (len blen - offset) len = blen - offset; @@ -311,6 +331,7 @@ else { RETVAL = PL_sv_undef; } + Safefree(tmpbuf); OUTPUT: RETVAL
Bug#745823: libwww-perl: an https request with iso-8859-1 headers, chunked transfer and data with utf8 bit on is corrupted.
On 25/04/14 22:01, Niko Tyni wrote: found 745823 6.06-1 thanks My pleasure. Interesting. I can reproduce this on (mostly current) sid with libwww-perl 6.06-1. Ah, I was going to test that Monday :-) Quoting HTTP::Request documentation: $r-content( $bytes ) Note that the content should be a string of bytes. Strings in perl can contain characters outside the range of a byte. The Encode module can be used to turn such strings into a string of bytes. So this is not totally unexpected, but the particular failure mode you've run into is certainly rather horrible. Possibly the content() method should croak when the UTF8 bit is set? Interestingly in the my case, although the UTF8 bit is set, the data is all code points below 256. In fact the first time i ran into the bug the data was XXX. (Read from a file with binmode :utf8 on). Maybe something like if (utf8::is_utf8($data)) { eval { utf8::downgrade ($data); }; croak content not bytes if $@; } -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#745823: libwww-perl: an https request with iso-8859-1 headers, chunked transfer and data with utf8 bit on is corrupted.
On 26/04/14 16:11, John Hughes wrote: Maybe something like if (utf8::is_utf8($data)) { eval { utf8::downgrade ($data); }; croak content not bytes if $@; } That's ridiculously over the top. We could just unconditionaly call utf8::downgrade ($data); -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#745823: libwww-perl: an https request with iso-8859-1 headers, chunked transfer and data with utf8 bit on is corrupted.
On 26/04/14 18:12, John Hughes wrote: We could just unconditionaly call utf8::downgrade ($data); Interestingly there is code in HTTP::Message that does that, but we're going from LWP::Protocol::http::request to LWP::Protocol::https::Socket-syswrite. This fixes it for me. --- /usr/share/perl5/LWP/Protocol/http.pm 2010-01-22 22:44:52.0 +0100 +++ /usr/local/share/perl/5.10.1/LWP/Protocol/http.pm 2014-04-26 18:45:56.0 +0200 @@ -240,6 +240,7 @@ if (ref($content_ref) eq 'CODE') { my $buf = $content_ref(); $buf = unless defined($buf); + utf8::downgrade ($buf); $buf = sprintf %x%s%s%s, length($buf), $CRLF, $buf, $CRLF if $chunked; substr($buf, 0, 0) = $req_buf if $req_buf -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#745823: libwww-perl: an https request with iso-8859-1 headers, chunked transfer and data with utf8 bit on is corrupted.
Package: libwww-perl Version: 5.836-1 Severity: normal This was horrible to narrow down, but: 1. I'm doing a POST to a HTTPS url 2. Some of my headers containg iso-8859-1 data 3. The body is sent with transfer-encoding: chunked 4. the is_utf8 bit was set on the data (although it happens to be all in code points 256). (changing *any* of these conditions makes the bug go away). The request headers get corrupted, sent in utf-8 instead of iso-8859-1 some of the data doesn't get sent, messing up the chunked counts, or even trashing the request headers. The number of missing bytes seems related to the difference in length between the iso-8859-1 headers and the incorrect utf-8 versions. For example my request should look like: POST / HTTP/1.1 TE: deflate,gzip;q=0.3 Connection: TE, close Host: localhost:4433 User-Agent: LWP UTF8 BUG Subject: Transfer-Encoding: chunked 1 ® 0 But it is sent as: POST / HTTP/1.1 TE: deflate,gzip;q=0.3 Connection: TE, close Host: localhost:4433 User-Agent: LWP UTF8 BUG Subject: ®®®®®®®®®®®® Transfer-Encoding: chunk0 Here's my test program: #! /usr/bin/perl use strict; use LWP::UserAgent; my $agent = LWP::UserAgent-new (agent = 'LWP UTF8 BUG'); # Bug only happens if https my $req = HTTP::Request-new (POST = 'https://localhost:4433'); # Bug only happens if utf8 bit is set on data to be written my $body = substr (\x{f00f}\xae, 1, 1); print utf8 bit set\n if utf8::is_utf8($body); # Bug only happens with chunked content my $read_body = sub { my $buf = $body; $body = ; $buf }; $req-content ($read_body); # Bug only happens if header with iso-8859-1 data $req-header (Subject = \xae x 12); my $ret = $agent-request ($req); # Request sent is malformed - iso-8859-1 data sent as utf-8 and # bytes missing from output (number of bytes missing equal to # difference in length between iso-8859-1 and utf-8 representations. --- -- System Information: Debian Release: 6.0.7 APT prefers oldstable APT policy: (500, 'oldstable') Architecture: amd64 (x86_64) Kernel: Linux 2.6.32-5-amd64 (SMP w/1 CPU core) Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages libwww-perl depends on: ii libhtml-parser-perl3.66-1collection of modules that parse H ii libhtml-tagset-perl3.20-2Data tables pertaining to HTML ii libhtml-tree-perl 3.23-2Perl module to represent and creat ii liburi-perl1.54-2module to manipulate and access UR ii netbase4.45 Basic TCP/IP networking system ii perl 5.10.1-17squeeze6 Larry Wall's Practical Extraction Versions of packages libwww-perl recommends: ii libhtml-format-perl2.04-2format HTML syntax trees into text ii libio-compress-perl2.024-1 bundle of IO::Compress modules ii libmailtools-perl 2.06-1Manipulate email in perl programs ii perl [libio-compress-p 5.10.1-17squeeze6 Larry Wall's Practical Extraction Versions of packages libwww-perl suggests: ii libcrypt-ssleay-perl 0.57-2 Support for https protocol in LWP ii libio-socket-ssl-perl1.33-1+squeeze1 Perl module implementing object or -- no debconf information -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#745823: libwww-perl: an https request with iso-8859-1 headers, chunked transfer and data with utf8 bit on is corrupted.
found 745823 6.06-1 thanks On Fri, Apr 25, 2014 at 05:10:36PM +0200, John Hughes wrote: Package: libwww-perl Version: 5.836-1 Severity: normal This was horrible to narrow down, but: 1. I'm doing a POST to a HTTPS url 2. Some of my headers containg iso-8859-1 data 3. The body is sent with transfer-encoding: chunked 4. the is_utf8 bit was set on the data (although it happens to be all in code points 256). (changing *any* of these conditions makes the bug go away). The request headers get corrupted, sent in utf-8 instead of iso-8859-1 some of the data doesn't get sent, messing up the chunked counts, or even trashing the request headers. The number of missing bytes seems related to the difference in length between the iso-8859-1 headers and the incorrect utf-8 versions. Interesting. I can reproduce this on (mostly current) sid with libwww-perl 6.06-1. Here's my test program: [...] # Bug only happens if https my $req = HTTP::Request-new (POST = 'https://localhost:4433'); # Bug only happens if utf8 bit is set on data to be written my $body = substr (\x{f00f}\xae, 1, 1); print utf8 bit set\n if utf8::is_utf8($body); # Bug only happens with chunked content my $read_body = sub { my $buf = $body; $body = ; $buf }; $req-content ($read_body); Quoting HTTP::Request documentation: $r-content( $bytes ) This is used to get/set the content and it is inherited from the HTTP::Message base class. See HTTP::Message for details and other methods that can be used to access the content. Note that the content should be a string of bytes. Strings in perl can contain characters outside the range of a byte. The Encode module can be used to turn such strings into a string of bytes. So this is not totally unexpected, but the particular failure mode you've run into is certainly rather horrible. Possibly the content() method should croak when the UTF8 bit is set? (I suppose it can't encode the string automatically as it doesn't know which encoding should be used.) -- Niko Tyni nt...@debian.org -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org