Bug#745823: libwww-perl: an https request with iso-8859-1 headers, chunked transfer and data with utf8 bit on is corrupted.

2014-04-27 Thread John Hughes

On 26/04/14 18:49, John Hughes wrote:


 if (ref($content_ref) eq 'CODE') {
 my $buf = $content_ref();
 $buf =  unless defined($buf);
+utf8::downgrade ($buf);
 $buf = sprintf %x%s%s%s, length($buf), $CRLF, $buf, $CRLF
 if $chunked;
 substr($buf, 0, 0) = $req_buf if $req_buf



But that's the wrong place to fix it.  The bug is realy in 
$socket-syswrite, aka Crypt::SSLeay::Conn::write.


That's where the bug should be fixed.


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#745823: libwww-perl: an https request with iso-8859-1 headers, chunked transfer and data with utf8 bit on is corrupted.

2014-04-27 Thread John Hughes

On 27/04/14 10:35, John Hughes wrote:
But that's the wrong place to fix it.  The bug is realy in 
$socket-syswrite, aka Crypt::SSLeay::Conn::write.


That's where the bug should be fixed.




This patch fixes it for me.


--- SSLeay.xs.dist	2007-08-13 19:42:33.0 +0200
+++ SSLeay.xs	2014-04-27 13:43:47.0 +0200
@@ -283,20 +283,40 @@
int len;
int offset = 0;
int n;
+   U8* tmpbuf = NULL;
 INPUT:
char* buf = SvPV(ST(1), blen);
 CODE:
+
+   if (DO_UTF8(ST(1))) {
+  STRLEN tmplen = blen;
+  bool is_utf8 = TRUE;
+  U8 * const result = bytes_from_utf8((const U8*) buf, tmplen, is_utf8);
+  if (is_utf8)
+  croak(Wide character in SSL write (bytes required));
+
+  if (result != (U8*)buf) { 
+  tmpbuf = result;
+  buf = (char*) tmpbuf;
+  blen = tmplen;
+  }
+   }
+
if (items  2) {
len = SvOK(ST(2)) ? SvIV(ST(2)) : blen;
if (items  3) {
offset = SvIV(ST(3));
if (offset  0) {
-   if (-offset  blen)
+   if (-offset  blen) {
+   Safefree(tmpbuf);
croak(Offset outside string);
+   }
offset += blen;
}
-   else if (offset = blen  blen  0)
+   else if (offset = blen  blen  0) {
+   Safefree(tmpbuf);
croak(Offset outside string);
+   }
}
if (len  blen - offset)
len = blen - offset;
@@ -311,6 +331,7 @@
else {
RETVAL = PL_sv_undef;
}
+   Safefree(tmpbuf);
 OUTPUT:
RETVAL
 


Bug#745823: libwww-perl: an https request with iso-8859-1 headers, chunked transfer and data with utf8 bit on is corrupted.

2014-04-26 Thread John Hughes

On 25/04/14 22:01, Niko Tyni wrote:

found 745823 6.06-1
thanks


My pleasure.

Interesting. I can reproduce this on (mostly current) sid with
libwww-perl 6.06-1.


Ah, I was going to test that Monday :-)

Quoting HTTP::Request documentation:

  $r-content( $bytes )

Note that the content should be a string of bytes.  Strings in
perl can contain characters outside the range of a byte.
The Encode module can be used to turn such strings into a
string of bytes.

So this is not totally unexpected, but the particular failure mode you've
run into is certainly rather horrible.

Possibly the content() method should croak when the UTF8 bit is set?
Interestingly in the my case, although the UTF8 bit is set, the data is 
all code points below 256.  In fact the first time i ran into the bug 
the data was XXX.  (Read from a file with binmode :utf8 on).


Maybe something like

if (utf8::is_utf8($data)) {
eval {
utf8::downgrade ($data);
};
croak content not bytes if $@;
}


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#745823: libwww-perl: an https request with iso-8859-1 headers, chunked transfer and data with utf8 bit on is corrupted.

2014-04-26 Thread John Hughes

On 26/04/14 16:11, John Hughes wrote:


Maybe something like

if (utf8::is_utf8($data)) {
eval {
utf8::downgrade ($data);
};
croak content not bytes if $@;
}



That's ridiculously over the top.

We could just unconditionaly call utf8::downgrade ($data);


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#745823: libwww-perl: an https request with iso-8859-1 headers, chunked transfer and data with utf8 bit on is corrupted.

2014-04-26 Thread John Hughes

On 26/04/14 18:12, John Hughes wrote:



We could just unconditionaly call utf8::downgrade ($data);


Interestingly there is code in HTTP::Message that does that, but we're 
going from LWP::Protocol::http::request to 
LWP::Protocol::https::Socket-syswrite.


This fixes it for me.

--- /usr/share/perl5/LWP/Protocol/http.pm   2010-01-22 22:44:52.0 
+0100
+++ /usr/local/share/perl/5.10.1/LWP/Protocol/http.pm   2014-04-26 
18:45:56.0 +0200
@@ -240,6 +240,7 @@
if (ref($content_ref) eq 'CODE') {
my $buf = $content_ref();
$buf =  unless defined($buf);
+   utf8::downgrade ($buf);
$buf = sprintf %x%s%s%s, length($buf), $CRLF, $buf, $CRLF
if $chunked;
substr($buf, 0, 0) = $req_buf if $req_buf


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#745823: libwww-perl: an https request with iso-8859-1 headers, chunked transfer and data with utf8 bit on is corrupted.

2014-04-25 Thread John Hughes
Package: libwww-perl
Version: 5.836-1
Severity: normal

This was horrible to narrow down, but:

1. I'm doing a POST to a HTTPS url
2. Some of my headers containg iso-8859-1 data
3. The body is sent with transfer-encoding: chunked
4. the is_utf8 bit was set on the data (although it happens to be
   all in code points  256).

(changing *any* of these conditions makes the bug go away).

The request headers get corrupted, sent in utf-8 instead of iso-8859-1

some of the data doesn't get sent, messing up the chunked counts, or
even trashing the request headers.

The number of missing bytes seems related to the difference in length
between the iso-8859-1 headers and the incorrect utf-8 versions.

For example my request should look like:


POST / HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: localhost:4433
User-Agent: LWP UTF8 BUG
Subject: 
Transfer-Encoding: chunked

1
®
0



But it is sent as:


POST / HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: localhost:4433
User-Agent: LWP UTF8 BUG
Subject: ®®®®®®®®®®®®
Transfer-Encoding: chunk0



Here's my test program:


#! /usr/bin/perl

use strict;
use LWP::UserAgent;

my $agent = LWP::UserAgent-new (agent = 'LWP UTF8 BUG');

# Bug only happens if https
my $req = HTTP::Request-new (POST = 'https://localhost:4433');

# Bug only happens if utf8 bit is set on data to be written
my $body = substr (\x{f00f}\xae, 1, 1);

print utf8 bit set\n if utf8::is_utf8($body);

# Bug only happens with chunked content
my $read_body = sub {
my $buf = $body;
$body = ;
$buf
};

$req-content ($read_body);

# Bug only happens if header with iso-8859-1 data
$req-header (Subject = \xae x 12);

my $ret = $agent-request ($req);

# Request sent is malformed - iso-8859-1 data sent as utf-8 and
# bytes missing from output (number of bytes missing equal to
# difference in length between iso-8859-1 and utf-8 representations.
---



-- System Information:
Debian Release: 6.0.7
  APT prefers oldstable
  APT policy: (500, 'oldstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/1 CPU core)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages libwww-perl depends on:
ii  libhtml-parser-perl3.66-1collection of modules that parse H
ii  libhtml-tagset-perl3.20-2Data tables pertaining to HTML
ii  libhtml-tree-perl  3.23-2Perl module to represent and creat
ii  liburi-perl1.54-2module to manipulate and access UR
ii  netbase4.45  Basic TCP/IP networking system
ii  perl   5.10.1-17squeeze6 Larry Wall's Practical Extraction 

Versions of packages libwww-perl recommends:
ii  libhtml-format-perl2.04-2format HTML syntax trees into text
ii  libio-compress-perl2.024-1   bundle of IO::Compress modules
ii  libmailtools-perl  2.06-1Manipulate email in perl programs
ii  perl [libio-compress-p 5.10.1-17squeeze6 Larry Wall's Practical Extraction 

Versions of packages libwww-perl suggests:
ii  libcrypt-ssleay-perl 0.57-2  Support for https protocol in LWP
ii  libio-socket-ssl-perl1.33-1+squeeze1 Perl module implementing object or

-- no debconf information


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#745823: libwww-perl: an https request with iso-8859-1 headers, chunked transfer and data with utf8 bit on is corrupted.

2014-04-25 Thread Niko Tyni
found 745823 6.06-1
thanks

On Fri, Apr 25, 2014 at 05:10:36PM +0200, John Hughes wrote:
 Package: libwww-perl
 Version: 5.836-1
 Severity: normal
 
 This was horrible to narrow down, but:
 
 1. I'm doing a POST to a HTTPS url
 2. Some of my headers containg iso-8859-1 data
 3. The body is sent with transfer-encoding: chunked
 4. the is_utf8 bit was set on the data (although it happens to be
all in code points  256).
 
 (changing *any* of these conditions makes the bug go away).
 
 The request headers get corrupted, sent in utf-8 instead of iso-8859-1
 
 some of the data doesn't get sent, messing up the chunked counts, or
 even trashing the request headers.
 
 The number of missing bytes seems related to the difference in length
 between the iso-8859-1 headers and the incorrect utf-8 versions.

Interesting. I can reproduce this on (mostly current) sid with
libwww-perl 6.06-1.

 Here's my test program:

[...]
 # Bug only happens if https
 my $req = HTTP::Request-new (POST = 'https://localhost:4433');
 
 # Bug only happens if utf8 bit is set on data to be written
 my $body = substr (\x{f00f}\xae, 1, 1);
 
 print utf8 bit set\n if utf8::is_utf8($body);
 
 # Bug only happens with chunked content
 my $read_body = sub {
   my $buf = $body;
   $body = ;
   $buf
 };
 
 $req-content ($read_body);

Quoting HTTP::Request documentation:

 $r-content( $bytes )
   This is used to get/set the content and it is inherited from
   the HTTP::Message base class.  See HTTP::Message for details
   and other methods that can be used to access the content.

   Note that the content should be a string of bytes.  Strings in
   perl can contain characters outside the range of a byte.
   The Encode module can be used to turn such strings into a
   string of bytes.

So this is not totally unexpected, but the particular failure mode you've
run into is certainly rather horrible.

Possibly the content() method should croak when the UTF8 bit is set? 
(I suppose it can't encode the string automatically as it doesn't know
which encoding should be used.)
-- 
Niko Tyni   nt...@debian.org


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org