On Thu, 19 May 2016 05:32:10 -0700, hmbrand wrote:
> Two issues
> 
> I use this to test:
> --8<---
> #!perl6
> 
> use v6;
> 
> use Test;
> 
> for ^32 {
> 
> say "";
> say $_;
> 
> my @data = ^20 .map({ 256.rand.Int }).list;
> @data.unshift: 61;
> 
> #dd @data;
> 
> my $b = Buf.new(@data);
> 
> ok((my Str $u = $b.decode("utf8-c8")), "decode");
> 
> my @back = $u.encode("utf8-c8").list;
> 
> #dd @back;
> 
> my $n = Buf.new(@back);
> 
> is-deeply($n, $b, "Data");
> }
> -->8---
> 
> First issue is that the buffer returns something longer than the
> original (a \0 is added):
> 
> # expected:
> Buf.new(61,29,61,200,30,99,107,150,71,11,253,134,110,27,35,227,88,140,180,158,209)
> #      got:
> Buf.new(61,29,61,200,30,99,107,150,71,11,253,134,110,27,35,227,88,140,180,158,209,0)
> 
> # expected:
> Buf.new(61,2,71,91,58,252,6,247,88,58,121,32,124,129,191,126,36,222,185,109,213)
> #      got:
> Buf.new(61,2,71,91,58,252,6,247,88,58,121,32,124,129,191,126,36,222,185,109,213,0)
> 
> The second issue is more fun, pairs are swapped:
> 
> # expected:
> Buf.new(61,147,135,8,82,78,208,66,205,164,204,162,140,97,175,37,108,194,27,192,119)
> #      got:
> Buf.new(61,147,135,8,82,78,208,66,204,162,205,164,140,97,175,37,108,194,27,192,119)
> 
> 205,164,204,162 => 204,162,205,164

I've done a total re-write of the UTF8-C8 decoder. The original approach turned 
out to be a lot too fragile, so I took a different approach. Along the way, I 
got it to properly handle the cases where normalization would re-order, fixing 
all of the examples above. It also fixes the various things that ASAN/Valgrind 
tripped over in the failing tests, and the tests - plus a number of new ones 
I've added - now come out clean under both.

So far as I'm aware, this deals with the outstanding issues in utf8-c8. The 
tests are unfudged, though I moved them to S32-str/utf8-c8.t to make fudging of 
the added test cases easier (we fudge the whole file for JVM at present).

/jnthn

Reply via email to