On Fri, 25 Aug 2017 18:56:37 -0700, alex.jakime...@gmail.com wrote:
> The input file for this problem is ≈15 MB so please bear with external
> link:
> https://files.progarm.org/golfed.gz (1.6 MB compressed)
> 
> Command:
> perl6 -ne 'say $++' golfed
> # or
> perl6 -ne 'say $++' < golfed
> 
> Result:
> 0
> 1
> 2
> … … …
> 257568
> 257569
> 257570
> Malformed UTF-8
>   in block <unit> at -e line 1
> 
> 
> There's no malformed UTF-8 in the file. And if you don't believe me,
> try this:
> 
> cat golfed | perl6 -ne 'say $++'
> 
> There are at least three possible outcomes (it is not as stable as
> previous examples):
> (*) Fails after 257570, just like in the previous example
> (*) Fails after 121712
> (*) No error, goes through the whole file just fine
> 
> 
> <geekosaur> sounds more likely to be I/O related than unicode related
> <geekosaur> like it's dropping bytes on the floor and if the utf8
> decoder was in (or lands in) the middle of a sequence, boom
> 
> 
> IRC log: https://irclog.perlgeek.de/perl6-dev/2017-08-26#i_15071860
> 
> This issue may be related:
> https://gist.github.com/coke/3feef738886b1e5af79a1ca636146075

Was actually the decoder itself dropping the bytes in a fast path -> slow path 
transition. Fixed, and test added in S32-io/io-handle.t.

Reply via email to