On Fri, 25 Aug 2017 18:56:37 -0700, alex.jakime...@gmail.com wrote: > The input file for this problem is ≈15 MB so please bear with external > link: > https://files.progarm.org/golfed.gz (1.6 MB compressed) > > Command: > perl6 -ne 'say $++' golfed > # or > perl6 -ne 'say $++' < golfed > > Result: > 0 > 1 > 2 > … … … > 257568 > 257569 > 257570 > Malformed UTF-8 > in block <unit> at -e line 1 > > > There's no malformed UTF-8 in the file. And if you don't believe me, > try this: > > cat golfed | perl6 -ne 'say $++' > > There are at least three possible outcomes (it is not as stable as > previous examples): > (*) Fails after 257570, just like in the previous example > (*) Fails after 121712 > (*) No error, goes through the whole file just fine > > > <geekosaur> sounds more likely to be I/O related than unicode related > <geekosaur> like it's dropping bytes on the floor and if the utf8 > decoder was in (or lands in) the middle of a sequence, boom > > > IRC log: https://irclog.perlgeek.de/perl6-dev/2017-08-26#i_15071860 > > This issue may be related: > https://gist.github.com/coke/3feef738886b1e5af79a1ca636146075
Was actually the decoder itself dropping the bytes in a fast path -> slow path transition. Fixed, and test added in S32-io/io-handle.t.