On Fri, Mar 2, 2012 at 2:03 PM, Eric Brine <ikeg...@adaelis.com> wrote:

> On Fri, Mar 2, 2012 at 9:11 AM, Craig A. Berry <craigbe...@mac.com> wrote:
>
 I was thinking of a situation where something external to Perl limits how
>> much data you can get in one read and thus gives you less than the full
>> amount requested by $/.
>>
>
> That's exactly the situation I described. Here, let me provide the strace
> output.
>
> $ strace perl -e'$/=\40; <>;' < /dev/random
> ...
> read(0, "\5|\200\"\360T0*\325\223\276\322\20S\244\16\341", 8192) = 17
> read(0, "\370\356 \2652\236\27>", 8192) = 8
> read(0, "\0\270\ve\332\223\225\312", 8192) = 8
> read(0, "\316\366\272\311\215.\204\361", 8192) = 8
> ...
>
>
>>  I'm pretty sure you'll get mangled UTF-8 if you happen to be
>> mid-character when you hit the end of the device buffer.
>
>
> No, because Perl will just ask for more. You'll get mangled UTF-8 if you
> happen to request a number of bytes that ends you mid-character (which is
> what this ticket is about).
>
> (If we were talking about sysread instead of readline or read, then yes,
> it could happen then. Unlike read and readline, sysread returns as soon as
> bytes are available.)
>

And here's an example where one character is read using two reads:

$ perl -C -e'print "a"x8191, chr(0x2660)' > x

$ ls -l x
-rw------- 1 ikegami group 8194 Mar  2 23:26 x

$ perl -le'use open ":std", ":utf8"; $/=\8194; $_=<>; print $_ eq
("a"x8191).chr(0x2660) ?1:0;' < x
1

strace:

read(0, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 8192) = 8192
read(0, "\231\240", 8192)               = 2

Reply via email to