On Fri, Mar 2, 2012 at 2:03 PM, Eric Brine <ikeg...@adaelis.com> wrote:
> On Fri, Mar 2, 2012 at 9:11 AM, Craig A. Berry <craigbe...@mac.com> wrote: > I was thinking of a situation where something external to Perl limits how >> much data you can get in one read and thus gives you less than the full >> amount requested by $/. >> > > That's exactly the situation I described. Here, let me provide the strace > output. > > $ strace perl -e'$/=\40; <>;' < /dev/random > ... > read(0, "\5|\200\"\360T0*\325\223\276\322\20S\244\16\341", 8192) = 17 > read(0, "\370\356 \2652\236\27>", 8192) = 8 > read(0, "\0\270\ve\332\223\225\312", 8192) = 8 > read(0, "\316\366\272\311\215.\204\361", 8192) = 8 > ... > > >> I'm pretty sure you'll get mangled UTF-8 if you happen to be >> mid-character when you hit the end of the device buffer. > > > No, because Perl will just ask for more. You'll get mangled UTF-8 if you > happen to request a number of bytes that ends you mid-character (which is > what this ticket is about). > > (If we were talking about sysread instead of readline or read, then yes, > it could happen then. Unlike read and readline, sysread returns as soon as > bytes are available.) > And here's an example where one character is read using two reads: $ perl -C -e'print "a"x8191, chr(0x2660)' > x $ ls -l x -rw------- 1 ikegami group 8194 Mar 2 23:26 x $ perl -le'use open ":std", ":utf8"; $/=\8194; $_=<>; print $_ eq ("a"x8191).chr(0x2660) ?1:0;' < x 1 strace: read(0, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 8192) = 8192 read(0, "\231\240", 8192) = 2