On Mar 3, 2012, at 1:29 AM, Eric Brine wrote: > On Fri, Mar 2, 2012 at 2:03 PM, Eric Brine <ikeg...@adaelis.com> wrote: > On Fri, Mar 2, 2012 at 9:11 AM, Craig A. Berry <craigbe...@mac.com> wrote: > I was thinking of a situation where something external to Perl limits how > much data you can get in one read and thus gives you less than the full > amount requested by $/. > > That's exactly the situation I described. Here, let me provide the strace > output. > > $ strace perl -e'$/=\40; <>;' < /dev/random > ... > read(0, "\5|\200\"\360T0*\325\223\276\322\20S\244\16\341", 8192) = 17 > read(0, "\370\356 \2652\236\27>", 8192) = 8 > read(0, "\0\270\ve\332\223\225\312", 8192) = 8 > read(0, "\316\366\272\311\215.\204\361", 8192) = 8 > ... > > I'm pretty sure you'll get mangled UTF-8 if you happen to be mid-character > when you hit the end of the device buffer. > > No, because Perl will just ask for more. You'll get mangled UTF-8 if you > happen to request a number of bytes that ends you mid-character (which is > what this ticket is about). > > (If we were talking about sysread instead of readline or read, then yes, it > could happen then. Unlike read and readline, sysread returns as soon as bytes > are available.) > > And here's an example where one character is read using two reads: > > $ perl -C -e'print "a"x8191, chr(0x2660)' > x > > $ ls -l x > -rw------- 1 ikegami group 8194 Mar 2 23:26 x > > $ perl -le'use open ":std", ":utf8"; $/=\8194; $_=<>; print $_ eq > ("a"x8191).chr(0x2660) ?1:0;' < x > 1 > > strace: > > read(0, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 8192) = 8192 > read(0, "\231\240", 8192) = 2 >
Thanks for clarifying my muddy thinking, Eric. I was neglecting the effects of the buffering layer because it's not used for record mode on VMS and I had erroneously convinced myself that it's not used elsewhere either, but it is. As long as the perlio buffer is larger than the requested record size, it looks like it will insulate you from anything external to Perl giving you less than the requested size. So does your second example demonstrate that if you request something larger than the perlio buffer, then you can get caught mid-character on buffer boundaries as well as record boundaries? And does that first 8192-byte chunk get loaded into an SV that is then invalid if its UTF-8 flag is on? ________________________________________ Craig A. Berry mailto:craigbe...@mac.com "... getting out of a sonnet is much more difficult than getting in." Brad Leithauser