On Mar 3, 2012, at 1:29 AM, Eric Brine wrote:

> On Fri, Mar 2, 2012 at 2:03 PM, Eric Brine <ikeg...@adaelis.com> wrote:
> On Fri, Mar 2, 2012 at 9:11 AM, Craig A. Berry <craigbe...@mac.com> wrote:
> I was thinking of a situation where something external to Perl limits how 
> much data you can get in one read and thus gives you less than the full 
> amount requested by $/.
> 
> That's exactly the situation I described. Here, let me provide the strace 
> output.
> 
> $ strace perl -e'$/=\40; <>;' < /dev/random
> ...
> read(0, "\5|\200\"\360T0*\325\223\276\322\20S\244\16\341", 8192) = 17
> read(0, "\370\356 \2652\236\27>", 8192) = 8
> read(0, "\0\270\ve\332\223\225\312", 8192) = 8
> read(0, "\316\366\272\311\215.\204\361", 8192) = 8
> ...
>  
>  I'm pretty sure you'll get mangled UTF-8 if you happen to be mid-character 
> when you hit the end of the device buffer.
> 
> No, because Perl will just ask for more. You'll get mangled UTF-8 if you 
> happen to request a number of bytes that ends you mid-character (which is 
> what this ticket is about).
> 
> (If we were talking about sysread instead of readline or read, then yes, it 
> could happen then. Unlike read and readline, sysread returns as soon as bytes 
> are available.)
> 
> And here's an example where one character is read using two reads:
> 
> $ perl -C -e'print "a"x8191, chr(0x2660)' > x
> 
> $ ls -l x
> -rw------- 1 ikegami group 8194 Mar  2 23:26 x
> 
> $ perl -le'use open ":std", ":utf8"; $/=\8194; $_=<>; print $_ eq 
> ("a"x8191).chr(0x2660) ?1:0;' < x
> 1
> 
> strace:
> 
> read(0, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 8192) = 8192
> read(0, "\231\240", 8192)               = 2
> 

Thanks for clarifying my muddy thinking, Eric.  I was neglecting the effects of 
the buffering layer because it's not used for record  mode on VMS and I had 
erroneously convinced myself that it's not used elsewhere either, but it is.  
As long as the perlio buffer is larger than the requested record size, it looks 
like it will insulate you from anything external to Perl giving you less than 
the requested size.  

So does your second example demonstrate that if you request something larger 
than the perlio buffer, then you can get caught mid-character on buffer 
boundaries as well as record boundaries?  And does that first 8192-byte chunk 
get loaded into an SV that is then invalid if its UTF-8 flag is on?

________________________________________
Craig A. Berry
mailto:craigbe...@mac.com

"... getting out of a sonnet is much more
 difficult than getting in."
                 Brad Leithauser

Reply via email to