On Thu, Mar 01, 2012 at 05:17:27PM -0600, Craig A. Berry wrote: > > On Mar 1, 2012, at 12:30 PM, Nicholas Clark wrote: > > > On Thu, Mar 01, 2012 at 10:13:15AM -0800, Karl Williamson via RT wrote: > >> I can't find my proposal in the record of this ticket, nor anyone > >> responding to it. The documentation says that $/ gives the *maximum* > >> record size. So why not return as many whole characters as will fit in > >> $/ bytes? > > I think that would require making :utf8 into its own layer with its own > buffer, which has been discussed over in [perl #100058]. > > > Specifically, the code is emulated on "everything else", but intended to > > do something real and useful on VMS: > > > > #ifdef VMS > > /* VMS wants read instead of fread, because fread doesn't respect */ > > /* RMS record boundaries. This is not necessarily a good thing to be */ > > /* doing, but we've got no other real choice - except avoid stdio > > as implementation - perhaps write a :vms layer ? > > */ > > fd = PerlIO_fileno(fp); > > if (fd != -1) { > > bytesread = PerlLIO_read(fd, buffer, recsize); > > } > > else /* in-memory file from PerlIO::Scalar */ > > #endif > > I don't think this code is as meaningful as it used to be since unix I/O is > the bottom layer for PerlIO now. Which means that PerlLIO_read and > PerlIO_read (differing only by the "L") are really the same thing, i.e., > both boil down to read(). I guess we can't simplify this code until and > unless using stdio as the bottom layer is truly deprecated and expunged.
I don't think you're correct on that one. read() is not stdio. It's (at least on Unix) a syscall. fread() is stdio, and loops on read() until it gets enough octets. So the code for VMS (if I'm following it correctly) is still grabbing a single record. On Fri, Mar 02, 2012 at 08:11:10AM -0600, Craig A. Berry wrote: > > On Mar 2, 2012, at 3:07 AM, Eric Brine wrote: > > > On Thu, Mar 1, 2012 at 6:17 PM, Craig A. Berry <craigbe...@mac.com> wrote: > > What happens on Unix when you have a pipe buffer that is 8192 bytes and you > > set $/ to 8193 and read a record containing UTF-8 data through the pipe? You mean set $/ to \8193 > > Perl requests 8K (formerly 4K) chunks until it has received enough. It > > requests 8K even if it only needs 1 byte. > > I think you're thinking of the PerlIO buffer that I increased from 4K to the > larger of 8K and BUFSIZ in 5.14, and which only applies to the perlio layer. > But S_sv_gets_read_record calls PerlIO_read, which just retrieves the base > layer (formerly stdio, currently unix) and calls its Read method, which is > just read(). So there is no buffering under Perl's control. > > I was thinking of a situation where something external to Perl limits how > much data you can get in one read and thus gives you less than the full > amount requested by $/. I'm pretty sure you'll get mangled UTF-8 if you > happen to be mid-character when you hit the end of the device buffer. To > test this, you'd need to know something about the internals of your system's > pipe implementation (or other device with a fixed buffer). I don't think that discussing this in terms of what non-VMS does with $/ set to a reference to an integer is necessarily that useful. I think it's really only been added as "this feature can't be VMS only" (all added in commit http://perl5.git.perl.org/perl.git/commitdiff/5b2b9c687790241e85aa7b76 ) In that, the whole bug report is about "what *should* this do?" because what it currently does is badly broken. The reason I'm specifically asking "what does a VMS programmer *want*?" is because the fixed size records feature was put in for VMS, with non-VMS an afterthought. So 1) is there a sane VMS native interpretation of "UTF-8 coming from a fixed record file" ? and only when that's answered is there 2) what do we fake on other platforms? [and I think it's also premature to consider whether this needs :utf8 as a real layer to implement. I'd like to get a feeling for what the Perl space behaviour, if any, should be] The possibly useful analogy is "what happens with a :utf8 layer on sysread?" which is, well, summed up with: goto more_bytes; ie - it's actually a different behaviour. It makes multiple syscalls. Blech. [and, thinking about it now, about 14 years later, possibly that non-VMS code in sv_gets() should have been using read(), not fread(), so that it would be useful on a datagram socket. But that's a bit late to fix] Nicholas Clark