On Mar 1, 2012, at 12:30 PM, Nicholas Clark wrote: > On Thu, Mar 01, 2012 at 10:13:15AM -0800, Karl Williamson via RT wrote: >> I can't find my proposal in the record of this ticket, nor anyone >> responding to it. The documentation says that $/ gives the *maximum* >> record size. So why not return as many whole characters as will fit in >> $/ bytes?
I think that would require making :utf8 into its own layer with its own buffer, which has been discussed over in [perl #100058]. > Specifically, the code is emulated on "everything else", but intended to > do something real and useful on VMS: > > #ifdef VMS > /* VMS wants read instead of fread, because fread doesn't respect */ > /* RMS record boundaries. This is not necessarily a good thing to be */ > /* doing, but we've got no other real choice - except avoid stdio > as implementation - perhaps write a :vms layer ? > */ > fd = PerlIO_fileno(fp); > if (fd != -1) { > bytesread = PerlLIO_read(fd, buffer, recsize); > } > else /* in-memory file from PerlIO::Scalar */ > #endif I don't think this code is as meaningful as it used to be since unix I/O is the bottom layer for PerlIO now. Which means that PerlLIO_read and PerlIO_read (differing only by the "L") are really the same thing, i.e., both boil down to read(). I guess we can't simplify this code until and unless using stdio as the bottom layer is truly deprecated and expunged. > perlvar.pod says: > > On VMS, record reads are done with the equivalent of C<sysread>, > so it's best not to mix record and non-record reads on the same > file. (This is unlikely to be a problem, because any file you'd > want to read in record mode is probably unusable in line mode.) > Non-VMS systems do normal I/O, so it's safe to mix record and > non-record reads of a file. > >> I think we need to do something on this for 5.16. At the minimum, we >> could emit a warning when a variable length encoded file is opened under >> a fixed-length $/. >> >> If even that isn't acceptable, we could add this to the >> intend-to-deprecate section in perldelta. > > So I'd like to know, if a programmer on VMS sets $/ to read records, but on > a file handle marked with :utf8, what do they want? > > (and if the answer is "their head examining", that's actually useful, as it > means that the least insane thing to implement is what we get) Yes, it's pretty daft to expect whole, varying-width characters to stay whole when you can only get a fixed-width chunk at a time and the chunks are measured in bytes. So far the only difference for VMS that I've thought of derives from this note in the CRTL help entry on read(): The read function does not span record boundaries in a record file and, therefore, reads at most one record. A separate read must be done for each record. So that means that if you set $/ to N on a record-oriented file and N is larger than the record size, you won't get as much as you asked for and you may chop varying-width characters in pieces around the record boundaries. Trying to overload the meaning of $/ so that N means number of characters rather than number of bytes obviously could not make it give you more bytes than the record holds. While it might be less of a corner case and more of a mainstream thing to do on VMS, I can't think of any way that this is substantively different from what would happen on any OS when reading through a pipe or a socket or a PerlIO layer or /dev/mumble that has a fixed-sized buffer measured in bytes. What happens on Unix when you have a pipe buffer that is 8192 bytes and you set $/ to 8193 and read a record containing UTF-8 data through the pipe? ________________________________________ Craig A. Berry mailto:craigbe...@mac.com "... getting out of a sonnet is much more difficult than getting in." Brad Leithauser