On Mar 1, 2012, at 12:30 PM, Nicholas Clark wrote:
> On Thu, Mar 01, 2012 at 10:13:15AM -0800, Karl Williamson via RT wrote:
>> I can't find my proposal in the record of this ticket, nor anyone
>> responding to it. The documentation says that $/ gives the *maximum*
>> record size. So why not return as many whole characters as will fit in
>> $/ bytes?
I think that would require making :utf8 into its own layer with its own buffer,
which has been discussed over in [perl #100058].
> Specifically, the code is emulated on "everything else", but intended to
> do something real and useful on VMS:
>
> #ifdef VMS
> /* VMS wants read instead of fread, because fread doesn't respect */
> /* RMS record boundaries. This is not necessarily a good thing to be */
> /* doing, but we've got no other real choice - except avoid stdio
> as implementation - perhaps write a :vms layer ?
> */
> fd = PerlIO_fileno(fp);
> if (fd != -1) {
> bytesread = PerlLIO_read(fd, buffer, recsize);
> }
> else /* in-memory file from PerlIO::Scalar */
> #endif
I don't think this code is as meaningful as it used to be since unix I/O is the
bottom layer for PerlIO now. Which means that PerlLIO_read and PerlIO_read
(differing only by the "L") are really the same thing, i.e., both boil down to
read(). I guess we can't simplify this code until and unless using stdio as
the bottom layer is truly deprecated and expunged.
> perlvar.pod says:
>
> On VMS, record reads are done with the equivalent of C<sysread>,
> so it's best not to mix record and non-record reads on the same
> file. (This is unlikely to be a problem, because any file you'd
> want to read in record mode is probably unusable in line mode.)
> Non-VMS systems do normal I/O, so it's safe to mix record and
> non-record reads of a file.
>
>> I think we need to do something on this for 5.16. At the minimum, we
>> could emit a warning when a variable length encoded file is opened under
>> a fixed-length $/.
>>
>> If even that isn't acceptable, we could add this to the
>> intend-to-deprecate section in perldelta.
>
> So I'd like to know, if a programmer on VMS sets $/ to read records, but on
> a file handle marked with :utf8, what do they want?
>
> (and if the answer is "their head examining", that's actually useful, as it
> means that the least insane thing to implement is what we get)
Yes, it's pretty daft to expect whole, varying-width characters to stay whole
when you can only get a fixed-width chunk at a time and the chunks are measured
in bytes. So far the only difference for VMS that I've thought of derives from
this note in the CRTL help entry on read():
The read function does not span record boundaries in a
record file and, therefore, reads at most one record. A
separate read must be done for each record.
So that means that if you set $/ to N on a record-oriented file and N is larger
than the record size, you won't get as much as you asked for and you may chop
varying-width characters in pieces around the record boundaries. Trying to
overload the meaning of $/ so that N means number of characters rather than
number of bytes obviously could not make it give you more bytes than the record
holds.
While it might be less of a corner case and more of a mainstream thing to do on
VMS, I can't think of any way that this is substantively different from what
would happen on any OS when reading through a pipe or a socket or a PerlIO
layer or /dev/mumble that has a fixed-sized buffer measured in bytes. What
happens on Unix when you have a pipe buffer that is 8192 bytes and you set $/
to 8193 and read a record containing UTF-8 data through the pipe?
________________________________________
Craig A. Berry
mailto:[email protected]
"... getting out of a sonnet is much more
difficult than getting in."
Brad Leithauser