On Mar 1, 2012, at 12:30 PM, Nicholas Clark wrote:

> On Thu, Mar 01, 2012 at 10:13:15AM -0800, Karl Williamson via RT wrote:
>> I can't find my proposal in the record of this ticket, nor anyone
>> responding to it.  The documentation says that $/ gives the *maximum*
>> record size.  So why not return as many whole characters as will fit in
>> $/ bytes?

I think that would require making :utf8 into its own layer with its own buffer, 
which has been discussed over in [perl #100058].  

> Specifically, the code is emulated on "everything else", but intended to
> do something real and useful on VMS:
> 
> #ifdef VMS
>    /* VMS wants read instead of fread, because fread doesn't respect */
>    /* RMS record boundaries. This is not necessarily a good thing to be */
>    /* doing, but we've got no other real choice - except avoid stdio
>       as implementation - perhaps write a :vms layer ?
>    */
>    fd = PerlIO_fileno(fp);
>    if (fd != -1) {
>       bytesread = PerlLIO_read(fd, buffer, recsize);
>    }
>    else /* in-memory file from PerlIO::Scalar */
> #endif

I don't think this code is as meaningful as it used to be since unix I/O is the 
bottom layer for PerlIO now.  Which means that PerlLIO_read and PerlIO_read 
(differing only by the "L") are really the same thing, i.e.,  both boil down to 
read().  I guess we can't simplify this code until and unless using stdio as 
the bottom layer is truly deprecated and expunged.

> perlvar.pod says:
> 
>    On VMS, record reads are done with the equivalent of C<sysread>,
>    so it's best not to mix record and non-record reads on the same
>    file.  (This is unlikely to be a problem, because any file you'd
>    want to read in record mode is probably unusable in line mode.)
>    Non-VMS systems do normal I/O, so it's safe to mix record and
>    non-record reads of a file.
> 
>> I think we need to do something on this for 5.16.  At the minimum, we
>> could emit a warning when a variable length encoded file is opened under
>> a fixed-length $/.
>> 
>> If even that isn't acceptable, we could add this to the
>> intend-to-deprecate section in perldelta.
> 
> So I'd like to know, if a programmer on VMS sets $/ to read records, but on
> a file handle marked with :utf8, what do they want?
> 
> (and if the answer is "their head examining", that's actually useful, as it
> means that the least insane thing to implement is what we get)

Yes, it's pretty daft to expect whole, varying-width characters to stay whole 
when you can only get a fixed-width chunk at a time and the chunks are measured 
in bytes.  So far the only difference for VMS that I've thought of derives from 
this note in the CRTL help entry on read():

        The read function does not span record boundaries in a
        record file and, therefore, reads at most one record. A
        separate read must be done for each record.

So that means that if you set $/ to N on a record-oriented file and N is larger 
than the record size, you won't get as much as you asked for and you may chop 
varying-width characters in pieces around the record boundaries.  Trying to 
overload the meaning of $/ so that N means number of characters rather than 
number of bytes obviously could not make it give you more bytes than the record 
holds.  

While it might be less of a corner case and more of a mainstream thing to do on 
VMS, I can't think of any way that this is substantively different from what 
would happen on any OS when reading through a pipe or a socket or a PerlIO 
layer or /dev/mumble that has a fixed-sized buffer measured in bytes.  What 
happens on Unix when you have a pipe buffer that is 8192 bytes and you set $/ 
to 8193 and read a record containing UTF-8 data through the pipe?

________________________________________
Craig A. Berry
mailto:craigbe...@mac.com

"... getting out of a sonnet is much more
 difficult than getting in."
                 Brad Leithauser

Reply via email to