On Thu, Mar 01, 2012 at 05:17:27PM -0600, Craig A. Berry wrote:
> 
> On Mar 1, 2012, at 12:30 PM, Nicholas Clark wrote:
> 
> > On Thu, Mar 01, 2012 at 10:13:15AM -0800, Karl Williamson via RT wrote:
> >> I can't find my proposal in the record of this ticket, nor anyone
> >> responding to it.  The documentation says that $/ gives the *maximum*
> >> record size.  So why not return as many whole characters as will fit in
> >> $/ bytes?
> 
> I think that would require making :utf8 into its own layer with its own 
> buffer, which has been discussed over in [perl #100058].  
> 
> > Specifically, the code is emulated on "everything else", but intended to
> > do something real and useful on VMS:
> > 
> > #ifdef VMS
> >    /* VMS wants read instead of fread, because fread doesn't respect */
> >    /* RMS record boundaries. This is not necessarily a good thing to be */
> >    /* doing, but we've got no other real choice - except avoid stdio
> >       as implementation - perhaps write a :vms layer ?
> >    */
> >    fd = PerlIO_fileno(fp);
> >    if (fd != -1) {
> >     bytesread = PerlLIO_read(fd, buffer, recsize);
> >    }
> >    else /* in-memory file from PerlIO::Scalar */
> > #endif
> 
> I don't think this code is as meaningful as it used to be since unix I/O is 
> the bottom layer for PerlIO now.  Which means that PerlLIO_read and 
> PerlIO_read (differing only by the "L") are really the same thing, i.e.,  
> both boil down to read().  I guess we can't simplify this code until and 
> unless using stdio as the bottom layer is truly deprecated and expunged.

I don't think you're correct on that one. read() is not stdio. It's (at least
on Unix) a syscall. fread() is stdio, and loops on read() until it gets enough
octets. So the code for VMS (if I'm following it correctly) is still grabbing
a single record.


On Fri, Mar 02, 2012 at 08:11:10AM -0600, Craig A. Berry wrote:
> 
> On Mar 2, 2012, at 3:07 AM, Eric Brine wrote:
> 
> > On Thu, Mar 1, 2012 at 6:17 PM, Craig A. Berry <craigbe...@mac.com> wrote:
> > What happens on Unix when you have a pipe buffer that is 8192 bytes and you 
> > set $/ to 8193 and read a record containing UTF-8 data through the pipe?

You mean set $/ to \8193

> > Perl requests 8K (formerly 4K) chunks until it has received enough. It 
> > requests 8K even if it only needs 1 byte.
> 
> I think you're thinking of the PerlIO buffer that I increased from 4K to the 
> larger of 8K and BUFSIZ in 5.14, and which only applies to the perlio layer.  
> But S_sv_gets_read_record calls PerlIO_read, which just retrieves the base 
> layer (formerly stdio, currently unix) and calls its Read method, which is 
> just read().  So there is no buffering under Perl's control.
> 
> I was thinking of a situation where something external to Perl limits how 
> much data you can get in one read and thus gives you less than the full 
> amount requested by $/.  I'm pretty sure you'll get mangled UTF-8 if you 
> happen to be mid-character when you hit the end of the device buffer.  To 
> test this, you'd need to know something about the internals of your system's 
> pipe implementation (or other device with a fixed buffer).

I don't think that discussing this in terms of what non-VMS does with $/ set
to a reference to an integer is necessarily that useful. I think it's really
only been added as "this feature can't be VMS only" (all added in commit

http://perl5.git.perl.org/perl.git/commitdiff/5b2b9c687790241e85aa7b76

)

In that, the whole bug report is about "what *should* this do?" because what
it currently does is badly broken.

The reason I'm specifically asking "what does a VMS programmer *want*?" is
because the fixed size records feature was put in for VMS, with non-VMS an
afterthought. So

1) is there a sane VMS native interpretation of "UTF-8 coming from a fixed
   record file" ?

and only when that's answered is there

2) what do we fake on other platforms?

[and I think it's also premature to consider whether this needs :utf8 as a
real layer to implement. I'd like to get a feeling for what the Perl space
behaviour, if any, should be]


The possibly useful analogy is "what happens with a :utf8 layer on sysread?"
which is, well, summed up with:

            goto more_bytes;

ie - it's actually a different behaviour. It makes multiple syscalls. Blech.

[and, thinking about it now, about 14 years later, possibly that non-VMS
code in sv_gets() should have been using read(), not fread(), so that it
would be useful on a datagram socket. But that's a bit late to fix]

Nicholas Clark

Reply via email to