On Thu, Mar 01, 2012 at 05:17:27PM -0600, Craig A. Berry wrote:
>
> On Mar 1, 2012, at 12:30 PM, Nicholas Clark wrote:
>
> > On Thu, Mar 01, 2012 at 10:13:15AM -0800, Karl Williamson via RT wrote:
> >> I can't find my proposal in the record of this ticket, nor anyone
> >> responding to it. The documentation says that $/ gives the *maximum*
> >> record size. So why not return as many whole characters as will fit in
> >> $/ bytes?
>
> I think that would require making :utf8 into its own layer with its own
> buffer, which has been discussed over in [perl #100058].
>
> > Specifically, the code is emulated on "everything else", but intended to
> > do something real and useful on VMS:
> >
> > #ifdef VMS
> > /* VMS wants read instead of fread, because fread doesn't respect */
> > /* RMS record boundaries. This is not necessarily a good thing to be */
> > /* doing, but we've got no other real choice - except avoid stdio
> > as implementation - perhaps write a :vms layer ?
> > */
> > fd = PerlIO_fileno(fp);
> > if (fd != -1) {
> > bytesread = PerlLIO_read(fd, buffer, recsize);
> > }
> > else /* in-memory file from PerlIO::Scalar */
> > #endif
>
> I don't think this code is as meaningful as it used to be since unix I/O is
> the bottom layer for PerlIO now. Which means that PerlLIO_read and
> PerlIO_read (differing only by the "L") are really the same thing, i.e.,
> both boil down to read(). I guess we can't simplify this code until and
> unless using stdio as the bottom layer is truly deprecated and expunged.
I don't think you're correct on that one. read() is not stdio. It's (at least
on Unix) a syscall. fread() is stdio, and loops on read() until it gets enough
octets. So the code for VMS (if I'm following it correctly) is still grabbing
a single record.
On Fri, Mar 02, 2012 at 08:11:10AM -0600, Craig A. Berry wrote:
>
> On Mar 2, 2012, at 3:07 AM, Eric Brine wrote:
>
> > On Thu, Mar 1, 2012 at 6:17 PM, Craig A. Berry <[email protected]> wrote:
> > What happens on Unix when you have a pipe buffer that is 8192 bytes and you
> > set $/ to 8193 and read a record containing UTF-8 data through the pipe?
You mean set $/ to \8193
> > Perl requests 8K (formerly 4K) chunks until it has received enough. It
> > requests 8K even if it only needs 1 byte.
>
> I think you're thinking of the PerlIO buffer that I increased from 4K to the
> larger of 8K and BUFSIZ in 5.14, and which only applies to the perlio layer.
> But S_sv_gets_read_record calls PerlIO_read, which just retrieves the base
> layer (formerly stdio, currently unix) and calls its Read method, which is
> just read(). So there is no buffering under Perl's control.
>
> I was thinking of a situation where something external to Perl limits how
> much data you can get in one read and thus gives you less than the full
> amount requested by $/. I'm pretty sure you'll get mangled UTF-8 if you
> happen to be mid-character when you hit the end of the device buffer. To
> test this, you'd need to know something about the internals of your system's
> pipe implementation (or other device with a fixed buffer).
I don't think that discussing this in terms of what non-VMS does with $/ set
to a reference to an integer is necessarily that useful. I think it's really
only been added as "this feature can't be VMS only" (all added in commit
http://perl5.git.perl.org/perl.git/commitdiff/5b2b9c687790241e85aa7b76
)
In that, the whole bug report is about "what *should* this do?" because what
it currently does is badly broken.
The reason I'm specifically asking "what does a VMS programmer *want*?" is
because the fixed size records feature was put in for VMS, with non-VMS an
afterthought. So
1) is there a sane VMS native interpretation of "UTF-8 coming from a fixed
record file" ?
and only when that's answered is there
2) what do we fake on other platforms?
[and I think it's also premature to consider whether this needs :utf8 as a
real layer to implement. I'd like to get a feeling for what the Perl space
behaviour, if any, should be]
The possibly useful analogy is "what happens with a :utf8 layer on sysread?"
which is, well, summed up with:
goto more_bytes;
ie - it's actually a different behaviour. It makes multiple syscalls. Blech.
[and, thinking about it now, about 14 years later, possibly that non-VMS
code in sv_gets() should have been using read(), not fread(), so that it
would be useful on a datagram socket. But that's a bit late to fix]
Nicholas Clark