Hi Andrej,
Andrey Chernov wrote on Wed, Aug 24, 2016 at 10:19:35PM +0300:
> On 24.08.2016 22:03, Ingo Schwarze wrote:
>> Andrey Chernov wrote:
>>> Could you show some code? In my testing fgetwln() fails on next read if
>>> previously there was partial line with tail EILSEQ. Stdio not advance
>>> its pointer over the sequence with EILSEQ.
>> See below for a radically stripped down version of FreeBSD rev(1).
>> When i revert my fgetwln(3) patch (as you did in FreeBSD) and compile
>> and run that stripped down rev(1) on OpenBSD, i get this:
>>
>> $ export LC_CTYPE=en_US.UTF-8
>> $ printf "one\200two\200three" | ./frev
>> eno
>> owt
>> eerht
>> frev: Illegal byte sequence
>>
>> Is there maybe yet another bug, maybe somewhere in OpenBSD fgetwc(3),
>> advancing a pointer where it shouldn't? What result do you see
>> when you run that test program on FreeBSD?
> Even on FreeBSD stable/10 I got different (i.e. correct) results:
>
> $ export LC_CTYPE=en_US.UTF-8
> $ printf "one\200two\200three" | ./frev
> eno
> frev: Illegal byte sequence
>
> It stops on the first \200 as it should.
I investigated and got the following result.
The function fgetwln(3) is implemented in terms of fgetwc(3).
Regarding fgetwc(3), POSIX says:
If an error occurs, the resulting value of the file position
indicator for the stream is unspecified.
And indeed, FreeBSD leaves the file position indicator unchanged
on failure, while OpenBSD advances it to the byte after the last
one that must be read to be able to detect the failure.
Relying on the FreeBSD fgetwc(3) behaviour for the fgetwln(3)
implementation inside the FreeBSD libc seems possible on first
sight, even though it means that the FreeBSD implementation of
fgetwln(3) is not portable - as i found when trying to run it on
OpenBSD. Actually, "unspecified" is much worse than "implementation
defined", so strictly speaking, relying on the fgetwc(3) behaviour
is not even safe on FreeBSD, because theoretically, the C compiler
is free to optimize away a call to fgetwc(3) and destroy the file
position pointer if it can somehow determine that the call will
fail, or to just destroy the file position pointer during fgetwc(3)
failure.
If you really want to specify fgetwln(3) to set the file position
pointer to a well-defined position on encoding errors - currently,
nothing of that kind is documented - it would mean that you would
have to stop using fgetwc(3) in the fgetwln(3) implementation and
instead inspect the libc internal buffers directly. That doesn't
seem reasonable to me.
But above all, i think it's a bad idea to have diverging requirements
for a non-standard high-level function like fgetwln(3) with respect
to the similar low-level standard function, here fgetwc(3). So
given that fgetwc(3) is allowed to destroy the file position pointer
on failure, fgetwln(3) should be allowed to do that, too. And given
that POSIX requires that fgetwc(3) must not change errno(2) when
successful, fgetwln(3) should satisfy the same restriction, which
means that it cannot return partial strings for two reasons: Both
the file position indicator and errno are already destroyed at the
point where the partial string could be returned.
I strongly feel that fgetln(3) ought to behave the same: either
succeed or fail. It should not return a string but set errno and
__SERR at the same time. So i'd very much like to commit my fgetln.c
patch. I have an OK from millert@, which is sufficient for commit
in OpenBSD. Do you still object, given the above results and
arguments?
To me, historic behaviour that nobody is likely to rely on is not
a strong argument. We fix bugs in historic code all the time, and
even apply functional improvements where they make things better.
Yours,
Ingo
Index: fgetln.c
===================================================================
RCS file: /cvs/src/lib/libc/stdio/fgetln.c,v
retrieving revision 1.14
diff -u -r1.14 fgetln.c
--- fgetln.c 31 Aug 2015 02:53:57 -0000 1.14
+++ fgetln.c 25 Aug 2016 14:45:08 -0000
@@ -115,8 +115,11 @@
(void)memcpy((void *)(fp->_lb._base + off), (void *)fp->_p,
len - off);
off = len;
- if (__srefill(fp))
- break; /* EOF or error: return partial line */
+ if (__srefill(fp)) {
+ if (fp->_flags & __SEOF)
+ break;
+ goto error;
+ }
if ((p = memchr((void *)fp->_p, '\n', fp->_r)) == NULL)
continue;