Re: bug in fputwc(3) error reporting
Ingo Schwarze wrote: > Consequently, i propose to not revert our fgetwc(3) patch and to > commit this fputwc(3) patch, too, making us agree with FreeBSD, > NetBSD, Dragonfly, SunOS, glibc, and POSIX, even tough nominally > violating the C standard (but in a way that seems less dangerous > than the inverse violation of POSIX). when in doubt, do what the cool kids do. :)
Re: bug in fputwc(3) error reporting
On Sat, 23 Jan 2016 21:14:24 +0100, Ingo Schwarze wrote: > In conclusion, i would consider it a very bad idea to change all the > operating systems to not set the error indicator, violating POSIX > and making correct coding harder, just to please the not very nice > C standard, then possibly change them all a second time once the C > standard gets fixed. > > Consequently, i propose to not revert our fgetwc(3) patch and to > commit this fputwc(3) patch, too, making us agree with FreeBSD, > NetBSD, Dragonfly, SunOS, glibc, and POSIX, even tough nominally > violating the C standard (but in a way that seems less dangerous > than the inverse violation of POSIX). OK millert@. - todd
Re: bug in fputwc(3) error reporting
Hi, Ingo Schwarze wrote on Sun, Jan 10, 2016 at 10:22:28PM +0100: > Todd C. Miller wrote on Sun, Jan 10, 2016 at 12:33:07PM -0700: >> On Sun, 10 Jan 2016 19:55:53 +0100, Ingo Schwarze wrote: >>> So, my conclusion is that it's the C standard that is carelessly >>> worded, not POSIX. I don't think the C standard intends to say >>> that fgetwc(3) and fputwc(3) are not allowed to set the error >>> indicator on an encoding error, it just doesn't clearly say that >>> they are required to. POSIX then adds the additional requirement, >>> merely failing to make it clear that it's resolving a careless >>> wording in the C standard. >> POSIX marks its extensions to ISO C, but this behavior is not >> documented as an extension. I suggest you contant the Austin group >> for clarification. > Done: > > http://austingroupbugs.net/view.php?id=1022 > > Let's wait for feedback, then take that into consideration for our > decision. So, here is the result. The only answer that can be taken seriously is from Geoff Clare (of the Open Group). He says this: Geoff Clare wrote: > Ingo Schwarze wrote: :: -- :: (0003030) schwarze (reporter) - 2016-01-22 02:02 :: http://austingroupbugs.net/view.php?id=1022#c3030 :: -- :: Re: 0003027 :: :: Thanks for your feedback, geoffclare. To make really sure that i :: understand correctly what you are saying: Your intention is that POSIX :: should not be changed, that all the operating systems i listed should :: continue what they are already doing, and that the C standard should be :: amended to also require setting the error flag in case of an encoding :: error. Right? : I imagine that's what we would propose to the C commitee. Whether : they accept our recommendation is another matter. In conclusion, i would consider it a very bad idea to change all the operating systems to not set the error indicator, violating POSIX and making correct coding harder, just to please the not very nice C standard, then possibly change them all a second time once the C standard gets fixed. Consequently, i propose to not revert our fgetwc(3) patch and to commit this fputwc(3) patch, too, making us agree with FreeBSD, NetBSD, Dragonfly, SunOS, glibc, and POSIX, even tough nominally violating the C standard (but in a way that seems less dangerous than the inverse violation of POSIX). By the way, i'm running with this ever since and didn't notice any adverse effects. OK? Ingo Index: fputwc.c === RCS file: /cvs/src/lib/libc/stdio/fputwc.c,v retrieving revision 1.6 diff -u -p -r1.6 fputwc.c --- fputwc.c1 Oct 2015 02:32:07 - 1.6 +++ fputwc.c23 Jan 2016 20:00:59 - @@ -62,7 +62,7 @@ __fputwc_unlock(wchar_t wc, FILE *fp) size = wcrtomb(buf, wc, st); if (size == (size_t)-1) { - errno = EILSEQ; + fp->_flags |= __SERR; return WEOF; }
Re: bug in fputwc(3) error reporting
Hi Todd, Todd C. Miller wrote on Sun, Jan 10, 2016 at 12:33:07PM -0700: > On Sun, 10 Jan 2016 19:55:53 +0100, Ingo Schwarze wrote: >> So, my conclusion is that it's the C standard that is carelessly >> worded, not POSIX. I don't think the C standard intends to say >> that fgetwc(3) and fputwc(3) are not allowed to set the error >> indicator on an encoding error, it just doesn't clearly say that >> they are required to. POSIX then adds the additional requirement, >> merely failing to make it clear that it's resolving a careless >> wording in the C standard. > POSIX marks its extensions to ISO C, but this behavior is not > documented as an extension. I suggest you contant the Austin group > for clarification. Done: http://austingroupbugs.net/view.php?id=1022 Let's wait for feedback, then take that into consideration for our decision. Yours, Ingo
Re: bug in fputwc(3) error reporting
On Sun, 10 Jan 2016 19:55:53 +0100, Ingo Schwarze wrote: > So, my conclusion is that it's the C standard that is carelessly > worded, not POSIX. I don't think the C standard intends to say > that fgetwc(3) and fputwc(3) are not allowed to set the error > indicator on an encoding error, it just doesn't clearly say that > they are required to. POSIX then adds the additional requirement, > merely failing to make it clear that it's resolving a careless > wording in the C standard. POSIX marks its extensions to ISO C, but this behavior is not documented as an extension. I suggest you contant the Austin group for clarification. - todd
Re: bug in fputwc(3) error reporting
Hi, Ted Unangst wrote on Wed, Dec 30, 2015 at 01:22:17AM -0500: > Philip Guenther wrote: >> On Tue, Dec 29, 2015 at 2:34 PM, Todd C. Miller wrote: >>> Since POSIX defers to ISO C we should be following the ISO C standard >>> with respect to behavior when an encoding error occurs. As such, >>> I've changed my mind and now believe we should not be setting the >>> error flag in those cases. >> That sounds like it should be reported to austin-group so they can fix >> the misleading wording. > What do solaris and glibc do? I'd expect the wording to be clarified > in the direction of existing practice. fgetwc(3): FreeBSD sets the error flag since Oct 16, 2002 (rev. 105234). NetBSD sets the error flag since July 3, 2006 (rev. 1.5), referencing SUSv3. Dragonfly sets the error flag since April 21, 2009, (e0f95098eeba0176864b9cafe6d69b5b7bc0e73f), sync from FreeBSD. fputwc(3): FreeBSD sets the error flag since Oct 16, 2002 (rev. 105234). NetBSD does NOT set the error flag. Dragonfly sets the error flag since Sep 29, 2013, (0d5acd7467c4e95f792ef49fceb3ab8e917ce86b), sync from FreeBSD. In don't understand the illumos code at all, and the glibc code looks even worse, but i tested on these systems, and they set the error flag for encoding errors in fgetwc(3): SunOS unstable11s 5.11 11.2 sun4u sparc SUNW,SPARC-Enterprise SunOS unstable10s 5.10 Generic_150400-17 sun4v sparc SUNW,SPARC-Enterprise-T5220 Linux donnerwolke.usta.de 3.16.0-4-686-pae #1 \ SMP Debian 3.16.7-ckt11-1+deb8u3 (2015-08-04) i686 GNU/Linux This one does not set the error flag: SunOS unstable9s 5.9 Generic_Virtual sun4u sparc SUNW,SPARC-Enterprise-T5220 Solaris and Linux tests of fputwc(3) are inconclusive, i don't understand how to set up invalid wchar_t values on Solaris and Linux. So, my conclusion is that it's the C standard that is carelessly worded, not POSIX. I don't think the C standard intends to say that fgetwc(3) and fputwc(3) are not allowed to set the error indicator on an encoding error, it just doesn't clearly say that they are required to. POSIX then adds the additional requirement, merely failing to make it clear that it's resolving a careless wording in the C standard. The interpretation that setting the error indicator is deliberately forbidden is not reasonable because that would make detection of encoding errors fragile, complicated, and error-prone. Assuming the POSIX specification, fgetwc(3) is easy and safe to use: if (fgetwc(...) == WEOF) { /* EOF or error */ if (ferror(...)) /* error */ inspect errno for the reason else /* EOF */ ... } Assuming setting the error indicator is forbidden, there is one way of using the interface that is counter-intutive and fragile: if (fgetwc(...) == WEOF) { /* EOF or error */ if (feof(...))/* EOF */ ... else /* error, no matter what ferror(...) */ inspect errno for the reason } There is another way of using the error indicator that is safe, but so contorted that we should not recommend it: int save_errno = errno; errno = 0; if (fgetwc(...) == WEOF) { /* EOF or error */ if (errno == EILSEQ) /* encoding error */ ... else if (ferror(...)) /* read error */ inspect errno for the reason else /* EOF */ ... } errno = save_errno I think we should NOT revert the patch i committed that sets the error indicator in fgetwc(3). It's reasonable and agrees with what everybody else does. I think we should also commit the following patch to fputwc(3), for the same reasons, and for consistency. OK? Ingo Index: fputwc.c === RCS file: /cvs/src/lib/libc/stdio/fputwc.c,v retrieving revision 1.6 diff -u -p -r1.6 fputwc.c --- fputwc.c1 Oct 2015 02:32:07 - 1.6 +++ fputwc.c10 Jan 2016 18:49:37 - @@ -62,7 +62,7 @@ __fputwc_unlock(wchar_t wc, FILE *fp) size = wcrtomb(buf, wc, st); if (size == (size_t)-1) { - errno = EILSEQ; + fp->_flags |= __SERR; return WEOF; }
Re: bug in fputwc(3) error reporting
Philip Guenther wrote: > On Tue, Dec 29, 2015 at 2:34 PM, Todd C. Miller > wrote: > ... > > Since POSIX defers to ISO C we should be following the ISO C standard > > with respect to behavior when an encoding error occurs. As such, > > I've changed my mind and now believe we should not be setting the > > error flag in those cases. > > That sounds like it should be reported to austin-group so they can fix > the misleading wording. What do solaris and glibc do? I'd expect the wording to be clarified in the direction of existing practice.
Re: bug in fputwc(3) error reporting
On Tue, Dec 29, 2015 at 2:34 PM, Todd C. Miller wrote: ... > Since POSIX defers to ISO C we should be following the ISO C standard > with respect to behavior when an encoding error occurs. As such, > I've changed my mind and now believe we should not be setting the > error flag in those cases. That sounds like it should be reported to austin-group so they can fix the misleading wording. Philip Guenther
Re: bug in fputwc(3) error reporting
On Sun, 27 Dec 2015 00:42:52 +0100, =?utf-8?Q?J=C3=A9r=C3=A9mie_Courr=C3=A8ges- Anglas?= wrote: > The way I read it, [CX] here only affects the last part of the sentence, > > "and errno shall be set to indicate the error." > > (because the C standard doesn't require errno to be set in all error > cases.) > > Thus [CX] here doesn't apply to the part that speaks about the error > indicator. Correct, it is the setting of errno that is the extension as can be seen in the html version where that phrase is bracketed by the [Option Start] and [Option End] glyphs (see the alt text). Since POSIX defers to ISO C we should be following the ISO C standard with respect to behavior when an encoding error occurs. As such, I've changed my mind and now believe we should not be setting the error flag in those cases. - todd
Re: bug in fputwc(3) error reporting
OK millert@. I believe there are also instances of the same problem in the non-wide code. In fact, we don't even document that the error indicator is set in putc(3)! Someone else may wish to tackle that one :-) - todd
Re: bug in fputwc(3) error reporting
"Anthony J. Bentley" writes: > Hi Jérémie, Hi, > Jérémie Courrèges-Anglas writes: >> Hmm, the C standard and POSIX have slightly different texts regarding >> this. >> >> Quoting POSIX-2013: >> -->8-- >> RETURN VALUE >> >> Upon successful completion, fputwc() shall return wc. Otherwise, it >> shall return WEOF, the error indicator for the stream shall be set, >> [CX] and errno shall be set to indicate the error. > ... >> So, the C standard doesn't say that the error indicator should be set on >> the FILE in case of an encoding error, it only speaks about errno being >> set to EILSEQ. I'd say that we should follow the C standard here - >> after all "This volume of POSIX.1-2008 defers to the ISO C standard". > > The [CX] there means it's an intentional extension of ISO C. The way I read it, [CX] here only affects the last part of the sentence, "and errno shall be set to indicate the error." (because the C standard doesn't require errno to be set in all error cases.) Thus [CX] here doesn't apply to the part that speaks about the error indicator. > This is > more obvious in the HTML copy of POSIX, > http://pubs.opengroup.org/onlinepubs/9699919799/functions/fputwc.html > > Quoting more POSIX (the [CX] defintion), > > The functionality described is an extension to the ISO C standard. > Application developers may make use of an extension as it is > supported on all POSIX.1-2008-conforming systems. > > With each function or header from the ISO C standard, a statement to > the effect that "any conflict is unintentional" is included. That is > intended to refer to a direct conflict. POSIX.1-2008 acts in part as > a profile of the ISO C standard, and it may choose to further > constrain behaviors allowed to vary by the ISO C standard. -- jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE
Re: bug in fputwc(3) error reporting
Hi Jérémie, Jérémie Courrèges-Anglas writes: > Hmm, the C standard and POSIX have slightly different texts regarding > this. > > Quoting POSIX-2013: > -->8-- > RETURN VALUE > > Upon successful completion, fputwc() shall return wc. Otherwise, it > shall return WEOF, the error indicator for the stream shall be set, > [CX] and errno shall be set to indicate the error. ... > So, the C standard doesn't say that the error indicator should be set on > the FILE in case of an encoding error, it only speaks about errno being > set to EILSEQ. I'd say that we should follow the C standard here - > after all "This volume of POSIX.1-2008 defers to the ISO C standard". The [CX] there means it's an intentional extension of ISO C. This is more obvious in the HTML copy of POSIX, http://pubs.opengroup.org/onlinepubs/9699919799/functions/fputwc.html Quoting more POSIX (the [CX] defintion), The functionality described is an extension to the ISO C standard. Application developers may make use of an extension as it is supported on all POSIX.1-2008-conforming systems. With each function or header from the ISO C standard, a statement to the effect that "any conflict is unintentional" is included. That is intended to refer to a direct conflict. POSIX.1-2008 acts in part as a profile of the ISO C standard, and it may choose to further constrain behaviors allowed to vary by the ISO C standard. -- Anthony J. Bentley
Re: bug in fputwc(3) error reporting
Ingo Schwarze writes: [...] > When fputwc(3) encounters an encoding error, it neglects to set the > error indicator, just like fgetwc(3) did before i fixed it today. > Setting the error indicator is required by the manual and by the > standard. Hmm, the C standard and POSIX have slightly different texts regarding this. Quoting POSIX-2013: -->8-- RETURN VALUE Upon successful completion, fputwc() shall return wc. Otherwise, it shall return WEOF, the error indicator for the stream shall be set, [CX] and errno shall be set to indicate the error. --<8-- Quoting the C99 draft, section 7.24.3.3: --8<-- Returns 3 The fputwc function returns the wide character written. If a write error occurs, the error indicator for the stream is set and fputwc returns WEOF. If an encoding error occurs, the value of the macro EILSEQ is stored in errno and fputwc returns WEOF. -->8-- Note that the C11 draft has the same text. So, the C standard doesn't say that the error indicator should be set on the FILE in case of an encoding error, it only speaks about errno being set to EILSEQ. I'd say that we should follow the C standard here - after all "This volume of POSIX.1-2008 defers to the ISO C standard". The same can be said about the fgetwc(3) diff that was recently committed, and for which I gave you an ok. > Instead, it overrides errno again. That's pointless because > wcrtomb(3) already did that, and is required to do so by the standard. > It is even slightly dangerous because it might hide internal coding > errors in libc. For example, if libc would ever neglect to initialize > the state object correctly, wcrtomb(3) might fail with EINVAL. > That should NOT be plastered over by setting errno to EILSEQ. I agree that errno doesn't need to be overriden, but please leave _flags alone. [...] > Index: fputwc.c > === > RCS file: /cvs/src/lib/libc/stdio/fputwc.c,v > retrieving revision 1.6 > diff -u -p -r1.6 fputwc.c > --- fputwc.c 1 Oct 2015 02:32:07 - 1.6 > +++ fputwc.c 24 Dec 2015 21:01:10 - > @@ -62,7 +62,7 @@ __fputwc_unlock(wchar_t wc, FILE *fp) > > size = wcrtomb(buf, wc, st); > if (size == (size_t)-1) { > - errno = EILSEQ; > + fp->_flags |= __SERR; > return WEOF; > } > -- jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE
bug in fputwc(3) error reporting
Third file, third bug. SO FAR, EACH AND EVERY BLOODY PLACE I LOOKED AT WAS BROKEN. When fputwc(3) encounters an encoding error, it neglects to set the error indicator, just like fgetwc(3) did before i fixed it today. Setting the error indicator is required by the manual and by the standard. Instead, it overrides errno again. That's pointless because wcrtomb(3) already did that, and is required to do so by the standard. It is even slightly dangerous because it might hide internal coding errors in libc. For example, if libc would ever neglect to initialize the state object correctly, wcrtomb(3) might fail with EINVAL. That should NOT be plastered over by setting errno to EILSEQ. Consider this test program: #include #include #include int main(void) { wchar_t ws[2]; int irc; ws[0] = 0xdc00; /* UTF-16 surrogate, invalid code point. */ ws[1] = L'\0'; irc = fputws(ws, stdout); printf("fputws returned %d\n", irc); printf("error status %d\n", ferror(stdout)); err(1, "fputws"); } Output before: $ ./putws fputws returned -1 error status 0 putws: fputws: Illegal byte sequence Output with the patch below: $ ./putws fputws returned -1 error status 1 putws: fputws: Illegal byte sequence OK? Ingo Index: fputwc.c === RCS file: /cvs/src/lib/libc/stdio/fputwc.c,v retrieving revision 1.6 diff -u -p -r1.6 fputwc.c --- fputwc.c1 Oct 2015 02:32:07 - 1.6 +++ fputwc.c24 Dec 2015 21:01:10 - @@ -62,7 +62,7 @@ __fputwc_unlock(wchar_t wc, FILE *fp) size = wcrtomb(buf, wc, st); if (size == (size_t)-1) { - errno = EILSEQ; + fp->_flags |= __SERR; return WEOF; }