Re: bug in fputwc(3) error reporting

2016-01-23 Thread Ted Unangst
Ingo Schwarze wrote:
> Consequently, i propose to not revert our fgetwc(3) patch and to
> commit this fputwc(3) patch, too, making us agree with FreeBSD,
> NetBSD, Dragonfly, SunOS, glibc, and POSIX, even tough nominally
> violating the C standard (but in a way that seems less dangerous
> than the inverse violation of POSIX).

when in doubt, do what the cool kids do. :) 



Re: bug in fputwc(3) error reporting

2016-01-23 Thread Todd C. Miller
On Sat, 23 Jan 2016 21:14:24 +0100, Ingo Schwarze wrote:

> In conclusion, i would consider it a very bad idea to change all the
> operating systems to not set the error indicator, violating POSIX
> and making correct coding harder, just to please the not very nice
> C standard, then possibly change them all a second time once the C
> standard gets fixed.
> 
> Consequently, i propose to not revert our fgetwc(3) patch and to
> commit this fputwc(3) patch, too, making us agree with FreeBSD,
> NetBSD, Dragonfly, SunOS, glibc, and POSIX, even tough nominally
> violating the C standard (but in a way that seems less dangerous
> than the inverse violation of POSIX).

OK millert@.

 - todd



Re: bug in fputwc(3) error reporting

2016-01-23 Thread Ingo Schwarze
Hi,

Ingo Schwarze wrote on Sun, Jan 10, 2016 at 10:22:28PM +0100:
> Todd C. Miller wrote on Sun, Jan 10, 2016 at 12:33:07PM -0700:
>> On Sun, 10 Jan 2016 19:55:53 +0100, Ingo Schwarze wrote:
  
>>> So, my conclusion is that it's the C standard that is carelessly
>>> worded, not POSIX.  I don't think the C standard intends to say
>>> that fgetwc(3) and fputwc(3) are not allowed to set the error
>>> indicator on an encoding error, it just doesn't clearly say that
>>> they are required to.  POSIX then adds the additional requirement,
>>> merely failing to make it clear that it's resolving a careless
>>> wording in the C standard.

>> POSIX marks its extensions to ISO C, but this behavior is not
>> documented as an extension.  I suggest you contant the Austin group
>> for clarification.

> Done:
> 
>   http://austingroupbugs.net/view.php?id=1022
> 
> Let's wait for feedback, then take that into consideration for our
> decision.

So, here is the result.  The only answer that can be taken
seriously is from Geoff Clare (of the Open Group).  He says this:

Geoff Clare wrote:
> Ingo Schwarze wrote:

:: --
::  (0003030) schwarze (reporter) - 2016-01-22 02:02
::  http://austingroupbugs.net/view.php?id=1022#c3030
:: --
:: Re: 0003027
::
:: Thanks for your feedback, geoffclare.  To make really sure that i
:: understand correctly what you are saying:  Your intention is that POSIX
:: should not be changed, that all the operating systems i listed should
:: continue what they are already doing, and that the C standard should be
:: amended to also require setting the error flag in case of an encoding
:: error.  Right?

: I imagine that's what we would propose to the C commitee.  Whether
: they accept our recommendation is another matter.

In conclusion, i would consider it a very bad idea to change all the
operating systems to not set the error indicator, violating POSIX
and making correct coding harder, just to please the not very nice
C standard, then possibly change them all a second time once the C
standard gets fixed.

Consequently, i propose to not revert our fgetwc(3) patch and to
commit this fputwc(3) patch, too, making us agree with FreeBSD,
NetBSD, Dragonfly, SunOS, glibc, and POSIX, even tough nominally
violating the C standard (but in a way that seems less dangerous
than the inverse violation of POSIX).

By the way, i'm running with this ever since and didn't notice
any adverse effects.

OK?
  Ingo


Index: fputwc.c
===
RCS file: /cvs/src/lib/libc/stdio/fputwc.c,v
retrieving revision 1.6
diff -u -p -r1.6 fputwc.c
--- fputwc.c1 Oct 2015 02:32:07 -   1.6
+++ fputwc.c23 Jan 2016 20:00:59 -
@@ -62,7 +62,7 @@ __fputwc_unlock(wchar_t wc, FILE *fp)
 
size = wcrtomb(buf, wc, st);
if (size == (size_t)-1) {
-   errno = EILSEQ;
+   fp->_flags |= __SERR;
return WEOF;
}



Re: bug in fputwc(3) error reporting

2016-01-10 Thread Ingo Schwarze
Hi Todd,

Todd C. Miller wrote on Sun, Jan 10, 2016 at 12:33:07PM -0700:
> On Sun, 10 Jan 2016 19:55:53 +0100, Ingo Schwarze wrote:
 
>> So, my conclusion is that it's the C standard that is carelessly
>> worded, not POSIX.  I don't think the C standard intends to say
>> that fgetwc(3) and fputwc(3) are not allowed to set the error
>> indicator on an encoding error, it just doesn't clearly say that
>> they are required to.  POSIX then adds the additional requirement,
>> merely failing to make it clear that it's resolving a careless
>> wording in the C standard.

> POSIX marks its extensions to ISO C, but this behavior is not
> documented as an extension.  I suggest you contant the Austin group
> for clarification.

Done:

  http://austingroupbugs.net/view.php?id=1022

Let's wait for feedback, then take that into consideration for our
decision.

Yours,
  Ingo



Re: bug in fputwc(3) error reporting

2016-01-10 Thread Todd C. Miller
On Sun, 10 Jan 2016 19:55:53 +0100, Ingo Schwarze wrote:

> So, my conclusion is that it's the C standard that is carelessly
> worded, not POSIX.  I don't think the C standard intends to say
> that fgetwc(3) and fputwc(3) are not allowed to set the error
> indicator on an encoding error, it just doesn't clearly say that
> they are required to.  POSIX then adds the additional requirement,
> merely failing to make it clear that it's resolving a careless
> wording in the C standard.

POSIX marks its extensions to ISO C, but this behavior is not
documented as an extension.  I suggest you contant the Austin group
for clarification.

 - todd



Re: bug in fputwc(3) error reporting

2016-01-10 Thread Ingo Schwarze
Hi,

Ted Unangst wrote on Wed, Dec 30, 2015 at 01:22:17AM -0500:
> Philip Guenther wrote:
>> On Tue, Dec 29, 2015 at 2:34 PM, Todd C. Miller wrote:

>>> Since POSIX defers to ISO C we should be following the ISO C standard
>>> with respect to behavior when an encoding error occurs.  As such,
>>> I've changed my mind and now believe we should not be setting the
>>> error flag in those cases.

>> That sounds like it should be reported to austin-group so they can fix
>> the misleading wording.

> What do solaris and glibc do?  I'd expect the wording to be clarified
> in the direction of existing practice.

fgetwc(3):
FreeBSD sets the error flag since Oct 16, 2002 (rev. 105234).
NetBSD sets the error flag since July 3, 2006 (rev. 1.5), referencing SUSv3.
Dragonfly sets the error flag since April 21, 2009,
  (e0f95098eeba0176864b9cafe6d69b5b7bc0e73f), sync from FreeBSD.

fputwc(3):
FreeBSD sets the error flag since Oct 16, 2002 (rev. 105234).
NetBSD does NOT set the error flag.
Dragonfly sets the error flag since Sep 29, 2013,
  (0d5acd7467c4e95f792ef49fceb3ab8e917ce86b), sync from FreeBSD.


In don't understand the illumos code at all, and the glibc code
looks even worse, but i tested on these systems, and they set the
error flag for encoding errors in fgetwc(3):

SunOS unstable11s 5.11 11.2 sun4u sparc SUNW,SPARC-Enterprise
SunOS unstable10s 5.10 Generic_150400-17 sun4v sparc SUNW,SPARC-Enterprise-T5220
Linux donnerwolke.usta.de 3.16.0-4-686-pae #1 \
  SMP Debian 3.16.7-ckt11-1+deb8u3 (2015-08-04) i686 GNU/Linux

This one does not set the error flag:

SunOS unstable9s 5.9 Generic_Virtual sun4u sparc SUNW,SPARC-Enterprise-T5220

Solaris and Linux tests of fputwc(3) are inconclusive, i don't
understand how to set up invalid wchar_t values on Solaris and
Linux.


So, my conclusion is that it's the C standard that is carelessly
worded, not POSIX.  I don't think the C standard intends to say
that fgetwc(3) and fputwc(3) are not allowed to set the error
indicator on an encoding error, it just doesn't clearly say that
they are required to.  POSIX then adds the additional requirement,
merely failing to make it clear that it's resolving a careless
wording in the C standard.

The interpretation that setting the error indicator is deliberately
forbidden is not reasonable because that would make detection of
encoding errors fragile, complicated, and error-prone.

Assuming the POSIX specification, fgetwc(3) is easy and safe to
use:

  if (fgetwc(...) == WEOF) {  /* EOF or error */
if (ferror(...))  /* error */
  inspect errno for the reason
else  /* EOF */
  ...
  }

Assuming setting the error indicator is forbidden, there is one way
of using the interface that is counter-intutive and fragile:

  if (fgetwc(...) == WEOF) {  /* EOF or error */
if (feof(...))/* EOF */
  ...
else  /* error, no matter what ferror(...) */
  inspect errno for the reason
  }

There is another way of using the error indicator that is safe,
but so contorted that we should not recommend it:

  int save_errno = errno;
  errno = 0;
  if (fgetwc(...) == WEOF) {  /* EOF or error */
if (errno == EILSEQ)  /* encoding error */
  ...
else if (ferror(...)) /* read error */
  inspect errno for the reason
else  /* EOF */
  ...
  }
  errno = save_errno


I think we should NOT revert the patch i committed that sets the
error indicator in fgetwc(3).  It's reasonable and agrees with what
everybody else does.  I think we should also commit the following
patch to fputwc(3), for the same reasons, and for consistency.

OK?
  Ingo


Index: fputwc.c
===
RCS file: /cvs/src/lib/libc/stdio/fputwc.c,v
retrieving revision 1.6
diff -u -p -r1.6 fputwc.c
--- fputwc.c1 Oct 2015 02:32:07 -   1.6
+++ fputwc.c10 Jan 2016 18:49:37 -
@@ -62,7 +62,7 @@ __fputwc_unlock(wchar_t wc, FILE *fp)
 
size = wcrtomb(buf, wc, st);
if (size == (size_t)-1) {
-   errno = EILSEQ;
+   fp->_flags |= __SERR;
return WEOF;
}
 



Re: bug in fputwc(3) error reporting

2015-12-29 Thread Ted Unangst
Philip Guenther wrote:
> On Tue, Dec 29, 2015 at 2:34 PM, Todd C. Miller
>  wrote:
> ...
> > Since POSIX defers to ISO C we should be following the ISO C standard
> > with respect to behavior when an encoding error occurs.  As such,
> > I've changed my mind and now believe we should not be setting the
> > error flag in those cases.
> 
> That sounds like it should be reported to austin-group so they can fix
> the misleading wording.

What do solaris and glibc do? I'd expect the wording to be clarified in the
direction of existing practice.



Re: bug in fputwc(3) error reporting

2015-12-29 Thread Philip Guenther
On Tue, Dec 29, 2015 at 2:34 PM, Todd C. Miller
 wrote:
...
> Since POSIX defers to ISO C we should be following the ISO C standard
> with respect to behavior when an encoding error occurs.  As such,
> I've changed my mind and now believe we should not be setting the
> error flag in those cases.

That sounds like it should be reported to austin-group so they can fix
the misleading wording.


Philip Guenther



Re: bug in fputwc(3) error reporting

2015-12-29 Thread Todd C. Miller
On Sun, 27 Dec 2015 00:42:52 +0100, =?utf-8?Q?J=C3=A9r=C3=A9mie_Courr=C3=A8ges-
Anglas?= wrote:

> The way I read it, [CX] here only affects the last part of the sentence,
> 
>   "and errno shall be set to indicate the error."
> 
> (because the C standard doesn't require errno to be set in all error
> cases.)
> 
> Thus [CX] here doesn't apply to the part that speaks about the error
> indicator.

Correct, it is the setting of errno that is the extension as can
be seen in the html version where that phrase is bracketed by the
[Option Start] and [Option End] glyphs (see the alt text).

Since POSIX defers to ISO C we should be following the ISO C standard
with respect to behavior when an encoding error occurs.  As such,
I've changed my mind and now believe we should not be setting the
error flag in those cases.

 - todd



Re: bug in fputwc(3) error reporting

2015-12-28 Thread Todd C. Miller
OK millert@.

I believe there are also instances of the same problem in the
non-wide code.  In fact, we don't even document that the error
indicator is set in putc(3)!  Someone else may wish to tackle that
one :-)

 - todd



Re: bug in fputwc(3) error reporting

2015-12-26 Thread Jérémie Courrèges-Anglas
"Anthony J. Bentley"  writes:

> Hi Jérémie,

Hi,

> Jérémie Courrèges-Anglas writes:
>> Hmm, the C standard and POSIX have slightly different texts regarding
>> this.
>> 
>> Quoting POSIX-2013:
>> -->8--
>>   RETURN VALUE
>> 
>> Upon successful completion, fputwc() shall return wc. Otherwise, it
>> shall return WEOF, the error indicator for the stream shall be set,
>> [CX] and errno shall be set to indicate the error.
> ...
>> So, the C standard doesn't say that the error indicator should be set on
>> the FILE in case of an encoding error, it only speaks about errno being
>> set to EILSEQ.  I'd say that we should follow the C standard here -
>> after all "This volume of POSIX.1-2008 defers to the ISO C standard".
>
> The [CX] there means it's an intentional extension of ISO C.

The way I read it, [CX] here only affects the last part of the sentence,

  "and errno shall be set to indicate the error."

(because the C standard doesn't require errno to be set in all error
cases.)

Thus [CX] here doesn't apply to the part that speaks about the error
indicator.

> This is
> more obvious in the HTML copy of POSIX,
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/fputwc.html
>
> Quoting more POSIX (the [CX] defintion),
>
> The functionality described is an extension to the ISO C standard.
> Application developers may make use of an extension as it is
> supported on all POSIX.1-2008-conforming systems.
>
> With each function or header from the ISO C standard, a statement to
> the effect that "any conflict is unintentional" is included. That is
> intended to refer to a direct conflict. POSIX.1-2008 acts in part as
> a profile of the ISO C standard, and it may choose to further
> constrain behaviors allowed to vary by the ISO C standard.

-- 
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE



Re: bug in fputwc(3) error reporting

2015-12-26 Thread Anthony J. Bentley
Hi Jérémie,

Jérémie Courrèges-Anglas writes:
> Hmm, the C standard and POSIX have slightly different texts regarding
> this.
> 
> Quoting POSIX-2013:
> -->8--
>   RETURN VALUE
> 
> Upon successful completion, fputwc() shall return wc. Otherwise, it
> shall return WEOF, the error indicator for the stream shall be set,
> [CX] and errno shall be set to indicate the error.
...
> So, the C standard doesn't say that the error indicator should be set on
> the FILE in case of an encoding error, it only speaks about errno being
> set to EILSEQ.  I'd say that we should follow the C standard here -
> after all "This volume of POSIX.1-2008 defers to the ISO C standard".

The [CX] there means it's an intentional extension of ISO C. This is
more obvious in the HTML copy of POSIX,
http://pubs.opengroup.org/onlinepubs/9699919799/functions/fputwc.html

Quoting more POSIX (the [CX] defintion),

The functionality described is an extension to the ISO C standard.
Application developers may make use of an extension as it is
supported on all POSIX.1-2008-conforming systems.

With each function or header from the ISO C standard, a statement to
the effect that "any conflict is unintentional" is included. That is
intended to refer to a direct conflict. POSIX.1-2008 acts in part as
a profile of the ISO C standard, and it may choose to further
constrain behaviors allowed to vary by the ISO C standard.

-- 
Anthony J. Bentley



Re: bug in fputwc(3) error reporting

2015-12-26 Thread Jérémie Courrèges-Anglas
Ingo Schwarze  writes:

[...]

> When fputwc(3) encounters an encoding error, it neglects to set the
> error indicator, just like fgetwc(3) did before i fixed it today.
> Setting the error indicator is required by the manual and by the
> standard.

Hmm, the C standard and POSIX have slightly different texts regarding
this.

Quoting POSIX-2013:
-->8--
  RETURN VALUE

Upon successful completion, fputwc() shall return wc. Otherwise, it
shall return WEOF, the error indicator for the stream shall be set,
[CX] and errno shall be set to indicate the error.
--<8--

Quoting the C99 draft, section 7.24.3.3:
--8<--
Returns

3 The fputwc function returns the wide character written. If a write
error occurs, the error indicator for the stream is set and fputwc
returns WEOF. If an encoding error occurs, the value of the macro EILSEQ
is stored in errno and fputwc returns WEOF.
-->8--

Note that the C11 draft has the same text.

So, the C standard doesn't say that the error indicator should be set on
the FILE in case of an encoding error, it only speaks about errno being
set to EILSEQ.  I'd say that we should follow the C standard here -
after all "This volume of POSIX.1-2008 defers to the ISO C standard".

The same can be said about the fgetwc(3) diff that was recently
committed, and for which I gave you an ok.

> Instead, it overrides errno again.  That's pointless because
> wcrtomb(3) already did that, and is required to do so by the standard.
> It is even slightly dangerous because it might hide internal coding
> errors in libc.  For example, if libc would ever neglect to initialize
> the state object correctly, wcrtomb(3) might fail with EINVAL.
> That should NOT be plastered over by setting errno to EILSEQ.

I agree that errno doesn't need to be overriden, but please leave _flags
alone.

[...]

> Index: fputwc.c
> ===
> RCS file: /cvs/src/lib/libc/stdio/fputwc.c,v
> retrieving revision 1.6
> diff -u -p -r1.6 fputwc.c
> --- fputwc.c  1 Oct 2015 02:32:07 -   1.6
> +++ fputwc.c  24 Dec 2015 21:01:10 -
> @@ -62,7 +62,7 @@ __fputwc_unlock(wchar_t wc, FILE *fp)
>  
>   size = wcrtomb(buf, wc, st);
>   if (size == (size_t)-1) {
> - errno = EILSEQ;
> + fp->_flags |= __SERR;
>   return WEOF;
>   }
>  


-- 
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE



bug in fputwc(3) error reporting

2015-12-24 Thread Ingo Schwarze
Third file, third bug.
SO FAR, EACH AND EVERY BLOODY PLACE I LOOKED AT WAS BROKEN.

When fputwc(3) encounters an encoding error, it neglects to set the
error indicator, just like fgetwc(3) did before i fixed it today.
Setting the error indicator is required by the manual and by the
standard.

Instead, it overrides errno again.  That's pointless because
wcrtomb(3) already did that, and is required to do so by the standard.
It is even slightly dangerous because it might hide internal coding
errors in libc.  For example, if libc would ever neglect to initialize
the state object correctly, wcrtomb(3) might fail with EINVAL.
That should NOT be plastered over by setting errno to EILSEQ.

Consider this test program:


#include 
#include 
#include 

int
main(void)
{
wchar_t ws[2];
int irc;

ws[0] = 0xdc00;  /* UTF-16 surrogate, invalid code point. */
ws[1] = L'\0';
irc = fputws(ws, stdout);
printf("fputws returned %d\n", irc);
printf("error status %d\n", ferror(stdout));
err(1, "fputws");
}


Output before:

   $ ./putws
  fputws returned -1
  error status 0
  putws: fputws: Illegal byte sequence

Output with the patch below:

   $ ./putws 
  fputws returned -1
  error status 1
  putws: fputws: Illegal byte sequence


OK?
  Ingo


Index: fputwc.c
===
RCS file: /cvs/src/lib/libc/stdio/fputwc.c,v
retrieving revision 1.6
diff -u -p -r1.6 fputwc.c
--- fputwc.c1 Oct 2015 02:32:07 -   1.6
+++ fputwc.c24 Dec 2015 21:01:10 -
@@ -62,7 +62,7 @@ __fputwc_unlock(wchar_t wc, FILE *fp)
 
size = wcrtomb(buf, wc, st);
if (size == (size_t)-1) {
-   errno = EILSEQ;
+   fp->_flags |= __SERR;
return WEOF;
}