Hi Theo,

Quick answer, more later:

Theo de Raadt wrote on Thu, Nov 16, 2017 at 09:52:39AM -0700:
> Todd Miller wrote:

>> Also, POSIX isn't explicit as to whether that restriction applies
>> to the format string or just the arguments to %lc and %ls conversions.
>> 
>> What it does say is:
>> 
>>     The format is composed of zero or more directives: ordinary
>>     characters, which are simply copied to the output stream, and
>>     conversion specifications, each of which shall result in the
>>     fetching of zero or more arguments.

> Well that says the format string is a string, not a wide string.

There are three kinds of strings, not two.  You are confusing wide
strings and multibyte strings.  It is certainly not a wide string.
It is a multibyte string, that is what the use of the word "character"
indicates.  If it were a byte string, it would talk about bytes
instead, see for example how POSIX describes %s:

  The argument shall be a pointer to an array of char.  Bytes from
  the array shall be written up to (but not including) any terminating
  null byte.


There isn't the slightest doubt that passing a format containing
non-UTF-8 bytes under a UTF-8 locale is invalid.  The only questions
are whether the standard says that is merely undefined, or whether
it requires failure.  If it requires failure, some think we should
ignore the standard; i say that isn't safe in this case (i'll explain
in more detail later why it isn't).  If it is undefined behaviour,
some seem to say it doesn't matter; i say failing closed is safer
even then.

> I think EILSEQ and -1 are intended to apply entirely to failed
> conversions,

That is not true.  For example, the function mblen(3) is specified
to return EILSEQ, and it does so.  So EILSEQ is also used for
validation even without conversion, even elsewhere.

> and these checks were mistakenly added to printf a while
> ago.

The *printf() functions set EILSEQ in these cases since revision 1.1
in 1995.

Yours,
  Ingo

Reply via email to