Hi Theo,

it's bad that i slacked on this, causing people to spend so much effort,
but i failed to make up my mind at first. I think now i see clearly.

Theo Buehler wrote on Tue, Nov 14, 2017 at 09:26:47AM +0100:

> There is a simplification and optimization for __vfprintf()
> from android pointed out by tedu:
> 
> https://github.com/aosp-mirror/platform_bionic/commit/5305a4d4a723b06494b93f2df81733b83a0c46d3
> 
> If we only support UTF-8 and ASCII, we do not need complicated multibyte
> decoding to recognize a '%' in the format string.

While that is true, the multibyte decoding is still required to
validate the format string, which is in turn required for correct
operation.  Consequently, the diff is incorrect and introduces
a bug.

POSIX says:
(http://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html)

  DESCRIPTION
    The format is a character string, [...]

  ERRORS
    In addition, all forms of fprintf() shall fail if:

    [EILSEQ]
      A wide-character code that does not correspond to a valid
      character has been detected.

That means that the functions are *required* to fail ("shall fail")
if encoding errors can be detected, that -1 must be returned, and
that errno must be set.

Even if someone finds a loophole to evade this reading of the
standard, after some thought, i consider the change reckless.
If somebody is using *printf(3) to format text output under a UTF-8
locale, the output is likely intended to be consumed by arbitrary
other multibyte-character-handling software, and such software,
in practice, is often of miserable quality regarding the handling
of encoding errors, and often misbehaves in scary ways when fed
invalidly encoded input.  DOS-like exploitability is a common
consequence, and arbitrary data corruption and other unpredictable
behaviour may also ensue.

Invalid UTF-8 encoding in a printf format string under the UTF-8
locale is invariably a severe coding error.  It is the job of the
operating system to catch dangerous coding errors as early as
possible, even if the standards would not absolutely require it,
and in particular if even the standards allow the interpretation
that the system is required to catch such errors.

While variable format strings are a thoroughly bad idea in the
vast majority of cases, they are not illegal, and there might
be rare valid use cases where this validation is even more
important than in the case of constant format strings.

> In his commit message, enh claims that there is a 10x speedup.

That's completely irrelevant.  Three things are at stake here:
correctness, simplicity, and performance.  Simplicity and performance
obviously point into the same direction, so there is no need to
even measure performance.  But correctness unconditionally
trumps both.

Sorry again,
  Ingo

Reply via email to