Ingo Schwarze wrote on Fri, Nov 17, 2017 at 03:07:48PM +0100: [ regarding cases where this may matter in practice ] > (2) Programs legitimately calling *printf() with a variable format > string in any non-POSIX locale, even if it's just UTF-8.
Whoa. I just realized there is a very widespread subclass of this: Programs using gettext(3) for printf(3) format strings. They are easy to grep for with /printf\(_/. Here are a few examples: aspell-0.60.6.1/prog/aspell.cpp: setlocale (LC_ALL, ""); aspell-0.60.6.1/prog/aspell.cpp: printf(_("\n %s filter: %s\n"), ... and many other instances, but i don't see any return value checks avahi-0.7/avahi-utils/avahi-browse.c: setlocale(LC_ALL, ""); avahi-0.7/avahi-utils/avahi-browse.c: printf(_(": Cache exhausted\n")); and some others, again no return value checks e2fsprogs-1.42.12/resize/main.c: setlocale(LC_CTYPE, ""); e2fsprogs-1.42.12/resize/online.c: printf(_("Filesystem at %s is mounted... and hundreds of others, no return value checks geeqie-1.3/src/main.c: setlocale(LC_ALL, ""); geeqie-1.3/src/main.c: log_printf(_("Creating %s dir:%s\n"), ... dozens of them, no return value checks in sight git-2.14.1/gettext.c: setlocale(LC_CTYPE, ""); git-2.14.1/builtin/merge.c: printf(_("Wonderful.\n")); hundreds of them, hard to say whether there are any return value checks, quite possibly not gnupg-2.1.23/common/i18n.c: setlocale (LC_ALL, "" ); gnupg-2.1.23/g10/keygen.c: tty_printf(_("Invalid selection.\n")); dozens of them, no return value checks in sight There are also many ports using g_strdup_printf(3) in this way, no idea what that does, but is seems likely to call *printf(3) internally in some way. The show goes on (without checking for setlocale(3) to save time): gnutls, libiconv, libv4l, mutt, openjp2, openldap, postgresql, vlc, xz, ... These are just some ports that i happened to build from source recently. So basically, almost *everybody* is using this, but hardly anybody ever checks for success or failure. When the return value is not checked, the change still makes the following difference: Without the change, an invalidly encoded format string prints nothing. With the change, an invalidly encoded format string prints invalidly encoded output. The former may sometimes be safer, but the missing information might sometimes lead to trouble. That said, i just checked what glibc and commercial Solaris 11 do, and lo and behold: schwarze@donnerwolke:~/Test/printf$ uname -a Linux donnerwolke.asta-kit.de 4.9.0-0.bpo.3-686 #1 SMP \ Debian 4.9.30-2+deb9u2~bpo8+1 (2017-06-27) i686 GNU/Linux schwarze@donnerwolke:~/Test/printf$ cat printf.c #include <err.h> #include <locale.h> #include <stdio.h> int main(void) { int irc; if (setlocale(LC_CTYPE, "en_US.UTF-8") == NULL) errx(1, "setlocale"); irc = printf("start\xc3\xa9middle\xc3%s\n", "end"); printf("%d\n", irc); return 0; } schwarze@donnerwolke:~/Test/printf$ make printf cc printf.c -o printf schwarze@donnerwolke:~/Test/printf$ ./printf > tmp.txt ; echo $? 0 schwarze@donnerwolke:~/Test/printf$ hexdump -C tmp.txt 00000000 73 74 61 72 74 c3 a9 6d 69 64 64 6c 65 c3 65 6e |start..middle.en| 00000010 64 0a 31 38 0a |d.18.| 00000015 Same thing on Solaris. Judging from a superficial look at the FreeBSD and NetBSD sources, they don't appear to validate the format either. So even if my reading of the standard should be correct (which some here have challenged), given that everybody else here intuitively expected the function to behave differently, that the stuff is actually used a lot in practice, and given that most (if not all) other implementations appear to behave the way that people intuitively expect, i think i should stand down and no longer object to the change. Maybe i should consider a small clarification in the manual page afterwards, not yet sure whether that is needed. I don't think, though, that the commit message should advertise this as a performance improvement. It should be called an intentional change of behaviour, now using the format string as a byte string like everyone else, no matter whether POSIX explicitly specifies it as a character string instead. Yours, Ingo