Re: LC_NUMERIC in wprintf(3)
Hi Todd, Todd C. Miller wrote on Sun, Jan 13, 2019 at 08:06:22AM -0700: > On Sun, 13 Jan 2019 05:20:47 +0100, Ingo Schwarze wrote: >> So adding a subsection to the setlocale(3) manual, containing lists >> of functions that are likely to break when you call setlocale(3) >> with bad arguments on non-OpenBSD systems, might make sense. >> Even if we can't guarantee completeness of these lists, they will >> give programmers a better idea about the approximate scopes and sizes >> of the traps. >> >> Once we have that, very short pointers in individual pages become >> sufficient: one short sentence including .Xr setlocale 3 below >> CAVEATS. Less bloat everywhere, better overview in the one critical >> place. >> >> Does that make sense? > Sure, that would be similar to how we list signal safe functions > in sigaction(2). Here is a first draft. Maybe it is already good enough for commit; it can always be added to when additional dangers are discovered. OK? I'd like to keep the cleanup in the individual pages separate from the commit to setlocale(3), or i fear the diff might get unwieldy. Yours, Ingo Index: setlocale.3 === RCS file: /cvs/src/lib/libc/locale/setlocale.3,v retrieving revision 1.22 diff -u -r1.22 setlocale.3 --- setlocale.3 4 Apr 2018 14:57:51 - 1.22 +++ setlocale.3 13 Jan 2019 18:02:11 - @@ -78,7 +78,8 @@ C library. A category of .Dv LC_ALL -sets the entire locale generically. +sets the entire locale generically, which is strongly discouraged for +security reasons in portable programs. .Pp The syntax and semantics of the .Fa locale @@ -126,6 +127,29 @@ The only function in the library that sets the locale is .Fn setlocale ; the locale is never changed as a side effect of some other routine. +.Pp +The +.Dv LC_CTYPE +category modifies the behaviour of at least the following functions: +.Xr iswctype 3 , +.Xr mblen 3 , +.Xr mbrlen 3 , +.Xr mbrtowc 3 , +.Xr mbsrtowcs 3 , +.Xr mbstowcs 3 , +.Xr mbtowc 3 , +.Xr towctrans 3 , +.Xr towlower 3 , +.Xr towupper 3 , +.Xr wcrtomb 3 , +.Xr wcscasecmp 3 , +.Xr wcsrtombs 3 , +.Xr wcstombs 3 , +.Xr wctomb 3 , +.Xr wctrans 3 , +.Xr wctype 3 , +and the functions documented in +.Xr iswalnum 3 . .Sh RETURN VALUES In case of success, .Fn setlocale @@ -205,3 +229,99 @@ .Fn setlocale function first appeared in .Bx 4.4 . +.Sh CAVEATS +On systems other than +.Ox , +calling +.Fn setlocale +or +.Xr uselocale 3 +with a +.Fa category +other than +.Dv LC_CTYPE +can cause erratic behaviour of many library functions. +For security reasons, make sure that portable programs never do that +but only use +.Dv LC_CTYPE . +.Pp +For example, the following functions may be affected. +The list is probably incomplete. +For example, additional library functions may be impacted +if they directly or indirectly call affected functions, +or if they attempt to imitate aspects of their behaviour. +Functions that are not standardized may be affected, too. +.Bl -tag -width Ds +.It Dv LC_COLLATE +.Xr glob 3 , +.Xr strcoll 3 , +.Xr strxfrm 3 , +.Xr wcscoll 3 , +.Xr wcsxfrm 3 , +and the functions documented in +.Xr regexec 3 +.It Dv LC_MESSAGES +.Xr catgets 3 , +.Xr catopen 3 , +.Xr nl_langinfo 3 , +.Xr perror 3 , +.Xr psignal 3 , +.Xr strerror 3 , +.Xr strsignal 3 , +and the functions documented in +.Xr err 3 +.It Dv LC_MONETARY +.Xr localeconv 3 , +.Xr nl_langinfo 3 , +.Fn strfmon +.It Dv LC_NUMERIC +.Xr atof 3 , +.Xr localeconv 3 , +.Xr nl_langinfo 3 , +.Fn strfmon , +and the functions documented in +.Xr printf 3 , +.Xr scanf 3 , +.Xr strtod 3 , +.Xr wcstod 3 , +.Xr wprintf 3 , +.Xr wscanf 3 . +This category is particularly dangerous because it can cause bugs +in the parsing and formatting of numbers, for example, but not +limited to, failures to recognize or properly write decimal points. +.It Dv LC_TIME +.Fn getdate , +.Xr nl_langinfo 3 , +.Xr strftime 3 , +.Xr strptime 3 . +Similarly, this is prone to causing bugs in the parsing and formatting +of date strings. +.It Dv LC_CTYPE +On systems other than +.Ox , +this category may affect the behaviour of additional functions, +for example: +.Xr btowc 3 , +.Xr isalnum 3 , +.Xr isalpha 3 , +.Xr isblank 3 , +.Xr iscntrl 3 , +.Xr isdigit 3 , +.Xr isgraph 3 , +.Xr islower 3 , +.Xr isprint 3 , +.Xr ispunct 3 , +.Xr isspace 3 , +.Xr isupper 3 , +.Xr isxdigit 3 , +.Xr mbsinit 3 , +.Xr strcasecmp 3 , +.Xr strcoll 3 , +.Xr strxfrm 3 , +.Xr tolower 3 , +.Xr toupper 3 , +.Xr vis 3 , +.Xr wcscoll 3 , +.Xr wcsxfrm 3 , +.Xr wctob 3 +.El
Re: LC_NUMERIC in wprintf(3)
On Sun, 13 Jan 2019 05:20:47 +0100, Ingo Schwarze wrote: > So adding a subsection to the setlocale(3) manual, containing lists > of functions that are likely to break when you call setlocale(3) > with bad arguments on non-OpenBSD systems, might make sense. > Even if we can't guarantee completeness of these lists, they will > give programmers a better idea about the approximate scopes and sizes > of the traps. > > Once we have that, very short pointers in individual pages become > sufficient: one short sentence including .Xr setlocale 3 below > CAVEATS. Less bloat everywhere, better overview in the one critical > place. > > Does that make sense? Sure, that would be similar to how we list signal safe functions in sigaction(2). - todd
Re: LC_NUMERIC in wprintf(3)
Hi Todd, Todd C. Miller wrote on Sat, Jan 12, 2019 at 08:04:31PM -0700: > On Sat, 12 Jan 2019 16:45:46 +0100, Ingo Schwarze wrote: >> I did not attempt to hunt down all offenders. >> Would you regard that as a whorthwhile endeavour? >> Note that would not only require considering bogus leftovers >> in our code, but also considering what other systems might do. >> >> I just thought it made sense to remove misleading statements about >> our own code, but not not without warning about related portability >> issues while there. > I just thought that if we were going to mention this problem at > all, then we should do so in strtod() as well. And since printf(3) > and scanf(3) suffer from the same issue as wprintf(3) and wscanf(3) > I see no reason to only include the warning in the wide versions. Hm, that makes sense. We can mention cases that we are already aware of even if we are not sure the list is complete, even if it is not easy to make it complete. > I'm not advocating that we try to hunt down all possible affected > functions. Actually, I wish there was a single central place we > could document this where people would see it, but I'm not sure > there really is one. I think there is. The only functions that are really dangerous in this respect are setlocale(3) and uselocale(3). As long as you don't call either of these two, not much can go wrong. So adding a subsection to the setlocale(3) manual, containing lists of functions that are likely to break when you call setlocale(3) with bad arguments on non-OpenBSD systems, might make sense. Even if we can't guarantee completeness of these lists, they will give programmers a better idea about the approximate scopes and sizes of the traps. Once we have that, very short pointers in individual pages become sufficient: one short sentence including .Xr setlocale 3 below CAVEATS. Less bloat everywhere, better overview in the one critical place. Does that make sense? Yours, Ingo
Re: LC_NUMERIC in wprintf(3)
On Sat, 12 Jan 2019 16:45:46 +0100, Ingo Schwarze wrote: > I did not attempt to hunt down all offenders. > Would you regard that as a whorthwhile endeavour? > Note that would not only require considering bogus leftovers > in our code, but also considering what other systems might do. > > I just thought it made sense to remove misleading statements about > our own code, but not not without warning about related portability > issues while there. I just thought that if we were going to mention this problem at all, then we should do so in strtod() as well. And since printf(3) and scanf(3) suffer from the same issue as wprintf(3) and wscanf(3) I see no reason to only include the warning in the wide versions. I'm not advocating that we try to hunt down all possible affected functions. Actually, I wish there was a single central place we could document this where people would see it, but I'm not sure there really is one. - todd
Re: LC_NUMERIC in wprintf(3)
Hi Todd, Todd C. Miller wrote on Fri, Jan 11, 2019 at 12:31:02PM -0700: > Don't we have the same issue with printf(3) and scanf(3)? > For example, in src/lib/libc/stdio/vfprintf.c:1022 > > decimal_point = nl_langinfo(RADIXCHAR); > > Also, anything using strtod(3), including scanf(3), is likewise > affected by gdtoa's use of localeconv()->decimal_point. Sure. I did not attempt to hunt down all offenders. Would you regard that as a whorthwhile endeavour? Note that would not only require considering bogus leftovers in our code, but also considering what other systems might do. I just thought it made sense to remove misleading statements about our own code, but not not without warning about related portability issues while there. Yours, Ingo
Re: LC_NUMERIC in wprintf(3)
Don't we have the same issue with printf(3) and scanf(3)? For example, in src/lib/libc/stdio/vfprintf.c:1022 decimal_point = nl_langinfo(RADIXCHAR); Also, anything using strtod(3), including scanf(3), is likewise affected by gdtoa's use of localeconv()->decimal_point. - todd
Re: LC_NUMERIC in wprintf(3)
Hi, Jan Stary wrote on Thu, Jan 10, 2019 at 03:15:38PM +0100: > same for wscanf(3): Fixed, but in a way similar to wprintf(3). Yours, Ingo $ man wscanf | col -b | grep -A 6 ^CAVEATS CAVEATS On systems other than OpenBSD, the LC_NUMERIC locale(1) category can cause parsing failures, for example because it may cause a different character to be expected instead of the decimal point. For security reasons, make sure that portable programs calling these functions never call setlocale(3) with the LC_ALL or LC_NUMERIC categories but only use LC_CTYPE.
Re: LC_NUMERIC in wprintf(3)
Hi Ted & Scott, Ted Unangst wrote on Fri, Jan 11, 2019 at 04:03:32AM -0500: > Ingo Schwarze wrote: >> That stuff is a very serious trap not only for the unwary, but also >> for experienced programmers. Less than a month ago, i discussed >> the matter with a senior programmer from the GNU project who has >> been maintaining important software over there for a long time and >> guess what he said? Something like "yes, you are right, that stuff >> is dangerous. I had a nasty bug in one of my programs just last >> week because a printf mangled numbers in a locale-dependent way i >> didn't anticipate." >> >> I think this is exactly what CAVEATS is good for. > ok. mangled numbers from code generators is one of my favorites. :) Committed. Thanks for checking! Ingo
Re: LC_NUMERIC in wprintf(3)
Ingo Schwarze wrote: > That stuff is a very serious trap not only for the unwary, but also > for experienced programmers. Less than a month ago, i discussed > the matter with a senior programmer from the GNU project who has > been maintaining important software over there for a long time and > guess what he said? Something like "yes, you are right, that stuff > is dangerous. I had a nasty bug in one of my programs just last > week because a printf mangled numbers in a locale-dependent way i > didn't anticipate." > > I think this is exactly what CAVEATS is good for. ok. mangled numbers from code generators is one of my favorites. :)
Re: LC_NUMERIC in wprintf(3)
On Jan 10 14:43:39, h...@stare.cz wrote: > The wprintf(3) manpage says > > The decimal point character is defined > in the program's locale (category LC_NUMERIC) > > but LC_NUMERIC is ignored in OpenBSD's C library, > as explained in setlocale(3). > > Would it be an improvement to remove that sentence? > (Removing a needless newline while here.) > > Jan > > > Index: wprintf.3 > === > RCS file: /cvs/src/lib/libc/stdio/wprintf.3,v > retrieving revision 1.5 > diff -u -p -r1.5 wprintf.3 > --- wprintf.3 1 Dec 2017 10:56:07 - 1.5 > +++ wprintf.3 10 Jan 2019 13:29:41 - > @@ -578,14 +578,9 @@ is > .Ql %% . > .El > .Pp > -The decimal point > -character is defined in the program's locale (category > -.Dv LC_NUMERIC ) . > -.Pp > In no case does a non-existent or small field width cause truncation of > a numeric field; if the result of a conversion is wider than the field > -width, the > -field is expanded to contain the conversion result. > +width, the field is expanded to contain the conversion result. > .Sh SEE ALSO > .Xr btowc 3 , > .Xr fputws 3 , same for wscanf(3): Index: wscanf.3 === RCS file: /cvs/src/lib/libc/stdio/wscanf.3,v retrieving revision 1.2 diff -u -p -r1.2 wscanf.3 --- wscanf.32 Nov 2011 22:29:07 - 1.2 +++ wscanf.310 Jan 2019 14:14:08 - @@ -389,10 +389,6 @@ a conversion, although it can be suppres flag. .El .Pp -The decimal point -character is defined in the program's locale (category -.Dv LC_NUMERIC ) . -.Pp For backwards compatibility, a .Dq conversion of
Re: LC_NUMERIC in wprintf(3)
Hi Jan, Jan Stary wrote on Thu, Jan 10, 2019 at 02:43:39PM +0100: > The wprintf(3) manpage says > > The decimal point character is defined > in the program's locale (category LC_NUMERIC) > > but LC_NUMERIC is ignored in OpenBSD's C library, > as explained in setlocale(3). Right, i agree that is bogus. > Would it be an improvement to remove that sentence? I wouldn't remove it outright. That stuff is a very serious trap not only for the unwary, but also for experienced programmers. Less than a month ago, i discussed the matter with a senior programmer from the GNU project who has been maintaining important software over there for a long time and guess what he said? Something like "yes, you are right, that stuff is dangerous. I had a nasty bug in one of my programs just last week because a printf mangled numbers in a locale-dependent way i didn't anticipate." I think this is exactly what CAVEATS is good for. OK? Ingo Index: wprintf.3 === RCS file: /cvs/src/lib/libc/stdio/wprintf.3,v retrieving revision 1.5 diff -u -p -r1.5 wprintf.3 --- wprintf.3 1 Dec 2017 10:56:07 - 1.5 +++ wprintf.3 10 Jan 2019 15:29:20 - @@ -225,17 +225,17 @@ A .Cm + overrides a space if both are used. .It Sq Cm ' -Decimal conversions -.Cm ( d , u , +On +.Ox , +this flag has no effect. +On other systems, it may cause the insertion of +.Xr locale 1 Ns -dependent +thousands separator characters into the integral parts of arguments +of the +.Cm d , i , u , f , or -.Cm i ) -or the integral portion of a floating point conversion -.Cm ( f -or -.Cm F ) -should be grouped and separated by thousands using -the non-monetary separator returned by -.Xr localeconv 3 . +.Cm F +conversions. .El .It An optional decimal digit string specifying a minimum field width. @@ -578,10 +578,6 @@ is .Ql %% . .El .Pp -The decimal point -character is defined in the program's locale (category -.Dv LC_NUMERIC ) . -.Pp In no case does a non-existent or small field width cause truncation of a numeric field; if the result of a conversion is wider than the field width, the @@ -605,3 +601,21 @@ and functions conform to .St -isoC-99 . +.Sh CAVEATS +On systems other than +.Ox , +the +.Dv LC_NUMERIC +.Xr locale 1 +category can cause erratic output, for example different characters +substituted for the decimal point or thousands separator characters +inserted into numbers. +For security reasons, make sure that portable programs calling these +functions never call +.Xr setlocale 3 +with the +.Dv LC_ALL +or +.Dv LC_NUMERIC +categories but only use +.Dv LC_CTYPE .
LC_NUMERIC in wprintf(3)
The wprintf(3) manpage says The decimal point character is defined in the program's locale (category LC_NUMERIC) but LC_NUMERIC is ignored in OpenBSD's C library, as explained in setlocale(3). Would it be an improvement to remove that sentence? (Removing a needless newline while here.) Jan Index: wprintf.3 === RCS file: /cvs/src/lib/libc/stdio/wprintf.3,v retrieving revision 1.5 diff -u -p -r1.5 wprintf.3 --- wprintf.3 1 Dec 2017 10:56:07 - 1.5 +++ wprintf.3 10 Jan 2019 13:29:41 - @@ -578,14 +578,9 @@ is .Ql %% . .El .Pp -The decimal point -character is defined in the program's locale (category -.Dv LC_NUMERIC ) . -.Pp In no case does a non-existent or small field width cause truncation of a numeric field; if the result of a conversion is wider than the field -width, the -field is expanded to contain the conversion result. +width, the field is expanded to contain the conversion result. .Sh SEE ALSO .Xr btowc 3 , .Xr fputws 3 ,