Hi,
this topic is quite tricky, so i'm sending this to a wider audience.
The patch below improves printf(1) with respect to the locale(1)
in three respects:
1. It fixes an outright documentation bug. According to both
our implementation and the POSIX standard, %s with a
precision limits the number of bytes written, not the
number of characters, and chops up characters when there
is insufficient space.
2. It mentions that our printf(1) does not have any locale(1)
support. I consider that intentional. I don't want support
for LC_NUMERIC, ever. It's insane.
3. It simplifies the code by removing the no-op setlocale(3).
That part of the diff is from Jan Stary.
To understand what is going on, first consider printf(3).
1. That C library function *is* locale-dependent, even in OpenBSD,
even now, in particular with respect to %lc and %ls. But the utility
program printf(1) does not support %lc and %ls, and it wouldn't
make sense because you cannot provide wide characters in a shell
program, only multibyte characters, and the latter are handled by
plain %s. So even though printf(1) does of course call printf(3),
it does not exercise any of the locale-dependent code paths.
2. OpenBSD printf(3) locale support is not as complete as required
by POSIX. All the other parts not discussed above, in particular
support for LC_NUMERIC, don't exist.
For these two reasons, the printf.c patch implies no functional
change. It only improves simplicity and clarity.
OK?
Ingo
P.S.
In my TODO-UTF8.txt file on cvs.openbsd.org, i have the following
note in the chapter "programs believed to be UTF-8 transparent",
which may also help to understand the situation.
printf [%c is explicitly specified to deal with bytes only, not
with characters. %s without a precision is transparent.
%s with a precision takes a field width in bytes, not
in display positions, and the standard mandates that
characters be chopped, which we do. As opposed to
printf(3), printf(1) should not and does not support
%lc and %ls modifiers. Consequently, nothing to do.]
Index: printf.1
===================================================================
RCS file: /cvs/src/usr.bin/printf/printf.1,v
retrieving revision 1.29
diff -u -p -r1.29 printf.1
--- printf.1 28 Feb 2015 21:51:57 -0000 1.29
+++ printf.1 27 Oct 2016 15:45:55 -0000
@@ -196,7 +196,7 @@ for
.Cm e
and
.Cm f
-formats, or the maximum number of characters to be printed
+formats, or the maximum number of bytes to be printed
from a string; if the digit string is missing, the precision is treated
as zero.
.It Format:
@@ -338,7 +338,7 @@ is printed.
.It Cm s
Characters from the string
.Ar argument
-are printed until the end is reached or until the number of characters
+are printed until the end is reached or until the number of bytes
indicated by the precision specification is reached; however if the
precision is 0 or missing, all characters in the string are printed.
.It Cm \&%
@@ -369,7 +369,12 @@ The
.Nm
utility is compliant with the
.St -p1003.1-2008
-specification.
+specification, but in order to produce predictable output,
+it deliberately ignores the
+.Xr locale 1
+and always operates as if
+.Ev LC_ALL Ns =C
+were set.
.Pp
The escape sequences \ee and \e' are extensions to that specification.
.Sh HISTORY
Index: printf.c
===================================================================
RCS file: /cvs/src/usr.bin/printf/printf.c,v
retrieving revision 1.25
diff -u -p -r1.25 printf.c
--- printf.c 27 Jul 2016 01:52:03 -0000 1.25
+++ printf.c 27 Oct 2016 15:45:55 -0000
@@ -30,14 +30,13 @@
*/
#include <ctype.h>
+#include <err.h>
+#include <errno.h>
+#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
-#include <unistd.h>
#include <string.h>
-#include <limits.h>
-#include <locale.h>
-#include <errno.h>
-#include <err.h>
+#include <unistd.h>
static int print_escape_str(const char *);
static int print_escape(const char *);
@@ -50,7 +49,6 @@ static unsigned long getulong(void);
static char *getstr(void);
static char *mklong(const char *, int);
static void check_conversion(const char *, const char *);
-static void usage(void);
static int rval;
static char **gargv;
@@ -80,8 +78,6 @@ main(int argc, char *argv[])
char convch, nextch;
char *format;
- setlocale (LC_ALL, "");
-
if (pledge("stdio", NULL) == -1)
err(1, "pledge");
@@ -92,7 +88,7 @@ main(int argc, char *argv[])
}
if (argc < 2) {
- usage();
+ fprintf(stderr, "usage: printf format [argument ...]\n");
return (1);
}
@@ -496,10 +492,4 @@ check_conversion(const char *s, const ch
warnc(ERANGE, "%s", s);
rval = 1;
}
-}
-
-static void
-usage(void)
-{
- (void)fprintf(stderr, "usage: printf format [argument ...]\n");
}