Hi,

this topic is quite tricky, so i'm sending this to a wider audience.

The patch below improves printf(1) with respect to the locale(1)
in three respects:

 1. It fixes an outright documentation bug.  According to both
    our implementation and the POSIX standard, %s with a
    precision limits the number of bytes written, not the
    number of characters, and chops up characters when there
    is insufficient space.

 2. It mentions that our printf(1) does not have any locale(1)
    support.  I consider that intentional.  I don't want support
    for LC_NUMERIC, ever.  It's insane.

 3. It simplifies the code by removing the no-op setlocale(3).
    That part of the diff is from Jan Stary.

To understand what is going on, first consider printf(3).

1. That C library function *is* locale-dependent, even in OpenBSD,
even now, in particular with respect to %lc and %ls.  But the utility
program printf(1) does not support %lc and %ls, and it wouldn't
make sense because you cannot provide wide characters in a shell
program, only multibyte characters, and the latter are handled by
plain %s.  So even though printf(1) does of course call printf(3),
it does not exercise any of the locale-dependent code paths.

2. OpenBSD printf(3) locale support is not as complete as required
by POSIX.  All the other parts not discussed above, in particular
support for LC_NUMERIC, don't exist.

For these two reasons, the printf.c patch implies no functional
change.  It only improves simplicity and clarity.

OK?
  Ingo


P.S.
In my TODO-UTF8.txt file on cvs.openbsd.org, i have the following
note in the chapter "programs believed to be UTF-8 transparent",
which may also help to understand the situation.

printf [%c is explicitly specified to deal with bytes only, not
        with characters.  %s without a precision is transparent.
        %s with a precision takes a field width in bytes, not
        in display positions, and the standard mandates that
        characters be chopped, which we do.  As opposed to
        printf(3), printf(1) should not and does not support
        %lc and %ls modifiers.  Consequently, nothing to do.]


Index: printf.1
===================================================================
RCS file: /cvs/src/usr.bin/printf/printf.1,v
retrieving revision 1.29
diff -u -p -r1.29 printf.1
--- printf.1    28 Feb 2015 21:51:57 -0000      1.29
+++ printf.1    27 Oct 2016 15:45:55 -0000
@@ -196,7 +196,7 @@ for
 .Cm e
 and
 .Cm f
-formats, or the maximum number of characters to be printed
+formats, or the maximum number of bytes to be printed
 from a string; if the digit string is missing, the precision is treated
 as zero.
 .It Format:
@@ -338,7 +338,7 @@ is printed.
 .It Cm s
 Characters from the string
 .Ar argument
-are printed until the end is reached or until the number of characters
+are printed until the end is reached or until the number of bytes
 indicated by the precision specification is reached; however if the
 precision is 0 or missing, all characters in the string are printed.
 .It Cm \&%
@@ -369,7 +369,12 @@ The
 .Nm
 utility is compliant with the
 .St -p1003.1-2008
-specification.
+specification, but in order to produce predictable output,
+it deliberately ignores the
+.Xr locale 1
+and always operates as if
+.Ev LC_ALL Ns =C
+were set.
 .Pp
 The escape sequences \ee and \e' are extensions to that specification.
 .Sh HISTORY
Index: printf.c
===================================================================
RCS file: /cvs/src/usr.bin/printf/printf.c,v
retrieving revision 1.25
diff -u -p -r1.25 printf.c
--- printf.c    27 Jul 2016 01:52:03 -0000      1.25
+++ printf.c    27 Oct 2016 15:45:55 -0000
@@ -30,14 +30,13 @@
  */
 
 #include <ctype.h>
+#include <err.h>
+#include <errno.h>
+#include <limits.h>
 #include <stdio.h>
 #include <stdlib.h>
-#include <unistd.h>
 #include <string.h>
-#include <limits.h>
-#include <locale.h>
-#include <errno.h>
-#include <err.h>
+#include <unistd.h>
 
 static int      print_escape_str(const char *);
 static int      print_escape(const char *);
@@ -50,7 +49,6 @@ static unsigned long getulong(void);
 static char    *getstr(void);
 static char    *mklong(const char *, int); 
 static void      check_conversion(const char *, const char *);
-static void     usage(void); 
      
 static int     rval;
 static char  **gargv;
@@ -80,8 +78,6 @@ main(int argc, char *argv[])
        char convch, nextch;
        char *format;
 
-       setlocale (LC_ALL, "");
-
        if (pledge("stdio", NULL) == -1)
                err(1, "pledge");
 
@@ -92,7 +88,7 @@ main(int argc, char *argv[])
        }
 
        if (argc < 2) {
-               usage();
+               fprintf(stderr, "usage: printf format [argument ...]\n");
                return (1);
        }
 
@@ -496,10 +492,4 @@ check_conversion(const char *s, const ch
                warnc(ERANGE, "%s", s);
                rval = 1;
        }
-}
-
-static void
-usage(void)
-{
-       (void)fprintf(stderr, "usage: printf format [argument ...]\n");
 }

Reply via email to