when committing ksh(1) vi input mode UTF-8 support recently,
i added a setlocale(3) call to the shell.  Now, when considering
how to document LC_CTYPE in the ksh(1) manual, i realized that
inspecting that variable is not really useful, so we can simplify
things, see the diff below.

First, note that emacs mode doesn't use LC_CTYPE in the first
place, nor does anything else in the shell except vi mode.

Assuming you use vi mode (VISUAL=vi), three settings influence
how things work for you:

 1. Whether escaping non-ASCII bytes is disabled (set +o vi-show8,
    the default) or enabled (set -o vi-show8).

 2. LC_CTYPE=C (the default) or UTF-8.

 3. Wether your xterm is UTF-8 enabled (-u8, the default)
    or not (+u8).

So there are 2^3 = 8 case combinations.  Let's look at them
in turn.

 A. Escaping non-ASCII bytes enabled (set -o vi-show8):

    In this case, the diff changes nothing because isu8cont()
    always returns 0 already now, so LC_CTYPE is already
    effectively ignored.

    I considered whether this mode is useful at all or whether
    it might be better to just delete the vi-show8 switch
    outright and simplify the code.

    But there is a potential use case, however rare it may be:
    Sometimes, you may want to edit individual raw 8-bit bytes
    on shell command lines, either for testing purposes or
    to call programs that require binary command line arguments
    or input.  That wish may occur for any LC_CTYPE setting
    and on any terminal.

    So 4 of the 8 cases are taken care of so far.

In the following, we know that non-ASCII bytes will not be
escaped (set +o vi-show8).

 B. The user wants to use UTF-8:

    In that case, having LC_CTYPE=en_US.UTF-8 is required
    and the patch changes nothing.

    Of course, the xterm must also be UTF-8 enabled.
    The combination with +u8 is just useless and dangerous,
    with or without the patch.

    So far, 6 of the 8 cases are taken care of.

What remains is set +o vi-show8 with LC_CTYPE=C (the default, actually).

 C. On a UTF-8 terminal (This is the default case!):

    In this case, if the user presses non-ASCII keys on the
    keyboard, they will result in UTF-8-encoded multibyte
    strings in the shell's internal buffers and they will be
    shown as Unicode glyphs.

    In that situation, respecting LC_CTYPE=C allows the user
    to move the cursor to individual bytes, but without being
    able to see where the cursor really is, and to delete and
    insert single bytes in the middle of characters, causing
    the display to show stuff that disagrees with the actual
    content of the buffers.  This is not useful at all and
    potentially dangerous.

    The diff below actually makes things better.  It improves
    the chances that the display remains consistent with actual
    buffer content, by effectively editing in UTF-8 mode.

    Admittedly, that may not be what the user wants.
    But if the user really wants to mess with arbitrary bytes,
    they ought to set -o vi-show8 as explained above.

 D. On a terminal in non-UTF-8 legacy latin-1 mode:

    The only reason i can imagine for using such a mode combination
    is to manipulate arbitrary bytes individually, but actually,
    this mode combination is unusable for that purpose because
    several bytes will corrupt or lock up the terminal, both with
    and without this patch.  So it doesn't really matter that the
    patch changes behaviour here, the mode is useless and dangerous
    in the first place.  Besides, the mode is hardly usable at all
    because most characters that can be entered are interpreted and
    shown as ISO-LATIN-1 characters, which is not a useful way to
    represent arbitrary binary bytes.

So, the patch

 - makes the code simpler,
 - changes nothing for many use cases,
 - improves the default use case, and
 - besides, only affects a mode combination that is useless anyway.

OK to put it in?

Index: main.c
RCS file: /cvs/src/bin/ksh/main.c,v
retrieving revision 1.81
diff -u -p -r1.81 main.c
--- main.c      11 Oct 2016 19:52:54 -0000      1.81
+++ main.c      14 Oct 2016 12:27:31 -0000
@@ -8,7 +8,6 @@
 #include <errno.h>
 #include <fcntl.h>
-#include <locale.h>
 #include <paths.h>
 #include <pwd.h>
 #include <stdio.h>
@@ -152,8 +151,6 @@ main(int argc, char *argv[])
        pid_t ppid;
        kshname = argv[0];
-       setlocale(LC_CTYPE, "");
        if (pledge("stdio rpath wpath cpath fattr flock getpw proc exec tty",
            NULL) == -1) {
Index: vi.c
RCS file: /cvs/src/bin/ksh/vi.c,v
retrieving revision 1.40
diff -u -p -r1.40 vi.c
--- vi.c        11 Oct 2016 19:52:54 -0000      1.40
+++ vi.c        14 Oct 2016 12:27:31 -0000
@@ -2224,6 +2224,6 @@ vi_macro_reset(void)
 static int
 isu8cont(unsigned char c)
-       return MB_CUR_MAX > 1 && !Flag(FVISHOW8) && (c & (0x80 | 0x40)) == 0x80;
+       return !Flag(FVISHOW8) && (c & (0x80 | 0x40)) == 0x80;
 #endif /* VI */

Reply via email to