Re: Small change to let mg handle localized characters

Stefan Sperling Thu, 23 Aug 2012 12:53:31 -0700

On Thu, Aug 23, 2012 at 08:58:53PM +0200, Eivind Evensen wrote:
> Since version 1.10 of lib/libc/gen/ctype_.c, I've been
> unable to use localized characters in mg properly (they're printed
> as an octal value only).
> 
> I've been using the below change to regain support for printing them
> normally.
> 
> Best regards, Eivind Evensen
> 
> 
> Index: main.c
> ===================================================================
> RCS file: /data/openbsd/src/usr.bin/mg/main.c,v
> retrieving revision 1.67
> diff -u -r1.67 main.c
> --- main.c    29 May 2012 06:08:48 -0000      1.67
> +++ main.c    23 Aug 2012 10:35:23 -0000
> @@ -12,6 +12,7 @@
>  #include "macro.h"
>  
>  #include <err.h>
> +#include <locale.h>
>  
>  int           thisflag;                      /* flags, this command  */
>  int           lastflag;                      /* flags, last command  */
> @@ -45,6 +46,8 @@
>       int              o, i, nfiles;
>       int              nobackups = 0;
>       struct buffer   *bp = NULL;
> +
> +     setlocale(LC_ALL, "");
>  
>       while ((o = getopt(argc, argv, "nf:")) != -1)
>               switch (o) {
> 
> -- 
> Eivind


This kind of change has been proposed before.
In my opinion it is not the right way of solving this problem.

It won't work correctly with multi-byte files (like UTF-8). E.g. typing
backspace to delete one character will delete one byte instead of the
entire character, which messes up the display. To properly support multi-byte
encodings mg needs to learn the difference between a byte and a character.

The locales mechanism and wchar_t are only useful for applications that do
not care about details of character encodings, and which only need to deal
with a single character set at a time. It is not very useful for editors
because they need to handle files in various encodings and be aware of
the current encoding in use.

Some applications in base (less and tmux, for example) have special
support code for UTF-8. This could be done for mg as well, so that
it can support single-byte character sets (ASCII, latin1) and also
UTF-8 (but no other multi-byte character set). You'd activate the
special UTF-8 mode if nl_langinfo(CODESET) returns "UTF-8".

To properly support arbitrary multi-byte character sets (UTF-8, UTF-16,
special asian language encodings etc) mg needs iconv which we don't have
in base. I have some work-in-progress iconv code but it's not ready for
the tree yet and I'm not actively working on it at the moment.
If you want to help out with this let me know.

Re: Small change to let mg handle localized characters

Reply via email to