Re: [Debian BTS] ru_RU.UTF-8 locale
On Tue, Feb 25, 2003 at 11:13:54AM -0500, Pavel Roskin wrote: > > On Mon, Feb 24, 2003 at 11:49:14AM +0100, Adam Byrtek / alpha wrote: > > > Hi, I know there are several people here which use the Russian locale. > > > Could you please try to reproduce this bug report or tell me whether I > > > can close it? Maybe this guy just doesn't know how to configure > > > UTF-8 terminal properly? Unfortunately I can't contact him... > > The problem is that gettext returns strings in UTF-8, and they are passed > to the screen library (S-Lang or ncurses), that is supposed to show those > strings correctly. Part of the problem is the need to measure the actual It is not just about gettext strings, but about filenames too. > I also don't feel it's such a good idea to use locale to figure out the > properties of the terminal. The locale is meant to define locale-specific > preferences of the user, not the properties of any software. Well, locale tells you what charset gettext strings are in, what characters are printable, etc. Running on UTF-8 terminal with non-UTF-8 locale is a bad idea, similarly running non-UTF-8 terminal with UTF-8 locale. The mc changes which are needed to support UTF-8 are at least: a) stop assuming strlen () is usable for both the strings and their length on the screen b) when truncating/etc. strings intended for display the visible length has to be taken into account and also it must ensure there are never just parts of MB chars c) view/edit should be able to iconv from selected data charset to the display charset When dealing with gettext returned strings, mbstrlen could be made way faster by assuming all strings are valid UTF-8 in UTF-8 locale - basically in a loop only count (char & 0xc0) != 0x80 in the string. Unfortunately, this is not necessarily true with filenames and file content. > Maybe it's better to use ncurses instead of S-Lang for the build with > UTF-8 support? ncurses has a longer history of supporting Unicode. > Also, it is developed by Thomas Dickey, who also maintains xterm and > terminfo. If there are any issues with the standards (like the one I just > mentioned), he is the person who can do something. I don't know, haven't ever looked at ncurses UTF-8 support. Ideally we should support both S-Lang and ncurses with both non-UTF-8 and UTF-8. > > mc doesn't work in UTF-8 locales. A few days I hacked mc up so that at > > least the things I use often in mc sort-of work with UTF-8, you can find > > the patch in ftp://people.redhat.com/jakub/mc/ > > First of all, to override the check for UTF-8 S-Lang, use > --with-screen=slang instead of hacking configure. If I remember well, that did not work, even if I specified this it was overridden by configure. > It would be nice if you commented your patches. I may consider applying > some of them. I really don't understand why /bin/rm is better than rm. Most of them aren't mine, I just forward ported them from older mc rpm. > > But view is not done at all and there is still a lot of places which > > need changing. The first thing to decide is what all locales mc wants to > > support. E.g. supporting just ASCII compatible charsets (like UTF-8) is > > easier than supporting ASCII incompatible ones. > > I think we can limit ourselves to the ASCII compatible charsets for now. This simplifies things. Jakub ___ Mc-devel mailing list [EMAIL PROTECTED] http://mail.gnome.org/mailman/listinfo/mc-devel
Re: [Debian BTS] ru_RU.UTF-8 locale
Hello, Jakub! > On Mon, Feb 24, 2003 at 11:49:14AM +0100, Adam Byrtek / alpha wrote: > > Hi, I know there are several people here which use the Russian locale. > > Could you please try to reproduce this bug report or tell me whether I > > can close it? Maybe this guy just doesn't know how to configure > > UTF-8 terminal properly? Unfortunately I can't contact him... The problem is that gettext returns strings in UTF-8, and they are passed to the screen library (S-Lang or ncurses), that is supposed to show those strings correctly. Part of the problem is the need to measure the actual length of the string as displayed on the screen. It's not trivial for UTF-8 because we may have one or two bytes representing one symbol. Not to mention that some symbols (e.g. Chinese) can be twice as wide as Latin characters. I also don't feel it's such a good idea to use locale to figure out the properties of the terminal. The locale is meant to define locale-specific preferences of the user, not the properties of any software. Maybe it's better to use ncurses instead of S-Lang for the build with UTF-8 support? ncurses has a longer history of supporting Unicode. Also, it is developed by Thomas Dickey, who also maintains xterm and terminfo. If there are any issues with the standards (like the one I just mentioned), he is the person who can do something. S-Lang is basically a clone of ncurses, when in comes to the library part (the S-Lang language is not a clone). It's easier to change S-Lang to match ncurses than to change S-Lang and try to change ncurses to match. > mc doesn't work in UTF-8 locales. A few days I hacked mc up so that at > least the things I use often in mc sort-of work with UTF-8, you can find > the patch in ftp://people.redhat.com/jakub/mc/ First of all, to override the check for UTF-8 S-Lang, use --with-screen=slang instead of hacking configure. It would be nice if you commented your patches. I may consider applying some of them. I really don't understand why /bin/rm is better than rm. > But view is not done at all and there is still a lot of places which > need changing. The first thing to decide is what all locales mc wants to > support. E.g. supporting just ASCII compatible charsets (like UTF-8) is > easier than supporting ASCII incompatible ones. I think we can limit ourselves to the ASCII compatible charsets for now. -- Regards, Pavel Roskin ___ Mc-devel mailing list [EMAIL PROTECTED] http://mail.gnome.org/mailman/listinfo/mc-devel
Re: [Debian BTS] ru_RU.UTF-8 locale
On Mon, Feb 24, 2003 at 04:07:57PM +0200, Andrew V. Samoilov wrote: > Jakub Jelinek wrote: > > On Mon, Feb 24, 2003 at 11:49:14AM +0100, Adam Byrtek / alpha wrote: > > > >>Hi, I know there are several people here which use the Russian locale. > >>Could you please try to reproduce this bug report or tell me whether I > >>can close it? Maybe this guy just doesn't know how to configure > >>UTF-8 terminal properly? Unfortunately I can't contact him... > > > > > > mc doesn't work in UTF-8 locales. > > A few days I hacked mc up so that at least the things I use often in mc > > sort-of work with UTF-8, you can find the patch in > > ftp://people.redhat.com/jakub/mc/ > > But view is not done at all and there is still a lot of places which need > > changing. > > The first thing to decide is what all locales mc wants to support. > > E.g. supporting just ASCII compatible charsets (like UTF-8) is easier > > than supporting ASCII incompatible ones. > > > > Jakub > > Can you upload UTF8 related patches there? This is the UTF-8 patch which assumes UTF-8ized slang (AFAIK the original UTF-8 patch we use in slang is from Debian, then we have a linedrawing patch and I had to fix two places in slang so that linedrawing worked even in say cs_CZ locale or some other non-latin1 non-UTF-8 locale). MB_CUR_MAX == 1 assumptions are in about every file in mc/src :(. --- mc-4.6.0/src/util.c.jj 2003-01-28 17:58:23.0 -0500 +++ mc-4.6.0/src/util.c 2003-02-21 08:36:36.0 -0500 @@ -35,6 +35,7 @@ #include #include +#include "tty.h" #include "global.h" #include "profile.h" #include "main.h" /* mc_home */ @@ -47,6 +48,10 @@ #include "charsets.h" #endif +#ifdef UTF8 +#include +#endif + static const char app_text [] = "Midnight-Commander"; int easy_patterns = 1; @@ -73,8 +78,31 @@ is_8bit_printable (unsigned char c) } int +mbstrlen (const char *str) +{ +#ifdef UTF8 +if (SLsmg_Is_Unicode) { + static mbstate_t s; + int len; + + len = mbsrtowcs (NULL, &str, -1, &s); + if (len < 0) { + memset (&s, 0, sizeof (s)); + return -1; + } + return len; +} else +#endif + return strlen (str); +} + +int is_printable (int c) { +#ifdef UTF8 +if (SLsmg_Is_Unicode) + return iswprint (c); +#endif c &= 0xff; #ifdef HAVE_CHARSET @@ -217,25 +245,90 @@ char * name_trunc (const char *txt, int trunc_len) { static char x[MC_MAXPATHLEN + MC_MAXPATHLEN]; -int txt_len; +int txt_len, first, skip; char *p; +const char *str; if (trunc_len > sizeof (x) - 1) { trunc_len = sizeof (x) - 1; } -txt_len = strlen (txt); -if (txt_len <= trunc_len) { - strcpy (x, txt); -} else { - int y = (trunc_len / 2) + (trunc_len % 2); - strncpy (x, txt, y); - strncpy (x + y, txt + txt_len - (trunc_len / 2), trunc_len / 2); - x[y] = '~'; -} -x[trunc_len] = 0; -for (p = x; *p; p++) - if (!is_printable (*p)) - *p = '?'; +txt_len = mbstrlen (txt); +first = 0; +skip = 0; +if (txt_len > trunc_len) { + first = trunc_len / 2; + skip = txt_len - trunc_len + 1; +} + +#ifdef UTF8 +if (SLsmg_Is_Unicode) { + mbstate_t s; + int mbmax; + + str = txt; + memset (&s, 0, sizeof (s)); + mbmax = MB_CUR_MAX; + p = x; + while (p < x + sizeof (x) - 1 && trunc_len) { + wchar_t wc; + int len; + + len = mbrtowc (&wc, str, mbmax, &s); + if (!len) + break; + if (len < 0) { + memset (&s, 0, sizeof (s)); + *p = '?'; + len = 1; + str++; + } else if (!is_printable (wc)) { + *p = '?'; + str += len; + len = 1; + } else if (p >= x + sizeof (x) - len) + break; + else { + memcpy (p, str, len); + str += len; + } + if (first) { + --trunc_len; + --first; + p += len; + if (!first && p < x + sizeof (x) - 1 && trunc_len) { + *p++ = '~'; + --trunc_len; + } + } else if (skip) + --skip; + else { + --trunc_len; + p += len; + } + } +} else +#endif +{ + str = txt; + p = x; + while (p < x + sizeof (x) - 1) { + if (*str == '\0') + break; + else if (!is_printable (*str)) + *p++ = '?'; + else + *p++ = *str; + ++str; + if (first) { + --first; + if (!first) { + *p++ = '~'; + str += skip; + } + } + } +} +*p = '\0'; return x; } @@ -664,12 +757,14 @@ short-month-n
Re: [Debian BTS] ru_RU.UTF-8 locale
Jakub Jelinek wrote: On Mon, Feb 24, 2003 at 11:49:14AM +0100, Adam Byrtek / alpha wrote: Hi, I know there are several people here which use the Russian locale. Could you please try to reproduce this bug report or tell me whether I can close it? Maybe this guy just doesn't know how to configure UTF-8 terminal properly? Unfortunately I can't contact him... mc doesn't work in UTF-8 locales. A few days I hacked mc up so that at least the things I use often in mc sort-of work with UTF-8, you can find the patch in ftp://people.redhat.com/jakub/mc/ But view is not done at all and there is still a lot of places which need changing. The first thing to decide is what all locales mc wants to support. E.g. supporting just ASCII compatible charsets (like UTF-8) is easier than supporting ASCII incompatible ones. Jakub Can you upload UTF8 related patches there? -- Regards, Andrew V. Samoilov ___ Mc-devel mailing list [EMAIL PROTECTED] http://mail.gnome.org/mailman/listinfo/mc-devel
Re: [Debian BTS] ru_RU.UTF-8 locale
On Mon, Feb 24, 2003 at 11:49:14AM +0100, Adam Byrtek / alpha wrote: > Hi, I know there are several people here which use the Russian locale. > Could you please try to reproduce this bug report or tell me whether I > can close it? Maybe this guy just doesn't know how to configure > UTF-8 terminal properly? Unfortunately I can't contact him... mc doesn't work in UTF-8 locales. A few days I hacked mc up so that at least the things I use often in mc sort-of work with UTF-8, you can find the patch in ftp://people.redhat.com/jakub/mc/ But view is not done at all and there is still a lot of places which need changing. The first thing to decide is what all locales mc wants to support. E.g. supporting just ASCII compatible charsets (like UTF-8) is easier than supporting ASCII incompatible ones. Jakub ___ Mc-devel mailing list [EMAIL PROTECTED] http://mail.gnome.org/mailman/listinfo/mc-devel