Re: [Debian BTS] ru_RU.UTF-8 locale

2003-02-25 Thread Jakub Jelinek
On Tue, Feb 25, 2003 at 11:13:54AM -0500, Pavel Roskin wrote:
> > On Mon, Feb 24, 2003 at 11:49:14AM +0100, Adam Byrtek / alpha wrote:
> > > Hi, I know there are several people here which use the Russian locale.
> > > Could you please try to reproduce this bug report or tell me whether I
> > > can close it? Maybe this guy just doesn't know how to configure
> > > UTF-8 terminal properly? Unfortunately I can't contact him...
> 
> The problem is that gettext returns strings in UTF-8, and they are passed
> to the screen library (S-Lang or ncurses), that is supposed to show those
> strings correctly.  Part of the problem is the need to measure the actual

It is not just about gettext strings, but about filenames too.

> I also don't feel it's such a good idea to use locale to figure out the
> properties of the terminal.  The locale is meant to define locale-specific
> preferences of the user, not the properties of any software.

Well, locale tells you what charset gettext strings are in, what characters
are printable, etc. Running on UTF-8 terminal with non-UTF-8 locale is a bad
idea, similarly running non-UTF-8 terminal with UTF-8 locale.

The mc changes which are needed to support UTF-8 are at least:
a) stop assuming strlen () is usable for both the strings and their length
   on the screen
b) when truncating/etc. strings intended for display the visible length
   has to be taken into account and also it must
   ensure there are never just parts of MB chars
c) view/edit should be able to iconv from selected data charset to the
   display charset

When dealing with gettext returned strings, mbstrlen could be made way
faster by assuming all strings are valid UTF-8 in UTF-8 locale -
basically in a loop only count (char & 0xc0) != 0x80 in the string.
Unfortunately, this is not necessarily true with filenames and file content.

> Maybe it's better to use ncurses instead of S-Lang for the build with
> UTF-8 support?  ncurses has a longer history of supporting Unicode.
> Also, it is developed by Thomas Dickey, who also maintains xterm and
> terminfo.  If there are any issues with the standards (like the one I just
> mentioned), he is the person who can do something.

I don't know, haven't ever looked at ncurses UTF-8 support.
Ideally we should support both S-Lang and ncurses with
both non-UTF-8 and UTF-8.

> > mc doesn't work in UTF-8 locales. A few days I hacked mc up so that at
> > least the things I use often in mc sort-of work with UTF-8, you can find
> > the patch in ftp://people.redhat.com/jakub/mc/
> 
> First of all, to override the check for UTF-8 S-Lang, use
> --with-screen=slang instead of hacking configure.

If I remember well, that did not work, even if I specified this it was
overridden by configure.

> It would be nice if you commented your patches.  I may consider applying
> some of them.  I really don't understand why /bin/rm is better than rm.

Most of them aren't mine, I just forward ported them from older mc rpm.

> > But view is not done at all and there is still a lot of places which
> > need changing. The first thing to decide is what all locales mc wants to
> > support. E.g. supporting just ASCII compatible charsets (like UTF-8) is
> > easier than supporting ASCII incompatible ones.
> 
> I think we can limit ourselves to the ASCII compatible charsets for now.

This simplifies things.

Jakub
___
Mc-devel mailing list
[EMAIL PROTECTED]
http://mail.gnome.org/mailman/listinfo/mc-devel


Re: [Debian BTS] ru_RU.UTF-8 locale

2003-02-25 Thread Pavel Roskin
Hello, Jakub!

> On Mon, Feb 24, 2003 at 11:49:14AM +0100, Adam Byrtek / alpha wrote:
> > Hi, I know there are several people here which use the Russian locale.
> > Could you please try to reproduce this bug report or tell me whether I
> > can close it? Maybe this guy just doesn't know how to configure
> > UTF-8 terminal properly? Unfortunately I can't contact him...

The problem is that gettext returns strings in UTF-8, and they are passed
to the screen library (S-Lang or ncurses), that is supposed to show those
strings correctly.  Part of the problem is the need to measure the actual
length of the string as displayed on the screen.  It's not trivial for
UTF-8 because we may have one or two bytes representing one symbol.  Not
to mention that some symbols (e.g. Chinese) can be twice as wide as Latin
characters.

I also don't feel it's such a good idea to use locale to figure out the
properties of the terminal.  The locale is meant to define locale-specific
preferences of the user, not the properties of any software.

Maybe it's better to use ncurses instead of S-Lang for the build with
UTF-8 support?  ncurses has a longer history of supporting Unicode.
Also, it is developed by Thomas Dickey, who also maintains xterm and
terminfo.  If there are any issues with the standards (like the one I just
mentioned), he is the person who can do something.

S-Lang is basically a clone of ncurses, when in comes to the library part
(the S-Lang language is not a clone).  It's easier to change S-Lang to
match ncurses than to change S-Lang and try to change ncurses to match.

> mc doesn't work in UTF-8 locales. A few days I hacked mc up so that at
> least the things I use often in mc sort-of work with UTF-8, you can find
> the patch in ftp://people.redhat.com/jakub/mc/

First of all, to override the check for UTF-8 S-Lang, use
--with-screen=slang instead of hacking configure.

It would be nice if you commented your patches.  I may consider applying
some of them.  I really don't understand why /bin/rm is better than rm.

> But view is not done at all and there is still a lot of places which
> need changing. The first thing to decide is what all locales mc wants to
> support. E.g. supporting just ASCII compatible charsets (like UTF-8) is
> easier than supporting ASCII incompatible ones.

I think we can limit ourselves to the ASCII compatible charsets for now.

-- 
Regards,
Pavel Roskin
___
Mc-devel mailing list
[EMAIL PROTECTED]
http://mail.gnome.org/mailman/listinfo/mc-devel


Re: [Debian BTS] ru_RU.UTF-8 locale

2003-02-24 Thread Jakub Jelinek
On Mon, Feb 24, 2003 at 04:07:57PM +0200, Andrew V. Samoilov wrote:
> Jakub Jelinek wrote:
> > On Mon, Feb 24, 2003 at 11:49:14AM +0100, Adam Byrtek / alpha wrote:
> > 
> >>Hi, I know there are several people here which use the Russian locale.
> >>Could you please try to reproduce this bug report or tell me whether I
> >>can close it? Maybe this guy just doesn't know how to configure
> >>UTF-8 terminal properly? Unfortunately I can't contact him...
> > 
> > 
> > mc doesn't work in UTF-8 locales.
> > A few days I hacked mc up so that at least the things I use often in mc
> > sort-of work with UTF-8, you can find the patch in
> > ftp://people.redhat.com/jakub/mc/
> > But view is not done at all and there is still a lot of places which need
> > changing.
> > The first thing to decide is what all locales mc wants to support.
> > E.g. supporting just ASCII compatible charsets (like UTF-8) is easier
> > than supporting ASCII incompatible ones.
> > 
> > Jakub
> 
> Can you upload UTF8 related patches there?

This is the UTF-8 patch which assumes UTF-8ized slang (AFAIK the original
UTF-8 patch we use in slang is from Debian, then we have a linedrawing patch
and I had to fix two places in slang so that linedrawing worked even
in say cs_CZ locale or some other non-latin1 non-UTF-8 locale).
MB_CUR_MAX == 1 assumptions are in about every file in mc/src :(.

--- mc-4.6.0/src/util.c.jj  2003-01-28 17:58:23.0 -0500
+++ mc-4.6.0/src/util.c 2003-02-21 08:36:36.0 -0500
@@ -35,6 +35,7 @@
 #include 
 #include 
 
+#include "tty.h"
 #include "global.h"
 #include "profile.h"
 #include "main.h"  /* mc_home */
@@ -47,6 +48,10 @@
 #include "charsets.h"
 #endif
 
+#ifdef UTF8
+#include 
+#endif
+
 static const char app_text [] = "Midnight-Commander";
 int easy_patterns = 1;
 
@@ -73,8 +78,31 @@ is_8bit_printable (unsigned char c)
 }
 
 int
+mbstrlen (const char *str)
+{
+#ifdef UTF8
+if (SLsmg_Is_Unicode) {
+   static mbstate_t s;
+   int len;
+
+   len = mbsrtowcs (NULL, &str, -1, &s);
+   if (len < 0) {
+   memset (&s, 0, sizeof (s));
+   return -1;
+   }
+   return len;
+} else
+#endif
+   return strlen (str);
+}
+
+int
 is_printable (int c)
 {
+#ifdef UTF8
+if (SLsmg_Is_Unicode)
+   return iswprint (c);
+#endif
 c &= 0xff;
 
 #ifdef HAVE_CHARSET
@@ -217,25 +245,90 @@ char *
 name_trunc (const char *txt, int trunc_len)
 {
 static char x[MC_MAXPATHLEN + MC_MAXPATHLEN];
-int txt_len;
+int txt_len, first, skip;
 char *p;
+const char *str;
 
 if (trunc_len > sizeof (x) - 1) {
trunc_len = sizeof (x) - 1;
 }
-txt_len = strlen (txt);
-if (txt_len <= trunc_len) {
-   strcpy (x, txt);
-} else {
-   int y = (trunc_len / 2) + (trunc_len % 2);
-   strncpy (x, txt, y);
-   strncpy (x + y, txt + txt_len - (trunc_len / 2), trunc_len / 2);
-   x[y] = '~';
-}
-x[trunc_len] = 0;
-for (p = x; *p; p++)
-   if (!is_printable (*p))
-   *p = '?';
+txt_len = mbstrlen (txt);
+first = 0;
+skip = 0;
+if (txt_len > trunc_len) {
+   first = trunc_len / 2;
+   skip = txt_len - trunc_len + 1;
+}
+
+#ifdef UTF8
+if (SLsmg_Is_Unicode) {
+   mbstate_t s;
+   int mbmax;
+
+   str = txt;
+   memset (&s, 0, sizeof (s));
+   mbmax = MB_CUR_MAX;
+   p = x;
+   while (p < x + sizeof (x) - 1 && trunc_len) {
+   wchar_t wc;
+   int len;
+
+   len = mbrtowc (&wc, str, mbmax, &s);
+   if (!len)
+   break;
+   if (len < 0) {
+   memset (&s, 0, sizeof (s));
+   *p = '?';
+   len = 1;
+   str++;
+   } else if (!is_printable (wc)) {
+   *p = '?';
+   str += len;
+   len = 1;
+   } else if (p >= x + sizeof (x) - len)
+   break;
+   else {
+   memcpy (p, str, len);
+   str += len;
+   }
+   if (first) {
+   --trunc_len;
+   --first;
+   p += len;
+   if (!first && p < x + sizeof (x) - 1 && trunc_len) {
+   *p++ = '~';
+   --trunc_len;
+   }
+   } else if (skip)
+   --skip;
+   else {
+   --trunc_len;
+   p += len;
+   }
+   }
+} else
+#endif
+{
+   str = txt;
+   p = x;
+   while (p < x + sizeof (x) - 1) {
+   if (*str == '\0')
+   break;
+   else if (!is_printable (*str))
+   *p++ = '?';
+   else
+   *p++ = *str;
+   ++str;
+   if (first) {
+   --first;
+   if (!first) {
+   *p++ = '~';
+   str += skip;
+   }
+   }
+   }
+}
+*p = '\0';
 return x;
 }
 
@@ -664,12 +757,14 @@ short-month-n

Re: [Debian BTS] ru_RU.UTF-8 locale

2003-02-24 Thread Andrew V. Samoilov
Jakub Jelinek wrote:
On Mon, Feb 24, 2003 at 11:49:14AM +0100, Adam Byrtek / alpha wrote:

Hi, I know there are several people here which use the Russian locale.
Could you please try to reproduce this bug report or tell me whether I
can close it? Maybe this guy just doesn't know how to configure
UTF-8 terminal properly? Unfortunately I can't contact him...


mc doesn't work in UTF-8 locales.
A few days I hacked mc up so that at least the things I use often in mc
sort-of work with UTF-8, you can find the patch in
ftp://people.redhat.com/jakub/mc/
But view is not done at all and there is still a lot of places which need
changing.
The first thing to decide is what all locales mc wants to support.
E.g. supporting just ASCII compatible charsets (like UTF-8) is easier
than supporting ASCII incompatible ones.
	Jakub
Can you upload UTF8 related patches there?

--
Regards,
Andrew V. Samoilov


___
Mc-devel mailing list
[EMAIL PROTECTED]
http://mail.gnome.org/mailman/listinfo/mc-devel


Re: [Debian BTS] ru_RU.UTF-8 locale

2003-02-24 Thread Jakub Jelinek
On Mon, Feb 24, 2003 at 11:49:14AM +0100, Adam Byrtek / alpha wrote:
> Hi, I know there are several people here which use the Russian locale.
> Could you please try to reproduce this bug report or tell me whether I
> can close it? Maybe this guy just doesn't know how to configure
> UTF-8 terminal properly? Unfortunately I can't contact him...

mc doesn't work in UTF-8 locales.
A few days I hacked mc up so that at least the things I use often in mc
sort-of work with UTF-8, you can find the patch in
ftp://people.redhat.com/jakub/mc/
But view is not done at all and there is still a lot of places which need
changing.
The first thing to decide is what all locales mc wants to support.
E.g. supporting just ASCII compatible charsets (like UTF-8) is easier
than supporting ASCII incompatible ones.

Jakub
___
Mc-devel mailing list
[EMAIL PROTECTED]
http://mail.gnome.org/mailman/listinfo/mc-devel