Re: X11R7.5 and C.UTF-8

2009-12-04 Thread Thomas Dickey

On Thu, 3 Dec 2009, Eric Blake wrote:


Thomas Dickey dickey at his.com writes:


This means that characters 0..127 have to be treated as ASCII, but


No, it means that portable characters and control characters must be  128.
ASCII meets this characteristic, but so does EBCDIC, as well as UTF-8.  The C
locale also implies that you can manipulate bytes = 128 in the naive manner,
so long as you don't care about characters embedded in those bytes.  And what
do you know - ASCII, EBCDIC, and UTF-8 all meet this property, too.


beyond that an implementation can do what it wants. And on Cygwin 1.7,
plain C actually does imply UTF-8, which happily is
backward-compatible with ASCII.


That's an interpretation that so far hasn't been blessed by the standards
people.  Any discussion of this topic should mention that, as a caveat.


Actually, the standards people HAVE spoken - and they agreed with our
interpretation.  POSIX was INTENTIONALLY written with the intent that a UTF-8
encoding is valid for the C locale, for the same reason that it was written
that an EBCDIC encoding is valid for the C locale.  These emails from the
Austin Group (the folks that write POSIX) are telling:

https://www.opengroup.org/sophocles/show_mail.tpl?
CALLER=show_archive.tplsource=Llistname=austin-group-lid=12982


This is basically your email on the matter.


https://www.opengroup.org/sophocles/show_mail.tpl?
CALLER=show_archive.tplsource=Llistname=austin-group-lid=13012

But they also admitted that there is still more work needed in POSIX to make
this intent clearly codified (for example, that control characters must be
single bytes  128).


But they have not actually agreed with you yet.

--
Thomas E. Dickey
http://invisible-island.net
ftp://invisible-island.net

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-12-03 Thread Corinna Vinschen
On Dec  3 07:48, Andy Koppe wrote:
 2009/12/3 Linda Walsh:
  C.UTF_8 doesn't exist.
 
 Well, guess what: it does in Cygwin 1.7, and it's the default locale.

Not exactly.  The default locale is C.UTF-8.  You can also use C.UTF8
or C.utf-8 or C.utf8, but not C.UTF_8 or C.utf_8.


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-12-03 Thread Thomas Dickey

On Thu, 3 Dec 2009, Andy Koppe wrote:


2009/12/3 Linda Walsh:

C.UTF_8 doesn't exist.

...

You can't have C and UTF-8, because C means no encoding (default).
UTF-8 IS an encoding, so they are mutually exclusive.


From http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html,
§7.2:

The tables in Locale Definition describe the characteristics and
behavior of the POSIX locale for data consisting entirely of
characters from the portable character set and the control character
set. For other characters, the behavior is unspecified.

This means that characters 0..127 have to be treated as ASCII, but
beyond that an implementation can do what it wants. And on Cygwin 1.7,
plain C actually does imply UTF-8, which happily is
backward-compatible with ASCII.


That's an interpretation that so far hasn't been blessed by the standards
people.  Any discussion of this topic should mention that, as a caveat.

ymmv

--
Thomas E. Dickey
http://invisible-island.net
ftp://invisible-island.net
--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/

Re: X11R7.5 and C.UTF-8

2009-12-03 Thread Andy Koppe
2009/12/3 Thomas Dickey:
 From
 http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html,
 §7.2:

 The tables in Locale Definition describe the characteristics and
 behavior of the POSIX locale for data consisting entirely of
 characters from the portable character set and the control character
 set. For other characters, the behavior is unspecified.

 This means that characters 0..127 have to be treated as ASCII, but
 beyond that an implementation can do what it wants. And on Cygwin 1.7,
 plain C actually does imply UTF-8, which happily is
 backward-compatible with ASCII.

 That's an interpretation that so far hasn't been blessed by the standards
 people.  Any discussion of this topic should mention that, as a caveat.

Fair point. It also means that apps are entitled to assume that C
supports no more than ASCII, which is why Cygwin 1.7's default locale
is C.UTF-8. A default locale setting based on the user's language
selection would be better, but we don't have that (yet?).

Andy

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-12-03 Thread Corinna Vinschen
On Dec  3 13:16, Andy Koppe wrote:
 2009/12/3 Thomas Dickey:
  From
  http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html,
  §7.2:
 
  The tables in Locale Definition describe the characteristics and
  behavior of the POSIX locale for data consisting entirely of
  characters from the portable character set and the control character
  set. For other characters, the behavior is unspecified.
 
  This means that characters 0..127 have to be treated as ASCII, but
  beyond that an implementation can do what it wants. And on Cygwin 1.7,
  plain C actually does imply UTF-8, which happily is
  backward-compatible with ASCII.
 
  That's an interpretation that so far hasn't been blessed by the standards
  people.  Any discussion of this topic should mention that, as a caveat.
 
 Fair point. It also means that apps are entitled to assume that C
 supports no more than ASCII, which is why Cygwin 1.7's default locale
 is C.UTF-8. A default locale setting based on the user's language
 selection would be better, but we don't have that (yet?).

Try the attached.  Note:  It has a hidden --testloop option...


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat
#define WINVER 0x0600
#include stdio.h
#include windows.h
#include getopt.h

#define VERSION  1.0

extern char *__progname;

void version () __attribute__ ((noreturn));
void usage (FILE *, int) __attribute__ ((noreturn));

void
version ()
{
  printf (%s (Cygwin) %s\n, __progname, VERSION);
  exit (0);
}

void
usage (FILE * stream, int status)
{
  fprintf (stream, \n\
Usage: %s [-suU] [-l LCID]\n\
\n\
Return POSIX LANG identifier corresponding to a locale, default is the\n\
system default locale\n\
Possible options are:\n\
\n\
  -s, --system  return LANG for the system's default locale\n\
  -u, --userreturn LANG for the current user's default locale\n\
  -l, --lcid LCID   return LANG for the LCID given as argument\n\
  -U, --UTF-8   always attach .UTF-8 to LANG\n\
  -h, --helpthis text\n\
  -V, --version print the version of %s and exit\n,
	   __progname, __progname);
  exit (status);
}

struct option longopts[] = {
  {system, no_argument, NULL, 's'},
  {user, no_argument, NULL, 'u'},
  {lcid, required_argument, NULL, 'l'},
  {UTF-8, no_argument, NULL, 'U'},
  {help, no_argument, NULL, 'h'},
  {version, no_argument, NULL, 'V'},
  {testloop, no_argument, NULL, 'T'},
  {0, no_argument, NULL, 0}
};
char *opts = dsul:UhV;

int
getlocale (LCID lcid, bool utf, bool test)
{
  UINT codepage;
  char iso639[10];
  char iso3166[10];

  if (!GetLocaleInfo (lcid, LOCALE_IDEFAULTANSICODEPAGE | LOCALE_RETURN_NUMBER,
		  (char *) codepage, sizeof codepage)
  || !GetLocaleInfo (lcid, LOCALE_SISO639LANGNAME, iso639, 10)
  || !GetLocaleInfo (lcid, LOCALE_SISO3166CTRYNAME, iso3166, 10))
{
  if (!test)
fprintf (stderr, %s: Non existant locale\n, __progname);
  return 2;
}
  if (utf)
codepage = 0;
  if (test)
{
  char cty[256];
  char lang[256];
  GetLocaleInfo (lcid, LOCALE_SENGCOUNTRY, cty, 256);
  GetLocaleInfo (lcid, LOCALE_SENGLANGUAGE, lang, 256);
  printf (0x%04x=\%s_%s\, %s (%s)\n, (unsigned) lcid, iso639, iso3166,
	  lang, cty);
}
  else
printf (LANG=\%s_%s%s\\n, iso639, iso3166, codepage ?  : .UTF-8);
  return 0;
}

#define d(X)	{X, #X}
struct dl {
  LCTYPE t;
  const char *s;
} dlist[] = {
  d(LOCALE_SLONGDATE),
  d(LOCALE_SSHORTDATE),
  d(LOCALE_STIMEFORMAT),
  d(LOCALE_SYEARMONTH),
  d(LOCALE_S1159),
  d(LOCALE_S2359),
  d(LOCALE_SDAYNAME1),
  d(LOCALE_SDAYNAME2),
  d(LOCALE_SDAYNAME3),
  d(LOCALE_SDAYNAME4),
  d(LOCALE_SDAYNAME5),
  d(LOCALE_SDAYNAME6),
  d(LOCALE_SDAYNAME7),
  d(LOCALE_SABBREVDAYNAME1),
  d(LOCALE_SABBREVDAYNAME2),
  d(LOCALE_SABBREVDAYNAME3),
  d(LOCALE_SABBREVDAYNAME4),
  d(LOCALE_SABBREVDAYNAME5),
  d(LOCALE_SABBREVDAYNAME6),
  d(LOCALE_SABBREVDAYNAME7),
  d(LOCALE_SMONTHNAME1),
  d(LOCALE_SMONTHNAME2),
  d(LOCALE_SMONTHNAME3),
  d(LOCALE_SMONTHNAME4),
  d(LOCALE_SMONTHNAME5),
  d(LOCALE_SMONTHNAME6),
  d(LOCALE_SMONTHNAME7),
  d(LOCALE_SMONTHNAME8),
  d(LOCALE_SMONTHNAME9),
  d(LOCALE_SMONTHNAME10),
  d(LOCALE_SMONTHNAME11),
  d(LOCALE_SMONTHNAME12),
  d(LOCALE_SMONTHNAME13),
  d(LOCALE_SABBREVMONTHNAME1),
  d(LOCALE_SABBREVMONTHNAME2),
  d(LOCALE_SABBREVMONTHNAME3),
  d(LOCALE_SABBREVMONTHNAME4),
  d(LOCALE_SABBREVMONTHNAME5),
  d(LOCALE_SABBREVMONTHNAME6),
  d(LOCALE_SABBREVMONTHNAME7),
  d(LOCALE_SABBREVMONTHNAME8),
  d(LOCALE_SABBREVMONTHNAME9),
  d(LOCALE_SABBREVMONTHNAME10),
  d(LOCALE_SABBREVMONTHNAME11),
  d(LOCALE_SABBREVMONTHNAME12),
  d(LOCALE_SABBREVMONTHNAME13),
  { 0, NULL }
};

int main (int argc, char **argv)
{
  int opt;
  LCID lcid = LOCALE_SYSTEM_DEFAULT;
  bool utf = false;
  bool test = false;
  bool dates = false;

  while ((opt = getopt_long (argc, argv, opts, longopts, NULL)) != EOF)
switch (opt)
  {

Re: X11R7.5 and C.UTF-8

2009-12-03 Thread Eric Blake
Thomas Dickey dickey at his.com writes:

  This means that characters 0..127 have to be treated as ASCII, but

No, it means that portable characters and control characters must be  128.  
ASCII meets this characteristic, but so does EBCDIC, as well as UTF-8.  The C 
locale also implies that you can manipulate bytes = 128 in the naive manner, 
so long as you don't care about characters embedded in those bytes.  And what 
do you know - ASCII, EBCDIC, and UTF-8 all meet this property, too.

  beyond that an implementation can do what it wants. And on Cygwin 1.7,
  plain C actually does imply UTF-8, which happily is
  backward-compatible with ASCII.
 
 That's an interpretation that so far hasn't been blessed by the standards
 people.  Any discussion of this topic should mention that, as a caveat.

Actually, the standards people HAVE spoken - and they agreed with our 
interpretation.  POSIX was INTENTIONALLY written with the intent that a UTF-8 
encoding is valid for the C locale, for the same reason that it was written 
that an EBCDIC encoding is valid for the C locale.  These emails from the 
Austin Group (the folks that write POSIX) are telling:

https://www.opengroup.org/sophocles/show_mail.tpl?
CALLER=show_archive.tplsource=Llistname=austin-group-lid=12982

https://www.opengroup.org/sophocles/show_mail.tpl?
CALLER=show_archive.tplsource=Llistname=austin-group-lid=13012

But they also admitted that there is still more work needed in POSIX to make 
this intent clearly codified (for example, that control characters must be 
single bytes  128).

-- 
Eric Blake




--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-12-02 Thread Linda Walsh

Ken Brown wrote:

On 10/28/2009 6:07 PM, Andy Koppe wrote:

2009/10/28 Ken Brown:
Maybe my terminology is wrong.  But if you start mintty with no 
.minttyrc

and with LANG unset, mintty will set LANG=C.UTF-8.


Yep. That's primarily for emacs' benefit, which parses the locale env
variables itself instead of using setlocale(LC_CTYPE, ), thereby
missing out on Cygwin's default locale.


Andy,

I've sent a report about this to the emacs-devel list 
(http://lists.gnu.org/archive/html/emacs-devel/2009-11/threads.html#01216). 
 But I don't have a good understanding of locale issues.  Could you take 
a look and see if what I said is accurate or if more should be said?


C.UTF_8 doesn't exist.

mintty is broken.

Might want to try 'Console' nstead of using mintty.  Not perfect either, 
but fewer compatibility problems that I've noticed.


Examples of valid LANG values: 

  C, ca_FR, en_US, fr_FR, it_IT, nl_NL, wa...@euro 



You can't have C and UTF-8, because C means no encoding (default).
UTF-8 IS an encoding, so they are mutually exclusive.  I don't
know under what circumstances C might imply UTF-8.  If the definition
of C changes?  It might be easier than changing c (as used in physics).

My understanding of locale issues is also limited and subject to change or
re-education...

:-)

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-12-02 Thread Charles Wilson
Linda Walsh wrote:
 C.UTF_8 doesn't exist.

You're wrong. Please read the whole of this thread -- and the last two
months' worth of cygwin-developers.

 mintty is broken.

No, it isn't.  It just doesn't work the way *you* expect it to.

 Might want to try 'Console' nstead of using mintty.  Not perfect either,
 but fewer compatibility problems that I've noticed.
 
 Examples of valid LANG values:
   C, ca_FR, en_US, fr_FR, it_IT, nl_NL, wa...@euro
 
 You can't have C and UTF-8, because C means no encoding (default).

No, it doesn't.  C means POSIX and is defined here:
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html
Note how all the glyphs are defined in terms of character NAMES, not
hexadecimal values?  That's because C, all by itself, just doesn't
SPECIFY any encoding.  You're still allowed to HAVE one -- in fact, you
ALWAYS have one.

On most systems, that has historically been the plain ASCII 7-bit
encoding; many others used the EBCDIC encoding and were not considered
in violation of the POSIX C locale specification.  Now, many systems
are starting to use the UTF-8 encoding by default, even in the C locale.

C/POSIX locale (without an additional .ENCODING suffix) is
encoding-AGNOSTIC, that's all.  So, you're allowed to add an .ENCODING
suffix to force a specific encoding if you like, without violating
POSIX.  (And your system is also allowed, in that case, to IGNORE that
.ENCODING suffix, and still be Posix-compliant IIUC, so it's rather a
hole in the spec IMO).

 UTF-8 IS an encoding, so they are mutually exclusive.  I don't
 know under what circumstances C might imply UTF-8.

Whenever the platform decides to use UTF-8 as its default encoding,
which is perfectly acceptable according to Posix.  Cygwin-1.7 has
decided to do that.  So, on cygwin-1.7, C implies .UTF-8.  X11R7.5
doesn't yet know that, without outside help (e.g. explicitly setting
$LANG to C.UTF-8 by default, so that XWin knows about the new
default behavior).

  If the definition
 of C changes?  It might be easier than changing c (as used in physics).
 
 My understanding of locale issues is also limited and subject to change or
 re-education...

Uhm, yeah.

--
Chuck

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-12-02 Thread Andy Koppe
2009/12/3 Linda Walsh:
 C.UTF_8 doesn't exist.

Well, guess what: it does in Cygwin 1.7, and it's the default locale.
And it's also in the next Debian:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=522776.

Cygwin 1.7 also supports C.ISO-8859-1, C.CP1252, ...


 Might want to try 'Console' nstead of using mintty.  Not perfect either, but
 fewer compatibility problems that I've noticed.

Care to provide examples, so they can be fixed? Or are you just bitter
about having to tick a box to switch backspace to ^H?

'Console' is better for native Windows programs, because, well, it's a
console, whereas mintty is more suited for Unix programs, because it's
an xterm-compatible tty.


 You can't have C and UTF-8, because C means no encoding (default).
 UTF-8 IS an encoding, so they are mutually exclusive.

From http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html,
§7.2:

The tables in Locale Definition describe the characteristics and
behavior of the POSIX locale for data consisting entirely of
characters from the portable character set and the control character
set. For other characters, the behavior is unspecified.

This means that characters 0..127 have to be treated as ASCII, but
beyond that an implementation can do what it wants. And on Cygwin 1.7,
plain C actually does imply UTF-8, which happily is
backward-compatible with ASCII.

Not that that is much to do with C.UTF-8, which is a separate locale
in any case. The meaning of locale strings is up to the OS, e.g. with
the Windows C runtime you get stuff like English_United States.1252.
And 'C.charset' on Cygwin is intended to mean the semantics of the
C locale, but with the specified charset.

However, since the 'C.charset' format is unlikely to be recognised
by remote systems, it's recommended to set a real locale such as
'en_US.UTF-8'.


 I don't
 know under what circumstances C might imply UTF-8.  If the definition
 of C changes?  It might be easier than changing c (as used in physics).

How droll.

Andy

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-11-28 Thread Ken Brown

On 10/28/2009 6:07 PM, Andy Koppe wrote:

2009/10/28 Ken Brown:

Maybe my terminology is wrong.  But if you start mintty with no .minttyrc
and with LANG unset, mintty will set LANG=C.UTF-8.


Yep. That's primarily for emacs' benefit, which parses the locale env
variables itself instead of using setlocale(LC_CTYPE, ), thereby
missing out on Cygwin's default locale.


Andy,

I've sent a report about this to the emacs-devel list 
(http://lists.gnu.org/archive/html/emacs-devel/2009-11/threads.html#01216). 
 But I don't have a good understanding of locale issues.  Could you 
take a look and see if what I said is accurate or if more should be said?


Thanks.

Ken

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-11-28 Thread Andy Koppe
2009/11/28 Ken Brown:
 On 10/28/2009 6:07 PM, Andy Koppe wrote:

 2009/10/28 Ken Brown:

 Maybe my terminology is wrong.  But if you start mintty with no .minttyrc
 and with LANG unset, mintty will set LANG=C.UTF-8.

 Yep. That's primarily for emacs' benefit, which parses the locale env
 variables itself instead of using setlocale(LC_CTYPE, ), thereby
 missing out on Cygwin's default locale.

 Andy,

 I've sent a report about this to the emacs-devel list
 (http://lists.gnu.org/archive/html/emacs-devel/2009-11/threads.html#01216).
  But I don't have a good understanding of locale issues.  Could you take a
 look and see if what I said is accurate or if more should be said?

Thanks Ken, I think you've got that all correct, including pointing
the finger at mule-cmds.el as the suspect. I'll keep an eye on that
thread.

One more thing that might be worth mentioning is
'nl_langinfo(CODESET)' for enquiring about the character encoding.
(It's actually being used in a couple of places in the emacs sources
already, in fns.c and w32proc.c, but I don't know what significance
those files have.)

Andy

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-11-28 Thread Ken Brown

On 11/28/2009 8:34 AM, Andy Koppe wrote:

2009/11/28 Ken Brown:

On 10/28/2009 6:07 PM, Andy Koppe wrote:

2009/10/28 Ken Brown:

Maybe my terminology is wrong.  But if you start mintty with no .minttyrc
and with LANG unset, mintty will set LANG=C.UTF-8.

Yep. That's primarily for emacs' benefit, which parses the locale env
variables itself instead of using setlocale(LC_CTYPE, ), thereby
missing out on Cygwin's default locale.

Andy,

I've sent a report about this to the emacs-devel list
(http://lists.gnu.org/archive/html/emacs-devel/2009-11/threads.html#01216).
 But I don't have a good understanding of locale issues.  Could you take a
look and see if what I said is accurate or if more should be said?


Thanks Ken, I think you've got that all correct, including pointing
the finger at mule-cmds.el as the suspect. I'll keep an eye on that
thread.

One more thing that might be worth mentioning is
'nl_langinfo(CODESET)' for enquiring about the character encoding.
(It's actually being used in a couple of places in the emacs sources
already, in fns.c and w32proc.c, but I don't know what significance
those files have.)


w32proc.c doesn't get compiled in the Cygwin build, but fns.c does.  The 
call to nl_langinfo(CODESET) is in the definition of the locale-info 
function, which provides a way for emacs to determine the CODESET.  I've 
passed this on to the emacs-devel list.  Thanks for the help.


Ken

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-11-03 Thread Jon TURNEY

On 29/10/2009 20:20, Andy Koppe wrote:

2009/10/29 Jon TURNEY:

I've put a patch in bugzilla [1] which can be applied to
/usr/share/X11/locale to temporarily repair this problem.

This needs to be looked at more deeply, though, as I'm not sure I've fully
understood what that locale data is being used for, or specified C.UTF-8
correctly.

[1] http://sourceware.org/bugzilla/show_bug.cgi?id=10870


I think the patch makes plenty of sense in mapping C.UTF-8 to
en_US.UTF-8, because most other UTF-8 locales are also mapped to
en_US.UTF-8, i.e. from X's perspective they're not actually
language-specific.


On second look, this patch doesn't seem to be quite right, as it makes the 
en_US.UTF-8 compose sequences available in C.UTF-8 (which is not the case in 
the C locale).



More generally, there's the issue that Cygwin allows any combination
of language and charset, whereas X has a fixed list of permitted
combinations. Cygwin also supports many charsets that aren't supported
by X (and vice versa). In particular, X only supports a few of the
Windows/DOS codepages. But I guess unsupported locales will just have
to be a case of don't do that?


Yes.

Treating XSupportsLocale() returning false as a fatal error as the Xserver 
currently does is wrong, I would say, unless the application has very specific 
requirement, though.



--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-11-03 Thread Andy Koppe
2009/11/3 Jon TURNEY:
 On second look, this patch doesn't seem to be quite right, as it makes the
 en_US.UTF-8 compose sequences available in C.UTF-8 (which is not the case in
 the C locale).

I think that's ok. The compose sequences don't make sense in an ASCII
locale, since ASCII doesn't contain composed characters. Yet they can
be very useful in a UTF-8 locale, so it would be a shame to remove
them. Also, the en_US.UTF-8 compose sequences aren't actually
English-specific, since the vast majority of non-English UTF-8 locales
use the same sequences.

Andy

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-10-29 Thread Jon TURNEY

On 29/10/2009 00:07, Andy Koppe wrote:

2009/10/28 Jon TURNEY:

On 28/10/2009 14:22, Ken Brown wrote:


X11R7.5 doesn't like the (default) locale C.UTF-8.  If I start the
server with 'LANG=C.UTF-8 /usr/bin/startxwin.bat', the server exits
immediately, and the log has complaints about the locale. If I instead
use 'LANG=en_US.UTF-8', there's no problem. I've attached both logs and
cygcheck output.


Thanks for the bug report.

I'm afraid I'm not immediately able to reproduce this, though, using the
command you give.


You might have LC_ALL or LC_CTYPE set, which would override LANG. Or
perhaps startxwin.bat overrides things somewhere along the way?

To avoid all that, you could try invoking Xwin directly with LC_ALL
set, which is top dog among locale variables.

   LC_ALL=C.UTF-8 xwin -multiwindow

It fails with en.UTF-8 too (which also is a legal Cygwin locale), but
it works with en_US.UTF-8.


Nope, I don't have LC_ALL or LC_CTYPE set

This is pretty curious, since all XSupportsLocale() should be doing 
effectively is checking if setlocale (LC_ALL, NULL) returns a name it understands.


Perhaps you can try the attached small test program.

I haven't been following the discussion about C.UTF-8 closely, but curiously, 
for me at least, this test program shows that setlocale(LC_ALL, ) fails with 
LANG=C.UTF-8 (so that doesn't actually seem to be a valid locale, although if 
it's the default it probably doesn't make much difference), but this means 
that a subsequent setlocale(LC_ALL, NULL) just returns C


Possibly C.UTF-8 needs adding to /usr/share/X11/locale/locale.alias and 
locale.dir.


in any case, it's probably also a bug that the Xserver considers 
XSupportsLocale() failure a critical error, rather than continuing with a 
warning, but I'd like to get to the bottom of this first...



The significant change is probably that libX11 is no longer built with
X_LOCALE (so that libX11 uses the native locale support rather than it's
own).
Exactly why this would cause a problem, I don't know.


Hmm, that sounds like it should have improved matters if anything.


Indeed :-)



Xlocale.c
Description: application/itunes-itlp
--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/

Re: X11R7.5 and C.UTF-8

2009-10-29 Thread Corinna Vinschen
On Oct 29 13:42, Jon TURNEY wrote:
 I haven't been following the discussion about C.UTF-8 closely, but
 curiously, for me at least, this test program shows that
 setlocale(LC_ALL, ) fails with LANG=C.UTF-8 (so that doesn't
 actually seem to be a valid locale, although if it's the default it
 probably doesn't make much difference), but this means that a
 subsequent setlocale(LC_ALL, NULL) just returns C

What version of Cygwin 1.7 are you using?  The change to newlib, which
allows to specify C.UTF-8 as locale is from 2009-09-29, so Cygwin
1.7.0-62 from 2009-10-03 allows to specify this locale.

The change which makes C.UTF-8 Cygwin's default locale is from
2009-10-09, so this change is only in Cygwin from CVS, or in developer
snapshots from past that date.


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-10-29 Thread Ken Brown

On 10/29/2009 9:42 AM, Jon TURNEY wrote:

On 29/10/2009 00:07, Andy Koppe wrote:

2009/10/28 Jon TURNEY:

On 28/10/2009 14:22, Ken Brown wrote:


X11R7.5 doesn't like the (default) locale C.UTF-8.  If I start the
server with 'LANG=C.UTF-8 /usr/bin/startxwin.bat', the server exits
immediately, and the log has complaints about the locale. If I instead
use 'LANG=en_US.UTF-8', there's no problem. I've attached both logs and
cygcheck output.


Thanks for the bug report.

I'm afraid I'm not immediately able to reproduce this, though, using the
command you give.


You might have LC_ALL or LC_CTYPE set, which would override LANG. Or
perhaps startxwin.bat overrides things somewhere along the way?

To avoid all that, you could try invoking Xwin directly with LC_ALL
set, which is top dog among locale variables.

   LC_ALL=C.UTF-8 xwin -multiwindow

It fails with en.UTF-8 too (which also is a legal Cygwin locale), but
it works with en_US.UTF-8.


Nope, I don't have LC_ALL or LC_CTYPE set

This is pretty curious, since all XSupportsLocale() should be doing 
effectively is checking if setlocale (LC_ALL, NULL) returns a name it 
understands.


Perhaps you can try the attached small test program.


$ LANG=C.UTF-8 ./Xlocale.exe
Setting locale from LANG succeeded
Locale is C.UTF-8
XSupportsLocale returned false

$ LANG=en_US.UTF-8 ./Xlocale.exe
Setting locale from LANG succeeded
Locale is en_US.UTF-8
XSupportsLocale returned true

$ unset LANG

$ ./Xlocale.exe
Setting locale from LANG succeeded
Locale is C
XSupportsLocale returned true

$ uname -a
CYGWIN_NT-5.1 markov 1.7.0(0.214/5/3) 2009-10-03 14:33 i686 Cygwin

Ken


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-10-29 Thread Jon TURNEY

On 29/10/2009 13:56, Corinna Vinschen wrote:

On Oct 29 13:42, Jon TURNEY wrote:

I haven't been following the discussion about C.UTF-8 closely, but
curiously, for me at least, this test program shows that
setlocale(LC_ALL, ) fails with LANG=C.UTF-8 (so that doesn't
actually seem to be a valid locale, although if it's the default it
probably doesn't make much difference), but this means that a
subsequent setlocale(LC_ALL, NULL) just returns C


What version of Cygwin 1.7 are you using?  The change to newlib, which
allows to specify C.UTF-8 as locale is from 2009-09-29, so Cygwin
1.7.0-62 from 2009-10-03 allows to specify this locale.

The change which makes C.UTF-8 Cygwin's default locale is from
2009-10-09, so this change is only in Cygwin from CVS, or in developer
snapshots from past that date.


Thanks for the clarification.

j...@byron ~
$ cygcheck -c cygwin
Cygwin Package Information
Package  VersionStatus
cygwin   1.7.0-62   OK

j...@byron ~
$ uname -a
CYGWIN_NT-5.1 byron 1.7.0(0.212/5/3) 2009-09-11 01:25 i686 Cygwin

Oops!


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-10-29 Thread Jon TURNEY

On 29/10/2009 14:37, Ken Brown wrote:

On 10/29/2009 9:42 AM, Jon TURNEY wrote:

On 29/10/2009 00:07, Andy Koppe wrote:

2009/10/28 Jon TURNEY:

On 28/10/2009 14:22, Ken Brown wrote:


X11R7.5 doesn't like the (default) locale C.UTF-8. If I start the
server with 'LANG=C.UTF-8 /usr/bin/startxwin.bat', the server exits
immediately, and the log has complaints about the locale. If I instead
use 'LANG=en_US.UTF-8', there's no problem. I've attached both logs
and
cygcheck output.


Thanks for the bug report.

I'm afraid I'm not immediately able to reproduce this, though, using
the
command you give.


You might have LC_ALL or LC_CTYPE set, which would override LANG. Or
perhaps startxwin.bat overrides things somewhere along the way?

To avoid all that, you could try invoking Xwin directly with LC_ALL
set, which is top dog among locale variables.

LC_ALL=C.UTF-8 xwin -multiwindow

It fails with en.UTF-8 too (which also is a legal Cygwin locale), but
it works with en_US.UTF-8.


Nope, I don't have LC_ALL or LC_CTYPE set

This is pretty curious, since all XSupportsLocale() should be doing
effectively is checking if setlocale (LC_ALL, NULL) returns a name it
understands.

Perhaps you can try the attached small test program.


$ LANG=C.UTF-8 ./Xlocale.exe
Setting locale from LANG succeeded
Locale is C.UTF-8
XSupportsLocale returned false

$ LANG=en_US.UTF-8 ./Xlocale.exe
Setting locale from LANG succeeded
Locale is en_US.UTF-8
XSupportsLocale returned true

$ unset LANG

$ ./Xlocale.exe
Setting locale from LANG succeeded
Locale is C
XSupportsLocale returned true

$ uname -a
CYGWIN_NT-5.1 markov 1.7.0(0.214/5/3) 2009-10-03 14:33 i686 Cygwin


I suppose I should show you mine, then

$ LANG=C.UTF-8 ./Xlocale
Setting locale from LANG failed
Locale is C
XSupportsLocale returned true

$ LANG=en_US.UTF-8 ./Xlocale
Setting locale from LANG succeeded
Locale is en_US.UTF-8
XSupportsLocale returned true

$ unset LANG

$  ./Xlocale
Setting locale from LANG succeeded
Locale is C
XSupportsLocale returned true

$ uname -a
CYGWIN_NT-5.1 byron 1.7.0(0.212/5/3) 2009-09-11 01:25 i686 Cygwin

Okay, well this makes sense now :-(

Appropriate data needs to exist in /usr/share/X11/locale for the C.UTF-8 
locale, but it doesn't at the moment. Let me see if I can find it :-)


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-10-29 Thread Jon TURNEY

On 29/10/2009 15:01, Jon TURNEY wrote:

On 29/10/2009 14:37, Ken Brown wrote:

$ LANG=C.UTF-8 ./Xlocale.exe
Setting locale from LANG succeeded
Locale is C.UTF-8
XSupportsLocale returned false


Okay, well this makes sense now :-(

Appropriate data needs to exist in /usr/share/X11/locale for the C.UTF-8
locale, but it doesn't at the moment. Let me see if I can find it :-)


I've put a patch in bugzilla [1] which can be applied to /usr/share/X11/locale 
to temporarily repair this problem.


This needs to be looked at more deeply, though, as I'm not sure I've fully 
understood what that locale data is being used for, or specified C.UTF-8 
correctly.


[1] http://sourceware.org/bugzilla/show_bug.cgi?id=10870

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-10-29 Thread Andy Koppe
2009/10/29 Jon TURNEY:
 I've put a patch in bugzilla [1] which can be applied to
 /usr/share/X11/locale to temporarily repair this problem.

 This needs to be looked at more deeply, though, as I'm not sure I've fully
 understood what that locale data is being used for, or specified C.UTF-8
 correctly.

 [1] http://sourceware.org/bugzilla/show_bug.cgi?id=10870

I think the patch makes plenty of sense in mapping C.UTF-8 to
en_US.UTF-8, because most other UTF-8 locales are also mapped to
en_US.UTF-8, i.e. from X's perspective they're not actually
language-specific.

More generally, there's the issue that Cygwin allows any combination
of language and charset, whereas X has a fixed list of permitted
combinations. Cygwin also supports many charsets that aren't supported
by X (and vice versa). In particular, X only supports a few of the
Windows/DOS codepages. But I guess unsupported locales will just have
to be a case of don't do that?

Andy

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-10-28 Thread Thomas Dickey

On Wed, 28 Oct 2009, Ken Brown wrote:


X11R7.5 doesn't like the (default) locale C.UTF-8.  If I start the server


technically speaking, there's no such locale as C.UTF-8,
so I'd not expect portable code to accept it (C and UTF-8 are
mutually exclusive).

with 'LANG=C.UTF-8 /usr/bin/startxwin.bat', the server exits immediately, and 
the log has complaints about the locale.  If I instead use 
'LANG=en_US.UTF-8', there's no problem.  I've attached both logs and cygcheck 
output.


--
Thomas E. Dickey
http://invisible-island.net
ftp://invisible-island.net

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-10-28 Thread Ken Brown



On 10/28/2009 5:23 PM, Thomas Dickey wrote:

On Wed, 28 Oct 2009, Ken Brown wrote:


X11R7.5 doesn't like the (default) locale C.UTF-8.  If I start the server


technically speaking, there's no such locale as C.UTF-8,
so I'd not expect portable code to accept it (C and UTF-8 are
mutually exclusive).


Maybe my terminology is wrong.  But if you start mintty with no 
.minttyrc and with LANG unset, mintty will set LANG=C.UTF-8.  Trying to 
then start the X server via startxwin.bat or startxwin.sh leads to the 
error I reported.  The error did not occur in X11R7.4.


There's been a lot of discussion in the various cygwin lists leading to 
the decision that C.UTF-8 should be the default.


Ken

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-10-28 Thread Andy Koppe
2009/10/28 Thomas Dickey:
 X11R7.5 doesn't like the (default) locale C.UTF-8.  If I start the server

 technically speaking, there's no such locale as C.UTF-8,
 so I'd not expect portable code to accept it (C and UTF-8 are
 mutually exclusive).

Technically speaking, portable code should make no assumption
whatsoever about the locale string. The meaning of that string is up
to the OS, and portable code should be using POSIX interfaces such as
the multibyte conversion functions or nl_langinfo to get at its
meaning.

C.UTF-8 is a language-neutral locale with a UTF-8 charset. It is
also being introduced by Debain:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=522776.

Xwin 1.6.x had no problem with C.UTF-8.

Andy

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-10-28 Thread Andy Koppe
2009/10/28 Ken Brown:
 Maybe my terminology is wrong.  But if you start mintty with no .minttyrc
 and with LANG unset, mintty will set LANG=C.UTF-8.

Yep. That's primarily for emacs' benefit, which parses the locale env
variables itself instead of using setlocale(LC_CTYPE, ), thereby
missing out on Cygwin's default locale.

(http://www.opengroup.org/onlinepubs/007908799/xbd/envvar.html says
that if LC_ALL, LC_CTYPE, and LANG are all either unset or empty, the
implementation-dependent default locale shall be used. For Cygwin 1.7,
the default locale uses UTF-8 and not ASCII as assumed by emacs. It
works correctly in vim.)

Andy

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-10-28 Thread Charles Wilson
Thomas Dickey wrote:
 On Wed, 28 Oct 2009, Ken Brown wrote:
 
 X11R7.5 doesn't like the (default) locale C.UTF-8.  If I start the server
 
 technically speaking, there's no such locale as C.UTF-8,
 so I'd not expect portable code to accept it (C and UTF-8 are
 mutually exclusive).

No, actually they are not.  The C or POSIX locale is defined
entirely in terms of character values -- not hexidecimal equivalents.
That is, the set alpha shall contain 'a', 'b'... etc.

The standard actually doesn't require that an implementation specify the
encoding in which those character values are represented at all. You
can, if you want, use 'HEX_CHAR', 'OCTAL_CHAR', and 'DECIMAL_CHAR'
representations -- which implicitly require a specific encoding -- but
the standard defines the 'C' locale entirely in terms of CHAR and
CHARSYMBOL, which are encoding-agnostic.

http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_03

Personally, I think it's a hole in the standard that it doesn't actually
talk about the POSIX locale with encoding Y -- but then, they don't
want to show preference between ASCII and EBCDIC, so UTF-8 sneaks in there.

--
Chuck

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-10-28 Thread Andy Koppe
 Xwin 1.6.x had no problem with C.UTF-8.

Actually it's libX11 that makes the difference: Xwin 1.7.1 is fine
after downgrading libX11 from 1.3.2-1 to 1.2.2-2.

Andy

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-10-28 Thread Jon TURNEY

On 28/10/2009 14:22, Ken Brown wrote:

X11R7.5 doesn't like the (default) locale C.UTF-8.  If I start the
server with 'LANG=C.UTF-8 /usr/bin/startxwin.bat', the server exits
immediately, and the log has complaints about the locale. If I instead
use 'LANG=en_US.UTF-8', there's no problem. I've attached both logs and
cygcheck output.


Thanks for the bug report.

I'm afraid I'm not immediately able to reproduce this, though, using the 
command you give.


On 28/10/2009 21:49, Andy Koppe wrote:
 Xwin 1.6.x had no problem with C.UTF-8.

The significant change is probably that libX11 is no longer built with 
X_LOCALE (so that libX11 uses the native locale support rather than it's own).

Exactly why this would cause a problem, I don't know.

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/



Re: X11R7.5 and C.UTF-8

2009-10-28 Thread Andy Koppe
2009/10/28 Jon TURNEY:
 On 28/10/2009 14:22, Ken Brown wrote:

 X11R7.5 doesn't like the (default) locale C.UTF-8.  If I start the
 server with 'LANG=C.UTF-8 /usr/bin/startxwin.bat', the server exits
 immediately, and the log has complaints about the locale. If I instead
 use 'LANG=en_US.UTF-8', there's no problem. I've attached both logs and
 cygcheck output.

 Thanks for the bug report.

 I'm afraid I'm not immediately able to reproduce this, though, using the
 command you give.

You might have LC_ALL or LC_CTYPE set, which would override LANG. Or
perhaps startxwin.bat overrides things somewhere along the way?

To avoid all that, you could try invoking Xwin directly with LC_ALL
set, which is top dog among locale variables.

  LC_ALL=C.UTF-8 xwin -multiwindow

It fails with en.UTF-8 too (which also is a legal Cygwin locale), but
it works with en_US.UTF-8.


 The significant change is probably that libX11 is no longer built with
 X_LOCALE (so that libX11 uses the native locale support rather than it's
 own).
 Exactly why this would cause a problem, I don't know.

Hmm, that sounds like it should have improved matters if anything.

Andy

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://x.cygwin.com/docs/
FAQ:   http://x.cygwin.com/docs/faq/