Re: Unicode: endpoint of evolution of encodings? (was Re: gcc and utf-8 source)

2004-11-17 Thread Antoine Leca
srintuar wrote: > FWIW, I'd assert that "j" in Spanish is not the same thing as > "j" in English (and that one is easily proved), apart from them being > represented with the same *glyph*. You picked (certainly involuntarily) a very instructive example. I am living in Spain, so I feel qualified to

Re: Unicode: endpoint of evolution of encodings? (was Re: gcc and utf-8 source)

2004-11-16 Thread Christopher Fynn
srintuar wrote: This may be more of a practical issue: for some scripts such as Korean, representing every possible character and partial character could require a very large amount of codespace. We only have the precomposed characters now for compatibility with platforms that simply dont support c

Re: Unicode: endpoint of evolution of encodings? (was Re: gcc and utf-8 source)

2004-11-16 Thread srintuar
Danilo Segan wrote: New policies such as "no more precomposed glyphs" also indicate that we're talking about glyph repository, not about character repository (i.e. "no more precomposed glyphs, since you can get those glyphs by combining existing glyphs", even though they may have entirely d

Unicode: endpoint of evolution of encodings? (was Re: gcc and utf-8 source)

2004-11-16 Thread Danilo Segan
Hi, Today at 13:44, srintuar wrote: > As for serbian, I dont think that really has much to do with unicode > itself. You could apply a special folding algorithm when doing > searches in a serbian context, but I dont think you would want to make > the script ambiguous. I'd rather make script amb

Re: gcc and utf-8 source

2004-11-16 Thread Antoine Leca
On Tuesday, November 16th, 2004 12:44Z srintuar va escriure: > > Unlike the famous gates quote, it is reasonable to state that certain > things represent ending points. For example, a 64 bit time counter for > seconds will probably be enough. Perhaps what is not reasonable here is to stop a 1-seco

Re: gcc and utf-8 source

2004-11-16 Thread srintuar
Danilo Segan wrote: srintuar wrote: 1) For printf("%s\n", "SchÃne GrÃÃe"); ... Being that UTF-8 is sortof an an endpoint in the evolution of encodings, I also consider option 1 to be perfectly valid. I would be careful with such statements. We don't know what the successor of

Re: gcc and utf-8 source

2004-11-15 Thread Damjan
> I have to use LANG=hu_HU instead... I can't understand why the hu_HU.UTF-8 > locale consumes much more disk space than the hu_HU locale, That's easy to see why damjan:/usr/lib/locale$ d -lh mk_MK.utf8/LC_COLLATE mk_MK/LC_COLLATE -rw-r--r-- 80 root root 862K 2004-05-25 06:34 mk_MK.utf8/LC_COLLA

Re: gcc and utf-8 source

2004-11-15 Thread Egmont Koblinger
On Mon, Nov 15, 2004 at 04:58:11PM +0100, Danilo Segan wrote: > > > I agree with you, and though I haven't thoroughly read the manpage, I'm > > pretty sure that gcc does this. gcc is, as far as I see, the one and only > > gnu project that is maintained correctly and the developers know where > > t

Re: gcc and utf-8 source

2004-11-15 Thread Markus Kuhn
Egmont Koblinger wrote on 2004-11-12 17:45 UTC: > I was reading Markus's page and found the example: > printf("%ls\n", L"SchÃne GrÃÃe"); > and noticed that gcc always interprets the source code according to Latin-1. > > Then I googled a bit and found this reported to the gcc folks by Markus: > h

Re: gcc and utf-8 source

2004-11-15 Thread Danilo Segan
Today at 13:03, Bruno Haible wrote: > srintuar wrote: >> > 1) For printf("%s\n", "SchÃne GrÃÃe"); >> ... >> Being that UTF-8 is sortof an an endpoint in the evolution of encodings, >> I also consider option 1 to be perfectly valid. > > I would be careful with such statements. We don't know what

Re: gcc and utf-8 source

2004-11-15 Thread Danilo Segan
Today at 13:09, Egmont Koblinger wrote: > I agree with you, and though I haven't thoroughly read the manpage, I'm > pretty sure that gcc does this. gcc is, as far as I see, the one and only > gnu project that is maintained correctly and the developers know where > they're going, they have systemat

Re: gcc and utf-8 source

2004-11-15 Thread Egmont Koblinger
On Mon, Nov 15, 2004 at 11:22:56AM +0100, [EMAIL PROTECTED] wrote: > I think the descriptions of the options you mention are a little > obscure about their scope. > Together with your following comments (which aren't completely clear > either, if I may say that), I seriously hope that any such

Re: gcc and utf-8 source

2004-11-15 Thread Bruno Haible
srintuar wrote: > > 1) For printf("%s\n", "SchÃne GrÃÃe"); > ... > Being that UTF-8 is sortof an an endpoint in the evolution of encodings, > I also consider option 1 to be perfectly valid. I would be careful with such statements. We don't know what the successor of UTF-8 might look like, nor wh

Re: gcc and utf-8 source

2004-11-12 Thread srintuar
Bruno Haible wrote: ... 1) For printf("%s\n", "SchÃne GrÃÃe"); ... 2) For printf("%ls\n", L"SchÃne GrÃÃe"); ... OTOH, if you limit yourself to Linux systems and don't want your programs to be portable or internationalized, you can now use option 2. Being that UTF-8 is sortof an an endpoint in

Re: gcc and utf-8 source

2004-11-12 Thread Egmont Koblinger
On Fri, Nov 12, 2004 at 03:00:43PM -0500, Edward H. Trager wrote: Hi, > void écrire(const char *myCString); // Function name has Latin-1 chars *in > UTF-8 encoding* No... I only want to use non-ascii (perhaps latin-2 but preferably utf-8) characters inside string constants, or maybe for comme

Re: gcc and utf-8 source

2004-11-12 Thread Edward H. Trager
On Friday 2004.11.12 20:09:49 +0100, Egmont Koblinger wrote: > On Fri, Nov 12, 2004 at 07:43:16PM +0100, Bruno Haible wrote: > > Hi, > > > gcc-3.4's documentation contains the following: > > > > `-fexec-charset=CHARSET' > > Gee, these are really there in gcc 3.4, but not yet in 3.3. Seems it'

Re: gcc and utf-8 source

2004-11-12 Thread Edward H. Trager
Hi, Egmont, The example from Markus' page that you show actually shows "source code" written using ASCII but with a C-style static string in UTF-8. There is no problem with this code! However, if you try to write some code like this: void Ãcrire(const char *myCString); // Function name has La

Re: gcc and utf-8 source

2004-11-12 Thread Egmont Koblinger
On Fri, Nov 12, 2004 at 07:43:16PM +0100, Bruno Haible wrote: Hi, > gcc-3.4's documentation contains the following: > > `-fexec-charset=CHARSET' Gee, these are really there in gcc 3.4, but not yet in 3.3. Seems it's time for an upgrade :-) > The portable solution is to use gettext: > > p

Re: gcc and utf-8 source

2004-11-12 Thread Bruno Haible
Egmont Koblinger wrote: > I was reading Markus's page and found the example: > printf("%ls\n", L"Schöne Grüße"); > and noticed that gcc always interprets the source code according to > Latin-1. gcc-3.4's documentation contains the following: `-fexec-charset=CHARSET' Set the execution chara

gcc and utf-8 source

2004-11-12 Thread Egmont Koblinger
Hi, I was reading Markus's page and found the example: printf("%ls\n", L"Schöne Grüße"); and noticed that gcc always interprets the source code according to Latin-1. Then I googled a bit and found this reported to the gcc folks by Markus: http://sources.redhat.com/ml/libc-alpha/2000-09/msg00337