RE: UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030

2002-11-15 Thread Carl W. Brown
Doug, > > However, 16 bit characters were a hard enough sell in the good old > > days. If we had started out withug 2bit characters we would still be > > dreaming about Unicode. > > I think Carl meant "with 32-bit characters." I don't know what kind of > word "withug" is (Old English?), but I li

Every character code in the world

2002-11-15 Thread John Cowan
This is not a proposal to change standards in any respect. It's just a thought-out (well, somewhat) approach for people who have to represent character codes as opposed to characters, and have 32 bits to play with. The intent is to represent all the codes of all the registered character sets, pre

Re: IBM AIX 5 and GB18030

2002-11-15 Thread Markus Scherer
Michael Yau wrote: Markus, >The standard does _not_ require to _process_ internally in GB18030. It is sufficient to have a converter and to process in Unicode, which does contain all of >the characters. Just curious, do you have this in writing from the China standards body? I don't personal

Re: IBM AIX 5 and GB18030

2002-11-15 Thread Markus Scherer
Jane, you are right, I over-simplified. I tried to make the point that you need not _process_ text in GB18030 but that Unicode processing and conversion to/from GB18030 fulfills the requirement to be able to read and write GB18030 text. Yes, you need to have font support for all the characters t

RE: UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030

2002-11-15 Thread John McConnell
My experience is that the UCS-2 to UTF-16 conversion can be much easier than the SBCS to DBCS conversion, depending on how your original code is organized. In the case of Windows, much of the text processing was already done by modules (e.g. Uniscribe, NLS) that processed text elements rather th

mixed-script writing systems

2002-11-15 Thread Peter_Constable
One of the Unicode design principles is unification: "unify across languages, but not across scripts". As a result, the "A" used in all Latin-based writing systems is the same character, but that character is different from the "A" used in Cyrillic- or Greek-based writing systems. There are a very

Re: Every character code in the world

2002-11-15 Thread David Starner
On Fri, Nov 15, 2002 at 11:38:48AM -0500, John Cowan wrote: > This is not a proposal to change standards in any respect. It's just a > thought-out (well, somewhat) approach for people who have to represent > character codes as opposed to characters, and have 32 bits to play with. Have you looked

Re: Every character code in the world

2002-11-15 Thread John Cowan
David Starner scripsit: > Have you looked at the way Emacs 21 handles this? It's got something > similar going on. I confess I remain in blissful ignorance of Emacs and all its works. Do you have a pointer to this particular part of it? -- Her he asked if O'Hare Doctor tidings sent from far

RE: Entering Plane 1 characters in XP

2002-11-15 Thread John McConnell
There are multiple registry keys that can cause usp10.dll to load. So usp10.dll may be loading even though you've deleted the LanguagePack key (not recommended, btw). Also, an application can load usp10.dll, independently of what the OS does. I suspect that's what you are seeing on Win98. There

Re: mixed-script writing systems

2002-11-15 Thread John Cowan
[EMAIL PROTECTED] scripsit: > So, the question is this: Should we say that this writing system is > completely Latin (keeping the norm that orthographic writing systems use a > single script) and apply the principle of unification -- across languages > but not across scripts -- to imply that we ne

Re: Every character code in the world

2002-11-15 Thread David Starner
On Fri, Nov 15, 2002 at 01:11:39PM -0500, John Cowan wrote: > David Starner scripsit: > > > Have you looked at the way Emacs 21 handles this? It's got something > > similar going on. > > I confess I remain in blissful ignorance of Emacs and all its works. Do > you have a pointer to this particul

Re: mixed-script writing systems

2002-11-15 Thread Dean Snyder
[EMAIL PROTECTED] wrote at 11:17 AM on Friday, November 15, 2002: >So, the question is this: Should we say that this writing system is >completely Latin (keeping the norm that orthographic writing systems use a >single script) and apply the principle of unification -- across languages >but not acr

Re: mixed-script writing systems

2002-11-15 Thread Jim Allan
Peter Constable posted on Wakhi: So, the question is this: Should we say that this writing system is completely Latin (keeping the norm that orthographic writing systems use a single script) and apply the principle of unification -- across languages but not across scripts -- to imply that we need

Re: mixed-script writing systems

2002-11-15 Thread Kenneth Whistler
> So, the question is this: Should we say that this writing system is > completely Latin (keeping the norm that orthographic writing systems use a > single script) and apply the principle of unification -- across languages > but not across scripts -- to imply that we need to encode new characters,

Re: mixed-script writing systems

2002-11-15 Thread John Cowan
Dean Snyder scripsit: > Group A writes the logically ordered graphemic sequence *acme* as "acme"; > group B as "emca". This fact requires separate encoding, because bidi-ness is a noncontextual property of a Unicode character. > Group A pronounces the graphemic sequence "acme" as /acme/; group B

Re: mixed-script writing systems

2002-11-15 Thread Peter_Constable
On 11/15/2002 02:59:11 PM Jim Allan wrote: >Yet I note the schwa used in the sample does not match the other vowel >letters in style or width, apparently here borrowed from a different font. Definitely an ecclectic font (and, unfortunately, illegal -- I won't mention the face name or the owners,

Re: mixed-script writing systems

2002-11-15 Thread Peter_Constable
On 11/15/2002 12:22:15 PM John Cowan wrote: >> So, the question is this: Should we say that this writing system is >> completely Latin (keeping the norm that orthographic writing systems use a >> single script) and apply the principle of unification -- across languages >> but not across scripts -