Re: Sanskrit Transliteration Characters

2001-02-20 Thread Otto Stolz
Am 2001-02-20 um 03:47 h UCT hat Krishna Desikachary geschrieben: There is an internationally accepted set of extra chars that are included in Roman (Latin) script to transacribe Sanskrit texts in Roman script. Is there a list of these characters available, online? If so, where (URL)? If not

Re: Sanskrit Transliteration Characters

2001-02-20 Thread Otto Stolz
Am 2001-02-20 um 9:18 UCT hat Valeriy E. Ushakov geschrieben: That's why I made and posted CSX mapping. There are a LOT of old CSX-encoded material. With this mapping I can use existing software (like the mentioned perl module) to convert it to Unicode and use emacs to view/edit it. This

Re: Sanskrit Transliteration Characters

2001-02-20 Thread Valeriy E. Ushakov
On Tue, Feb 20, 2001 at 12:32:01 +, Otto Stolz wrote: That's why I made and posted CSX mapping. There are a LOT of old CSX-encoded material. With this mapping I can use existing software (like the mentioned perl module) to convert it to Unicode and use emacs to view/edit it.

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in

2001-02-20 Thread J%ORG KNAPPEN
Doug Ewell wrote: A few days ago I said there was a "widespread belief" that Unicode is a 16-bit-only character set that ends at U+. A corollary is that the supplementary characters ranging from U+1 to U+10 are either little-known or perceived to belong to ISO/IEC 10646 only,

Re: Sanskrit Transliteration Characters

2001-02-20 Thread Antoine Leca
Otto Stolz wrote: Am 2001-02-20 um 03:47 h UCT hat Krishna Desikachary geschrieben: There is an internationally accepted set of extra chars that are included in Roman (Latin) script to transacribe Sanskrit texts in Roman script. Is there a list of these characters available, online?

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-20 Thread Antoine Leca
Marco Cimarosti wrote: Doug Ewell wrote: "A 16-bit character encoding standard [...] By contrast, 8-bit ASCII [...] These two statements are regularly found together, but it is the second one that makes me despair. If nearly half a century was not enough time for people to learn

RE: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-20 Thread Peter_Constable
On 02/20/2001 03:34:28 AM Marco Cimarosti wrote: How about considering UTF-32 as the default Unicode form, in order to be able to provide a short answer of this kind: "Unicode is now a 32-bit character encoding standard, although only about one million of codes actually exist, and there

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)

2001-02-20 Thread P. T. Rourke
The error may arise from a misunderstanding of the reference on the first page of chapter 1 of the book to a 16-bit form and an 8-bit form and to "using a 16-bit encoding." It's also hard to get one's head wrapped around the idea that Unicode isn't just an encoding until one does extensive

Re: Implementing Complex Unicode Scripts

2001-02-20 Thread Brendan Murray/DUB/Lotus
"Charlie Jolly" [EMAIL PROTECTED] wrote: Does anybody know if there is a chart or table showing what OS's, Applications, Programming Languages support Unicode and in particular what scripts? You'll find some of this on http://www.unicode.org/unicode/onlinedat/products.html. Should an open

Re: Implementing Complex Unicode Scripts

2001-02-20 Thread Peter_Constable
On 02/20/2001 06:21:09 AM "Charlie Jolly" wrote: Should an open source script processing engine be part of the standard? As I understand it if you want to develop Unicode solutions for complex scripts then you either have to do it yourself or rely upon Uniscribe or ATSUI. Whether or not the

Re: Implementing Complex Unicode Scripts

2001-02-20 Thread John H. Jenkins
At 4:21 AM -0800 2/20/01, Charlie Jolly wrote: Do fonts have to tie themselves to a script engine. Will an Opentype font for lets say Hindi such as MS Mangal work on an Apple OS or Linux? Or is this font tied to Uniscribe? If this is correct then shouldn't there be a better solution? Both OT and

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)

2001-02-20 Thread DougEwell2
In a message dated 2001-02-20 06:18:34 Pacific Standard Time, [EMAIL PROTECTED] writes: With the Unicode-related functions in Prague growing out of size, I moved them into a new library called 'Babylon'. It will provide all the functionality defined in the Unicode standard (it is not

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in

2001-02-20 Thread DougEwell2
In a message dated 2001-02-20 04:21:49 Pacific Standard Time, [EMAIL PROTECTED] writes: A little out of date, but describing correctly the state of art in 1991 before the merger. Agreed, but the example was from Windows 2000. It should at least be current through Unicode 2.1. Even

Re: Implementing Complex Unicode Scripts

2001-02-20 Thread John Hudson
At 04:21 AM 2/20/2001 -0800, Charlie Jolly wrote: Do fonts have to tie themselves to a script engine. Will an Opentype font for lets say Hindi such as MS Mangal work on an Apple OS or Linux? Or is this font tied to Uniscribe? If this is correct then shouldn't there be a better solution? Mangal

[OT] terminology: BMP, UCS 3 questions

2001-02-20 Thread Elaine Keown
Ok, I was wrong: On pp 969-71 of the 3.0 book, it *does* mention the BMP and UCS. They're not in the index, that's what fooled me. 1. I take it that these 2 terms are more popular with the 10646 folks? 2. P. 971, what "additional semantics" are being alluded to here? 3. 971, what

RE: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-20 Thread Cathy Wissink
The people who are responsible for this text have been made aware of the problem. This will be updated for WindowsXP. Cathy -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Tuesday, February 20, 2001 8:04 AM To: Unicode List Subject: Re: Perception that

8-bit ASCII

2001-02-20 Thread Hart, Edwin F.
I am unsure if "8-bit ASCII" is a well-defined term. "ASCII" implies X3.4-1986 and the 7-bit ASCII code. It was my intention for ISO/IEC 8859-1 to be the 8-bit ASCII standard. When the US adopted ISO 8859-1 as a US standard (ANSI/ISO 8859-1), as editor I asked ANSI to add "(8-bit ASCII)" to

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in

2001-02-20 Thread Antoine Leca
Doug Ewell wrote: In a message dated 2001-02-20 04:21:49 Pacific Standard Time, Funilly, the message I got is stamped 03:36:27 PST... [EMAIL PROTECTED] writes: A nit to pick: It's the latin alphabet, not roman. Roman is a kind of typeface, contrasting to sans serif aka grotesque.

Re: Implementing Complex Unicode Scripts

2001-02-20 Thread Charlie Jolly
Thanks for the comments thus far. They have helped clarify alot of ambiguities. As for AAT, could Apple not supply template fonts so that font designers can concentrate on the glyphs. I.e. replace master glyphs with their own. Charlie Jolly

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in

2001-02-20 Thread Antoine Leca
[EMAIL PROTECTED] wrote: On 02/19/2001 08:05:49 PM David Starner wrote: It will provide all the functionality defined in the Unicode standard (it is not Unicode but ISO 10646 compliant as it uses 32bit wide characters internally) and is written in C++. Eh? Unicode has no aversion to

Re: Implementing Complex Unicode Scripts

2001-02-20 Thread Antoine Leca
Charlie Jolly wrote: Fonts: Do fonts have to tie themselves to a script engine. Yes. Font technologies does not allow things like Nagari or Sinhala rendering to operate by themselves, they need some assistance from the underlying platform. This is the current state of art, one may hope it

Re: Implementing Complex Unicode Scripts

2001-02-20 Thread Peter_Constable
On 02/20/2001 10:19:37 AM John Hudson wrote: The Apple AAT and SIL Graphite approach work a little differently. I'm not familiar enough with Graphite to know how they handle stuff like character reordering, or how difficult it is to achieve such things in their Graphite Description Language,

Re: Perception that Unicode is 16-bit (was: Re: Surrogate

2001-02-20 Thread John Hudson
At 09:05 AM 2/20/2001 -0800, Antoine Leca wrote: (In French, sans serif is normally named "antique" Which must be very confusing to Germans and others who use 'antiqua' to distinguish seriffed humanists types from blackletter. John Hudson Tiro Typeworks | Vancouver, BC | All

Re: Implementing Complex Unicode Scripts

2001-02-20 Thread John Hudson
At 09:17 AM 2/20/2001 -0800, Charlie Jolly wrote: As for AAT, could Apple not supply template fonts so that font designers can concentrate on the glyphs. I.e. replace master glyphs with their own. The folk at Apple are certainly aware that they have a problem with AAT's current level of

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)

2001-02-20 Thread Tobias Hunger
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tuesday 20 February 2001 17:03, you wrote: In a message dated 2001-02-20 06:18:34 Pacific Standard Time, into a new library called 'Babylon'. It will provide all the functionality defined in the Unicode standard (it is not Unicode but

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in

2001-02-20 Thread Peter_Constable
On 02/20/2001 10:03:35 AM DougEwell2 wrote: A nit to pick: It's the latin alphabet, not roman. Roman is a kind of typeface, contrasting to sans serif aka grotesque. True. I have also heard "roman" used to mean the opposite of italic. An alphabet is a type of writing system, something

Re: Implementing Complex Unicode Scripts

2001-02-20 Thread Peter_Constable
On 02/20/2001 11:19:48 AM Antoine Leca wrote: Yes. Font technologies does not allow things like Nagari or Sinhala rendering to operate by themselves, they need some assistance from the underlying platform. This is the current state of art, one may hope it will change in the future. I don't

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)

2001-02-20 Thread William Overington
The following statements have been made by participants in this thread. 1. A few days ago I said there was a "widespread belief" that Unicode is a 16-bit-only character set that ends at U+. A corollary is that the supplementary characters ranging from U+1 to U+10 are either

Re: Implementing Complex Unicode Scripts

2001-02-20 Thread John H. Jenkins
At 9:17 AM -0800 2/20/01, Charlie Jolly wrote: Thanks for the comments thus far. They have helped clarify alot of ambiguities. As for AAT, could Apple not supply template fonts so that font designers can concentrate on the glyphs. I.e. replace master glyphs with their own. It's on our to-do

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in

2001-02-20 Thread Peter_Constable
On 02/20/2001 12:33:04 PM John Hudson wrote: The only thing that I insist on is that we maintain the distinction between Roman and roman. Which is? I wonder though, Peter, about your suggestion that '"Latin script" is less acceptable since "Latin" suggests something constrained to the

Re: Perception that Unicode is 16-bit (was: Re: Surrogate

2001-02-20 Thread Michael \(michka\) Kaplan
From: "John Hudson" [EMAIL PROTECTED] I wonder though, Peter, about your suggestion that '"Latin script" is less acceptable since "Latin" suggests something constrained to the language Latin'. Couldn't the same thing be said about 'Arabic script'? I think everyone here can likely agree that

RE: Implementing Complex Unicode Scripts

2001-02-20 Thread Apurva Joshi
Re: "Uniscribe is just an implementation of these specifications, and I hope sincerely Microsoft will not hide some "features" into USP10.DLL in order to kill any concurrence." The process of adding new feature support to Uniscribe is not unlike adding newer "features / capabilities" to other

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in

2001-02-20 Thread John Cowan
[EMAIL PROTECTED] wrote: Even 8-bit ASCII is a correct term meaning ISO-8859-1. I would question that. Understandable, yes, but not really correct. No, it *is* correct. ANSI X.3 (which has a new name these days) in fact did define an 8-bit American Standard Code for Information

Re: Perception that Unicode is 16-bit (was: Re: Surrogate

2001-02-20 Thread John Hudson
At 09:09 AM 2/20/2001 -0800, [EMAIL PROTECTED] wrote: If we are talking about the full collection of characters that are historically related to the Latin alphabet, however, i.e. the entire script, then I would need to see better argumentation and references than this to convince me that it's

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)

2001-02-20 Thread Peter_Constable
On 02/20/2001 11:18:40 AM Tobias Hunger wrote: Looks like David was quoting me. I am working on Babylon and wanted to make clear that it is not unicode conformant as its API uses 32bit wide characters which violates clause 1 of Section 3.1. This is something that UTC should clean up because C1

Re: New BMP characters (was Re: [very OT] Documentation: beyond 65

2001-02-20 Thread Kenneth Whistler
Marco asked: Kenneth Whistler wrote: No. Unicode 3.1 has already been approved, and is in the last stages of publication. After that, Unicode 3.2 will appear, adding over 1000 more characters to the BMP. Can you anticipate what these new BMP characters will be? Entire scripts or just

Re: 8-bit ASCII

2001-02-20 Thread David Gallardo
No, the 8-bit ANSI standard (ANSI/ISO 8859-1-1987) does not include "ASCII" as part of its title. It is listed by ANSI as "8-Bit Single Byte Coded Graphic Character Sets - Part 1: Latin Alphabet No. 1" So, no, there is no such thing as 8-bit ASCII, though Latin 1 is frequently referred to as

RE: collations: Czech vs. Croat vs. Slovak

2001-02-20 Thread Cathy Wissink
From what I remember about these collations, Czech and Slovak are very similar, if not identical; Croat(ian) is very different than the other two (it also has compressions/ligatures that sort as unique letters). Cathy -Original Message- From: Tex Texin [mailto:[EMAIL PROTECTED]] Sent:

Re: collations: Czech vs. Croat vs. Slovak

2001-02-20 Thread Tex Texin
Cathy thanks. Yes, I remember now I was down this path before. thanks tex Cathy Wissink wrote: From what I remember about these collations, Czech and Slovak are very similar, if not identical; Croat(ian) is very different than the other two (it also has compressions/ligatures that sort as

Re: collations: Czech vs. Croat vs. Slovak

2001-02-20 Thread G. Adam Stanislav
At 12:07 20-02-2001 -0800, Tex Texin wrote: Hi, I am updating my information on Slovak collation. See http://www.whizkidtech.net/ISO-8859-2/sk.html . Then email me with any questions you might have. Adam --- Whiz Kid Technomagic - brand name computers for less. See

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)

2001-02-20 Thread Kenneth Whistler
Paul Keinänen said: [86-M8] Motion: Amend Unicode 3.1 to change the Chapter 3, C1 conformance clause to read "A process shall interpret Unicode code units (values) in accordance with the Unicode transformation format used." (passed) While this wording makes it possible to handle any 32 bit

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in

2001-02-20 Thread Roozbeh Pournader
On Tue, 20 Feb 2001 [EMAIL PROTECTED] wrote: Even 8-bit ASCII is a correct term meaning ISO-8859-1. I would question that. Understandable, yes, but not really correct. In the computer culture I grew up, 8-bit ASCII meant CP437. Every author called the CP437 table that was available at

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in

2001-02-20 Thread DougEwell2
In a message dated 2001-02-20 09:53:50 Pacific Standard Time, [EMAIL PROTECTED] writes: An alphabet is a type of writing system, something that is implemented for a particular language. Certainly Latin is the name of a language while Roman is not, and so "Latin alphabet" is correct while

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in

2001-02-20 Thread DougEwell2
I wrote: Even 8-bit ASCII is a correct term meaning ISO-8859-1. I would question that. Understandable, yes, but not really correct. [EMAIL PROTECTED] wrote: No, it *is* correct. ANSI X.3 (which has a new name these days) in fact did define an 8-bit American Standard Code for

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in

2001-02-20 Thread Tex Texin
128 wrongs don't make a right... ;-) I see books and documents all the time that refer to writing out ASCII files when they really mean plaintext. Usually they don't know which code page they are generating. ASCII is a very ambiguous term these days... tex Roozbeh Pournader wrote: On Tue, 20