MS Core Fonts and Azeri

2001-09-17 Thread Herman Ranes
I have observed that MS Core Fonts include glyphs for U+018f (Ə) LATIN CAPITAL LETTER SCHWA and for U+0259 (ə) LATIN SMALL LETTER SCHWA. Obviously, they were included for use in Azeri text. Is it considered correct to use IPA characters like U+0259 in non-IPA contexts? On the other hand,

RE: PDUTR #26 posted

2001-09-17 Thread Marco Cimarosti
Julie Doll Allen wrote: Proposed Draft Unicode Technical Report #26: Compatibility Encoding Scheme for UTF-16: 8-Bit (CESU-8) is now available at: http://www.unicode.org/unicode/reports/tr26/ Does renaming UTF-8S to CESU-8 fix all the issues that were discussed on this mailing list at the

Re: MS Core Fonts and Azeri

2001-09-17 Thread James Kass
Herman Ranes wrote: I have observed that MS Core Fonts include glyphs for U+018f (Ə) LATIN CAPITAL LETTER SCHWA and for U+0259 (ə) LATIN SMALL LETTER SCHWA. Obviously, they were included for use in Azeri text. Is it considered correct to use IPA characters like U+0259 in non-IPA

Re: PDUTR #26 posted

2001-09-17 Thread Michael \(michka\) Kaplan
From: Marco Cimarosti [EMAIL PROTECTED] Does renaming UTF-8S to CESU-8 fix all the issues that were discussed on this mailing list at the beginning of last spring? In my opinion (and the opinion of some others), no. But they do represent the *attempt* to answer them. Specifically: - How

Re: FW: Unicode and the UTF8 encoding in HTML

2001-09-17 Thread Otto Stolz
Hello Unicoders, this should finally go to the FAQ. Hello JG, on Thursday, September 13, 2001 1:23 AM, James Gardner wrote: I have a Microsoft Active Server Page which is saved as an ANSI file. Note that older Microsoft documentation abused the term ANSI for Microsoft's proprietary CP

Re: PDUTR #26 posted

2001-09-17 Thread Michael \(michka\) Kaplan
From: [EMAIL PROTECTED] If Michka is referring to non-compliant CESU-8 parsers, I really wouldn't care much because CESU-8 is supposed to live in its own little private world. But if people start compromising their UTF-8 parsers to accommodate CESU-8 adaptively, it would be a great blow to

RE: PDUTR #26 posted

2001-09-17 Thread Carl W. Brown
MichKa, Actually, once its in IANA then it is legal in XML and other places, and *everyone* will have to support it, whether they want to or not. What is supposedly private will become quite public. IANA, after all, does not have charsets that they register for people to not use and none of

Iranian standard draft for information interchange

2001-09-17 Thread Roozbeh Pournader
We, in the FarsiWeb Project, have recently finished a standard draft titled Information Technology -- Persian Information Interchange and Display, Based on Unicode, which will be submitted to Iranian standards body (ISIRI) for approval as a national standard. For those interested, it is

Re: PDUTR #26 posted

2001-09-17 Thread Michael \(michka\) Kaplan
From: Mark Davis [EMAIL PROTECTED] - A significant reason for CESU-8 garnering enough support was that its introduction allows the definition of UTF-8 itself to be tightened, to formally exclude the 3-byte surrogates both in reading and writing. I do not see this as a valid argument at all

Re: PDUTR #26 posted

2001-09-17 Thread Michael \(michka\) Kaplan
From: Carl W. Brown [EMAIL PROTECTED] In actuality it would be difficult for IANA to deny a character set for any official character set so the decision is actually up to the Unicode committee. I concur. I don't believe that the idea of registering CESU-8 with IANA came from the Unicode

Re: PDUTR #26 posted

2001-09-17 Thread David Starner
On Mon, Sep 17, 2001 at 08:45:59AM -0700, Michael (michka) Kaplan wrote: Actually, once its in IANA then it is legal in XML and other places, and *everyone* will have to support it, whether they want to or not. What is I think that's a little excessive. UTF-1, NATS-DANO, GOST_19768-74 and SCSU

CESU-8: to document or not

2001-09-17 Thread Addison Phillips [wM]
Folks, I've been following this thread for awhile and it seems that I can make a small contribution. Several comments have been made about why we should NOT document this and give it some kind of official imprimatur. I agree that it will generate more confusion and may be used in unforeseen

RE: PDUTR #26 posted

2001-09-17 Thread Carl W. Brown
MichKa, Also, Toby was not attempting to be deceitful, AFAIK. The original proposal he submitted (still called UTF-8S) was not in any way contradictory but many people objected to various issues within it and the way many things were presented. The current proposal was a very rushed

RE: PDUTR #26 posted

2001-09-17 Thread Carl W. Brown
Doug, But if people start compromising their UTF-8 parsers to accommodate CESU-8 adaptively, it would be a great blow to UTF-8. It would essentially undo all the tightening-up that was accomplished by the Corrigendum, and it would revive all the old Bruce Schneier-style skepticism about

Re: PDUTR #26 posted

2001-09-17 Thread Michael \(michka\) Kaplan
From: John Cowan [EMAIL PROTECTED] False. IANA's registry is merely de facto: what they register is not in fact encodings, but *names* of encodings. The charset name ISO646-DE is legal as an XML encoding, but it would astonish me if any extant XML parser supports it. (This is one of

Re: PDUTR #26 posted

2001-09-17 Thread Michael \(michka\) Kaplan
From: Carl W. Brown [EMAIL PROTECTED] It would seem to be that if you either have to change the UTF-8 code to support CESU-8 or change the UTF-16 compare logic then changing the UTF-16 logic to do code point order compares is a much more containable change with a much lower processing

RE: PDUTR #26 posted

2001-09-17 Thread Carl W. Brown
Mark, - Just because it is in IANA does *not* mean that everyone will support it. There are many encodings in IANA supported by very few people. Nor does it mean that it is intended for widespread public use. The IANA registry is also used as a general purpose registry, even for encodings

Re: PDUTR #26 posted

2001-09-17 Thread Markus Scherer
One technical nit: The document says: 2.1 c. The bit pattern 0xxx is illegal in any CESU-8 byte, ... In fact, this should say The bit patterns are illegal in ... The changes are subtle: one '0' replaced by 'x' - you want to forbid all bytes =0xf0 (f0..ff), not just f0..f7. (The

RE: CESU-8: to document or not

2001-09-17 Thread Carl W. Brown
Addison, By providing a documented, standard way to refer to legacy versions of these products and their encodings, I can more readily rely on having a well-documented range of protocols and procedures for converting and validating data exchanged with these systems. The argument that

Re: PDUTR #26 posted

2001-09-17 Thread David Hopwood
-BEGIN PGP SIGNED MESSAGE- Mark Davis wrote: A few notes: - IANA is a registry. I believe the only legitimate grounds that they have for denying a registration is that it is incompletely specified or has a misleading name. No, that's not correct. The relevant document defining

Re: PDUTR #26 posted

2001-09-17 Thread David Hopwood
-BEGIN PGP SIGNED MESSAGE- Carl W. Brown wrote: Doug, But if people start compromising their UTF-8 parsers to accommodate CESU-8 adaptively, it would be a great blow to UTF-8. It would essentially undo all the tightening-up that was accomplished by the Corrigendum, and it

Re: CESU-8: to document or not

2001-09-17 Thread DougEwell2
In a message dated 2001-09-17 13:06:16 Pacific Daylight Time, [EMAIL PROTECTED] writes: I agree that there is a world of software out there that does not support Unicode 3.1 yet. Toby has a legitimate problem. It is the proposed solution that bothers me. For now I suspect that living

Re: PDUTR #26 posted

2001-09-17 Thread David Starner
On Sun, Sep 16, 2001 at 09:28:34PM +0100, David Hopwood wrote: It doesn't reopen that specific type of security hole, because irregular UTF-8 sequences (as defined by Unicode 3.1) can only decode to characters above 0x, and those characters are unlikely to be special for any application