I have observed that MS Core Fonts include glyphs for U+018f (Ə) LATIN
CAPITAL LETTER SCHWA and for U+0259 (ə) LATIN SMALL LETTER SCHWA.
Obviously, they were included for use in Azeri text.
Is it considered correct to use IPA characters like U+0259 in non-IPA
contexts?
On the other hand,
Julie Doll Allen wrote:
Proposed Draft Unicode Technical Report #26: Compatibility Encoding
Scheme for UTF-16: 8-Bit (CESU-8) is now available at:
http://www.unicode.org/unicode/reports/tr26/
Does renaming UTF-8S to CESU-8 fix all the issues that were discussed on
this mailing list at the
Herman Ranes wrote:
I have observed that MS Core Fonts include glyphs for
U+018f (Ə) LATIN CAPITAL LETTER SCHWA and for
U+0259 (ə) LATIN SMALL LETTER SCHWA. Obviously,
they were included for use in Azeri text.
Is it considered correct to use IPA characters like U+0259
in non-IPA
From: Marco Cimarosti [EMAIL PROTECTED]
Does renaming UTF-8S to CESU-8 fix all the issues that were
discussed on this mailing list at the beginning of last spring?
In my opinion (and the opinion of some others), no. But they do represent
the *attempt* to answer them.
Specifically:
- How
Hello Unicoders,
this should finally go to the FAQ.
Hello JG,
on Thursday, September 13, 2001 1:23 AM, James Gardner
wrote:
I have a Microsoft Active Server Page which is saved
as an ANSI file.
Note that older Microsoft documentation abused the term
ANSI for Microsoft's proprietary CP
From: [EMAIL PROTECTED]
If Michka is referring to non-compliant CESU-8 parsers, I really
wouldn't care much because CESU-8 is supposed to live in its
own little private world. But if people start compromising their
UTF-8 parsers to accommodate CESU-8 adaptively, it would
be a great blow to
MichKa,
Actually, once its in IANA then it is legal in XML and other places, and
*everyone* will have to support it, whether they want to or not. What is
supposedly private will become quite public. IANA, after all,
does not have
charsets that they register for people to not use and none of
We, in the FarsiWeb Project, have recently finished a standard draft
titled Information Technology -- Persian Information Interchange and
Display, Based on Unicode, which will be submitted to Iranian standards
body (ISIRI) for approval as a national standard.
For those interested, it is
From: Mark Davis [EMAIL PROTECTED]
- A significant reason for CESU-8 garnering enough support was that its
introduction allows the definition of UTF-8 itself to be tightened, to
formally exclude the 3-byte surrogates both in reading and writing.
I do not see this as a valid argument at all
From: Carl W. Brown [EMAIL PROTECTED]
In actuality it would be difficult for IANA to deny a character set for
any
official character set so the decision is actually up to the Unicode
committee.
I concur.
I don't believe that the idea of registering CESU-8 with IANA came from
the
Unicode
On Mon, Sep 17, 2001 at 08:45:59AM -0700, Michael (michka) Kaplan wrote:
Actually, once its in IANA then it is legal in XML and other places, and
*everyone* will have to support it, whether they want to or not. What is
I think that's a little excessive. UTF-1, NATS-DANO, GOST_19768-74 and
SCSU
Folks,
I've been following this thread for awhile and it seems that I can make a small
contribution.
Several comments have been made about why we should NOT document this and give it some
kind of official imprimatur. I agree that it will generate more confusion and may be
used in unforeseen
MichKa,
Also, Toby was not attempting to be deceitful, AFAIK. The
original proposal
he submitted (still called UTF-8S) was not in any way
contradictory but many
people objected to various issues within it and the way many things were
presented. The current proposal was a very rushed
Doug,
But if people start compromising their UTF-8 parsers to
accommodate CESU-8
adaptively, it would be a great blow to UTF-8. It would
essentially undo
all the tightening-up that was accomplished by the Corrigendum,
and it would
revive all the old Bruce Schneier-style skepticism about
From: John Cowan [EMAIL PROTECTED]
False.
IANA's registry is merely de facto: what they register is not in fact
encodings, but *names* of encodings. The charset name ISO646-DE is
legal as an XML encoding, but it would astonish me if any extant
XML parser supports it. (This is one of
From: Carl W. Brown [EMAIL PROTECTED]
It would seem to be that if you either have to change the UTF-8 code to
support CESU-8 or change the UTF-16 compare logic then changing the UTF-16
logic to do code point order compares is a much more containable change
with
a much lower processing
Mark,
- Just because it is in IANA does *not* mean that everyone will
support it.
There are many encodings in IANA supported by very few people. Nor does it
mean that it is intended for widespread public use. The IANA registry is
also used as a general purpose registry, even for encodings
One technical nit:
The document says:
2.1 c. The bit pattern 0xxx is illegal in any CESU-8 byte, ...
In fact, this should say The bit patterns are illegal in ...
The changes are subtle: one '0' replaced by 'x' - you want to forbid all bytes =0xf0
(f0..ff), not just f0..f7.
(The
Addison,
By providing a documented, standard way to refer to legacy
versions of these products and their encodings, I can more
readily rely on having a well-documented range of protocols and
procedures for converting and validating data exchanged with
these systems. The argument that
-BEGIN PGP SIGNED MESSAGE-
Mark Davis wrote:
A few notes:
- IANA is a registry. I believe the only legitimate grounds that they have
for denying a registration is that it is incompletely specified or has a
misleading name.
No, that's not correct. The relevant document defining
-BEGIN PGP SIGNED MESSAGE-
Carl W. Brown wrote:
Doug,
But if people start compromising their UTF-8 parsers to accommodate
CESU-8 adaptively, it would be a great blow to UTF-8. It would
essentially undo all the tightening-up that was accomplished by the
Corrigendum, and it
In a message dated 2001-09-17 13:06:16 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
I agree that there is a world of software out there that does not support
Unicode 3.1 yet. Toby has a legitimate problem. It is the proposed
solution
that bothers me. For now I suspect that living
On Sun, Sep 16, 2001 at 09:28:34PM +0100, David Hopwood wrote:
It doesn't reopen that specific type of security hole, because irregular
UTF-8 sequences (as defined by Unicode 3.1) can only decode to characters above
0x, and those characters are unlikely to be special for any application
23 matches
Mail list logo