Mark Davis [EMAIL PROTECTED] wrote:
- when one of the BOM-allowing UTFs starts with a BOM, you know the
encoding*, and you strip off the BOM when you get the content.
*assuming that no UTF-16 file has U+ as the first character.
In the real world, this is a pretty good assumption --
Hello, Doug!
I)
AT http://www.unicode.org/unicode/uni2book/ch03.pdf
AT
1.
AT - A single abstract character may correspond to more then one code
AT value -
for example, U+00C5 ... LATIN CAPITAL LETTER A WITH RING and
U+212B ... ANGSTROM SIGN
2.
AT - Multiple code values may be
Doug Ewell wrote:
As Shlomi points out, Microsoft products do not treat UTF-7
specially, except that IE recognizes the UTF-7 BOM and sets its encoding
accordingly (but this is true for any UTF-7 sequence, not just the BOM;
try loading a text file containing only the 11 ASCII characters
It is a pretty good assumption; but if BOMs are used on smaller fields
the probability goes up. And to be perfectly reliable, you can't
assume it.
That is one reason that the WORD JOINER was encoded, so that
eventually we can use FEFF solely as a BOM.
Mark
—
Γνῶθι σαυτόν — Θαλῆς
[For
I thought some of the choices in the following were amusing:
http://m-w.com/cgi-bin/dictionary/?va=Unicode
Mark
—
Γνῶθι σαυτόν — Θαλῆς
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]
http://www.macchiato.com
Mark Davis [EMAIL PROTECTED] wrote:
- when one of the BOM-allowing UTFs starts with a BOM, you know the
encoding*, and you strip off the BOM when you get the content.
*assuming that no UTF-16 file has U+ as the first character.
In the real world, this is a pretty good assumption
On 2002-apr-09, Shlomi Tal and Doug Ewell discussed on this list a UTF-7 signature
byte sequence of +/v8- (which was news to me).
(Subject MS/Unix BOM FAQ again (small fix))
I meditated some over this -
+/v8 is the encoding of U+FEFF as the first code point in a text. So far, so good.
The '-'
Shlomi Tal wrote:
UTF-7, it shocked me how Greek Sokrates and S o k r a t e s (with
spaces between each Greek letter in the latter) would have different
encodings for the same Unicode characters.
That is not unusual for stateful encodings.
It's the same with BOCU-1 (not in this particular
Given a Unicode encoding value U+ (or whatever for non-BMP), how can
I find out the version of the Unicode standard in which this character
first appeared?
- Frank
Frank asked:
From [EMAIL PROTECTED] Thu Apr 11 12:12:33 2002
Date: Thu, 11 Apr 2002 14:58:48 EDT
Given a Unicode encoding value U+ (or whatever for non-BMP), how can
I find out the version of the Unicode standard in which this character
first appeared?
At last, a question for which we
Dear Doug Ewell, William Overington, James E. Agenbroad, and Maurice
Bauhahn,
Thank you all for the reply.
May I assume u+0b85 as official?
Some explanations for the need for a visible a.
In Tamil,
a/
dependent ai, and au has ligatures. infact au and ou at present
utilise the same ligature.
Avarangal wrote:
Dear Doug Ewell, William Overington, James E. Agenbroad, and Maurice
Bauhahn,
Thank you all for the reply.
May I assume u+0b85 as official?
Whoa, hang on here! Official WHAT? u+0b85 is definitely in Unicode:
U+0B85 TAMIL LETTER A
It is _NOT_ an inherent a vowel.
see:
http://www.columbia.edu/kermit/utf8.html
which has an interesting new entry: Vietnamese N¥m, the first entry
containing non-BMP characters (probably will not be entirely visible to
most people)
Can *anyone* see it properly? Last I checked no browser could read UTF-8
beyond the BMP,
- Original Message -
From: Tom Gewecke [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: den 11 april 2002 22:56
Subject: Re: Vietnamese Nom Text
see:
http://www.columbia.edu/kermit/utf8.html
which has an interesting new entry: Vietnamese N^¥m, the first entry
containing non-BMP
It seems that I have to make a font containing any characters that I want to
propose for inclusion.
Do the characters have to be encoded to the correct code points, or can they
be encoded to just about any code point?
Is there some free font program out there that can be used for this purpose?
ICU 2.1 will have an API for this, uchar.h/u_charAge().
markus
Kenneth Whistler wrote:
Frank asked:
Given a Unicode encoding value U+ (or whatever for non-BMP), how can
I find out the version of the Unicode standard in which this character
first appeared?
Mark:
A suggestion: On slide 5, I would be inclined not to differentiate
surrogates from non-characters. That only confuses people, I think,
regarding the relationships between codepoints and the various encoding
forms. Even if they are formally still distinguished in the Std, I contend
that
From: "Stefan Persson" [EMAIL PROTECTED]
To: "Unicode-listan" [EMAIL PROTECTED]
Subject: Concerning proposals
Date: Thu, 11 Apr 2002 23:57:55 +0200
It seems that I have to make a font containing any characters that I want
to
propose for inclusion.
Oy gevalt. So I can't propose anything.
From [EMAIL PROTECTED] Thu Apr 11 13:45:37 2002
X-Originating-IP: [62.30.112.2]
To: [EMAIL PROTECTED]
Subject: Re: Inherent a
Sinnathurai Srivas wrote:
May I assume u+0b85 as official?
No.
That is U+0B85 TAMIL LETTER A -- just the ordinary, standalone
letter /a/.
You are, of course, free
Stefan asked:
It seems that I have to make a font containing any characters that I want to
propose for inclusion.
Or provide a font already made by someone else containing them, or get
someone else who has the relevant tools to produce it.
This is a barrier erected for three reasons:
1.
Juuitchan donned sackcloth and ashes and wailed:
It seems that I have to make a font containing any characters that I want
to
propose for inclusion.
Oy gevalt. So I can't propose anything. Fabulous. Just fabulous.
Well, get serious.
The Unicode Standard is serious business. (Even if
At 15:49 4/11/2002, Kenneth Whistler wrote:
Is there some free font program out there that can be used for this
purpose?
I'll let somebody else on the list who knows about font tools answer
that one.
I'm not aware of any free tools that I would trust to do the job. The
cheapest option is
Stefan == Stefan Persson [EMAIL PROTECTED] writes:
Stefan Is there some free font program out there that can be used for
Stefan this purpose?
There is pfaedit at:
http://pfaedit.sf.net/
and for bdf bitmap fonts xmbdfed at:
http://crl.nmsu.edu/~mleisher/xmbdfed.html
Pfaedit's
pfaedit's a free font editor for Unix. Or one could write out a
PostScript font by hand - it's not completely unreasonable, especially
if you're doing something like a few math characters.
--
David Starner - [EMAIL PROTECTED]
It's not a habit; it's cool; I feel alive.
If you don't have it
This is a barrier erected for three reasons:
1. If a proposed character can't pass the font test -- i.e., nobody can
come up with a usable font that contains it -- then it may be of
rather marginal usefulness, since apparently people *aren't* using
it.
Of course, historical
This thread seems just about ended, and I don't want to be the person
to revive it, but there have been numerous related topics in the past
six months, and nothing in them answers the question that has been
nagging me.
The question is
Considering the difficulty af actually getting access to
Markus Scherer [EMAIL PROTECTED] wrote:
On 2002-apr-09, Shlomi Tal and Doug Ewell discussed on this list
a UTF-7 signature byte sequence of +/v8- (which was news to me).
I don't remember ever reading a recommendation, or even a suggestion, to
use +/v8- as a signature for UTF-7. But that
27 matches
Mail list logo