Re: MS/Unix BOM FAQ again (small fix)

2002-04-11 Thread Doug Ewell
Mark Davis [EMAIL PROTECTED] wrote: - when one of the BOM-allowing UTFs starts with a BOM, you know the encoding*, and you strip off the BOM when you get the content. *assuming that no UTF-16 file has U+ as the first character. In the real world, this is a pretty good assumption --

Re[2]: Discrepancy in ch03.pdf?

2002-04-11 Thread Anton Tagunov
Hello, Doug! I) AT http://www.unicode.org/unicode/uni2book/ch03.pdf AT 1. AT - A single abstract character may correspond to more then one code AT value - for example, U+00C5 ... LATIN CAPITAL LETTER A WITH RING and U+212B ... ANGSTROM SIGN 2. AT - Multiple code values may be

Re: MS/Unix BOM FAQ again (small fix)

2002-04-11 Thread Otto Stolz
Doug Ewell wrote: As Shlomi points out, Microsoft products do not treat UTF-7 specially, except that IE recognizes the UTF-7 BOM and sets its encoding accordingly (but this is true for any UTF-7 sequence, not just the BOM; try loading a text file containing only the 11 ASCII characters

Re: MS/Unix BOM FAQ again (small fix)

2002-04-11 Thread Mark Davis
It is a pretty good assumption; but if BOMs are used on smaller fields the probability goes up. And to be perfectly reliable, you can't assume it. That is one reason that the WORD JOINER was encoded, so that eventually we can use FEFF solely as a BOM. Mark — Γνῶθι σαυτόν — Θαλῆς [For

OT: Definitions of Unicode

2002-04-11 Thread Mark Davis
I thought some of the choices in the following were amusing: http://m-w.com/cgi-bin/dictionary/?va=Unicode Mark — Γνῶθι σαυτόν — Θαλῆς [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com

RE: MS/Unix BOM FAQ again (small fix)

2002-04-11 Thread jarkko . hietaniemi
Mark Davis [EMAIL PROTECTED] wrote: - when one of the BOM-allowing UTFs starts with a BOM, you know the encoding*, and you strip off the BOM when you get the content. *assuming that no UTF-16 file has U+ as the first character. In the real world, this is a pretty good assumption

UTF-7 signature

2002-04-11 Thread Markus Scherer
On 2002-apr-09, Shlomi Tal and Doug Ewell discussed on this list a UTF-7 signature byte sequence of +/v8- (which was news to me). (Subject MS/Unix BOM FAQ again (small fix)) I meditated some over this - +/v8 is the encoding of U+FEFF as the first code point in a text. So far, so good. The '-'

Re: UTF-7 signature

2002-04-11 Thread Markus Scherer
Shlomi Tal wrote: UTF-7, it shocked me how Greek Sokrates and S o k r a t e s (with spaces between each Greek letter in the latter) would have different encodings for the same Unicode characters. That is not unusual for stateful encodings. It's the same with BOCU-1 (not in this particular

When was U+xxxx added?

2002-04-11 Thread Frank da Cruz
Given a Unicode encoding value U+ (or whatever for non-BMP), how can I find out the version of the Unicode standard in which this character first appeared? - Frank

Re: When was U+xxxx added?

2002-04-11 Thread Kenneth Whistler
Frank asked: From [EMAIL PROTECTED] Thu Apr 11 12:12:33 2002 Date: Thu, 11 Apr 2002 14:58:48 EDT Given a Unicode encoding value U+ (or whatever for non-BMP), how can I find out the version of the Unicode standard in which this character first appeared? At last, a question for which we

Re: Inherent a

2002-04-11 Thread Avarangal
Dear Doug Ewell, William Overington, James E. Agenbroad, and Maurice Bauhahn, Thank you all for the reply. May I assume u+0b85 as official? Some explanations for the need for a visible a. In Tamil, a/ dependent ai, and au has ligatures. infact au and ou at present utilise the same ligature.

Re: Inherent a

2002-04-11 Thread Rick McGowan
Avarangal wrote: Dear Doug Ewell, William Overington, James E. Agenbroad, and Maurice Bauhahn, Thank you all for the reply. May I assume u+0b85 as official? Whoa, hang on here! Official WHAT? u+0b85 is definitely in Unicode: U+0B85 TAMIL LETTER A It is _NOT_ an inherent a vowel.

Re: Vietnamese Nom Text

2002-04-11 Thread Tom Gewecke
see: http://www.columbia.edu/kermit/utf8.html which has an interesting new entry: Vietnamese Nˆ¥m, the first entry containing non-BMP characters (probably will not be entirely visible to most people) Can *anyone* see it properly? Last I checked no browser could read UTF-8 beyond the BMP,

Re: Vietnamese Nom Text

2002-04-11 Thread Stefan Persson
- Original Message - From: Tom Gewecke [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: den 11 april 2002 22:56 Subject: Re: Vietnamese Nom Text see: http://www.columbia.edu/kermit/utf8.html which has an interesting new entry: Vietnamese N^¥m, the first entry containing non-BMP

Concerning proposals

2002-04-11 Thread Stefan Persson
It seems that I have to make a font containing any characters that I want to propose for inclusion. Do the characters have to be encoded to the correct code points, or can they be encoded to just about any code point? Is there some free font program out there that can be used for this purpose?

Re: When was U+xxxx added?

2002-04-11 Thread Markus Scherer
ICU 2.1 will have an API for this, uchar.h/u_charAge(). markus Kenneth Whistler wrote: Frank asked: Given a Unicode encoding value U+ (or whatever for non-BMP), how can I find out the version of the Unicode standard in which this character first appeared?

Re: Unicode Myths

2002-04-11 Thread Peter_Constable
Mark: A suggestion: On slide 5, I would be inclined not to differentiate surrogates from non-characters. That only confuses people, I think, regarding the relationships between codepoints and the various encoding forms. Even if they are formally still distinguished in the Std, I contend that

Re: Concerning proposals

2002-04-11 Thread $B$m!;!;!;!;(B $B$m!;!;!;(B
From: "Stefan Persson" [EMAIL PROTECTED] To: "Unicode-listan" [EMAIL PROTECTED] Subject: Concerning proposals Date: Thu, 11 Apr 2002 23:57:55 +0200 It seems that I have to make a font containing any characters that I want to propose for inclusion. Oy gevalt. So I can't propose anything.

Re: Inherent a

2002-04-11 Thread Kenneth Whistler
From [EMAIL PROTECTED] Thu Apr 11 13:45:37 2002 X-Originating-IP: [62.30.112.2] To: [EMAIL PROTECTED] Subject: Re: Inherent a Sinnathurai Srivas wrote: May I assume u+0b85 as official? No. That is U+0B85 TAMIL LETTER A -- just the ordinary, standalone letter /a/. You are, of course, free

Re: Concerning proposals

2002-04-11 Thread Kenneth Whistler
Stefan asked: It seems that I have to make a font containing any characters that I want to propose for inclusion. Or provide a font already made by someone else containing them, or get someone else who has the relevant tools to produce it. This is a barrier erected for three reasons: 1.

Re: Concerning proposals

2002-04-11 Thread Kenneth Whistler
Juuitchan donned sackcloth and ashes and wailed: It seems that I have to make a font containing any characters that I want to propose for inclusion. Oy gevalt. So I can't propose anything. Fabulous. Just fabulous. Well, get serious. The Unicode Standard is serious business. (Even if

Re: Concerning proposals

2002-04-11 Thread John Hudson
At 15:49 4/11/2002, Kenneth Whistler wrote: Is there some free font program out there that can be used for this purpose? I'll let somebody else on the list who knows about font tools answer that one. I'm not aware of any free tools that I would trust to do the job. The cheapest option is

Re: Concerning proposals

2002-04-11 Thread James H. Cloos Jr.
Stefan == Stefan Persson [EMAIL PROTECTED] writes: Stefan Is there some free font program out there that can be used for Stefan this purpose? There is pfaedit at: http://pfaedit.sf.net/ and for bdf bitmap fonts xmbdfed at: http://crl.nmsu.edu/~mleisher/xmbdfed.html Pfaedit's

Re: Concerning proposals

2002-04-11 Thread David Starner
pfaedit's a free font editor for Unix. Or one could write out a PostScript font by hand - it's not completely unreasonable, especially if you're doing something like a few math characters. -- David Starner - [EMAIL PROTECTED] It's not a habit; it's cool; I feel alive. If you don't have it

Re: Concerning proposals

2002-04-11 Thread $B$m!;!;!;!;(B $B$m!;!;!;(B
This is a barrier erected for three reasons: 1. If a proposed character can't pass the font test -- i.e., nobody can come up with a usable font that contains it -- then it may be of rather marginal usefulness, since apparently people *aren't* using it. Of course, historical

Re: MS/Unix BOM FAQ again (small fix)

2002-04-11 Thread George W Gerrity
This thread seems just about ended, and I don't want to be the person to revive it, but there have been numerous related topics in the past six months, and nothing in them answers the question that has been nagging me. The question is Considering the difficulty af actually getting access to

Re: UTF-7 signature

2002-04-11 Thread Doug Ewell
Markus Scherer [EMAIL PROTECTED] wrote: On 2002-apr-09, Shlomi Tal and Doug Ewell discussed on this list a UTF-7 signature byte sequence of +/v8- (which was news to me). I don't remember ever reading a recommendation, or even a suggestion, to use +/v8- as a signature for UTF-7. But that