RE: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

2003-03-02 Thread Kent Karlsson
Michael (michka) Kaplan: ... then the conversion will simply strip the errant characters. Note that either solution meets the needs of refusal to interpret the errant sequences. Simply stripping the errant byte sequences means that they are each interpreted as the empty string of characters.

Impossible combinations?

2003-03-02 Thread Kevin Brown
I'm working on a Latin-based font that's got a large number of kerning pairs already defined and I'm trying to pare this list of pairs down to the bare minimum. There seem to be many pairs which are unlikely ever to be used. These pairs all involve a lowercase on the left with an uppercase on

Some of Andy's assertions

2003-03-02 Thread Michael Everson
1. The sequence 'Vowel+Virama+Ya...' is illogical to scholars of Bengali and indeed Indic languages in general. I refuted this yesterday by indication that this usage is an innovation. 2. Such sequences are not semantically equivalent to the intended ... sentence fragment. 3. There are no other

Re: Please see my latest proposal

2003-03-02 Thread Michael Everson
Andy, Your BENGALI LETTER OPEN O can be encoded already with the sequence U+0985 U+09CD U+09AF. Your BENGALI LETTER CENTRAL E can be encoded already with the sequence U+098F U+09CD U+09AF. There is no need to bring the Bengali code block in line with the Devanagari block. -- Michael Everson

Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

2003-03-02 Thread Mark Davis
I agree with Kent that it is somewhat less robust to simply remove ill-formed sequences, since it removes any indication that the data was corrupted. Either better to signal an error, or insert some other indication like a REPLACEMENT CHARACTER or SUB at that point. (And in my reading, C12a does

Re: Impossible combinations?

2003-03-02 Thread Roozbeh Pournader
On Sun, 2 Mar 2003, Kevin Brown wrote: Does anyone know of a Latin-based language in which it is possible to have a lowercase immediately followed by an uppercase in the SAME word? That happens in many common names, like McGowan. It will also be used in tech terms that need to avoid space for

Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

2003-03-02 Thread Michael \(michka\) Kaplan
From: Mark Davis [EMAIL PROTECTED] I agree with Kent that it is somewhat less robust to simply remove ill-formed sequences, since it removes any indication that the data was corrupted. Nice that the API gives one the option to choose, huh? ;-) The notion of continuing (even if one is

Re: Impossible combinations?

2003-03-02 Thread Michael Everson
At 21:01 +0330 2003-03-02, Roozbeh Pournader wrote: That happens in many common names, like McGowan. Noble names, Roozbeh. ;-) -- Michael Everson * * Everson Typography * * http://www.evertype.com

Re: Impossible combinations?

2003-03-02 Thread Kenneth Whistler
On Sun, 2 Mar 2003, Kevin Brown wrote: Does anyone know of a Latin-based language in which it is possible to have a lowercase immediately followed by an uppercase in the SAME word? In addition to the examples pointed out by Roozbeh and Michael, this pattern is growing increasingly common

Re: Impossible combinations?

2003-03-02 Thread John Hudson
At 04:11 AM 3/2/2003, Kevin Brown wrote: I'm working on a Latin-based font that's got a large number of kerning pairs already defined and I'm trying to pare this list of pairs down to the bare minimum. There seem to be many pairs which are unlikely ever to be used. These pairs all involve a

Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

2003-03-02 Thread Asmus Freytag
At 07:21 AM 3/2/03 -0800, Mark Davis wrote: C12a When a process interprets a code unit sequence which purports to be in a Unicode character encoding form, it shall treat ill-formed code unit sequences as an error condition, and shall not interpret such sequences as