Michael (michka) Kaplan:
...
then the conversion will simply strip the errant characters. Note that
either solution meets the needs of refusal to interpret the errant
sequences.
Simply stripping the errant byte sequences means that they are
each interpreted as the empty string of characters.
I'm working on a Latin-based font that's got a large number of kerning
pairs already defined and I'm trying to pare this list of pairs down to
the bare minimum. There seem to be many pairs which are unlikely ever to
be used. These pairs all involve a lowercase on the left with an
uppercase on
1. The sequence 'Vowel+Virama+Ya...' is illogical to scholars of
Bengali and indeed Indic languages in general.
I refuted this yesterday by indication that this usage is an innovation.
2. Such sequences are not semantically equivalent to the intended
... sentence fragment.
3. There are no other
Andy,
Your BENGALI LETTER OPEN O can be encoded already with the sequence
U+0985 U+09CD U+09AF.
Your BENGALI LETTER CENTRAL E can be encoded already with the
sequence U+098F U+09CD U+09AF.
There is no need to bring the Bengali code block in line with the
Devanagari block.
--
Michael Everson
I agree with Kent that it is somewhat less robust to simply remove
ill-formed sequences, since it removes any indication that the data was
corrupted. Either better to signal an error, or insert some other indication
like a REPLACEMENT CHARACTER or SUB at that point. (And in my reading, C12a
does
On Sun, 2 Mar 2003, Kevin Brown wrote:
Does anyone know of a Latin-based language in which it is possible to
have a lowercase immediately followed by an uppercase in the SAME word?
That happens in many common names, like McGowan. It will also be used in
tech terms that need to avoid space for
From: Mark Davis [EMAIL PROTECTED]
I agree with Kent that it is somewhat less robust to simply remove
ill-formed sequences, since it removes any indication that the data
was
corrupted.
Nice that the API gives one the option to choose, huh? ;-)
The notion of continuing (even if one is
At 21:01 +0330 2003-03-02, Roozbeh Pournader wrote:
That happens in many common names, like McGowan.
Noble names, Roozbeh. ;-)
--
Michael Everson * * Everson Typography * * http://www.evertype.com
On Sun, 2 Mar 2003, Kevin Brown wrote:
Does anyone know of a Latin-based language in which it is possible to
have a lowercase immediately followed by an uppercase in the SAME word?
In addition to the examples pointed out by Roozbeh and Michael,
this pattern is growing increasingly common
At 04:11 AM 3/2/2003, Kevin Brown wrote:
I'm working on a Latin-based font that's got a large number of kerning
pairs already defined and I'm trying to pare this list of pairs down to
the bare minimum. There seem to be many pairs which are unlikely ever to
be used. These pairs all involve a
At 07:21 AM 3/2/03 -0800, Mark Davis wrote:
C12a When a process interprets a code unit sequence which
purports to be in a Unicode character encoding form, it
shall treat ill-formed code unit sequences as an error
condition, and shall not interpret such sequences as
11 matches
Mail list logo