On 31/07/2003 15:02, Ted Hopp wrote:

On Thursday, July 31, 2003 4:56 PM, John Cowan wrote:


Unicode allows any combining character to be attached to any base


character


whatsoever. However, putting a dagesh into a DEVANAGARI KA, or placing a
circumflex over an ARABIC MEEM, is pretty certain to cause bad rendering,


and


may screw up other text processes such as syllabication.



From Unicode 3.2, Chapter 8 [regarding shin and sin dot]:
"The two dots are mutually exclusive. The base letter shin can also have
dagesh, a vowel, and other diacritics. Use of the two dots with any other
base character is an error."

Sometimes, doing something that's allowed can still be an error.


Presumably we have to distinguish between what is a spelling etc error in any particular language and what is an illegal Unicode sequence. Probably this sentence really means more like a spelling error.

We mustn't forget that unusual combinations are sometimes meaningful. For example, there are languages which use Hebrew base characters with Arabic vowel points. We mustn't make these illegal sequences in Unicode without very good reason.


-- Peter Kirk [EMAIL PROTECTED] http://web.onetel.net.uk/~peterkirk/





Reply via email to