On Thursday, July 31, 2003 4:56 PM, John Cowan wrote:Presumably we have to distinguish between what is a spelling etc error in any particular language and what is an illegal Unicode sequence. Probably this sentence really means more like a spelling error.
Unicode allows any combining character to be attached to any basecharacter
whatsoever. However, putting a dagesh into a DEVANAGARI KA, or placing aand
circumflex over an ARABIC MEEM, is pretty certain to cause bad rendering,
may screw up other text processes such as syllabication.
From Unicode 3.2, Chapter 8 [regarding shin and sin dot]:"The two dots are mutually exclusive. The base letter shin can also have dagesh, a vowel, and other diacritics. Use of the two dots with any other base character is an error."
Sometimes, doing something that's allowed can still be an error.
We mustn't forget that unusual combinations are sometimes meaningful. For example, there are languages which use Hebrew base characters with Arabic vowel points. We mustn't make these illegal sequences in Unicode without very good reason.
-- Peter Kirk [EMAIL PROTECTED] http://web.onetel.net.uk/~peterkirk/

