From: "Peter R. Mueller-Roemer" <[EMAIL PROTECTED]>
For a fixed length of combining character sequence (base + 3 combining marks is the most I have seen graphically distinguishable) the repertore is still finite.

I do think that you are underestimating the repertoire. Also Unicode does NOT define an upper bound for the length of combining sequences, and also not on the length of default grapheme clusters (which can be composed of multiple combining sequences, for example in the Hangul or Tibetan scripts) Your estimations also ignores various layouts found in Asian texts, and the particular structures of historic texts which can use many "diacritics" on top of a single base letter starting a combining sequence. The model of these scripts (for example Hebrew) imply the justaposition of up to 13 or 15 levels of diacritics for the same base letter!


In practice, it's impossible to enumerate all existing combinations (and ensure that they will be assigned a unique code within a reasonnably limited code point), and that's why a simpler model based on more basic but combinable code points is used in Unicode: it frees Unicode from having to encode all of them (this is already a difficult task for the Han script which could have been encoded with combining sequences, if the algorithms needed to create the necesssary layout had not needed the use of so many complex rules and so many exceptions...)





Reply via email to