Re: script complexity, was Re: OpenType vs TrueType

Philippe Verdy Sun, 05 Dec 2004 08:32:43 -0800

Richard Cook <rscook at socrates dot berkeley dot edu> wrote:

Script complexity is not so easily quantified. Has anyone tried to
sort scripts by complexity? In terms of the present discussion, Han
would be viewed as a simple script, and yet it is "simple" only in
terms of the script model in which ideographs are the smallest unit.
In a stroke-based Han script model, Han is at least as complex as any.

If Han had not been encoded with a ideograph-based model, may be(?) we would have needed much less code points. However the main immediate problem would have been that the layout of composite radical and strokes in the ideographic square is very complex, highly contextual, and in fact too much variable across dialects and script forms to allow a layout algorithm to be designed and standardized.

At least one could have standardized a Han strokes-to square layout system, but it would have required a huge dictionnary, requiring many dialect-specific sections to handle the variant forms and placement of the composing strokes. In addition, the "square" model is not imperitive in Han, because there are various styles for writing it, where the usual square model is much relaxed, or simply not observed on actual documents.

To model such variations in a stroke-based model, it would have been needed to encode: - the strokes themselves (all, not just the radicals!) - stroke variants - descriptive composition pseudo-characters (like the existing IDC in Unicode) - dialectal composition rules. And then to create a very complex specification to describe each ideograph according to this model, and allow a renderer to redraw the ideographs from such composition grapheme clusters. The second problem is that GB* and BigFive encodings already existed as widely used standards, but there was no concrete and interoperable solution to represent Han characters with such composed sequences.

This modeling was possible for Hangul, but with a simplification: the encoded "jamos" sometime represent several "strokes" (considered as letters, also because they have a clear phonetic value, but sometimes grouped within the same "jamo" to simplify the design of the Hangul layout system, notably for double-consonnant "SANG*" jamos). But a simpler system of jamos was still possible (for example it was easy to model the double-consonnant jamos as two successive simpler jamos, and then update the Hangul syllable model accordingly)

Re: script complexity, was Re: OpenType vs TrueType

Reply via email to