On Dec 24, 2003, at 5:18 PM, Philippe Verdy wrote:


All depends on the way you define characters. Most ideographs are composed,
but Unicode and the CJK unification working groups have failed for now to
define a coherent definition of how these characters really compose, so we
are still assisting to an always exploding number of compound ideographs,
created everyday by Han users.



Huh? Where on earth are you getting this stuff?


First of all, while people *are* still making up new ideographs, it's not a terribly common thing. The issue we've got to deal with at this point is *not* new ideographs, but old ones which are coming to light as the 2000+ years of written documents using the script are culled.

Secondly, there are excellent models for how to represent ideographs by decomposing them. The IDS model found in Unicode is one of the weaker ones but is fine for describing the overall structure. The CDL model under development is another, rather better one.

Finally, there has *never* been a serious effort to encode ideographs by breaking them down into pieces. Even though it's recognized that ideographs are usually formed as compounds in well-defined ways, the results are not thought of by the users of the script as anything but fundamental units. The ideographs are also seen as being made up of a small number of basic stroke types, a fact which is frequently used by font designers, but nobody wants to *encode* them using this system.

========
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage..mac.com/jhjenkins/




Reply via email to