On Tue, 26 Mar 2002, Doug Ewell wrote: > I'm surprised nobody took Dan the Silly Man to task on this one. > > English enjoys new words on-the-fly. > > What a pity Kanji on-the-fly is a taboo, at least on Unicode ;)
I think these were meant as rhetorical questions, but I'll bite, particularly #3... > Can you name a character encoding standard, anywhere in the world, > invented by anybody -- government, industry consortium, private company, > individual, kwijibo, ANYBODY -- that can do better in this regard than > Unicode? Besides the giant 70K+ repetoire which reduces the likelihood of an unavailable character, there's always the PUA option. Some other competitors in the Han character area don't even have that (ie., a "gaiji" area), instead forcing one to submit such characters for registration. > Can you name a font technology that will support the display of these > "invented-on-the-fly" Kanji? > > For that matter, can you invent a Kanji on the fly that cannot be > represented (perhaps in a rather cumbersome way) with Ideographic > Description Characters? Yes, it's possible but uncommon. Unlike some other character description schemes, IDS can only form characters by composition. e.g., there's no way to gut out everything except the right half of U+8BD1 (yi4 'to translate') and use the former right half as a component in describing another character (as of Unicode 2.1--I haven't checked later versions.) Such a component would need to be separately encoded for it to participate in an IDS. Sometimes such components are not independent characters, or they are rare independent characters that have been overlooked for encoding. In this particular example, when U+776A occurs as part of a character in unsimplified Chinese, then the simplified Chinese form would have U+776A converted into the component mentioned above by application of simplification rules (standing alone, U+776A is identical in simplified form). Find all the characters containing U+776A as a component and create the simplified forms by applying the rule--that'll generate plenty of characters that IDS's can't represent. Another case is a character for 'Marxism'--it is U+9A6C with the final stroke gutted out, and replaced with U+4E49 (Again, this example only checked to be true as of Unicode 2.1). There are also an almost negligible number of cases such as U+4E52 and U+4E53 (used to write ping1pang1 'ping pong') or U+5187 (used to write Cantonese mou5 'to not have', among other words), which are created by deleting of single strokes from U+5175 and U+6709, respectively. A number of Vietnamese chu+~ no^m characters are also created in such fashion. This is at a level smaller than the components that IDS work on, and is really not a flaw of IDS. IDS's, unlike some other description schemes, also don't handle rotation--there are also an almost negligible number of cases where a character (or a component) is formed by rotating another 180 degrees, e.g., U+20114, which is U+4E88 rotated 180. However, this is so rare that it wouldn't be a productive IDC if it were to exist. IDS's also don't handle cases of "ligaturing", e.g., U+21155 (xi3 'double happiness'), which is two U+559C side-by-side in origin. Distinguish from U+56CD of the same meaning as U+21155, where ligaturing doesn't take place. IDS's also don't handle cases of guwen 'ancient character', which are characters in pre-modern form that have been converted to modern form, e.g., U+20A30, a tortured character which is really the zhuan 'seal' form of U+5973 (nuu3 'woman; female') modernized. IDS's might handle it, but clumsily. Others such as U+20066 are just impossible with IDS's. However, this type are not likely to be created in this age, except as modernizations of ancient forms. Despite these counterexamples, IDS do handle the majority of unencoded Han characters, most of which are the "left to right" or "above to below" variety with respect to the particular IDC's used. Thomas Chan [EMAIL PROTECTED]

