Andy White wrote: > And I today see that the precomposed character '0B71 ORIYA LETTER WA' > has been added to the UCS4.0 charts > http://www.unicode.org/charts/PDF/U40-0B00.pdf > This is clearly a composition of ORIYA LETTER O and ORIYA LETTER LETTER > VA (BA).
People on the list today are playing a little fast and loose with the terminology of "precomposed" and "composition". In the Unicode Standard, a character is not precomposed or composite unless it has a formal decomposition mapping defined in the Unicode Character Database (namely in UnicodeData.txt). While ORIYA LETTER WA is graphically constructed of the form for the ORIYA LETTER O and the bottom half of PA (not BA), it doesn't fit the pattern one would expect for consonant conjuncts (C+C, not V+C), and it isn't given a formal decomposition in UnicodeData.txt, because even though it is graphically complex, it otherwise fits into the pattern of the regular consonant letters for Indic scripts (as an alternate for VA). Note that the new ORIYA LETTER VA is also graphically complex -- a dotted BA -- but is also not given a decomposition. For that matter, you could look to existing Oriya characters such as U+0B06 ORIYA LETTER AA and claim it is just a graphic combination of U+0B05 ORIYA LETTER A and U+0B3E ORIYA VOWEL SIGN AA. But such decompositions are *also* not used in the standard. So ORIYA LETTER AA is an *atomic* character in Unicode, despite the fact that it is graphically complex (and analyzable into parts). If anyone ones a pointless exercise in simplification for the benefit of complexity sometime, try working on the Yi syllabary charts (U+A000..U+A48C) and pull these graphically complex forms apart into all of their duplicated constituent parts. The mere fact that such forms are graphically complex and have identifiable parts is not what establishes, however, their status as atomic versus composite character in the Unicode Standard. --Ken

