Антон Тагунов <[EMAIL PROTECTED]> wrote regarding Definition D5:
> Every time I read the following passage in > http://www.unicode.org/unicode/uni2book/ch03.pdf > I get confused: > > - A single abstract character may correspond to more then one code > value - ... > - Multiple code values may be required to represent a single abstract > character. I don't see a discrepancy between these two statements, at least not one that the phrase "more than one code value sequence" would clarify. > For example, a byte is the code unit in SJIS:... > ideographs require two code values I do think the text here is unclear about "code values" and "code units." It says they are the same thing, and then uses both terms interchangeably, which is a bit confusing for a standard. To me, a more useful distinction is the one in Technical Report #17, "Character Encoding Model" <http://www.unicode.org/unicode/reports/tr17/> between "code point" and "code unit." A code point is something like U+0410 for CYRILLIC CAPITAL LETTER A. Code units are the two bytes 0xD0 0x90 required to express that code point in UTF-8, or the single 32-bit word 0x00000410 required to express it in UTF-32. Incorporating the concepts from UTR #17 into the main text is one place where the "language tightening" project for Unicode 4.0 should really pay off. -Doug Ewell Fullerton, California

