Hello, Doug!

I)

AT> http://www.unicode.org/unicode/uni2book/ch03.pdf
AT>
1.
AT> - A single abstract character may correspond to more then one code
AT>   value -
      for example, U+00C5 ... LATIN CAPITAL LETTER A WITH RING and
      U+212B ...  ANGSTROM SIGN
2.
AT> - Multiple code values may be required to represent a single abstract
AT>   character.

DE> I don't see a discrepancy between these two statements, at least not one
DE> that the phrase "more than one code value sequence" would clarify.

Yes, _this_ is the fragement that looks confusing to me.

2. says that a single abstaract character may need more then one
   code value to be encoded.
   Okay, this is about surrogate pairs.

1. speaks about a single abstract character mapping to two
  _scalar values_

But then it should have said "A single abstract charcter may
correspond to more then one SEQUENCE of 1 to 2 code values!!

Imagine an abstract character corresponds to two scalar values
over 0xFFFF. Then it corresponds to two PAIRS OF CODE VALUES, not to
two CODE VALUES

Dough?

---

II)

AT>   For example, a byte is the code unit in SJIS:...
AT>   ideographs require two code values

DE> I do think the text here is unclear about "code values" and "code
DE> units."

Doug, I did not mean to go that far :-)

DE> <http://www.unicode.org/unicode/reports/tr17/> between "code point" and
DE> "code unit."

Thanks for the link!

DE>   A code point ... U+0410
DE>   Code units are the two bytes 0xD0 0x90 required to express
DE>   that code point in UTF-8, or the single 32-bit word 0x00000410 required
DE>   to express it in UTF-32.
DE> Incorporating the concepts from UTR #17 into the main text is one place
DE> where the "language tightening" project for Unicode 4.0 should really
DE> pay off.

It looks to me that both concepts are already in ch03.pdf
  A code value is also referred to as a code unit in the information
  industry

  A Unicode scalar value is also referred to as a code position or a
  code point in the information industry

Sure "language tightening" will be good, but this was not the part
of ch03.pdf that got me confused. I personally am quite content with the
- code value, code unit
- code point, scalar value, code position
definitions :-)


- Anton



Reply via email to