On Sun, Jan 6, 2013 at 12:34 PM, Mark Davis ☕ <[email protected]> wrote:
> [...] > What you write and that the UTFs have historical artifact in their design makes sense to me. (There are many, many discussions of this in the Unicode email archives if > you have more questions.) > Okay. I am fine with ending this thread. *But ...* I do want to rephrase what baffled me just now. After sleeping over this, it's clearer what the issue was: Most Unicode discourse is about code points and talks about them, with the implication (everywhere, pretty much) that we're encoding *code points* in encoding forms. Maybe I've just read this into the discourse, but if Unicode discussions used the expression "scalar value" more, there would be no potential for such misunderstanding. (1) Any expression containing "surrogate" *should* be relevant only for UTF-16. (2) The notion of "code point" covers scalar values *plus* U+<surrogate value>. (3) The expression "code point" is used in an encoding form–independent context, for the most part. (4) So, it's very confusing to ever write surrogate values (say, D813_hex) in "U+"-notation. Surrogate values are UTF-16-internal byte values. Nobody should be thinking about them outside of UTF-16. Now the terminology is a jumble. Stephan

