Re: What does it mean to "not be a valid string in Unicode"?

Stephan Stiller Sun, 06 Jan 2013 13:05:48 -0800

On Sun, Jan 6, 2013 at 12:34 PM, Mark Davis ☕ <[email protected]> wrote:


> [...]
>

What you write and that the UTFs have historical artifact in their design
makes sense to me.

(There are many, many discussions of this in the Unicode email archives if
> you have more questions.)
>

Okay. I am fine with ending this thread. *But ...*

I do want to rephrase what baffled me just now. After sleeping over this,
it's clearer what the issue was: Most Unicode discourse is about code
points and talks about them, with the implication (everywhere, pretty much)
that we're encoding *code points* in encoding forms. Maybe I've just read
this into the discourse, but if Unicode discussions used the expression
"scalar value" more, there would be no potential for such misunderstanding.
(1) Any expression containing "surrogate" *should* be relevant only for
UTF-16.
(2) The notion of "code point" covers scalar values *plus* U+<surrogate
value>.
(3) The expression "code point" is used in an encoding form–independent
context, for the most part.
(4) So, it's very confusing to ever write surrogate values (say, D813_hex)
in "U+"-notation. Surrogate values are UTF-16-internal byte values. Nobody
should be thinking about them outside of UTF-16. Now the terminology is a
jumble.

Stephan

Re: What does it mean to "not be a valid string in Unicode"?

Reply via email to