On 9/17/2013 5:27 PM, Asmus Freytag wrote:
On 9/17/2013 2:55 PM, Stephan Stiller wrote:
[AF:]
It is the wording in your posts that adds to the confusion.
My fundamental point is, has been, and continues to be that whenever people use the more general word "code point" instead of the more appropriate "scalar value", that will "add to the confusion"
Not necessarily.
The understanding of the authorities is that encoding forms deal with code units and not code points; that's what you and Doug write. That follows from /parts/ of the Glossary. But I've acknowledged that a couple of emails ago.

Now explain to me, in clear wording:

 * Why are surrogate code points "for use by UTF-16"? That's what the
   Glossary says too (entry "Surrogate Code Point").
 * How exactly are high-surrogate and low-surrogate code points
   "designated for surrogate code units in the UTF-16 character
   encoding form"? It's in TUS (§3.2, C1).
 * How exactly are high-surrogate and low-surrogate code points
   "designated only for that use" (TUS, §3.8, D74), if they aren't
   actually "used" by any encoding form such as UTF-16?
 * What's "use" supposed to mean anyways, if there's no "use" going on?
 * Which parts of (TUS ∪ Glossary) should people omit to get to a
   consistent interpretation of it? Wait, we've answered that. Good,
   now: How are they supposed to come to the right conclusion?

If the Glossary isn't being inconsistent, it's unclear and misleading, in parts. The Glossary or TUS (the book) shouldn't be connecting surrogate code points with surrogate code units, because the letter aren't even computed from the former in any meaningful way. The resulting pair components fit numerically into the "surrogate" range (by historical growth of Unicode, as has been stated), but there's no semantic connection, and there can't even be one because there's nothing to connect anything to because there's nothing in the hole of the scalar value range.

Stephan

Reply via email to