Re: Code point vs. scalar value

Stephan Stiller Tue, 17 Sep 2013 15:26:48 -0700

[AF:]

It is the wording in your posts that adds to the confusion.

My fundamental point is, has been, and continues to be that wheneverpeople use the more general word "code point" instead of the moreappropriate "scalar value", that will "add to the confusion". If youmake the presupposition <http://en.wikipedia.org/wiki/Presupposition>that your sequence of "code points" or "scalar values" contains nosurrogate values, then, yes, this will be

[DE:] truly a distinction without a difference

but if you're using these word without an explicitly statedpresupposition, then one will assume that when you mean "code point" youdo (surprise, surprise) actually mean "code point", which /according tothe official definitions/ will include "surrogate code points". Imentioned this a while ago in a question about ICU, and KenW repliedthat the real world contains bad data. I also think that this

[DE:] it is very unlikely that Twitter and others are storing and interchanging 
loose surrogates

is incorrect. Not sure whether the Twitter hack I linked to made use of/loose/ surrogates, but it was based on encoding and storing surrogates.



[AF:]

[some paragraphs terminating in:]
Some people writing end user materials may have shown terminologicalmuddle

Sorry to say, but that's apparently the way Twitter misconstrued it. Thealternative to a characterization of the way they've interpreted theword "code point" (which is rather un-crazy, but then you're minimizingin your email the extent to which such interpretations or"mis"construals exist online) is to say that Twitter has been, for along time, /blatantly/ wrong in their official attempt at clarifying thedetails of the distinguishing feature of their product, after having theproduct out for an even longer time.

From time to time I will encounter products that appear to handleUnicode but whose string handling gets deeply confused once youenter/paste anything beyond the BMP; you can blame this on confusing"code point" with "code unit" instead, but if the first word didn'texist (because it shouldn't), there would be no confusion.



This qualification

[AF:] by those who have the requisite technical background

of this statement

[AF:] to insinuate that the definitions are widely confused

of course makes it true. As long as "high-surrogate code point" and"low-surrogate code point" aren't officially deprecated, confusion willpersist. They should be deprecated, because, /as you say/:

[AF:] Once you add the UTF-prefix, you are, by force, speaking of codeunits.

So the high-low distinction for "surrogate" code points is misleading,and the "surrogate" attribute for "code point" shouldn't be there,because, as I've in fact written in a much earlier thread and as peopleknow, surrogates are UTF-16-specific.



Stephan

Re: Code point vs. scalar value

Reply via email to