On 9/16/2013 8:31 AM, Stephan Stiller wrote:
On 9/16/2013 7:48 AM, Stephan Stiller wrote:
or count code points corresponding to code units because, well, you
can match them up
= "or count code points corresponding to UTF-16 code units"; those
happen to be BMP code points.
Twitter has been claiming since /at least/ April 2012 that they're
counting "code points" ("counts the number of codepoints" in their
article). (I know it goes back further, but I'm too lazy to trace
things.) André observed just in October 2012 that they were actually
counting UTF-16 code points (though more accurate to call them UTF-16
code units, which all match up with BMP code points, which is what I
think Doug meant, but it's a terminological detail, and this confusion
It is the wording in your posts that adds to the confusion.
There is not, and never has been such a thing as a UTF-16 "code point".
Once you add the UTF-prefix, you are, by force, speaking of code units.
At best there is the concept of a "code point encoded in UTF-16", but at
that point the result is no longer a fixed width entity, but, in the
general case, a sequence.
Some people writing end user materials may have shown terminological
muddle, but that's no reason to repeat that here in your own statements
or to insinuate that the definitions are widely confused by those who
have the requisite technical background.
A./
actually turns out to be part of the problem). You are relegating
scalar values to lower status (factually wrong; see everywhere in the
glossary). Now what on earth do they mean by "codepoint" [spelled as
such]?
If you really want, you can say that Twitter wasn't confusing code
points [typecast from UTF-16 code units, in my worldview] with scalar
values but instead code points [in the "scalar value" sense] with code
units, but that's terminological sophistry. Under either view they
didn't know what they were doing when handling "code points", however
defined or interpreted.
Stephan