On 9/16/2013 8:31 AM, Stephan Stiller wrote:
On 9/16/2013 7:48 AM, Stephan Stiller wrote:
or count code points corresponding to code units because, well, you can match them up
= "or count code points corresponding to UTF-16 code units"; those happen to be BMP code points.

Twitter has been claiming since /at least/ April 2012 that they're counting "code points" ("counts the number of codepoints" in their article). (I know it goes back further, but I'm too lazy to trace things.) André observed just in October 2012 that they were actually counting UTF-16 code points (though more accurate to call them UTF-16 code units, which all match up with BMP code points, which is what I think Doug meant, but it's a terminological detail, and this confusion

It is the wording in your posts that adds to the confusion.

There is not, and never has been such a thing as a UTF-16 "code point". Once you add the UTF-prefix, you are, by force, speaking of code units.

At best there is the concept of a "code point encoded in UTF-16", but at that point the result is no longer a fixed width entity, but, in the general case, a sequence.

Some people writing end user materials may have shown terminological muddle, but that's no reason to repeat that here in your own statements or to insinuate that the definitions are widely confused by those who have the requisite technical background.

A./

actually turns out to be part of the problem). You are relegating scalar values to lower status (factually wrong; see everywhere in the glossary). Now what on earth do they mean by "codepoint" [spelled as such]?

If you really want, you can say that Twitter wasn't confusing code points [typecast from UTF-16 code units, in my worldview] with scalar values but instead code points [in the "scalar value" sense] with code units, but that's terminological sophistry. Under either view they didn't know what they were doing when handling "code points", however defined or interpreted.

Stephan


Reply via email to