On 9/16/2013 7:48 AM, Stephan Stiller wrote:
or count code points corresponding to code units because, well, you can match them up
= "or count code points corresponding to UTF-16 code units"; those happen to be BMP code points.

Twitter has been claiming since /at least/ April 2012 that they're counting "code points" ("counts the number of codepoints" in their article). (I know it goes back further, but I'm too lazy to trace things.) André observed just in October 2012 that they were actually counting UTF-16 code points (though more accurate to call them UTF-16 code units, which all match up with BMP code points, which is what I think Doug meant, but it's a terminological detail, and this confusion actually turns out to be part of the problem). You are relegating scalar values to lower status (factually wrong; see everywhere in the glossary). Now what on earth do they mean by "codepoint" [spelled as such]?

If you really want, you can say that Twitter wasn't confusing code points [typecast from UTF-16 code units, in my worldview] with scalar values but instead code points [in the "scalar value" sense] with code units, but that's terminological sophistry. Under either view they didn't know what they were doing when handling "code points", however defined or interpreted.

Stephan

Reply via email to