You haven't been following the thread, have you. When you "count code points" you can: either count the original code "points", which is the same as counting scalar values, /because that's what an encoding form encodes/; or count code points corresponding to code units because, well, you can match them up. The latter interpretation seemed to derive from terminological imprecision at first, but my concern and suspicion turned out to be spot-on with what Twitter did historically.

On 9/16/2013 7:19 AM, Philippe Verdy wrote:
2013/9/16 Stephan Stiller <[email protected] <mailto:[email protected]>> > That's exactly what happens when people confuse "code point" with "scalar value" ;-) Hmm, whom might we blame? :-)

Actually you never count scalar values. You are confusing tham with code units. Twitter was orignally counting UTF-16 code units, but now counts code points.

Scalar values are unrelated, they are properites assigned to code points so that all code points have a scalar value but the reverse is true only with the valid range 0 to 0x1FFFFF. Scalar values are only used if you need to perform arithmetic to compute code points from others. This genreally does not work well within the UCS except in a few very small ranges (like decimal digits). The scalar value is also needed to convert from one standard UTF to another.

Reply via email to