2013/9/19 Asmus Freytag <[email protected]> > The legacy difference was the existence of UCS-2 in parallel with UTF-16. > Correct. But UCS-2 is still not extinct, eve if it is no longer used for exchanging interoperable plain-text.
UCS-2 remains widely used for storing arbitrary data in "strings", without any one of the restrictions that must apply to UTF-16. Most UTF-16 libraries are still in fact more generic UCS-2 libraries that can be used to process either pure UTF-16 or abitrary UCS-2. These libraries are still conforming processes if, when given any compliant UTF input data they always poduce compliant UTF output data. Applications may still use some API to determine the compliance of the input data, but applications are not required to assert this compliance everytime. And the only place where the "scalar value property" matters is only ducing conversions between standard UTFs. Internally when hanfling text or even when enumerating each code point, its absolutely never matters what is the scalar value property, if another binary value (e.g. a pointer or reference address to a object containing the code point properties) may be used which will facilitate the character handling or reencoding between various UTFs. The binary value may also still contain some additional state variables or flags, such as the scalar value of the previous or next code point in the text stream, or an end of file indicator, or a positional index, or the current state of an output encoder/compressor (e.g. for SCSU). These extra info are just like private fields in a object instance (in OO programming), or some dirty flags (for objects that need to be preseved if swapped out, or parity/CRC bits; where the scalar value is just a public field, or an exposed computed property...

