2013/9/17 Stephan Stiller <[email protected]> > [AF:] Once you add the UTF-prefix, you are, by force, speaking of code > units. > > So the high-low distinction for "surrogate" code points is misleading, and > the "surrogate" attribute for "code point" shouldn't be there, because, as > I've in fact written in a much earlier thread and as people know, > surrogates are UTF-16-specific. >
There's really no chance that code points will be deprecated, and not even the "surrogates". Because they will persist for backward compatibility (the UCS is already full of "characters" that are present only for compatiblity with legacy encodings, evn if they were not approved standards/norms, but only industry standards) My opinion is that these surrogates are already widely perceived in lots of applications as if they were "characters". It's difficult (but not completely impossible) to interoperate with these applications, even if their use does not fully conform to UTF-16. I will call those applications using not really UTF-16, but simply a less restricted "UCS-2" encoding. Unpaired surrogates are a really today. Even if we cannot convert them compliantly to a standard UTF (but standard UTF's are not alone). IT's stll possibly to use them in encoded texts when they are not strictly interoperable with all standard UTF's. But it is possible exactly because standard UTF's do not fully use their encoding space. If an application uses these unallocated spaces, they are doing it under a private use rule, outside of the UCS standards. But let's remember that UCS standards are definitely NOT mandatory anywhere. So we'll continue to live with code points that can be surrogates (not interoperable between distinct standard UTFs), or even "non-characters" (which may interoperate only within standard UTF's). They are extensions, and as much ambiguous as most C0 and C1 controls.

