Re: Code point vs. scalar value

Philippe Verdy Tue, 17 Sep 2013 20:24:16 -0700

2013/9/17 Stephan Stiller <[email protected]>

>  [AF:] Once you add the UTF-prefix, you are, by force, speaking of code
> units.
>
> So the high-low distinction for "surrogate" code points is misleading, and
> the "surrogate" attribute for "code point" shouldn't be there, because, as
> I've in fact written in a much earlier thread and as people know,
> surrogates are UTF-16-specific.
>


There's really no chance that code points will be deprecated, and not even
the "surrogates". Because they will persist for backward compatibility (the
UCS is already full of "characters" that are present only for compatiblity
with legacy encodings, evn if they were not approved standards/norms, but
only industry standards)

My opinion is that these surrogates are already widely perceived in lots of
applications as if they were "characters". It's difficult (but not
completely impossible) to interoperate with these applications, even if
their use does not fully conform to UTF-16. I will call those applications
using not really UTF-16, but simply a less restricted "UCS-2" encoding.

Unpaired surrogates are a really today. Even if we cannot convert them
compliantly to a standard UTF (but standard UTF's are not alone). IT's stll
possibly to use them in encoded texts when they are not strictly
interoperable with all standard UTF's. But it is possible exactly because
standard UTF's do not fully use their encoding space. If an application
uses these unallocated spaces, they are doing it under a private use rule,
outside of the UCS standards.

But let's remember that UCS standards are definitely NOT mandatory
anywhere. So we'll continue to live with code points that can be surrogates
(not interoperable between distinct standard UTFs), or even
"non-characters" (which may interoperate only within standard UTF's). They
are extensions, and as much ambiguous as most C0 and C1 controls.

Re: Code point vs. scalar value

Reply via email to