2011/8/25 Richard Wordingham <[email protected]>: > It will only happen when the need becomes obvious, which may be never, > or may be 30 years hence. It's even conceivable that UTF-16 will > drop out of use. "Conceivable" but extremely unlikely because it will remain used in extremely frequent cases, even if it can only support a subset of the new encoding.
[begin side note] This is a situation similar to the case of the UCS-2 subset, and of the ISO 10646 "implementation levels" that have been withdrawn and are no longer meaningful as a condition for conformance: conforming applications today *must* exhibit behaviors that effectively can respect the unbreakability and unreorderability of surrogate pairs; the need to support isolated surrogates or custom encodings that would depend on different pairing rules of surrogates, i.e. a high surrogate followed by a low surrogate, are not conforming. This does not mean that applications have to imply distinctive semantics to surrogates or have to "support" non-BMP characters by recognizing their distinctive properties: as long as runs of surrogates are handled in such a way that they will never be reordered or composed in arbitrary sequences, these applications can satisfy the conformance requirement, without having to fully assert a higher "implementation level". So an UCS-2 only application can continue to blindly treat surrogates *as if* they were unbreakable strings of symbols with a strong LTR directionality and unknown glyphs (or just the same ".notdef" glyph), or to treat them *as if* they were unassigned (but valid) code points in the BMP (all with the same default property values, except that the value of individual code units must all be preserved; alternatively an UCS-2 application may still replace those surrogate code units all simultaneously to the same value associated to a non-ignorable character, such as 0xFFFD or 0x003F, or may still suppress all of them, knowing that it is destructive of information, or opt for throwing a fatal exception for all of them; these are some of the worst situations where this UCS-2 only behavior is still conforming). [end side note] This does not mean that existing UTF's will be the favored encoding in the future (we can't say that even about UTF-8, or UTF-32). It's just impossible to magically predict now which of the three standard UTF's (or their standard byte-order variants) will become out of use, or if any one of them will become out of use: for now there is absolutely no sign that this will ever occur. Instead, we still see a very large (and still accelerating) adoption rate for these UTFs (notably UTF-8).

