On 4/10/2025 4:58 PM, Mark E. Shoulson via Unicode wrote:
Characters that look the same and act different really *are* bad news, and that bad news should be considered also. And other objections also make sense. It just feels like there's some reflex opposition to any change.
Yes, there's some reflex opposition to certain changes. Some of that even got codified in stability policies.
There are two main reasons for adopting a stability policy. One is to make sure that existing sequences of code points can't be reanalyzed differently in a later version, so that their interpretation/display/whatever stays constant. The other reason is to limit change so that version upgrades can be more predictable. Including limiting the possibility of new values for some property.
"Characters that look the same and act differently" break predictability of interpretation. Not because implementations can't handle these code points predictably based on their behavior, but because users can't predictably enter them. For some, changing the surrounding text can change their interpretation differently, which is something nobody can predict when text is first entered.
And, if it is not possible to tell whether users entered the correct code point (compared to their intent), it's not possible to make text behave in a way that users can predict. Users can't be sure they entered the correct code point, if the behavior is such that it can't be observed at the time the text is entered.
Invisible characters are a specialization of the foregoing. Unless they affect the text at the point of text entry, they cannot be detected and verified by the user. A "show hidden characters" mode can assist, but isn't always available.
There are some situations where there are overriding reasons for either of these types of characters, but time and again, they cause unanticipated problems. Experience predicts that "I don't see anything wrong with this solution" is almost a guarantee that undiscovered problems exist.
A./