> there is never any excuse for software to create unpaired surrogates, or any other sort of invalid code unit sequences
First off, it depends on when one is encountered. They are invalid in UTF16, but are permitted in a Unicode 16-bit string. But more fundamentally, there may not be "excuses" for such software, but it happens anyway. Pretending it doesn't, makes for unhappy customers. For example, you don't want to be throwing an exception when one is encountered, when that could cause an app to fail. So the point is to handle the situation as gracefully, consistently, and as safely as possible. And 'safely' is key. Pretending that it doesn't exist is logically equivalent to deletion, and can cause security problems. (see tr36) Mark On Mon, Oct 19, 2015 at 10:07 AM, Doug Ewell <[email protected]> wrote: > This discussion was originally about how to handle unpaired surrogates, > as if that were a normal use case. > > Regardless of what encoding model is used to handle characters under the > hood, and regardless of how the Delete key should work with actual > characters or clusters, there is never any excuse for software to create > unpaired surrogates, or any other sort of invalid code unit sequences. > That is like having an image editor that deletes every 128th byte from a > JPEG file, and then worrying about how to display the file. > > -- > Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸 > > >

