Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Hans Åberg via Unicode Thu, 18 May 2017 01:36:09 -0700

> On 16 May 2017, at 15:21, Richard Wordingham via Unicode 
> <[email protected]> wrote:
> 
> On Tue, 16 May 2017 14:44:44 +0200
> Hans Åberg via Unicode <[email protected]> wrote:
> 
>>> On 15 May 2017, at 12:21, Henri Sivonen via Unicode
>>> <[email protected]> wrote:  
>> ...
>>> I think Unicode should not adopt the proposed change.  
>> 
>> It would be useful, for use with filesystems, to have Unicode
>> codepoint markers that indicate how UTF-8, including non-valid
>> sequences, is translated into UTF-32 in a way that the original octet
>> sequence can be restored.
> 
> Escape sequences for the inappropriate bytes is the natural technique.
> Your problem is smoothly transitioning so that the escape character is
> always escaped when it means itself. Strictly, it can't be done.
> 
> Of course, some sequences of escaped characters should be prohibited.
> Checking could be fiddly.


One could write the bytes using \xnn escape codes, sequences terminated using 
\& as in Haskell, translating '\' into "\\". It then becomes a C-encoded 
string, not plain text.

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Reply via email to