> On 16 May 2017, at 15:21, Richard Wordingham via Unicode > <unicode@unicode.org> wrote: > > On Tue, 16 May 2017 14:44:44 +0200 > Hans Åberg via Unicode <unicode@unicode.org> wrote: > >>> On 15 May 2017, at 12:21, Henri Sivonen via Unicode >>> <unicode@unicode.org> wrote: >> ... >>> I think Unicode should not adopt the proposed change. >> >> It would be useful, for use with filesystems, to have Unicode >> codepoint markers that indicate how UTF-8, including non-valid >> sequences, is translated into UTF-32 in a way that the original octet >> sequence can be restored. > > Escape sequences for the inappropriate bytes is the natural technique. > Your problem is smoothly transitioning so that the escape character is > always escaped when it means itself. Strictly, it can't be done. > > Of course, some sequences of escaped characters should be prohibited. > Checking could be fiddly.
One could write the bytes using \xnn escape codes, sequences terminated using \& as in Haskell, translating '\' into "\\". It then becomes a C-encoded string, not plain text.