> On 11 Sep 2018, at 23:48, Richard Wordingham via Unicode > <[email protected]> wrote: > > On Tue, 11 Sep 2018 21:10:03 +0200 > Hans Åberg via Unicode <[email protected]> wrote: > >> Indeed, before UTF-8, in the 1990s, I recall some Russians using >> LaTeX files with sections in different Cyrillic and Latin encodings, >> changing the editor encoding while typing. > > Rather like some of the old Unicode list archives, which are just > concatenations of a month's emails, with all sorts of 8-bit encodings > and stretches of base64.
It might be useful to represent non-UTF-8 bytes as Unicode code points. One way might be to use a codepoint to indicate high bit set followed by the byte value with its high bit set to 0, that is, truncated into the ASCII range. For example, U+0080 looks like it is not in use, though I could not verify this.

