Re: Unicode String Models

Hans Åberg via Unicode Tue, 11 Sep 2018 15:16:45 -0700


> On 11 Sep 2018, at 23:48, Richard Wordingham via Unicode 
> <[email protected]> wrote:
> 
> On Tue, 11 Sep 2018 21:10:03 +0200
> Hans Åberg via Unicode <[email protected]> wrote:
> 
>> Indeed, before UTF-8, in the 1990s, I recall some Russians using
>> LaTeX files with sections in different Cyrillic and Latin encodings,
>> changing the editor encoding while typing.
> 
> Rather like some of the old Unicode list archives, which are just
> concatenations of a month's emails, with all sorts of 8-bit encodings
> and stretches of base64.


It might be useful to represent non-UTF-8 bytes as Unicode code points. One way 
might be to use a codepoint to indicate high bit set followed by the byte value 
with its high bit set to 0, that is, truncated into the ASCII range. For 
example, U+0080 looks like it is not in use, though I could not verify this.

Re: Unicode String Models

Reply via email to