> On 11 Sep 2018, at 19:21, Eli Zaretskii <e...@gnu.org> wrote: > >> From: Hans Åberg <haber...@telia.com> >> Date: Tue, 11 Sep 2018 19:13:28 +0200 >> Cc: Henri Sivonen <hsivo...@hsivonen.fi>, >> unicode@unicode.org >> >>> In Emacs, each raw byte belonging >>> to a byte sequence which is invalid under UTF-8 is represented as a >>> special multibyte sequence. IOW, Emacs's internal representation >>> extends UTF-8 with multibyte sequences it uses to represent raw bytes. >>> This allows mixing stray bytes and valid text in the same buffer, >>> without risking lossy conversions (such as those one gets under model >>> 2 above). >> >> Can you give a reference detailing this format? > > There's no formal description as English text, if that's what you > meant. The comments, macros and functions in the files > src/character.[ch] in the Emacs source tree tell most of that story, > albeit indirectly, and some additional info can be found in the > section "Text Representation" of the Emacs Lisp Reference manual.
OK. If one encounters a file with mixed encodings, it is good to be able to view its contents and then convert it, as I see one can do in Emacs.