Re: Can a single text document use multiple character encodings?

Asmus Freytag Wed, 28 Aug 2013 17:26:47 -0700

On 8/28/2013 1:00 PM, Stephan Stiller wrote:

For Web formats (HTML, etc.), the answer is "no".

The obvious follow-up to the list: It'd be interesting to know where
the answer is "yes".


People will occasionally mention ISO/IEC 2022, which can be thought of
as a meta-encoding or encoding template or encoding constructor, but
in the normal case a sensible position is that a document making use
of multiple encodings is no longer plaintext ("single text document"
in this thread's subject line). And – yes – "plaintext" is a fuzzy
notion around the edges, as others have successfully argued in the past.

The original question was about combining UTF-8 and UTF-16 in the samedocument. As plain text is usually represented as an array of codeunits, the choice of two different code unit sizes 8 and 16-bit isparticularly unlikely to be supported - with potential exception of somerichly structured data formats which (if they existed) might handle suchsituations. None are generally known that fit the "single text document"qualification.

ISO 2022 allows switching among sets in mid stream, but as far as Iremember (haven't had to think about this since Unicode came around) thecode unit is still a byte, except that sometimes pairs of bytes areused. As I remember, ISO 2022 was still far from widely supported in thelate 80's and practically not at all on the fast growing PC sector.

The reason for that, the Unicode advocates think, is that it's just toounwieldy.

As for mixing UTF-8 and UTF-16, the conversion is lossless and sotrivial that most people would just convert the data into one or theother of these formats and not bother to have both. So in the unlikelycase that a format existed for a "single document" where you could tothat, it would seem even less likely that it was used for the examplegiven in the question.

A./

Re: Can a single text document use multiple character encodings?

Reply via email to