On 8/28/2013 1:00 PM, Stephan Stiller wrote:
For Web formats (HTML, etc.), the answer is "no".
The obvious follow-up to the list: It'd be interesting to know where
the answer is "yes".

People will occasionally mention ISO/IEC 2022, which can be thought of
as a meta-encoding or encoding template or encoding constructor, but
in the normal case a sensible position is that a document making use
of multiple encodings is no longer plaintext ("single text document"
in this thread's subject line). And – yes – "plaintext" is a fuzzy
notion around the edges, as others have successfully argued in the past.

The original question was about combining UTF-8 and UTF-16 in the same document. As plain text is usually represented as an array of code units, the choice of two different code unit sizes 8 and 16-bit is particularly unlikely to be supported - with potential exception of some richly structured data formats which (if they existed) might handle such situations. None are generally known that fit the "single text document" qualification.

ISO 2022 allows switching among sets in mid stream, but as far as I remember (haven't had to think about this since Unicode came around) the code unit is still a byte, except that sometimes pairs of bytes are used. As I remember, ISO 2022 was still far from widely supported in the late 80's and practically not at all on the fast growing PC sector.

The reason for that, the Unicode advocates think, is that it's just too unwieldy.

As for mixing UTF-8 and UTF-16, the conversion is lossless and so trivial that most people would just convert the data into one or the other of these formats and not bother to have both. So in the unlikely case that a format existed for a "single document" where you could to that, it would seem even less likely that it was used for the example given in the question.

A./

Reply via email to