On Fri, 30 Aug 2013 22:23:14 +0200 Steffen "Daode" Nurpmeso <[email protected]> wrote:
> Hello character plus experts, > i'm wondering wether there are any multibyte character sets known > which use the numerical values of ASCII control characters that > are vital to Unix/POSIX (plus) as part of multibyte sequences? > In particular U+000A and U+000D? Infamously, UTF-16, as implied by Doug's mention of SCSU. If you count fixed length (>1) character sets as multibyte, you can add UCS-2 and UTF-32. UTF-16 does have the property that characters occupy a multiple of 2-bytes, so are well behaved in this respect if one knows to work with aligned pairs of bytes rather than bytes, and if one knows the endianity. Also, at present, U+0A00 and U+0D00 are unassigned. Note that the old belief that U+FFFE would not occur externally to an application has been decreed a fallacy, so an apparent U+FEFF or U+FFFE at the start of a file from an external source only indicates the endianity if one knows that file is encoded in the UTF-16 encoding scheme as opposed to the UTF-16LE or UTF-16BE encoding scheme. For UTF-32, reversing the bytes of a C0 control character would yield an invalid byte seqeunce. Richard.

