About UTS#6: SCSU (A Standard Compression Scheme for Unicode). http://www.unicode.org/reports/tr6/tr6-3.5.html
I know that this is not part of the SCSU standard, but the reference section 10 about private extensions of SCSU seems to forget some other wellknown transport encoding syntaxes that allows transporting SCSU content within streams where usage of control bytes (like the null byte) is restricted. One well-known method is to apply a "COBS" encoding. See reference and implementation details in http://www.acm.org/sigcomm/sigcomm97/papers/p062.pdf It is MUCH better than the proposed method in section 10.1 that uses "DLE escaping", and the method is generic enough to allow escaping ANY byte value (not only the 0x00 byte): (1) When used with the default profile (which just avoids the null byte value), COBS allows avoiding any occurence of the null byte with the worst case producing not more than 1 byte every 254 source bytes, and no more than 1 additional byte for any random source stream. (2) With an extended COBS profile, where N byte values need to be avoided in the encoded stream, the worst case produces only 1 additional byte for every (255-N) source bytes, and also no more than 1 additional byte for any random source stream. So this can be used to restrict the output stream to avoid ALL control bytes that are undesirable during transport, notably all C0 control bytes used by SCSU as "tags" (i.e. bytes 0x00-0x1F except CR=0x0D, LF=0x0A, TAB=0x09), or even all C1 control bytes (in 0x80-0x9F, notably the NL character). (3) A COBS profile that would avoid all C0&C1 control bytes except CR, LF and TAB would cost no more than 1 additional byte for every 226 bytes of SCSU-encoded source bytes: this worst case represents less than +0.5% of transported data size, still much better than the +100% you get in the worst case with the transport syntaxes suggested in 10.1! (4) COBS can be used as well to restrict the allowed bytes to the 7-bit range, making SCSU plus a COBS transfer encoding syntax in this COBS profile suitable for emails, and still much better than UTF-7 for Asian languages or multilanguage documents that largely benefit from the SCSU compression. A COBS profile can also handle the case of repeated byte values in the SCSU compressed stream (case discussed in section 10.2 of UTS#6). It also works much better than other well-known Transform Encoding Syntaxes like Base64 or Quoted-Printable, often used for emails but that behave poorly with Asian languages: these TES also have very poor worst cases (that can completely break the compression benefits offered by SCSU). Implementing COBS is also very straightforward, with very little CPU overhead (COBS will just need an internal buffering with a maximum of 254 bytes with the default profile that avoids null byte values, which is very reasonnable, and easy to implement in low-cost hardware too). Because of these properties, there's no need to modify the standard SCSU algorithm: one just needs to apply COBS encoding directly on the output of the SCSU compressor. COBS appears then as a better solution than what is suggested in section 10.1 and 10.2 of TR6... Setting up COBS profiles is not necessary when implementing SCSU, so such extensions are really not needed. I would suggest that TR6 removes the section 10, and instead puts it into an annexe showing how a transport encoding syntax can be used to solve the suggested problems: The solutions exposed in section 10.1 and 10.2 are definitely not the best ones if one needs a good compression of Unicode, because their usage have very bad worst cases that double the size of the output stream. Another option would be to add section 10.3 referencing COBS as a better transfer encoding syntax, and saying that the existing 10.1 and 10.2 solutions should better be modeled as simple transfer encoding syntaxes too, completely out of scope of the SCSU UTF itself, that really don't need such extensions in its core, where it will produce interoperability problems, now that it is a Unicode Technical Standard, to be implemented notably in XML or HTML parsers.

