RE: [BULK] - Re: MCW encoding of Hebrew (was RE: Response to Everson Ph and why Jun 7? fervor)From: Mike Ayers > Another such code is VISCII for Vietnamese.
Recte: VISCII does not claim to be ASCII. It claims be be a separate 8-bit encoding, which includes the US-ASCII printable charset, but is not compatible with ASCII as it replaces some C0 controls by Latin characters... breaking the conformance model for ISO 646. So the MCW representation of Hebrew letters with 7-bit codes that can fit in systems made to transport or store safely only ASCII is a charset under the IANA definition: i.e. the association of a character repertoire (or Unicode subset), and encoding that assigns a unique numeric code to the characters, and a serialization syntax which maps these codes into streams of bytes (here a simple identity function). The fact that it is or is not registered on IANA as a "charset" usable for interchange (for example in MIME content-types) does not change its status: this MCW encoding (as well as VISCII) is definitely *NOT* ASCII (i.e. ISO 646-US) and it does not comply to ISO 646 encoding rules (which *require* mapping the invariable subset with no other interpretation as Basic Latin letters digits and punctuations)! One prrof is the encoding of alef as a left parethensis: it breaks the use of paired parentheses, will prevent using parentheses in Hebrew, will not allow putting negative numbers in parentheses; also it will give wrong results if case mapping is performed legitimately as if it was ASCII (breaking with case-insensitive searches). Any MCW-encoded text exposed as if it was ASCII will become exposed to lots of interoperability problems, *unless* the text is correctly tagged as using another charset than ASCII. The fact that this is private should not be a limit. For example a MCW-encoded text could be transported with the following MIME content-type: text/plain; charset=x-MCW under the following Content-Transfer-Syntax: 7-bit or with other transforms (Base64, Quoted-Printable...) or compressions (deflate...) There are much enough options in Emails to allow transporting private encodings safely, without claiming to be ASCII when it is not.

