RE: Unicode forms for internal storage - BOCU-1 speedFrom: Mike Ayers > The author called it "UTF-9". Therefore we call it the same thing so anyone > knows what we're talking about. It may not be ideal, but it's intelligible. > Why should anyone assume that something is an international standard just > because its name starts with "UTF-"?
You can't assume that everybody knows what is spoken about when one finds a reference to a name starting by "UTF-". The first question that will come is that Unicode does not document it, and where it can be found. I don't object proposals to define new "UTF-*" forms, but this should still be proposals for an otherwise distinctly named encoding form, chosen by the proposal author out of the "UTF-*" naming space. Did Jerome Abela or Mark Crispin provide a reference name/symbol for their encoding? They could have simply used their initials to reference it and to say, for example in the case of Mark Crispin's encoding form: "MC-UTF-9" is a Unicode-conforming encoding form used to represent any valid Unicode string with 9-bit code units. It is proposed as a candidate future encoding form that may be referenced later, if approved by a Unicode official reference document or in a IETF/ISO/IEC 10646 published RFC, by the name "UTF-9". Until then, this encoding form should never be referenced by the informal acronym "UTF-9". "MC-UTF-9" then designates only the encoding form specified by Mark Crispin in this document, and this name as well as the term "UTF-9" should not be used for any other proposed 9-bit encoding forms, except if approved by official Unicode or ISO/IEC 10646 publications. Such sentence makes sense and avoids confusions later, notably when several candidate encodings are studied. It also allows mutliple encodings to survive and interoperate. So let's not approve here the informal absive use of non standardized "UTF-" encoding schemes or forms... Unicode should ask to IANA to reject such registration needed for some MIME implementations, by reserving for itself (or for IETF if it wants to publish RFCs related to ISO standards) this prefix for future uses. I have seen several other informal proposals for "UTF-*" forms/schemes. All this is just confusive, and their authors should imagine their own names for reference. What do you think of this idea?

