Folks,
I've been following this thread for awhile and it seems that I can make a small
contribution.
Several comments have been made about why we should NOT document this and give it some
kind of official imprimatur. I agree that it will generate more confusion and may be
used in unforeseen ways by unwary people who don't take time to read the documentation.
However: the comments about this encoding being confined to the Evil Doers Who
Practice It is faulty. Here at webMethods we have something like 90 product
"adapters": pieces of software that talk to a specific application. As a result, I am
aware of the vast range of variation in character set and encoding support available
to product designers. One problem that we are approaching is that the changes to UTF-8
(to prohibit non-shortest-form) *are* changes and that the products I work on do not
have the option of rejecting "malformed" data. Adapters must accept the way in which
Oracle or Peoplesoft have implemented their system (for example) and deal with it
correctly, with a minimum loss of data.
By providing a documented, standard way to refer to legacy versions of these products
and their encodings, I can more readily rely on having a well-documented range of
protocols and procedures for converting and validating data exchanged with these
systems. The argument that these products "merely support an older version of the
Unicode standard" is specious, because the older versions merely made the six-byte
form permissable by way of omission (the six-byte form was *never* the preferred
form). The older versions say nothing about mixing the two forms, for example. Whether
we dignify this encoding with a name or not, someone needs to fully document the rules
and provide a stable basis for supporting this usage.
For what it's worth, I thank Toby for braving the heat to produce this document. As a
practical matter, I don't support the creation of new CESU-8 systems and will be
grappling for a place on the walls to throw hot oil down on the barbarians who propose
them, but for supporting our existing legacies (which cannot merely be extinguished
"in the next release"), I think the effort is valuable. And the wording of the UTR
seemed restrictive enough to me, at least, to be able to support the UTR (since it
provides me the ammunition to oppose its adoption in practice).
Best Regards,
Addison
Addison P. Phillips
Globalization Architect / Manager, Globalization Engineering
webMethods, Inc. 432 Lakeside Drive, Sunnyvale, CA
+1 408.962.5487 (phone) +1 408.210.3659 (mobile)
-------------------------------------------------
Internationalization is an architecture. It is not a feature.
webMethods--THE Software Integration Company