Folks,

I've been following this thread for awhile and it seems that I can make a small 
contribution.

Several comments have been made about why we should NOT document this and give it some 
kind of official imprimatur. I agree that it will generate more confusion and may be 
used in unforeseen ways by unwary people who don't take time to read the documentation.

However: the comments about this encoding being confined to the Evil Doers Who 
Practice It is faulty. Here at webMethods we have something like 90 product 
"adapters": pieces of software that talk to a specific application. As a result, I am 
aware of the vast range of variation in character set and encoding support available 
to product designers. One problem that we are approaching is that the changes to UTF-8 
(to prohibit non-shortest-form) *are* changes and that the products I work on do not 
have the option of rejecting "malformed" data. Adapters must accept the way in which 
Oracle or Peoplesoft have implemented their system (for example) and deal with it 
correctly, with a minimum loss of data.

By providing a documented, standard way to refer to legacy versions of these products 
and their encodings, I can more readily rely on having a well-documented range of 
protocols and procedures for converting and validating data exchanged with these 
systems. The argument that these products "merely support an older version of the 
Unicode standard" is specious, because the older versions merely made the six-byte 
form permissable by way of omission (the six-byte form was *never* the preferred 
form). The older versions say nothing about mixing the two forms, for example. Whether 
we dignify this encoding with a name or not, someone needs to fully document the rules 
and provide a stable basis for supporting this usage. 

For what it's worth, I thank Toby for braving the heat to produce this document. As a 
practical matter, I don't support the creation of new CESU-8 systems and will be 
grappling for a place on the walls to throw hot oil down on the barbarians who propose 
them, but for supporting our existing legacies (which cannot merely be extinguished 
"in the next release"), I think the effort is valuable. And the wording of the UTR 
seemed restrictive enough to me, at least, to be able to support the UTR (since it 
provides me the ammunition to oppose its adoption in practice).

Best Regards,

Addison

Addison P. Phillips
Globalization Architect / Manager, Globalization Engineering
webMethods, Inc.  432 Lakeside Drive, Sunnyvale, CA
+1 408.962.5487 (phone)  +1 408.210.3659 (mobile)
-------------------------------------------------
Internationalization is an architecture. It is not a feature. 
webMethods--THE Software Integration Company


Reply via email to