Martin Kochanski wrote: > >From: Tom Gewecke <[EMAIL PROTECTED]> [...] > > I constantly run into browser, mail, and text editing software > > with encoding menus that list, as two separate items, Unicode > > and UTF-8, as if Unicode and UTF-16 were identical and as if > > UTF-8 were not Unicode. > > As a poor software maker, I suppose I ought to defend other > software makers. EVERYONE KNOWS that Unicode and UTF-16 are > the same thing. It is, unfortunately, irrelevant that in this > case (as in so many others) "what everyone knows" happens to > be untrue. We exist to conform to the user's expectations, > not to educate him; still less to confuse him by replacing a > nice simple word (Unicode) with indigestible code letters and > digits (UTF-16BE or whatever).
The terms that actually get in to common usage follow unpredictable routes. I guess that if users commonly say "Save the file as Unicode" as opposed to "Save it in UTF-8", the labels on the menu should reflect this language, or the user would not know what to do. At best, the localization could use a label such as "Unicode (UTF-8)" to enforce the concept that UTF-8 is Unicode as well. But it could hardly use "Unicode (UTF-16BE)" for the *default* UTF, because the user would ask "Where is *plain* 'Unicode'?" > That said, has anyone a suggestion for names of available > output formats (as presented to an end user) that would not > confuse the user but would satisfy the purist? Before trying answering this question, we should perhaps consider whether the user needs all this details, and all of them at the same hierarchical level. A ideal interface should probably automatically and silently select Unicode (and its default UTF) whenever one or more of the characters in a document are not representable in the local encoding. Only if the user selects a menu like "Manual encoding settings", she should be presented with a choice like "International (Unicode)", that opposes to "Western (ISO 8859-1)", "Chinese, simplified (GB 2312-80)", and so on. All entries should have a generic descriptive label together with a precise geek-friendly label in parenthesis. When the user selects "International (Unicode)", he should be allowed to enter an "Advanced settings" menu which, for this encoding, allows choosing between "8 bit ASCII-compatible (UTF-8)", "16 bit with surrogates support (UTF-16)", "flat 32 bit (UTF-32)". Selecting "16 bit ... (UTF-16)" shows extra choices like "Big-endian" vs. "Little Endian". Such an interface would *teach* the user the exact relationship between the various choices. But, of course, this requires time and resources for a complete redesign of the encoding menu. If the developers are just given the time to throw in a flat list with all the options, we can't blame them for the result... _ Marco

