RE: UTF-16 is not Unicode

Marco Cimarosti Tue, 12 Feb 2002 03:00:45 -0800

Martin Kochanski wrote:
> >From: Tom Gewecke <[EMAIL PROTECTED]>
[...]
> > I constantly run into browser, mail, and text editing software
> > with encoding menus that list, as two separate items, Unicode
> > and UTF-8, as if Unicode and UTF-16 were identical and as if
> >  UTF-8 were not Unicode.
>
> As a poor software maker, I suppose I ought to defend other 
> software makers. EVERYONE KNOWS that Unicode and UTF-16 are 
> the same thing. It is, unfortunately, irrelevant that in this 
> case (as in so many others) "what everyone knows" happens to 
> be untrue. We exist to conform to the user's expectations, 
> not to educate him; still less to confuse him by replacing a 
> nice simple word (Unicode) with indigestible code letters and 
> digits (UTF-16BE or whatever).


The terms that actually get in to common usage follow unpredictable routes.
I guess that if users commonly say "Save the file as Unicode" as opposed to
"Save it in UTF-8", the labels on the menu should reflect this language, or
the user would not know what to do.

At best, the localization could use a label such as "Unicode (UTF-8)" to
enforce the concept that UTF-8 is Unicode as well. But it could hardly use
"Unicode (UTF-16BE)" for the *default* UTF, because the user would ask
"Where is *plain* 'Unicode'?"

> That said, has anyone a suggestion for names of available 
> output formats (as presented to an end user) that would not 
> confuse the user but would satisfy the purist?

Before trying answering this question, we should perhaps consider whether
the user needs all this details, and all of them at the same hierarchical
level.

A ideal interface should probably automatically and silently select Unicode
(and its default UTF) whenever one or more of the characters in a document
are not representable in the local encoding.

Only if the user selects a menu like "Manual encoding settings", she should
be presented with a choice like "International (Unicode)", that opposes to
"Western (ISO 8859-1)", "Chinese, simplified (GB 2312-80)", and so on. All
entries should have a generic descriptive label together with a precise
geek-friendly label in parenthesis.

When the user selects "International (Unicode)", he should be allowed to
enter an "Advanced settings" menu which, for this encoding, allows choosing
between "8 bit ASCII-compatible (UTF-8)", "16 bit with surrogates support
(UTF-16)", "flat 32 bit (UTF-32)". Selecting "16 bit ... (UTF-16)" shows
extra choices like "Big-endian" vs. "Little Endian".

Such an interface would *teach* the user the exact relationship between the
various choices. But, of course, this requires time and resources for a
complete redesign of the encoding menu. If the developers are just given the
time to throw in a flat list with all the options, we can't blame them for
the result...

_ Marco

RE: UTF-16 is not Unicode

Reply via email to