Re: Unicode encoding forms in web development

Markus Scherer Tue, 20 Mar 2001 09:13:00 -0800
For HTML, use UTF-8. For XML, use UTF-8 or UTF-16.
US-ASCII and ISO 8859-1 are also acceptable, either if your actual character needs are 
limited to their repertoires or with numeric character references.
If you know the sender and receiver and you half a low-bandwidth application, consider 
SCSU.

See the Unicode FAQ with its recommendations.

See more specific comments below.

markus

Michel Paul wrote:
> 1- W3C recognized the benefits of Unicode character
> set by enforcing it HTML and XML. BUT they also did
> not enforce the Unicode encoding forms. Any character
> encoding form can be used.

Right, but:
For anything not Unicode/US-ASCII/ISO 8859-1, you will need character conversion 
tables. Such tables are poorly standardized, are a maintenance nightmare, and use a 
lot of space. You always have the danger of losing text because the table is not 
precisely the same in the sender and receiver processes, or because the encoding model 
is different and is not transformed by the conversion.

> 2- Since there is more than one Unicode encoding form,
> its declaration/identification (charset, BOM, ...) is
> still compulsory. Then why not using any other
> character encoding form?

See above. Conversion between any Unicode encoding form (and US-ASCII/ISO 8859-1) is 
simple, fast, and algorithmic (without tables).
Which of the dozen or so Shift-JIS tables in the industry are you using?

> 3- Authoring and development tools have a better
> support of "local" character encoding forms (non
> Unicode ones). That is why the vast majority of web
> pages do not use Utf-8, 16 or 32.

_Useful_ authoring tools will at least support UTF-8 and UTF-16.
Re: Unicode encoding forms in web development

Reply via email to