Hello Stuart Somer, you wrote:
> I find many recomendations not to use unicode characters for entities > like em dashes trademark symbols because there is poor browser support. According to HTML 4, <http://www.w3.org/TR/html401/charset.html#h-5.3>, you may use any NCR (numeric character reference), or any entity, regard- less of the encoding <http://www.w3.org/TR/html401/charset.html#h-5.2.2>. In theory, the Document Character Set is always the Universal Character Set (UCS, aka Unicode), <http://www.w3.org/TR/html401/charset.html#h-5.1>; the encoding chosen is just the vehicle to transfer the characters readily from the server to the client: the characters contained in that set may be given in their respective binary representation, while any character may be given as a NCR. A browser should be capable of dis- playing all Unicode characters, provided there are suitable fonts locally available. In contrast to this theory, Netscape 4.7 does display only characters that are in the encoding chosen -- with a notable exception: if the encoding is ISO-8859-1, all CP 1252 characters can be displayed (at least on Windows systems; I have not excessively tested Netscape on other OSes). Cf. <http://czyborra.com/charsets/iso8859.html#ISO-8859-1> and <http://czyborra.com/charsets/codepages.html#CP1252>, for these character sets. Netscape 6.2, Internet Explorer 6.0, and Opera 6.0 comply with the HTML 4 character model, as outlined above. Hence my recommendation: - When your user community has Netscape 6.2, Internet Explorer 6.0, or Opera 6.0, use any convenient encoding, and insert characters beyond the chosen encoding as either NCRs or entities. - When a notable fraction of your user community uses older browsers, particularly Netscape 4.7: - For characters contained in CP 1252, such as em-dash, trademark symbol, and smart quotes, choose ISO-8859-1 encoding, and use NCRs for the characters not in ISO-8859-1 (but in CP 1252). - If you need characters beyond CP 1252, choose UTF-8 encoding; depending on your editor (and other authoring tools), you may prefer to enter all characters directly, or to enter the characters beyond ASCII as entities or NCRs. In any case, it would be wise to - stay within the WGL4.0 Character Set, cf. <http://www.microsoft.com/typography/otspec/WGL4.htm>, as there are suitable fonts freely available, - test your WWW-pages with all browsers popular in your user community. > Do you know of a chart for browser support of > unicode by browser version. The most comprehensive discussion I've seen is <http://www.hclrss.demon.co.uk/unicode/browsers.html>. Best wishes, Otto Stolz

