Michael Everson recently pointed out that the Unicode home page seems to
begin with the character U+FEFF (ZWNBS/BOM), encoded as UTF-8.  Presumably
this is an artifact created by the program used to make the page, although
I haven't noticed it on any others on the site.

I had a look at the BOM faq and am wondering if any list members could
confirm my understanding of the proper use of BOM at the start of web pages:

--The only case where a BOM should be used is when the byte order is not
specified by the encoding/charset listed in the HTML, i.e. UTF-16 or 32.
For all others, including the BE and LE varieties of the latter, it should
not be used.

--If the page is marked UTF-16 and has no BOM it will be interpreted as
UTF-16BE.

--U+FEFF can appear (presumably by accident) at the beginning of any web
page, but aside from those two cases where it is necessary, it is a ZWNBS
and not a BOM.  (As Michael pointed out, Mac IE 5.2.2 displays a Euro
symbol).

Suppose a page has no charset/encoding specified in the markup.   Does the
presence of U+FEFF mean it should be presumed to be UTF-16?  Some of my
browsers behave this way.






Reply via email to