Dean Roddey asks

> What is this UTF-8 BOM stuff? I've never heard of such a thing. Given
the
> form of UTF-8, why would it need a BOM? Its a multi-byte encoding, so
there
> are no components of it larger than a byte.

That was pretty much my first reaction also.  Checking with the ICU folks,
though, it turns out that UTF-8 allows a BOM, and, if it is found, it
should be ignored.  It doesn't affect the data that follows in any way,
except to confirm that the encoding is really utf-8 and not ascii or
latin-1 or whatever.  The utf-8 BOM is three bytes, and is nothing more
than the UTF-16 BOM character as it appears when encoded as UTF-8.

Pretty silly, especially since we already have an encoding declaration to
tell us what the encoding is.  But it seems that Microsoft is generating
utf-8 encoded XML with a BOM, and we need to be able to swallow it.


Andy Heninger
IBM XML Technology Group, Cupertino, CA
[EMAIL PROTECTED]




Reply via email to