internal XMLReader.cpp

Andy Heninger 1 Aug 2000 06:54:46 -0000


Dean Roddey asks


> What is this UTF-8 BOM stuff? I've never heard of such a thing. Given
the
> form of UTF-8, why would it need a BOM? Its a multi-byte encoding, so
there
> are no components of it larger than a byte.

That was pretty much my first reaction also.  Checking with the ICU folks,
though, it turns out that UTF-8 allows a BOM, and, if it is found, it
should be ignored.  It doesn't affect the data that follows in any way,
except to confirm that the encoding is really utf-8 and not ascii or
latin-1 or whatever.  The utf-8 BOM is three bytes, and is nothing more
than the UTF-16 BOM character as it appears when encoded as UTF-8.

Pretty silly, especially since we already have an encoding declaration to
tell us what the encoding is.  But it seems that Microsoft is generating
utf-8 encoded XML with a BOM, and we need to be able to swallow it.


Andy Heninger
IBM XML Technology Group, Cupertino, CA
[EMAIL PROTECTED]

Re: cvs commit: xml-xerces/c/src/internal XMLReader.cpp

Reply via email to