There were cases where the xml files had BOM marks and the encoding specified was utf-8. In those situation the parser's unable to recognize those files.
This change causes the UTF-8 BOM to be completely ignored for any ASCII family encoding. Andy H had a valid question though - should the BOM override the XML encoding declaration, or should the declaration override the BOM, or should it be an error if they conflict? Right now the encoding declaration overrides the BOM. Arundhati Dean Roddey wrote: > What is this UTF-8 BOM stuff? I've never heard of such a thing. Given the > form of UTF-8, why would it need a BOM? Its a multi-byte encoding, so there > are no components of it larger than a byte. > > -------------------------- > Dean Roddey > The CIDLib C++ Frameworks > Charmed Quark Software > [EMAIL PROTECTED] > http://www.charmedquark.com > > "You young, and you gotcha health. Whatchoo wanna job fer?" > > ----- Original Message ----- > From: <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Monday, July 31, 2000 12:00 PM > Subject: cvs commit: xml-xerces/c/src/internal XMLReader.cpp > > > aruna1 00/07/31 12:00:50 > > > > Modified: c/src/internal XMLReader.cpp > > Log: > > Fixed BOM in UTF-8 files > > > > Revision Changes Path > > 1.20 +15 -2 xml-xerces/c/src/internal/XMLReader.cpp > > > > Index: XMLReader.cpp > > =================================================================== > > RCS file: /home/cvs/xml-xerces/c/src/internal/XMLReader.cpp,v > > retrieving revision 1.19 > > retrieving revision 1.20 > > diff -u -r1.19 -r1.20 > > --- XMLReader.cpp 2000/07/25 22:33:05 1.19 > > +++ XMLReader.cpp 2000/07/31 19:00:48 1.20 > > @@ -55,7 +55,7 @@ > > */ > > > > /* > > - * $Id: XMLReader.cpp,v 1.19 2000/07/25 22:33:05 aruna1 Exp $ > > + * $Id: XMLReader.cpp,v 1.20 2000/07/31 19:00:48 aruna1 Exp $ > > */ > > > > > > // ------------------------------------------------------------------------- > -- > > @@ -1331,11 +1331,24 @@ > > break; > > } > > > > - case XMLRecognizer::US_ASCII : > > case XMLRecognizer::UTF_8 : > > { > > + // If there's a utf-8 BOM (0xEF 0xBB 0xBF), skip past it. > > + // Don't move to char buf - no one wants to see it. > > + // Note: this causes any encoding= declaration to > override > > + // the BOM's attempt to say that the encoding is > utf-8. > > + > > // Look at the raw buffer as short chars > > const char* asChars = (const char*)fRawByteBuf; > > + > > + if (fRawBytesAvail > XMLRecognizer::fgUTF8BOMLen && > > + XMLString::compareNString( asChars > > + , XMLRecognizer::fgUTF8BOM > > + , > XMLRecognizer::fgUTF8BOMLen) == 0) > > + { > > + fRawBufIndex += XMLRecognizer::fgUTF8BOMLen; > > + asChars += XMLRecognizer::fgUTF8BOMLen; > > + } > > > > // > > // First check that there are enough bytes to even see the > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Arundhati Bhowmick IBM -- XML Technology Group (Silicon Valley)