Hello all,

 

I’m using Xerces 2.7 and I’m trying to parse the following snippet from my XML file:

 

<title>Junk Mail - just how &#8220;heavy&#8221; a problem is it?</title>

 

The xml header/encoding on the file is:

 

<?xml version="1.0" encoding="UTF-8"?>

 

When I parse this and walk the DOM and extract the contents of this title node, I get back:

 

Junk Mail - just how “heavy” a problem is it?

Where the special characters are decimal 30,128,100 and 30,128,99

 

Why is Xerces interpreting the &#xxxx; codes and more importantly, how do I stop it? J

 

Here is my Xerces setup code:

 

      m_parser = new XercesDOMParser();

      m_parser->setValidationScheme( XercesDOMParser::Val_Never );

      m_parser->setDoNamespaces( false );

      m_parser->setDoSchema( false );

      m_errorHandler = (ErrorHandler*) new HandlerBase();

      m_parser->setErrorHandler( m_errorHandler );

 

Hope someone can help, thanks a lot!!

 

Graeme Ing

 

Reply via email to