DO NOT REPLY [Bug 7594] New: - Junk characters will cause the Xerces XML parser to fail

bugzilla Thu, 28 Mar 2002 15:20:11 -0800

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7594>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.


http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7594

Junk characters will cause the Xerces XML parser to fail

           Summary: Junk characters will cause the Xerces XML parser to fail
           Product: Xerces-C++
           Version: 1.6.0
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: Build
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]
                CC: [EMAIL PROTECTED]


Consider these two documents:

Document one has the bad character:

"<?xml version="1.0" ?> 
<ControlInfo>
        <ProjectName>c:\ricktest\PalmDesktop.dcprj</ProjectName>
        <SceneName>GWMain</SceneName>
        <ControlName>ListView1</ControlName>
        <ControlType>8</ControlType>
        <ActionType>getall</ActionType>
        <Data>
                <I1>I1
                        <D1>SMALLCAPINVESTOR.COM 
&lt;[EMAIL PROTECTED]&gt;��03/06/02 09:22PM�</D1>
                </I1>
        </Data>
        <Parameters>ron</Parameters>
        <ExpectedScene>GWMain</ExpectedScene>
</ControlInfo>"

Document 2 has the character removed:

"<?xml version="1.0" ?> 
<ControlInfo>
        <ProjectName>c:\ricktest\PalmDesktop.dcprj</ProjectName>
        <SceneName>GWMain</SceneName>
        <ControlName>ListView1</ControlName>
        <ControlType>8</ControlType>
        <ActionType>getall</ActionType>
        <Data>
                <I1>I1
                        <D1>SMALLCAPINVESTOR.COM 
&lt;[EMAIL PROTECTED]&gt;03/06/02 09:22PM</D1>
                </I1>
        </Data>
        <Parameters>ron</Parameters>
        <ExpectedScene>GWMain</ExpectedScene>
</ControlInfo>"

The Xerces DLL throws an exception while parsing document one but not document 
two. The only reason I bring this up as an issue is because Microsofts XML 
parser used in IE 5.0+ handles the character... at least it parses the XML and 
displays a 'black square' for the invalid character.

Is this how Xerces is supposed to handle invalid text?  Is the text actually 
invalid? Is there a setting in the parser that will alow for these invalid 
characters to be ignored?

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 7594] New: - Junk characters will cause the Xerces XML parser to fail

Reply via email to