Possible encoding related bug

Sasa Bojanic 20 Aug 2003 09:10:34 -0000

Hi,

I think that that there is an encoding related bug in Xerces2.5.

When using DOM parser, and trying to parse a document that contains characters that do not belong to the character set that correspond to the specified document encoding (e.g. the character � is contained in the document which encoding is specified as "us-ascii"), the parser is crashing.

Here is the code snippet:

      try {
         DOMParser parser = new DOMParser();
         parser.parse(toParse);
      }catch (Exception ex) {
         ex.printStackTrace();
      }

* "toParse" is the path to the following document:

<?xml version="1.0" encoding="us-ascii"?>
<Package Id="pkg1">

    <PackageHeader>
        <XPDLVersion>1.0</XPDLVersion>
        <Vendor>Together</Vendor>
        <Created>2003-08-20 10:00:49</Created>
    </PackageHeader>
</Package>

The parser crashes because of � character, and I get the following stack trace:

java.io.IOException: Byte "228" is not a member of the (7-bit) ASCII character set.
        at org.apache.xerces.impl.io.ASCIIReader.read(Unknown Source)
        at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
        at org.apache.xerces.impl.XML11EntityScanner.skipSpaces(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
        at XML.main(XML.java:25)

When I use Xerces2.4, everything goes fine!

Regards,

Sasa.

Possible encoding related bug

Reply via email to