|
Hi,
I think that that there is an encoding related bug
in Xerces2.5.
When using DOM parser, and trying to parse a
document that contains characters that do not belong to the character set that
correspond to the specified document encoding (e.g. the character � is contained
in the document which encoding is specified as "us-ascii"), the parser is
crashing.
Here is the code snippet:
try
{
DOMParser parser = new DOMParser(); parser.parse(toParse); }catch (Exception ex) { ex.printStackTrace(); } * "toParse" is the path to the following
document:
<?xml version="1.0"
encoding="us-ascii"?>
<Package Id="pkg1"> <!-- � --> <PackageHeader> <XPDLVersion>1.0</XPDLVersion> <Vendor>Together</Vendor> <Created>2003-08-20 10:00:49</Created> </PackageHeader> </Package> The parser crashes because of � character, and I
get the following stack trace:
java.io.IOException: Byte "228" is not a member of
the (7-bit) ASCII character set.
at org.apache.xerces.impl.io.ASCIIReader.read(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XML11EntityScanner.skipSpaces(Unknown Source) at org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at XML.main(XML.java:25) When I use Xerces2.4, everything goes
fine!
Regards,
Sasa.
|
- Possible encoding related bug Sasa Bojanic
- Re: Possible encoding related bug Michael Glavassevich
