Hi Sander,
Thanks for you looking into this problem. I think its
something to do with the way xerces creating the Reader object when the
InputSource.setEncoding() method get called, because if I don't call the
InputSource.setEncoding() method, then everything works fine. Also, if I
created the InputStreamReader with the encoding outside the Xerces, and then
create the InputSource with that Reader, then everything works fine as well (see
attached java file). So this kind of make me think its Xerces problem, but
I could be wrong.
thanks,
Benson.
-----Original Message-----
From: Sander Bos [mailto:[EMAIL PROTECTED]
Sent: Monday, September 29, 2003 2:44 AM
To: [EMAIL PROTECTED]
Subject: RE: UTF-16 encoding problemDear Benson,I am not sure if I am doing something wrong, or its a JVM or Xerces problem, I am getting a "java.lang.InternalError" while parsing an UTF-16 XML if I am using InputSoruce.setEncoding("UTF-16"). I attached my sample file and a simple Sax parser class. I know I don't have to call the setEncoding() function, the parser will detect itself, but it shouldn't a problem even I set it.BTW, this problems happes to Xerces 2.4.0 and 2.5.0 with JVM 1.4.0_01 and 1.3.1.I don't have an answer for you, apart from that I could reproduce your problem (also with JDK 1.4.1_02), that I don't think you do anything wrong but that I do not know what goes wrong. I found it interesting that you could get an internal error so easily with so many different JDK's so I looked at it a bit but could not figure it out.For others that may be interested, Since the bug came from InputStreamReader.read I made a small test where I tried to set up a stream just like Xerces, soInputStream is = new FileInputStream(fname);// Copied from XMLEntityManager
RewindableInputStream ris = new RewindableInputStream(is);
InputStreamReader reader = new InputStreamReader(ris, "UTF-16");
char cbuf[] = new char[1024];
while ((reader.read(cbuf, 11, 11)) != -1) {
}
but for the different values of the two '11''s I tried it for, I could not cause the same crash. I don't think the rewindablestream is reset anywhere for UTF-16.(I did find it kind of weird that an XML11EntityScanner is used (see stacktrace), where the document is of version 1.0, but maybe that is the default?)Kind regards,--Sander.Here is the stack trace:D:\work\source\xml>java SimpleSaxParser test.xml UTF-16
afile=D:\work\source\xml\test.xml, encoding=UTF-16
Exception in thread "main" java.lang.InternalError: Converter malfunction (Unicode) -- please submit a bug report via ht
tp://java.sun.com/cgi-bin/bugreport.cgi
at sun.nio.cs.StreamDecoder$ConverterSD.malfunction(StreamDecoder.java:232)
at sun.nio.cs.StreamDecoder$ConverterSD.convertInto(StreamDecoder.java:248)
at sun.nio.cs.StreamDecoder$ConverterSD.implRead(StreamDecoder.java:294)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:179)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XML11EntityScanner.skipString(Unknown Source)
at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:345)
at SimpleSaxParser.parse(SimpleSaxParser.java:25)
at SimpleSaxParser.main(SimpleSaxParser.java:46)thanks,Benson.
SimpleSaxParser.java
Description: SimpleSaxParser.java
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
