Hi, we parse swedish xml-files and use the following encoding succesfully "ISO-8859-1".
 
Johan
-----Original Message-----
From: jean-gui [mailto:[EMAIL PROTECTED]
Sent: den 20 mars 2000 20:02
To: [EMAIL PROTECTED]
Subject: Problem parsing non-english XML files.

Hi,
 
I am new to the mailing list. I am trying to parse xml document files with the DOMParser. I am using the release 1.03.
My code looks like this:
 
        InputSource source = new InputSource(in);
        DOMParser parser = new DOMParser();
        parser.parse(source);
        doc = parser.getDocument();
 
"in" is an InputStream...
 
My file is including swedish characters.
When I try to read and parse my XML files with the preceding code, I got the following errors:
 
sun.io.ByteToCharUTF-16
    at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1155)
    at lm.fsd.XercesImplementation.stream2DOM(XercesImplementation.java:195)
    at lm.fsd.DOMinatorTest.main(DOMinatorTest.java:149)
    at symantec.tools.debug.MainThread.run(Agent.java:48)
 
Without the swedish characters, the parsing is working well. How could I set my encoding (UTF-16 for example) even if my stream is a character stream ? Since it seems that the DOMParser doesn´t extract, by itself, the encoding which is written in the XML document ...
I have tried InputSource.setEncoding() but it doesn´t change anything...
 
I suppose that there is a way to handle this type of encoding problem. I tried to find the solution in the mailing list.
I haven´t found any ansvers corresponding completely to my issue. Maybe I missed it or maybe you have an idea and you can help me.
 
If so ... any advise is welcome...
 
Thank you in advance
Jean-Guillaume LALANNE
Application developper - LARGEMEDIUM AB
 

Reply via email to