Actually, I tried the original test example with UTF-8 encoding, with the (optional) 3 Bytes BOM at the beginning, and received the following error, only when the setEncoding method was used:
org.xml.sax.SAXParseException: Content is not allowed in prolog.
at org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1172)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:345)
at test.SimpleSaxParser.parse(SimpleSaxParser.java:43)
at test.SimpleSaxParser.main(SimpleSaxParser.java:64)
Perhaps that 'setEncoding' method causes BOM handling to be skipped altogther?
David
| Ravi Varanasi <[EMAIL PROTECTED]>
09/30/2003 01:04 PM
|
To: [EMAIL PROTECTED] cc: Subject: RE: UTF-16 encoding problem |
Hi,
I do not see anything wrong in your code. I have the following piece
of code in my program, working perfectly fine since last couple months. I
am using UTF-8 though. I do NOT think the change in encoding will make any
difference, provided your file is in the format given in setEncoding
method.
What I suspect is, your input file is not UTF-16 encoded but, you are
setting the stream encoding as UTF-16. Check it out & let us know if that
fixes the problem. I use UniPad UTF editor to check encoding.
try {
InputSource ipSource = new InputSource();
ipSource.setEncoding("UTF-8");
ipSource.setByteStream( new FileInputStream( new File(inputFile) ) );
parser.parse(ipSource);
return true;
} catch (SAXParseException e) {
e.printStackTrace();
return false;
} catch (Exception e) {
e.printStackTrace();
return false;
}
Thanks,
Ravi Varanasi
408 517 7675 (Work)
408 394 3273 (Mobile)
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
