Actually, I tried the original test example with UTF-8 encoding, with the (optional) 3 Bytes BOM at the beginning, and received the following error, only when the setEncoding method was used:

org.xml.sax.SAXParseException: Content is not allowed in prolog.
        at org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1172)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:345)
        at test.SimpleSaxParser.parse(SimpleSaxParser.java:43)
        at test.SimpleSaxParser.main(SimpleSaxParser.java:64)


Perhaps that 'setEncoding' method causes BOM handling to be skipped altogther?

David





Ravi Varanasi <[EMAIL PROTECTED]>

09/30/2003 01:04 PM
Please respond to xerces-j-user

       
        To:        [EMAIL PROTECTED]
        cc:        
        Subject:        RE: UTF-16 encoding problem








Hi,
      I do not see anything wrong in your code. I have the following piece
of code in my program, working perfectly fine since last couple months. I
am using UTF-8 though. I do NOT think the change in encoding will make any
difference, provided your file is in the format given in setEncoding
method.

What I suspect is, your input file is not UTF-16 encoded but, you are
setting the stream encoding as UTF-16. Check it out & let us know if that
fixes the problem. I use UniPad UTF editor to check encoding.

   try {
     InputSource ipSource = new InputSource();
     ipSource.setEncoding("UTF-8");
     ipSource.setByteStream( new FileInputStream( new File(inputFile) ) );
     parser.parse(ipSource);
     return true;
   } catch (SAXParseException e) {
     e.printStackTrace();
     return false;
   } catch (Exception e) {
     e.printStackTrace();
     return false;
   }



Thanks,

Ravi Varanasi

408 517 7675 (Work)
408 394 3273 (Mobile)


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to