Re: UTF-8 Encoding

Michael Glavassevich 31 Mar 2003 06:10:08 -0000

Hi Shekhar,

Setting the encoding on the input source allows the parser to skip encoding
auto-detection, however once it reads the encoding from the XML decleration
it will create a new character reader if the previous encoding (either
auto-detected or supplied by the user) is different from the one specified
in the document, so you don't gain anything by doing this.


You should try changing the value of the encoding in the actual document.
<?xml version="1.0" encoding="iso-8859-1"?>

If you absolutely cannot get alter your input document, you can try setting
your own character reader on the input source. This will force the parser
to use your own reader. If you have an InputStream to the document you can
easily get one for ISO-8859-1 using an InputStreamReader.

At 11:08 AM 31/03/2003 +0530, you wrote:
>      is still giving  the same error.    I just thought I should clarify
>that I have a XML document given to me by a client that I  need to parse.
>The XML document has its encoding set to UTF-8   <>   I need to parse this
>document but the character  with hex value B6 present in the XML Document
>is not being accepted by the  parser. I need to overrider the encoding set
>in the XML document through the  code but setting the Inputsource encoding
>to ISO-88591-1 is not doing the  trick.   Thanks Shekhar    ----- Original
>Message -----    From:    Ragunath Marudhachalam    To:
>[EMAIL PROTECTED]    Sent: Friday, March 28, 2003 7:48  PM  
>Subject: RE: UTF-8 Encoding   
>   yes.              OutputFormat format = new OutputFormat( Document );
>file://Serialize DOM   format.setEncoding("ISO-8859-1");       This    will
>set the encoding to ISO-8859-1 instead of UTF-8. UTF-8 is the default   
>encoding that is set when u create a document without specifying any   
>encoding.   If you    are not using serialization, then try setting the
>encoding to the    InputSource.               Ragu 
>CircuitVision         -----Original Message-----
>From: Shekhar Karani      [mailto:[EMAIL PROTECTED]
>Sent: Friday, March 28,      2003 9:16 AM
>To: [EMAIL PROTECTED]
>Subject:      Re: UTF-8 Encoding
>
>     Doing that in my code will over ride the XML      document encoding?  
>        Shekhar            ----- Original Message -----        From:       
>Ragunath Marudhachalam        To: [EMAIL PROTECTED]        Sent:
>Friday, March 28, 2003 7:33        PM       Subject: RE: UTF-8 Encoding
   
>       set the encoding to "ISO-8859-1"               Ragu 
>CircuitVision                 -----Original Message-----
>From: Shekhar Karani [mailto:[EMAIL PROTECTED]
>Sent:          Friday, March 28, 2003 6:27 AM
>To: [EMAIL PROTECTED]
>Subject:          UTF-8 Encoding
>
>         Hi
>
>I am using the xerces 2.2.1 to          parse XML documents. One of the XML 
>documents has a hex character          B6. This character is being treated
>as an 
>invalid UTF-8 character by          the parser. The parser gives the error 
>"Invalid byte 1 of UTF-8 byte          stream". However, the editor XML SPY 
>version 5, accepts this          character.
>
>Please let me know what I need to do in my code to          accept this 
>character.
>
>The archives on the mailing list are          not accessible hence I am not
>sure 
>if this question is present          there.                   Thanks
>Shekhar
>
>
>
>
> 

-----------------------------
Michael Glavassevich
[EMAIL PROTECTED]
4B Computer Engineering
University of Waterloo

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: UTF-8 Encoding

Reply via email to