Re: [castor-user] Invalid Unicode characters in XML

xsltuser Wed, 12 Mar 2008 03:42:40 -0700

Rewriting XML element:

<tax-summary-desc> & amp;#165; & amp;#x589e; & amp;#x503c; & amp;#x7a0e; &
amp;#x6458; & amp;#x8981;</tax-summary-desc>




xsltuser wrote:
> 
> I don't have a JUnit readily available but I can provide the brief
> overview of what we are doing:
> 
> 1. We are providing xml compliant unicode characters for chinese text
> in the resource bundle, for instance
> 
> printVatSummary.vatSummary=& #165;   & #x589e; & #x503c;  & #x7a0e;  &
> #x6458;  & #x8981;
> 
> (Extra spaces have been introduced above for better readability. Actual
> code does not have these spaces. )
> 
> 2. Application generates the DataPOJO reading properties from the resource
> bundle.
> 
> DATA POJO:
> 
> - taxSummaryDesc=& #165;   & #x589e; & #x503c;  & #x7a0e;  & #x6458;  &
> #x8981;
> 
> (Extra spaces have been introduced above for better readability. Actual
> code does not have these spaces. )
> 
> 
> 3. DataPOJO is next marshalled into XML using Castor API's. 
> <tax-summary-desc> element below contain invalid unicode characters. If we
> transform this XML to HTML by applying styesheets, we
> get UTFDataFormatException as follows:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <tax-summary-desc>&amp;#165;&amp;#x589e;&amp;#x503c;&amp;#x7a0e;&amp;#x6458;&amp;#x8981;</tax-summary-desc>
> 
> error::java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8
> sequence.
> 
> We'll need to configure castor to render the special characters as it is.
> 
> 
> 
> 
> Werner Guttmann wrote:
>> 
>> I am not 100% sure whether this is a bug or not, but would like to look 
>> into this further. You wouldn't be able to supply us with e.g. a JUnit 
>> test that highlights the problem ?
>> 
>> Werner
>> 
>> xsltuser wrote:
>>> Dear All,
>>> 
>>> I'm using Castor's Marshaller API to marshal java object to xml. The
>>> input
>>> object has some unicode characters of the form & # x 1234;  (without
>>> spaces)
>>> representing chinese characters. The output xml after marshalling
>>> displays &
>>> amp ; x 1234;   (without spaces) in place of  & # x 1234;  (without
>>> spaces) 
>>> as a result of which chinese characters are not rendered correctly when
>>> the
>>> XML is transformed into HTML using XSLT. I'm using UTF-8 encoding.
>>> 
>>> Any help is greatly appreciated.
>>> 
>>> Thanks.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe from this list, please visit:
>> 
>>     http://xircles.codehaus.org/manage_email
>> 
>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Invalid-Unicode-characters-in-XML-tp15951719p16001446.html
Sent from the Castor - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email

Re: [castor-user] Invalid Unicode characters in XML

Reply via email to