I don't have a JUnit readily available but I can provide the brief overview
of what we are doing:

1. We are providing xml compliant unicode characters for chinese text in the
resource bundle, for instance

printVatSummary.vatSummary=& #165;   & #x589e; & #x503c;  & #x7a0e;  &
#x6458;  & #x8981;

(Extra spaces have been introduced above for better readability. Actual code
does not have these spaces. )

2. Application generates the DataPOJO reading properties from the resource
bundle.

DATA POJO:

- taxSummaryDesc=& #165;   & #x589e; & #x503c;  & #x7a0e;  & #x6458;  &
#x8981;

(Extra spaces have been introduced above for better readability. Actual code
does not have these spaces. )


3. DataPOJO is next marshalled into XML using Castor API's. 
<tax-summary-desc> element below contain invalid unicode characters. If we
transform this XML to HTML by applying styesheets, we
get UTFDataFormatException as follows:

<?xml version="1.0" encoding="UTF-8"?>
<tax-summary-desc>&amp;#165;&amp;#x589e;&amp;#x503c;&amp;#x7a0e;&amp;#x6458;&amp;#x8981;</tax-summary-desc>

error::java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8
sequence.

We'll need to configure castor to render the special characters as it is.




Werner Guttmann wrote:
> 
> I am not 100% sure whether this is a bug or not, but would like to look 
> into this further. You wouldn't be able to supply us with e.g. a JUnit 
> test that highlights the problem ?
> 
> Werner
> 
> xsltuser wrote:
>> Dear All,
>> 
>> I'm using Castor's Marshaller API to marshal java object to xml. The
>> input
>> object has some unicode characters of the form & # x 1234;  (without
>> spaces)
>> representing chinese characters. The output xml after marshalling
>> displays &
>> amp ; x 1234;   (without spaces) in place of  & # x 1234;  (without
>> spaces) 
>> as a result of which chinese characters are not rendered correctly when
>> the
>> XML is transformed into HTML using XSLT. I'm using UTF-8 encoding.
>> 
>> Any help is greatly appreciated.
>> 
>> Thanks.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe from this list, please visit:
> 
>     http://xircles.codehaus.org/manage_email
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Invalid-Unicode-characters-in-XML-tp15951719p16001443.html
Sent from the Castor - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply via email to