Re: [castor-user] Invalid Unicode characters in XML

Werner Guttmann Fri, 14 Mar 2008 02:34:23 -0700

Still, as time here is limited, can I kindly ask you to come up with aJava (JUnit test) program that shows *exactly* what you are doing ?

Please do understand that our time (speaking as a committer here) islimited, as we are not being paid for this job. The clearer youstructure the problem desceription, the easier for us to look at theissue. I am sure that it would have taken equal time to actually writethe JUnit test case .. ;-).



Regards
Werner

xsltuser wrote:

I don't have a JUnit readily available but I can provide the brief overview
of what we are doing:

1. We are providing xml compliant unicode characters for chinese text in the
resource bundle, for instance

printVatSummary.vatSummary=& #165;   & #x589e; & #x503c;  & #x7a0e;  &
#x6458;  & #x8981;

(Extra spaces have been introduced above for better readability. Actual code
does not have these spaces. )

2. Application generates the DataPOJO reading properties from the resource
bundle.

DATA POJO:

- taxSummaryDesc=& #165;   & #x589e; & #x503c;  & #x7a0e;  & #x6458;  &
#x8981;

(Extra spaces have been introduced above for better readability. Actual code
does not have these spaces. )

3. DataPOJO is next marshalled into XML using Castor API's.<tax-summary-desc> element below contain invalid unicode characters. If we

transform this XML to HTML by applying styesheets, we
get UTFDataFormatException as follows:

<?xml version="1.0" encoding="UTF-8"?>
<tax-summary-desc>&amp;#165;&amp;#x589e;&amp;#x503c;&amp;#x7a0e;&amp;#x6458;&amp;#x8981;</tax-summary-desc>

error::java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8
sequence.

We'll need to configure castor to render the special characters as it is.




Werner Guttmann wrote:

I am not 100% sure whether this is a bug or not, but would like to lookinto this further. You wouldn't be able to supply us with e.g. a JUnittest that highlights the problem ?


Werner

xsltuser wrote:

Dear All,

I'm using Castor's Marshaller API to marshal java object to xml. The
input
object has some unicode characters of the form & # x 1234;  (without
spaces)
representing chinese characters. The output xml after marshalling
displays &
amp ; x 1234;   (without spaces) in place of  & # x 1234;  (without

spaces)as a result of which chinese characters are not rendered correctly when

the
XML is transformed into HTML using XSLT. I'm using UTF-8 encoding.

Any help is greatly appreciated.

Thanks.


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email

Re: [castor-user] Invalid Unicode characters in XML

Reply via email to