On 02/21/2012 05:15 PM, Thilo Goetz wrote:
On 21/02/12 16:15, Jens Grivolla wrote:
On 02/21/2012 04:08 PM, Thilo Goetz wrote:
On 21/02/12 15:59, Jens Grivolla wrote:
it appears that InlineXMLCasConsumer depends on the system locale for
some internal transformations. The output appears to be written in UTF8
(outStream.write(xmlAnnotations.getBytes("UTF-8"));) but when used on a
machine with a locale of ASCII all accented characters get broken.

I suspect that it has to do with the XMLSerializer working on a
ByteArrayOutputStream, but haven't been able to track it down yet.

Have you checked that it's really the writing end where things
get corrupted, and not the reading end?  Just a thought...

Yes, we have an XmiWriterCasConsumer in parallel that works fine.

Ah yes, eyeballing the source gives:

       // return XML string
       return new String(byteArrayOutputStream.toByteArray());

This is in CasToInlineXml.java.  I stopped after I found this,
maybe there's more.  Jira, patch, you know the drill :-)

https://issues.apache.org/jira/browse/UIMA-2376

Reply via email to