[
https://issues.apache.org/jira/browse/UIMA-387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493294
]
Thilo Goetz commented on UIMA-387:
----------------------------------
The 0 char btw is not even a valid Unicode character, not just illegal in XML.
However, as long as Java allows users to create such characters, and Unicode
editors don't complain either, we'd better support it.
> XMI Serializer can write invalid control characters
> ---------------------------------------------------
>
> Key: UIMA-387
> URL: https://issues.apache.org/jira/browse/UIMA-387
> Project: UIMA
> Issue Type: Bug
> Components: Core Java Framework
> Affects Versions: 2.1
> Reporter: Adam Lally
> Fix For: 2.2
>
>
> On 5/1/07, Leo Ferres <[EMAIL PROTECTED]> wrote:
> > Hello,
> >
> > While trying to open an xmi file after processing in xml view, an
> > error pops up telling me that there is an invalid  xml character.
> > the error comes from the sax parser. Below is the stack trace. Thanks
> > very much for your help,
> >
> Most control characters are not allowed in XML 1.0, even if they are
> escaped with &#xxx. If your input document contains such characters,
> the XMI CAS serializer is writing them to the output XMI document,
> making it unreadable.
> I checked that if you edit the XMI document and change the first line to:
> <?xml version="1.1" encoding="UTF-8"?>
> The problem goes away, because XML version 1.1 does allow escaped
> control characters.
> So one possibility for us to fix this in UIMA is to have the XMI CAS
> Serializer generate XML version 1.1 tag by default. (I think we
> considered that before and decided not to for some reason, maybe we
> were worried that other applications might not be able to consume XML
> 1.1? I can't remember. :)
> Another possibility would be to have the XMI serializer automatically
> replace these characters with spaces. The XCAS (not XMI) serializer
> does that, but only for the document text, not for feature values. We
> could also serialize the XMI using XML version 1.1, which allows
> escaped control characters (but still not the 0x00 character).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.