Burn Lewis wrote:
XML 1.0 does not accept all Unicode characters .... the legal ones are:#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] So if you wish to serialize a CAS to a file or to a remote service you'll have to avoid the 29 legal (but useless?) low value ones. UIMA could replace or escape them but both have possibly undesirable side-effects (lost information & non-standard XML.) At the least this restriction should be documented.
It is: http://incubator.apache.org/uima/downloads/releaseDocs/2.2.1-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.xmi_emf.xml_character_issues If your mail program doesn't like the URL, it's section 8.3.1 in the UIMA Tutorial and Developers' Guides. --Thilo
