also, see the comments here: https://issues.apache.org/jira/browse/UIMA-387
On 10/21/2011 1:58 PM, Charles Bearden wrote: > I created a simple UIMA-AS pipeline comprising a collection reader and an > aggregate AE, which I ran simply like so: > > runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \ > -d <deployment descriptor> \ > -c <collection reader descriptor> \ > > Evidently, the content I wish to process has some non-XML characters in it, > because a certain bit of data raises an exception, the heart of which appears > to be: > > Caused by: org.xml.sax.SAXParseException: Trying to serialize non-XML 1.0 > character: , 0x19 > > The complete exception is here: > <http://pastebin.com/rMPyAhqP> > > The point in my code at which the exception enters the picture > (NoteLinesFromDBReader.java:139) is the point in the .getNext() method where I > get the next CAS: > jcas = aCAS.getJCas(); > > I don't run into this problem when I use the old-fashioned CPE, so my thinking > is that the CAS from the CR is being serialized before being put into the > queue. Is the expectation in UIMA AS that I sanitize text artifacts of non-XML > characters before the CR gets them? Or am I doing something else wrong > perhaps? > > Thanks for your help, > Chuck
