We are getting an odd error while trying to process large datasets using 
UIMA-AS 2.3.1.  There is an exception thrown by the XmiCasSerializer in the 
Client when it is in the process of serializing a CAS to be sent to a remote 
service.  The exception is as follows:

org.apache.uima.resource.ResourceProcessException
      at 
org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:854)
      at 
org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:885)
      at 
org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.process(BaseUIMAAsynchronousEngineCommon_impl.java:734)
      at gov.va.vinci.flap.Client.run(Client.java:181)
      at gov.va.vinci.density.DensityClient.main(DensityClient.java:137)
Caused by: org.xml.sax.SAXParseException: Trying to serialize non-XML 1.0 
character: _, 0x1a
      at 
org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:254)
      at 
org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.startElement(XMLSerializer.java:174)
      at 
org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.startElement(XmiCasSerializer.java:1003)
      at 
org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeFS(XmiCasSerializer.java:755)
      at 
org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeIndexed(XmiCasSerializer.java:700)
      at 
org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.serialize(XmiCasSerializer.java:268)
      at 
org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.access$700(XmiCasSerializer.java:108)
      at 
org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1539)
      at 
org.apache.uima.aae.UimaSerializer.serializeCasToXmi(UimaSerializer.java:136)
      at 
org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.serializeCAS(BaseUIMAAsynchronousEngineCommon_impl.java:260)
      at 
org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:779)
      ... 4 more

It happens at apparently random points when processing the corpus and is never 
actually "thrown" but is simply written to StdErr.  Also the serializer never 
seems to return which means the UimaAsynchronoousEngine.process() method never 
returns and the client simply "hangs" until it is manually terminated.  To 
resolve this issue I have implemented text filters for the incoming CAS data to 
prevent anything out of the ASCII-8 range.  I have also tried switching the 
server and client to binary serialization strategies but that causes the 
XmiCasSerializer in my UimaAsBaseListener object to return errors attempting to 
serialize CAS objects revieved in the entityProcessingComplete event.

Any suggestions from the UIMA masters?  How can I debug further so that I can 
find out A: Where is this illegal character coming from and B: How can I 
prevent it from happening?

Thanks,

Thomas Ginter
801-448-7676
[email protected]<mailto:[email protected]>




Reply via email to