To support some advanced users of UIMA, we have been working on an alternative general scalability mechanism for UIMA analytics. Our goals were to provide a standards-based, much more flexible and powerful capability than that offered by the UIMA collection processing manager, with less software complexity. To this end we have developed an architecture based on asynchronous messaging technology conforming to the JMS standard, and from that built a small scalability extension for Apache UIMA, which we call UIMA JMS.
The extension uses JMS and allows incorporating alternative JMS middleware implementations. The primary end-user interface to UIMA JMS is a new descriptor, the UIMA deployment descriptor. This descriptor references standard UIMA component descriptors, and adds the configuration information necessary to specify which annotators are to be replicated, where they will be deployed, how many threads to run concurrently, how error conditions are to be handled and several other details. Our initial implementation uses Apache's ActiveMQ for the JMS messaging middleware. We would like to explore donating this extension to the UIMA project, if this is acceptable to the community, and would appreciate any comments or feedback.
