Hi Chuck, Great questions. The major issue that makes UIMA AS somewhat hard to understand is that UIMA AS, although advertised as a scale out framework, is lacking life cycle management for processes. It has so far been focused on the details of interconnecting UIMA compliant components in multi-threaded and multi-process configurations and on error handling.
On Mon, Aug 15, 2011 at 5:29 PM, Charles Bearden <[email protected]> wrote: > We have used UIMA as a CPE to run several fairly simple pipelines, including > some using cTAKES components [1]. UIMA AS is billed as "the next generation > scalability replacement for the Collection Processing Manager (CPM)", and > I'm trying to wrap my head around it by using it for some of the tasks we > did previously with CPEs and the CPM. > > Neither the Getting Started [2] nor the UIMA AS user manual [3] cover the > practicalities of deploying asynchronous pipelines, so I'm relying on the > README that comes with uima-as-2.3.1-bin.tar.gz. If there is a better > document to work from, please let me know :-) UIMA is my first exposure to a > Big Java Framework, so my knowledge & intuitions about it are not deep. > > It looks to me as if there are two basic patterns: > (1) start the broker ('startBroker.sh'), and then > (2) use 'runRemoteAsyncAE.sh' to both connect the CR with the queue via the > '-c' argument and to deploy the AS AEs via the '-d' flag; or > > (1) start the broker ('startBroker.sh'); > (2) deploy one or more instances of the AS AE with 'deployAsyncService.sh', > and then > (3) use 'runRemoteAsyncAE.sh' to both connect the CR with the queue via the > '-c' argument. > > Do I have this right? The first pattern is basically a "getting started" example, and the second typical for larger deployments. RunRemoteAsyncAE.java is sample application code and useful tool for exercising services. UIMA_Service.java, the program called by deployAsyncService, is a useful tool and sample code for deploying services; for example it can easily be adapted into a servlet container. > > One challenge we face is that some essential third-part components are not > thread-safe, and so it looks to me as if I'll have to scale out instances of > those components by deploying them in their own JVMs and not by means of a > single deployment with > > <scaleout numberOfInstances="20"/> > > in the deployment descriptor. Right, non thread-safe components are simply scaled out as multiple processes all pulling from the same queue. Multi-thread scaling is more essential for vertical scale out of analytics sharing large in-memory objects. Eddie
