Yosi Mass wrote: > Hi, > > I would like to suggest a scale-out of UIMA by enabling it to run in a P2P > environment. > > >From my understanding, the CPE is a 1st generation scaleout, and it can run > a distributed pipeline using vinci/soap but the machines involved in the > pipeline are predefined in the UIMA descriptors. > > The 2nd generation scaleout is called UIMA-AS (AS = Asynchronous Scaleout), > and is based on some Java and web standards, such as JMS (Java Messaging > Service). It is now officially released on Apache UIMA. This allows users > to selectively choose which parts of their pipeline to run in this mode, > which in turn allows scaling out individual parts of the pipeline, as > needed. Again there is no dynamic discovery of resources after startup. > Hmm, I think this may not be quite accurate. In UIMA-AS, connections are made using a JMS infrastructure, such as ActiveMQ. Each service has an associated "address" in the network space, made up of a Broker URL and Port.
The actual service implementation is done by 1 or more servers that register themselves with the Broker URL and Port. During a run, servers can be dynamically added or removed; this changes the "capacity" of the service. Of course, if all of the servers for a particular service are removed, then the service "fails". But maybe what is meant, is, rather, the ability of the system to recognize when a service becomes available, rather than merely changing its capacity. For instance, in the UIMA-AS case, this could mean several kinds of things: 1) allowing a service to be configured with 0 servers available at startup 2) having the flow controller "know" more explicitly about service "availablilty", for instance, the number of "servers" there might be for a particular service. Here, the idea would be that a flow controller could dynamically decide, based on what the service level of different steps in the pipeline were, how to "route" a CAS for a particular aggregate. Are these the kinds of function that are desired? > I would like to suggest a 3rd generation scaleout using a fully > decentralized P2P network. Assume that each peer can publish its > capabilities (namely which annotators it can run) and its current > availability, then we may extend UIMA/UIMA-AS pipeline to discover an > available and capable peer for running an annotator and thus achieve better > load balancing and thus better performance than previous generations. > The "publication" would need to include the type system of the annotators, and some notion of which annotators would ever "want" to be run together in a pipeline, because a key part of the UIMA design is the "merging" of type systems to allow interoperability among the parts. Is there a "reservation" idea here too? For instance, in an open environment, where there are lots of clients and services and servers for those services, a particular client might want to reserve some amount of processing capability for itself, (not necessarily all of the capability). Finally, I wonder -- are there systems / infrastructure / middleware already out there that do this kind of thing that we could perhaps easily adapt / adopt for these purposes? -Marshall > What people on the list think about this? > > Thanks, Yosi > > > > > >
