See below for some additions, hopefully helpful :-)

Eddie Epstein wrote:
> Hi Michael,
>
> On 8/22/07, Michael Baessler <[EMAIL PROTECTED]> wrote:
>   
>> 1) Do you have any experiences with the memory footprint when using UIMA
>> AS? It seems to me that when deploying a larger system with multiple AEs
>> a lot of queues are used. Each AS aggregate use a queue for the
>> delegates. How is the performance with all these queues? Do you have any
>> measurements?
>>     
>
> This is a great question. When an aggregate is deployed with
> asynchronous delegates, each process call goes through a queue. For
> colocated delegates the message is just a reference to the in-memory
> CAS, so there is no serialization overhead. Moreover, a colocated
> broker in the same JVM is used for communication between colocated
> components, and ActiveMQ has optimized the producer/consumer paths to
> a colocated broker. But even with these optimizations the overhead is
> undesirable for some configurations.
>
> I will get some specific overhead times for calling colocated
> delegates in the next couple of days. For remote delegates, the
> overhead is basically determined by XmiCas serialization  steps and is
> a function of CAS content.
>
> Note that by default an aggregate is deployed as an AS primitive, that
> is, as a single threaded component, so there is no performance
> degradation for processing within the aggregate. 
An AS primitive can have a <numberOfInstances> element - in which case,
UIMA AS will
replicate it, and run each instance as a single threaded component.
> An aggregate is only
> deployed asynchronously if required, i.e. one of the delegates is a
> remote service, a colocated delegate is to be replicated, special
> error handling is desired for a delegate, or it is simply desired to
> run delegates in separate threads for concurrency.
>
>   
>> 2) Collection Process complete - The documentation says: "If a component
>> is replicated, only one of the instances will receive the
>> collectionProcessComplete call". I think replicated mean there is more
>> than one instance of the same component. So why does only one of the
>> components receive that call? Is that given by design that only one, and
>> we don't know which of the component receive the information? I think is
>> is the same as with a CAS, right?
>>     
>
> That's right. If it is required to have all CASes go through the same
> instance of an analytic then it should not be replicated.
>
>   
Part of the reason for only having one instance receive the
collectionProcessComplete call, is that the model supports multiple
instances (deployed on separate machines, for example) listening to the
same queue.  These instances are independent, and can come and go
during the processing of the collection.   For instance, one might
crash, and another might be started
up after some time to take up the slack.  When we send a
collectionProcessComplete call, we do not know
how many instances at that moment are listening to the queue.
>> 3) When the system processes a document without any CasMultiplier the
>> process call for this document blocks until the result is created and
>> returned?  So in the system only one CAS is created and used.
>>     
>
> Not sure what you mean by "the system" here. An AS primitive will
> process only one CAS at a time; 
-- unless it has "multiple instances" specified, but even then, each
instance only processes
one CAS at a time --
> an AS aggregate can process more than
> one CAS at a time, based on the number of delegates and the size of
> the caspool specified at the top level of the service
>   
and the number of instances of its AS primitives.
>   
>> If the system has also CasMultiplier components the CasPool size for a
>> CasMultiplier component can limit the CASes that can be used/created at
>> the same time. But how does this work if the system collects the
>> documents itself? The the call blocked as long as all the documents are
>> processed?
>>     
>
> If an AS aggregate has a CasMultiplier, additional CASes can be put
> into play concurrently, limited by the size of the CasMultiplier's
> caspool. The design relies on the proper choice of caspool sizes to
> enable the desired level of concurrent processing. The caspools also
> limit the number of requests that can build up in any input queue,
> avoiding queue overflows that are otherwise possible in asynchronous
> messaging systems.
>
>   
>> 4) The error handling seems to be similar as in the CPM with some
>> additional new features (real retry). Is there some reuse of the old code?
>>     
>
> No code reuse, only reuse of error handling concepts.
>
>   
>> 5) UIMA AS does not have a StatusListener that can/must be implemented
>> to get some information about the system. How are the results reposted
>> in the good case? I understand that in an error case, the error with
>> some additional background information is returned.
>>     
>
> The custom flow controller is the key to application customization.
> User code there can register the state of all CASes processed, or
> route CASes to specific annotators designed to register such things.
>   

Marshall

Reply via email to