Jorn, I've checked the code and see that a log message is confusing in the
scenario you've described. The message says something like this:

Controller:Test Aggregate TAE Stopping Collocated Delegate Cas
Multiplier:TestMultiplier

followed by something like this:

>>>> Cas Multiplier:Simple Text Segmenter Stopped Generating CASes from
Input CAS:-61dfae6a:1267b2ab2ab:-7fe8

The first message is wrong and confusing. What is happening in your scenario
is the following:

1) CM receives input CAS C1
2) The CM generates a new CAS C2
3) C2 is sent to Delegate Service D1
4) D1 throws exception on C2
5) Aggregate receives exception from D1 and sends STOP message to CM asking
it to stop producing
    new CASes from C1. This message is not sent to stop the CM!
6) CM returns C1 to the Aggregate
7) Aggregate determines that C1 has been marked as failed ( because of C2
failure)
8) Aggregate returns C1 marked as failed to the client

The confusion is the message that the aggregate logs: Stopping Collocated
Delegate Cas Multiplier:
It should say something like, Sending Request to CM to Stop Generating New
CASes from CAS id: xxx

Can you confirm that the client receives C1 with an exception.

JC



On Fri, Jan 29, 2010 at 11:30 AM, Jörn Kottmann <[email protected]> wrote:

> Jaroslaw Cwiklik wrote:
>
>> Jorn, I need more information to determine the cause of the problem. Can
>> you
>> clarify your deployment? I understand you have worker nodes running UIMA
>> AS
>> services. These services connect to a remote Web Service. I presume that
>> this connection is done your code in the AE. Is this correct?
>>
>> Your pasted log messages that indicate that a service is stopping. Any
>> exceptions in the log? Also, can you increase log level to see if there is
>> more revealing information. Another idea, is to attach jConsole to the
>> UIMA
>> AS service that is not processing messages and look at the JVM threads to
>> see if there is a hang somewhere. Also, check UIMA JMX MBeans. There
>> should
>> be MBeans for CM and their CasPools. Are the pools empty indicating that
>> CASes are stuck somewhere?
>>
>>
> It happened again.
>
> This time I checked the process with JMX, the CAS pools where all empty,
> and
> 11 CASes where waiting in the input queue of an AE. Does that mean that the
> AE.process
> method blocks ?
>
> Can my hang be explained by blocking AE.process methods ? Is there a
> default
> timeout for that case ?
>
> Actually since I last reported the hang issue it happened two more times,
> today
> and a few days ago. I have a core dump of the process from a few days ago.
> I checked the stack traces and it looked like the CM at the beginning of
> the pipe
> was waiting in next() -> getEmptyCAS(), but it locked because the CAS pools
> where empty (?).
> All other threads where waiting in framework code, and not in my
> implementation code.
>
> The stopping message in the log file occurs always if an exception is
> thrown from an
> AE which is behind a CM. I opened a jira issue to demonstrate that.
>
> Jörn
>

Reply via email to