Jaroslaw Cwiklik wrote:
Jorn, are there any messages in the service log that say that the client is
in the DoNotProcess List indicating a connection failure to the reply queue?
No, there are two log messages (damn, its still in info log level mode,
will change that)
at around the time the system failed:
1/29/10 3:09:40 AM - 16:
org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handleProcessRequestFromRemoteClient:
INFO: Duplicate Request With Cas Reference Id: 2d0e21bb:12664a22384:7eac
Received. Ignoring Duplicate.
1/29/10 3:36:45 AM - 23:
org.apache.uima.adapter.jms.activemq.JmsOutputChannel$ConnectionTimer.startTimer:
INFO: Inactivity Timer Expired. Thread: Controller:SearchEngine:Reply
TimerThread-:ID:dkcphlinh1master-49059-1264408732600-0:0:1:14749726442764009
Controller: SearchEngine Timeout Value: 1,800,000 Endpoint Name:
ID:dkcphlinh1master-49059-1264408732600-0:0:1(last message in the log file)
The Duplicate Request message has been a few times in the log file on
other days.
If this is not the case, can you confirm that there are messages in the
service queue indicating that the service is hung somewhere.
Yes there are messages in the service queue.
Maybe the hang issue from for a few days and the one today
have different reasons.
I kind of believe that the process method in my last AE after the CM
blocked.
Add the end of the AE.process method it calls two web service methods,
the first
to save analysis results, and the second to mark the article as processed.
In our database we could see that the first was called and the second
method was never called,
but there was also no exception in the log files, which might indicated that
the process method just blocked at the end. There is no way to escape the
process method without throwing an exception or finishing the call, we
never received.
I hope I can provide you with more data when it happens again, so we can
find the cause.
I will set timeouts on my web service calls, to make sure a blocked
process method is not
the reason for the hang. Maybe there is also a timeout I can set for the
process method.
From the hang from a few days ago I have a core dump, but I am not sure
how I can
find out where all the CASes are ... in the stack traces I can see that
its trying to get
an empty cas in the CM.next() method but it seems like that its waiting
for new CASes to
become available.
Jörn