Jaroslaw Cwiklik wrote:
Jorn, are there any messages in the service log that say that the client is
in the DoNotProcess List indicating a connection failure to the reply queue?
No, there are two log messages (damn, its still in info log level mode, will change that)
at around the time the system failed:

1/29/10 3:09:40 AM - 16: org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handleProcessRequestFromRemoteClient: INFO: Duplicate Request With Cas Reference Id: 2d0e21bb:12664a22384:7eac Received. Ignoring Duplicate. 1/29/10 3:36:45 AM - 23: org.apache.uima.adapter.jms.activemq.JmsOutputChannel$ConnectionTimer.startTimer: INFO: Inactivity Timer Expired. Thread: Controller:SearchEngine:Reply TimerThread-:ID:dkcphlinh1master-49059-1264408732600-0:0:1:14749726442764009 Controller: SearchEngine Timeout Value: 1,800,000 Endpoint Name: ID:dkcphlinh1master-49059-1264408732600-0:0:1(last message in the log file)

The Duplicate Request message has been a few times in the log file on other days.
If this is not the case, can you confirm that there are messages in the
service queue indicating that the service is hung somewhere.
Yes there are messages in the service queue.

Maybe the hang issue from for a few days and the one today
have different reasons.

I kind of believe that the process method in my last AE after the CM blocked. Add the end of the AE.process method it calls two web service methods, the first
to save analysis results, and the second to mark the article as processed.

In our database we could see that the first was called and the second method was never called,
but there was also no exception in the log files, which might indicated that
the process method just blocked at the end. There is no way to escape the
process method without throwing an exception or finishing the call, we
never received.

I hope I can provide you with more data when it happens again, so we can find the cause. I will set timeouts on my web service calls, to make sure a blocked process method is not the reason for the hang. Maybe there is also a timeout I can set for the process method.

From the hang from a few days ago I have a core dump, but I am not sure how I can find out where all the CASes are ... in the stack traces I can see that its trying to get an empty cas in the CM.next() method but it seems like that its waiting for new CASes to
become available.

Jörn

Reply via email to