I've ran into a situation several times where an asynchonouse message exchange hangs when it is sent to a service that runs on a clustered servicevicemix instance that has unexpected lost network connection or has crashed. When a servicemix instance shuts down normally, it broadcasts out service deregistrations to all other clustered Servicemix instances. If a clustered servicemix instance crashes or loses connectivity to the rest of the cluster, then they have no way of knowing that they can't route messages to that service anymore. If a message is routed to a service on that unreachable Servicemix instance, then the message exchange seems to hang in limbo. The service that sent it is never notified that the messageexchange can't be delivered. You could of course, after a certain period of time, assume that the message exchange isn't going to be returned, but I don't think that will free up the thread being consumed by the waiting message exchange.
It doesn't appear that Servicemix has a default way of detecting that a clustered service is no longer routable to so that a message exchange doesn't end up hanging in limbo, or do MessageExchanges have timeouts but they are set by default to a very long duration? Any help on how to handle this situation properly would be appreciated. It would be nice if there was a way for servicemix instances to detect when an ActiveMQ NetworkConnector has lost it's connection to another Servicemix instance and then deregister those services so that messages can't be routed to them anymore.... Ryan
