[ https://issues.apache.org/activemq/browse/AMQ-443?page=all ] Hiram Chirino resolved AMQ-443: -------------------------------
Fix Version: 4.0 Resolution: Fixed 4.0 Has implemented a more robust keepalive solution. KeepAlive packets are only sent when the transport has been idle. Also, while the transport is performing a blocking opperation it is not considered idle. > ReliableTransport / KeepAlive algorithm does not work properly. > --------------------------------------------------------------- > > Key: AMQ-443 > URL: https://issues.apache.org/activemq/browse/AMQ-443 > Project: ActiveMQ > Type: Bug > Components: Transport, Broker > Versions: 3.2, 3.2.1 > Environment: Solaris 8 / 10. JDK 1.5 > Reporter: Kevin Yaussy > Fix For: 4.0 > Attachments: KeepAliveDaemon.java, ReliableTransportChannel.java > > > The current implementation of KeepAliveDaemon.java will sometimes force > disconnections on well behaved connections. The problem may arrise if there > is a connection which goes away, and the KeepAlive send to that channel > blocks while attempting to reconnect. If this reconnection takes a while, > then other channels that were responding fine may get their connections > broken. This happens due to the following code in KeepAliveDaemon.java: > if ((channel.getLastReceiptTimestamp() + > channel.getKeepAliveTimeout() * 2) < System.currentTimeMillis()) { > or > } else if ((channel.getLastReceiptTimestamp() + > channel.getKeepAliveTimeout()) < System.currentTimeMillis()) { > The fact that the receipt timestamp is checked against > System.currentTimeMillis() causes the code to break otherwise good > connections. If a KeepAlive send (in examineChannel) for a broken channel > takes longer than some good channel's KeepAliveTimeout, then the good > connection gets broken. > This can, in turn, cause some pretty bad behavior in the Broker. While > testing and diagnosing this problem, I could some brokers in a network of > brokers stuck. The sequence of events during recovery, which get interrupted > due to closing the connections, would sometimes lead to the broker hanging > waiting for a receipt, such as during an addConsumer (which eventually calls > syncSendWithReceipt). > I have redone the logic in KeepAliveDaemon.java (which required a small > change to ReliableTransportChannel as well). This now seems to work. > I'm a bit concerned about the blocking calls, though. This may be a > different issue / bug. I thought it looked like there was a mechanism to > cancel outstanding receipt waiters - but, every once in a while that > mechanism would not get called. This results in the broker basically getting > stuck, and does not ever really recover. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/activemq/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira