[ https://issues.apache.org/activemq/browse/AMQ-443?page=all ]
     
Hiram Chirino resolved AMQ-443:
-------------------------------

    Fix Version: 4.0
     Resolution: Fixed

4.0 Has implemented a more robust keepalive solution.  KeepAlive packets are 
only sent when the transport has been idle.  Also, while the transport is 
performing a blocking opperation it is not considered idle.


> ReliableTransport / KeepAlive algorithm does not work properly.
> ---------------------------------------------------------------
>
>          Key: AMQ-443
>          URL: https://issues.apache.org/activemq/browse/AMQ-443
>      Project: ActiveMQ
>         Type: Bug

>   Components: Transport, Broker
>     Versions: 3.2, 3.2.1
>  Environment: Solaris 8 / 10.  JDK 1.5
>     Reporter: Kevin Yaussy
>      Fix For: 4.0
>  Attachments: KeepAliveDaemon.java, ReliableTransportChannel.java
>
>
> The current implementation of KeepAliveDaemon.java will sometimes force 
> disconnections on well behaved connections.  The problem may arrise if there 
> is a connection which goes away, and the KeepAlive send to that channel 
> blocks while attempting to reconnect.  If this reconnection takes a while, 
> then other channels that were responding fine may get their connections 
> broken.  This happens due to the following code in KeepAliveDaemon.java:
>               if ((channel.getLastReceiptTimestamp() + 
> channel.getKeepAliveTimeout() * 2) < System.currentTimeMillis()) {
> or
>               } else if ((channel.getLastReceiptTimestamp() + 
> channel.getKeepAliveTimeout()) < System.currentTimeMillis()) {
> The fact that the receipt timestamp is checked against 
> System.currentTimeMillis() causes the code to break otherwise good 
> connections.  If a KeepAlive send (in examineChannel) for a broken channel 
> takes longer than some good channel's KeepAliveTimeout, then the good 
> connection gets broken.
> This can, in turn, cause some pretty bad behavior in the Broker.  While 
> testing and diagnosing this problem, I could some brokers in a network of 
> brokers stuck.  The sequence of events during recovery, which get interrupted 
> due to closing the connections, would sometimes lead to the broker hanging 
> waiting for a receipt, such as during an addConsumer (which eventually calls 
> syncSendWithReceipt).
> I have redone the logic in KeepAliveDaemon.java (which required a small 
> change to ReliableTransportChannel as well).  This now seems to work.
> I'm a bit concerned about the blocking calls, though.  This may be a 
> different issue / bug.  I thought it looked like there was a mechanism to 
> cancel outstanding receipt waiters - but, every once in a while that 
> mechanism would not get called.  This results in the broker basically getting 
> stuck, and does not ever really recover.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   https://issues.apache.org/activemq/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to