Dne 25.9.2011 14:31, Radim Kolar napsal(a):
Dne 25.9.2011 9:29, Philippe napsal(a):
I have this happening on 0.8.x It looks to me as this happens when the node is under heavy load such as unthrottled compactions or a huge GC.
i have this problem too. Node down detection must be improved - increased timeouts a bit or make more tries before making decision. If node is under load (especially if there is swap activity), it is often marked unavailable.
Also there needs to be implemented algorithm like it is used in BGP routing protocol to prevent route flap. It should guard against cases like this:

INFO [GossipTasks:1] 2011-09-25 14:56:36,544 Gossiper.java (line 695) InetAddress /216.17.99.40 is now dead. INFO [GossipStage:1] 2011-09-25 14:56:36,641 Gossiper.java (line 681) InetAddress /216.17.99.40 is now UP INFO [GossipTasks:1] 2011-09-25 14:56:37,823 Gossiper.java (line 695) InetAddress /216.17.99.40 is now dead. INFO [GossipStage:1] 2011-09-25 14:56:37,971 Gossiper.java (line 681) InetAddress /216.17.99.40 is now UP

route flap protection works like - announce 1st state change immediately to peer, next change for example after 30 seconds if state is changed in less than 30 seconds, if route keeps flaping up/down then increase report time to 60 seconds etc.

Reply via email to