Dne 25.9.2011 14:31, Radim Kolar napsal(a):
Dne 25.9.2011 9:29, Philippe napsal(a):
I have this happening on 0.8.x It looks to me as this happens when
the node is under heavy load such as unthrottled compactions or a
huge GC.
i have this problem too. Node down detection must be improved -
increased timeouts a bit or make more tries before making decision. If
node is under load (especially if there is swap activity), it is often
marked unavailable.
Also there needs to be implemented algorithm like it is used in BGP
routing protocol to prevent route flap. It should guard against cases
like this:
INFO [GossipTasks:1] 2011-09-25 14:56:36,544 Gossiper.java (line 695)
InetAddress /216.17.99.40 is now dead.
INFO [GossipStage:1] 2011-09-25 14:56:36,641 Gossiper.java (line 681)
InetAddress /216.17.99.40 is now UP
INFO [GossipTasks:1] 2011-09-25 14:56:37,823 Gossiper.java (line 695)
InetAddress /216.17.99.40 is now dead.
INFO [GossipStage:1] 2011-09-25 14:56:37,971 Gossiper.java (line 681)
InetAddress /216.17.99.40 is now UP
route flap protection works like - announce 1st state change immediately
to peer, next change for example after 30 seconds if state is changed in
less than 30 seconds, if route keeps flaping up/down then increase
report time to 60 seconds etc.