Hi Don, > Everything works fine under load, with JBoss happily hitting > controller A, which in turn updates the database backend on A. Then I > unplug the ethernet cable on server B, and everything hangs. JBoss > stops, controller A stops, logging nothing. Controller B logs a > warning that controller A has left the cluster. I wait for five > minutes, nothing happens except a transaction timeout on the JBoss > server. After ten minutes I plug the ethernet back in, and controller > A logs this: > > 14:21:05,390 INFO continuent.hedera.gms > Member(address=/10.0.0.60:49573, uid=10.0.0.60:49573) failed in > > and comes back to life, as does JBoss. (However, the cluster remains > broken -- neither controller sees the other any more.) Thats the expected behavior. After a cluster split sequoia waits 120 seconds for a rejoin of the lost node. If the node will not rejoin in these 120 seconds the cluster will be splitted. Try waiting just 60 seconds, you will see the cluster will work again normal.
There is no "pinging" logic in sequoia that looks if the lost node is working again (after the 120 seconds). This logic is very helpful, you have to implement it for yourself ontop of sequoia api. hth Stefan
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Sequoia mailing list [email protected] https://forge.continuent.org/mailman/listinfo/sequoia
