Sequoia appears not to handle network failure gracefully. My configuration:
- two MS Windows servers, A (10.0.0.61) and B (10.0.0.60).
- JBoss 4.0.5
- Sequoia 2.10.9 using Appia default configuration.
- MySql 5.0.41
- Server A running JBoss, Sequoia controller, MySql backend.
- Server B running Sequoia controller, MySql backend.
- Controller A and B are (the only) members of a cluster called "mySequoia", as confirmed on each machine using "show controllers". - JBoss is configured to use only controller A, via "<connection-url>jdbc:sequoia://A/mySequoia</connection-url>".
- B's backed is disabled.

Everything works fine under load, with JBoss happily hitting controller A, which in turn updates the database backend on A. Then I unplug the ethernet cable on server B, and everything hangs. JBoss stops, controller A stops, logging nothing. Controller B logs a warning that controller A has left the cluster. I wait for five minutes, nothing happens except a transaction timeout on the JBoss server. After ten minutes I plug the ethernet back in, and controller A logs this:

14:21:05,390 INFO continuent.hedera.gms Member(address=/10.0.0.60:49573, uid=10.0.0.60:49573) failed in Group(gid=mySequoia) 14:21:05,390 WARN controller.virtualdatabase.mySequoia Controller Member(address=/10.0.0.60:49573, uid=10.0.0.60:49573) has left the cluster. 14:21:05,390 INFO controller.virtualdatabase.mySequoia 1 requests were waiting responses from Member(address=/10.0.0.60:49573, uid=10.0.0.60:49573) 14:21:05,390 WARN controller.RequestManager.mySequoia 1 controller(s) died during execution of request 844424930133025 14:21:05,390 WARN controller.RequestManager.mySequoia Controller Member(address=/10.0.0.60:49573, uid=10.0.0.60:49573) is suspected of failure. 14:21:06,906 INFO controller.requestmanager.cleanup Waiting 120000ms for client of controller 281474976710656 to failover 14:23:06,906 INFO controller.requestmanager.cleanup Cleanup for controller 281474976710656 failure is completed.

and comes back to life, as does JBoss. (However, the cluster remains broken -- neither controller sees the other any more.)

Originally I saw this problem with both controllers active and enabled, with JBoss configured to round-robin them. I suspected a cluster communications bug, so I simplified the deployment to this single active controller to see what would happen. What's going on here? Is it a controller bug, an Appia bug, maybe a misconfiguration, or what?
_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

Reply via email to