Sequoia appears not to handle network failure gracefully. My configuration:
- two MS Windows servers, A (10.0.0.61) and B (10.0.0.60).
- JBoss 4.0.5
- Sequoia 2.10.9 using Appia default configuration.
- MySql 5.0.41
- Server A running JBoss, Sequoia controller, MySql backend.
- Server B running Sequoia controller, MySql backend.
- Controller A and B are (the only) members of a cluster called
"mySequoia", as confirmed on each machine using "show controllers".
- JBoss is configured to use only controller A, via
"<connection-url>jdbc:sequoia://A/mySequoia</connection-url>".
- B's backed is disabled.
Everything works fine under load, with JBoss happily hitting controller
A, which in turn updates the database backend on A. Then I unplug the
ethernet cable on server B, and everything hangs. JBoss stops,
controller A stops, logging nothing. Controller B logs a warning that
controller A has left the cluster. I wait for five minutes, nothing
happens except a transaction timeout on the JBoss server. After ten
minutes I plug the ethernet back in, and controller A logs this:
14:21:05,390 INFO continuent.hedera.gms
Member(address=/10.0.0.60:49573, uid=10.0.0.60:49573) failed in
Group(gid=mySequoia)
14:21:05,390 WARN controller.virtualdatabase.mySequoia Controller
Member(address=/10.0.0.60:49573, uid=10.0.0.60:49573) has left the cluster.
14:21:05,390 INFO controller.virtualdatabase.mySequoia 1 requests were
waiting responses from Member(address=/10.0.0.60:49573, uid=10.0.0.60:49573)
14:21:05,390 WARN controller.RequestManager.mySequoia 1 controller(s)
died during execution of request 844424930133025
14:21:05,390 WARN controller.RequestManager.mySequoia Controller
Member(address=/10.0.0.60:49573, uid=10.0.0.60:49573) is suspected of
failure.
14:21:06,906 INFO controller.requestmanager.cleanup Waiting 120000ms
for client of controller 281474976710656 to failover
14:23:06,906 INFO controller.requestmanager.cleanup Cleanup for
controller 281474976710656 failure is completed.
and comes back to life, as does JBoss. (However, the cluster remains
broken -- neither controller sees the other any more.)
Originally I saw this problem with both controllers active and enabled,
with JBoss configured to round-robin them. I suspected a cluster
communications bug, so I simplified the deployment to this single active
controller to see what would happen.
What's going on here? Is it a controller bug, an Appia bug, maybe a
misconfiguration, or what?
_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia