[Sequoia] controller hangs on broken network

Don Isenor Fri, 31 Aug 2007 14:58:23 -0700

Sequoia appears not to handle network failure gracefully. My configuration:
- two MS Windows servers, A (10.0.0.61) and B (10.0.0.60).
- JBoss 4.0.5
- Sequoia 2.10.9 using Appia default configuration.
- MySql 5.0.41
- Server A running JBoss, Sequoia controller, MySql backend.
- Server B running Sequoia controller, MySql backend.

- Controller A and B are (the only) members of a cluster called"mySequoia", as confirmed on each machine using "show controllers".- JBoss is configured to use only controller A, via"<connection-url>jdbc:sequoia://A/mySequoia</connection-url>".

- B's backed is disabled.

Everything works fine under load, with JBoss happily hitting controllerA, which in turn updates the database backend on A. Then I unplug theethernet cable on server B, and everything hangs. JBoss stops,controller A stops, logging nothing. Controller B logs a warning thatcontroller A has left the cluster. I wait for five minutes, nothinghappens except a transaction timeout on the JBoss server. After tenminutes I plug the ethernet back in, and controller A logs this:

14:21:05,390 INFO continuent.hedera.gmsMember(address=/10.0.0.60:49573, uid=10.0.0.60:49573) failed inGroup(gid=mySequoia)14:21:05,390 WARN controller.virtualdatabase.mySequoia ControllerMember(address=/10.0.0.60:49573, uid=10.0.0.60:49573) has left the cluster.14:21:05,390 INFO controller.virtualdatabase.mySequoia 1 requests werewaiting responses from Member(address=/10.0.0.60:49573, uid=10.0.0.60:49573)14:21:05,390 WARN controller.RequestManager.mySequoia 1 controller(s)died during execution of request 84442493013302514:21:05,390 WARN controller.RequestManager.mySequoia ControllerMember(address=/10.0.0.60:49573, uid=10.0.0.60:49573) is suspected offailure.14:21:06,906 INFO controller.requestmanager.cleanup Waiting 120000msfor client of controller 281474976710656 to failover14:23:06,906 INFO controller.requestmanager.cleanup Cleanup forcontroller 281474976710656 failure is completed.

and comes back to life, as does JBoss. (However, the cluster remainsbroken -- neither controller sees the other any more.)

Originally I saw this problem with both controllers active and enabled,with JBoss configured to round-robin them. I suspected a clustercommunications bug, so I simplified the deployment to this single activecontroller to see what would happen.What's going on here? Is it a controller bug, an Appia bug, maybe amisconfiguration, or what?

_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

[Sequoia] controller hangs on broken network

Reply via email to