Re: [Sequoia] controller hangs on broken network

Don Isenor Tue, 04 Sep 2007 10:58:23 -0700

Right you are, Emmanuel, it was a TCP timeout thing. However, it's not aWindows issue, it's a JVM issue. The TCP timeout is a JVM setting andthe default is infinity. When I set it to ten seconds (by adding-Dsun.net.client.defaultConnectTimeout=10000-Dsun.net.client.defaultReadTimeout=10000 to bin/controller.bat) theproblem was fixed -- after ten seconds Sequoia recovered and resumedtransactions.

It would be helpful if bin/controller.bat set these timeouts by default-- the current default behaviour (to hang Sequoia forever) is not good.

One question though... if group communications uses UDP and is notsusceptible to the TCP timeout problem, then why did the controller Ahang when I disconnected controller B? Controller A's backend databaseis local.


Emmanuel Cecchet wrote:

Hi Don,
What you are describing are just TCP timeouts. When you unplug yourcable all processes have to wait for the kernel to timeout on TCPconnections. You might want to tune those TCP timeout settings (Idon't know how to do that in Windows but there are probably manyresources on the web for that).Note that the group communication uses UDP-based heartbeat andtherefore does not suffer the TCP timeout problem.
Hope this helps,
Emmanuel
Sequoia appears not to handle network failure gracefully. Myconfiguration:
- two MS Windows servers, A (10.0.0.61) and B (10.0.0.60).
- JBoss 4.0.5
- Sequoia 2.10.9 using Appia default configuration.
- MySql 5.0.41
- Server A running JBoss, Sequoia controller, MySql backend.
- Server B running Sequoia controller, MySql backend.
- Controller A and B are (the only) members of a cluster called"mySequoia", as confirmed on each machine using "show controllers".- JBoss is configured to use only controller A, via"<connection-url>jdbc:sequoia://A/mySequoia</connection-url>".
- B's backed is disabled.
Everything works fine under load, with JBoss happily hittingcontroller A, which in turn updates the database backend on A. Then Iunplug the ethernet cable on server B, and everything hangs. JBossstops, controller A stops, logging nothing. Controller B logs awarning that controller A has left the cluster. I wait for fiveminutes, nothing happens except a transaction timeout on the JBossserver. After ten minutes I plug the ethernet back in, and controllerA logs this:
14:21:05,390 INFO continuent.hedera.gmsMember(address=/10.0.0.60:49573, uid=10.0.0.60:49573) failed inGroup(gid=mySequoia)14:21:05,390 WARN controller.virtualdatabase.mySequoia ControllerMember(address=/10.0.0.60:49573, uid=10.0.0.60:49573) has left thecluster.14:21:05,390 INFO controller.virtualdatabase.mySequoia 1 requestswere waiting responses from Member(address=/10.0.0.60:49573,uid=10.0.0.60:49573)14:21:05,390 WARN controller.RequestManager.mySequoia 1controller(s) died during execution of request 84442493013302514:21:05,390 WARN controller.RequestManager.mySequoia ControllerMember(address=/10.0.0.60:49573, uid=10.0.0.60:49573) is suspected offailure.14:21:06,906 INFO controller.requestmanager.cleanup Waiting 120000msfor client of controller 281474976710656 to failover14:23:06,906 INFO controller.requestmanager.cleanup Cleanup forcontroller 281474976710656 failure is completed.
and comes back to life, as does JBoss. (However, the cluster remainsbroken -- neither controller sees the other any more.)
Originally I saw this problem with both controllers active andenabled, with JBoss configured to round-robin them. I suspected acluster communications bug, so I simplified the deployment to thissingle active controller to see what would happen.What's going on here? Is it a controller bug, an Appia bug, maybe amisconfiguration, or what?
_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia


_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

Re: [Sequoia] controller hangs on broken network

Reply via email to