Waseem,
We are using *collocated controller configuration in a *a RAIDb-1 configuration. We have setup two controllers, one on each of the two database servers, with one database backend each. We have not enabled controller replication (therefore two controllers do not know about each other), instead we are using MySQL replication to replicate data between the two database servers. We have configured one of the web servers to use sequoia and load balance requests between the database servers 1 and 2 (using roundRobin algorithm). The other three web servers are configured to send read/write requests to just database server 1.
I think that you have a major design problem here. Transactions happening through Sequoia will not be properly ordered with transactions that are going directly to your server 1.
Problems 1:
I configured our production environment as described above -during low usage period. All started well, however couple of hours later, I got the errors belows. 14:02:36,901 ERROR controller.loadbalancer.RAIDb1 write request 153210 failed: Backend dbXXXX - BackendWorkerThread for backend 'dbXXXX' with RAIDb level:1 failed (Lock wait timeout exceeded; try restarting transaction)
That seems logical. This is a deadlock detected by InnoDB. You are fooling Sequoia's locking by accessing directly the database in its back. Therefore locking logic will mis-schedule queries in Sequoia and potentially introduce deadlocks.
Problems 2:
About 10 minutes after problem 1 occured, the web server which was using sequoia to load balance requests "locked up " unexpectantly. We were not able to ping /telnet to it or anything. It was just not responding. Eventually we had to reboot the machine to bring it back up. I turned off sequioa and so far no problems with this machine has occured. There wsa no useful information in the logs (include sequoia logs) to indicate what happened. Do you have any thoughts here? We have not seen this machine do this before and therefore suspect that it must have somethign to do with the usage of sequoia - however no proof or idea what caused it? (note, controllers are not running on the web servers, they are runing on the database servers).
There is certainly a reasonable explanation for that. A query that was not supposed to block in Sequoia will remain indefinitely blocked because a concurrent transaction (not going through Sequoia) is holding the required locks. Note that I am not sure how messy this can become with MySQL replication in the game.

To solve the problem, all traffic going to a database replicated with Sequoia should go through Sequoia. You are not allowed to access backends directly once replication is started. If you want to use Sequoia with MySQL replication, you should look at the ParallelDB load balancer instead of RAIDb-1.

Continuent provides professional services if you need help in designing the proper architecture for your needs.

Hope this helps,
Emmanuel

--
Emmanuel Cecchet
Chief Scientific Officer, Continuent

Blog: http://emanux.blogspot.com/
Open source: http://www.continuent.org
Corporate: http://www.continuent.com
Skype: emmanuel_cecchet
Cell: +33 687 342 685


_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

Reply via email to