Waseem,
We are using *collocated controller configuration in a *a RAIDb-1
configuration. We have setup two controllers, one on each of the two
database servers, with one database backend each. We have not enabled
controller replication (therefore two controllers do not know about
each other), instead we are using MySQL replication to replicate data
between the two database servers. We have configured one of the web
servers to use sequoia and load balance requests between the database
servers 1 and 2 (using roundRobin algorithm). The other three web
servers are configured to send read/write requests to just database
server 1.
I think that you have a major design problem here. Transactions
happening through Sequoia will not be properly ordered with transactions
that are going directly to your server 1.
Problems 1:
I configured our production environment as described above -during low
usage period. All started well, however couple of hours later, I got
the errors belows.
14:02:36,901 ERROR controller.loadbalancer.RAIDb1 write request 153210
failed:
Backend dbXXXX - BackendWorkerThread for backend 'dbXXXX' with RAIDb
level:1 failed (Lock wait timeout exceeded; try restarting transaction)
That seems logical. This is a deadlock detected by InnoDB. You are
fooling Sequoia's locking by accessing directly the database in its
back. Therefore locking logic will mis-schedule queries in Sequoia and
potentially introduce deadlocks.
Problems 2:
About 10 minutes after problem 1 occured, the web server which was
using sequoia to load balance requests "locked up " unexpectantly. We
were not able to ping /telnet to it or anything. It was just not
responding. Eventually we had to reboot the machine to bring it back
up. I turned off sequioa and so far no problems with this machine has
occured. There wsa no useful information in the logs (include sequoia
logs) to indicate what happened. Do you have any thoughts here? We
have not seen this machine do this before and therefore suspect that
it must have somethign to do with the usage of sequoia - however no
proof or idea what caused it? (note, controllers are not running on
the web servers, they are runing on the database servers).
There is certainly a reasonable explanation for that. A query that was
not supposed to block in Sequoia will remain indefinitely blocked
because a concurrent transaction (not going through Sequoia) is holding
the required locks. Note that I am not sure how messy this can become
with MySQL replication in the game.
To solve the problem, all traffic going to a database replicated with
Sequoia should go through Sequoia. You are not allowed to access
backends directly once replication is started.
If you want to use Sequoia with MySQL replication, you should look at
the ParallelDB load balancer instead of RAIDb-1.
Continuent provides professional services if you need help in designing
the proper architecture for your needs.
Hope this helps,
Emmanuel
--
Emmanuel Cecchet
Chief Scientific Officer, Continuent
Blog: http://emanux.blogspot.com/
Open source: http://www.continuent.org
Corporate: http://www.continuent.com
Skype: emmanuel_cecchet
Cell: +33 687 342 685
_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia