Re: [Sequoia] Lock wait timeout exceeded problem and machine " locking up".

Emmanuel Cecchet Fri, 05 Jan 2007 10:33:05 -0800

Waseem,

We are using *collocated controller configuration in a *a RAIDb-1configuration. We have setup two controllers, one on each of the twodatabase servers, with one database backend each. We have not enabledcontroller replication (therefore two controllers do not know abouteach other), instead we are using MySQL replication to replicate databetween the two database servers. We have configured one of the webservers to use sequoia and load balance requests between the databaseservers 1 and 2 (using roundRobin algorithm). The other three webservers are configured to send read/write requests to just databaseserver 1.

I think that you have a major design problem here. Transactionshappening through Sequoia will not be properly ordered with transactionsthat are going directly to your server 1.

Problems 1:
I configured our production environment as described above -during lowusage period. All started well, however couple of hours later, I gotthe errors belows.14:02:36,901 ERROR controller.loadbalancer.RAIDb1 write request 153210failed:Backend dbXXXX - BackendWorkerThread for backend 'dbXXXX' with RAIDblevel:1 failed (Lock wait timeout exceeded; try restarting transaction)

That seems logical. This is a deadlock detected by InnoDB. You arefooling Sequoia's locking by accessing directly the database in itsback. Therefore locking logic will mis-schedule queries in Sequoia andpotentially introduce deadlocks.

Problems 2:
About 10 minutes after problem 1 occured, the web server which wasusing sequoia to load balance requests "locked up " unexpectantly. Wewere not able to ping /telnet to it or anything. It was just notresponding. Eventually we had to reboot the machine to bring it backup. I turned off sequioa and so far no problems with this machine hasoccured. There wsa no useful information in the logs (include sequoialogs) to indicate what happened. Do you have any thoughts here? Wehave not seen this machine do this before and therefore suspect thatit must have somethign to do with the usage of sequoia - however noproof or idea what caused it? (note, controllers are not running onthe web servers, they are runing on the database servers).

There is certainly a reasonable explanation for that. A query that wasnot supposed to block in Sequoia will remain indefinitely blockedbecause a concurrent transaction (not going through Sequoia) is holdingthe required locks. Note that I am not sure how messy this can becomewith MySQL replication in the game.

To solve the problem, all traffic going to a database replicated withSequoia should go through Sequoia. You are not allowed to accessbackends directly once replication is started.If you want to use Sequoia with MySQL replication, you should look atthe ParallelDB load balancer instead of RAIDb-1.

Continuent provides professional services if you need help in designingthe proper architecture for your needs.


Hope this helps,
Emmanuel

--
Emmanuel Cecchet
Chief Scientific Officer, Continuent

Blog: http://emanux.blogspot.com/
Open source: http://www.continuent.org
Corporate: http://www.continuent.com
Skype: emmanuel_cecchet
Cell: +33 687 342 685


_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

Re: [Sequoia] Lock wait timeout exceeded problem and machine " locking up".

Reply via email to