[ https://forge.continuent.org/jira/browse/SEQUOIA-1034?page=comments#action_14495 ]
Robert Hodges commented on SEQUOIA-1034: ---------------------------------------- As a result of the problems with non-determinism described by this bug, we are indulging in a rethink of how to manage backend task queues. This may ultimately result in a long-term fix involving a 'pluggable' ordering mechanism so that we can support different ways of ordering requests ranging from very simple (e.g., dispatch to a single thread from the total order queue) to the existing mechanism with conflicting and non-conflicting queues. The rest of this comment contains a summary of the core problem and general approaches to solving it. Sequoia needs updates to execute deterministically on each backend. You might call this the "state machine assumption"--given that each backend is in the same state to start with and that updates are delivered in total order, each backend will have the same state after those updates are applied. The update algorithm of Sequoia is based on table locks, which provide hints which tables are updated by which requests. Using locking, we can permit requests to execute in parallel, which is critical to ensure adequate performance. The current BackendTaskQueues implementation provides a framework of queues to help schedule requests that are ordered through table locks. In general table locking for each request works quite well and provides deterministic execution without excessive complexity. However, a fundamental problem arises when applications use database transactions, namely that distributed deadlocks may arise between the controller and underlying backends. Let's assume that two transactions T1 and T2 update the row in table A, here denoted by "UPDATE(A)". If we hold locks at the request level in Sequoia, the following problem occurs: T1: BEGIN T2: BEGIN T1: UPDATE(A) <-- Gets lock in Sequoia and row lock in DB, updates, releases Sequoia lock, continues to hold row lock in DB T2: UPDATE(A) <-- Gets lock in Sequoia but is blocked from execution in DB by T1's row lock. T1: UPDATE(A) <-- Blocked in Sequoia awaiting lock on T1. (Stuck!) This problem would not happen when going directly to the database, as we would just wait for T1 to complete and then T2 would execute. However, with Sequoia we have a deadlock due to write locks in the database. It can also occur quite easily which shared locks, which PostgreSQL takes when doing updates in the presence of foreign keys. In MySQL InnoDB we don't get a deadlock. Instead, InnoDB gives up trying to acquire the lock after a timeout, which results in the infamous " Lock wait timeout exceeded" error. At this point the transaction typically fails unless the application is smart enough to resubmit. Here's a survey of the solutions to the distributed deadlock problem and notes on how they are currently applied in Sequoia. Solution #1. Don't use transactions. If you don't use transactions there is no problem, because database locks release as the statement completes. The solution is deterministic and free of deadlocks. It works by default for MySQL MyISAM. The down side is that not all applications can do this. Solution #2. Single thread transactions. If somebody starts a transaction that does a write, the transaction must be enqueued. It may only run after each preceding transaction completes. This solution is deterministic and deadlock-free, since transactions can never become interleaved and create a conflict. The downside is that this is very slow since it requires transactions to be fully ordered. Solution #3. Store locks on transactions, not requests, in Sequoia When a transaction cannot acquire locks, it must wait. This is the behavior of the 'enforceTableLocking' VDB option in Sequoia. It is deterministic but creates deadlocks in middleware since transactions may commonly acquire table locks in reverse order. Another problem we have found is that it is quite a bit slower since it tends to move you back toward the single-threaded transaction processing described in #2. Solution #4. Execute using request locks and deal with blocking when it occurs There are several variations on this approach, which are presented below. 4.0 Let the Chips Fall... This is the null hypothesis--do nothing. It works only if there are no interleaved requests across transactions. This is default behavior of Sequoia. 4.1 Timeouts on Request Locks After a time-out give up the lock in middleware and allow other requests to proceed. The problem with time-outs is that they are non-deterministic, since you might re-order updates if an update is simply slow rather than truly blocked. 4.2 Rollback of Blocking Transactions Roll back preceding transactions when a request blocks. Basically you would find and roll back transactions that might be blocking (e.g., any transaction that is not complete and may have previously held the lock). This seems difficult to implement and would not work in every case if there were very many locks, since you would get a lot of blocked transactions. It would potentially lead to large numbers of aborts. 4.3 Transfer Request Locks After Lock Timeout On MySQL InnoDB, requests that cannot obtain a lock within a certain period of time are rejected. This does not abort the transaction but creates an error that is reflected back from JDBC. At this point you would reorder the failed request so that it is waits until preceding transactions complete. The blocking seems to be deterministic for InnoDB; lock wait timeouts also do not cancel the transaction so you can retry easily. PostgreSQL on the other hand as well as other databases do not support lock timeouts and errors of any kind errors cause the transaction to abort, so you cannot just resubmit. Instead, you would wait forever. 4.4 Look for Locks after Requests Block This is a variation of #4.1. If an update is holding a request lock longer than a certain period of time, we look in the database to see if it is waiting on locks. If it is in fact waiting on a lock, we would simple grant the lock to the next request in the queue. This seems tricky but both MySQL/InnoDB and PostgreSQL provide mechanisms to look for blocked transactions dynamically. In the case of MySQL, you can parse output of 'show innodb engine status' to find blocked requests. In PostgreSQL you can submit queries on the pg_locks table. The advantage of this approach over 4.1 is that it should be deterministic provided database lock acquisition is deterministic, which we assure using table logs. Other approaches and comments on what is presented here are most welcome. I will make additional posts as we gather more information. For now we are looking mostly at MySQL and PostgreSQL but of course other databases are of interest as well. > Possible race condition leading to concurrent requests being executed in a > different order on backends > ------------------------------------------------------------------------------------------------------ > > Key: SEQUOIA-1034 > URL: https://forge.continuent.org/jira/browse/SEQUOIA-1034 > Project: Sequoia > Type: Bug > Components: Core > Versions: Sequoia 2.10.9 > Reporter: Stephane Giron > Assignee: Stephane Giron > Priority: Critical > Fix For: sequoia 2.10.10 > > > This can lead to backends inconsistencies. > As an example, this is the behavior that was observed : > CREATE TABLE `table1` ( > `db_id` bigint(20) NOT NULL auto_increment, > `col1` varchar(255) default NULL, > `creation_date` datetime default NULL, > `request_date` varchar(255) default NULL, > `col2` varchar(255) default NULL, > `version` varchar(255) default NULL, > PRIMARY KEY (`db_id`), > UNIQUE KEY `col1` (`col1`,`col2`) > ) ENGINE=InnoDB DEFAULT CHARSET=latin1; > Transaction 1 > begin > insert into table1 (version) values('a1'); > insert into table1 (version) values('c1'); > commit > Transaction 2 > begin > insert into table1 (version) values('b') > commit > T1 begin > T2 begin > T1 insert 1 > T2 insert 1 > T1 insert 2 > commits > The following result is what I got in the backends : > On one controller : > | 8 | NULL | NULL | NULL | NULL | a1 | > | 9 | NULL | NULL | NULL | NULL | c1 | > | 10 | NULL | NULL | NULL | NULL | b | > On the other : > | 8 | NULL | NULL | NULL | NULL | a1 | > | 9 | NULL | NULL | NULL | NULL | b | > | 10 | NULL | NULL | NULL | NULL | c1 | > This can happen if the statement "T2 insert 1", which will be posted in the > conflicting queue, can be processed on one controller (because there is > nothing else at this time in the non conflicting queue), whereas on the other > controller the "T1 insert 2" statement arrives in the non conflicting queue > before "T2 insert 1" starts executing. > This behavior can be observed essentially because of the auto incremented > key. However other type of statements may run out of order as well. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://forge.continuent.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira _______________________________________________ Sequoia mailing list [email protected] https://forge.continuent.org/mailman/listinfo/sequoia
