[ 
https://forge.continuent.org/jira/browse/SEQUOIA-1034?page=comments#action_14495
 ] 

Robert Hodges commented on SEQUOIA-1034:
----------------------------------------

As a result of the problems with non-determinism described by this bug, we are 
indulging in a rethink of how to manage backend task queues.  This may 
ultimately result in a long-term fix involving a 'pluggable' ordering mechanism 
so that we can support different ways of ordering requests ranging from very 
simple (e.g., dispatch to a single thread from the total order queue) to the 
existing mechanism with conflicting and non-conflicting queues.   The rest of 
this comment contains a summary of the core problem and general approaches to 
solving it.  

Sequoia needs updates to execute deterministically on each backend.  You might 
call this the "state machine assumption"--given that each backend is in the 
same state to start with and that updates are delivered in total order, each 
backend will have the same state after those updates are applied.   

The update algorithm of Sequoia is based on table locks, which provide hints 
which tables are updated by which requests.  Using locking, we can permit 
requests to execute in parallel, which is critical to ensure adequate 
performance.  The current BackendTaskQueues implementation provides a framework 
of queues to help schedule requests that are ordered through table locks.   

In general table locking for each request works quite well and provides 
deterministic execution without excessive complexity.  However, a fundamental 
problem arises when applications use database transactions, namely that 
distributed deadlocks may arise between the controller and underlying backends. 
  Let's assume that two transactions T1 and T2 update the row in table A, here 
denoted by "UPDATE(A)".   If we hold locks at the request level in Sequoia, the 
following problem occurs: 

T1: BEGIN
T2: BEGIN
T1: UPDATE(A)   <-- Gets lock in Sequoia and row lock in DB, updates, releases 
Sequoia lock, continues to hold row lock in DB
T2: UPDATE(A)   <-- Gets lock in Sequoia but is blocked from execution in DB by 
T1's row lock. 
T1: UPDATE(A)   <-- Blocked in Sequoia awaiting lock on T1. 
(Stuck!) 

This problem would not happen when going directly to the database, as we would 
just wait for T1 to complete and then T2 would execute.  However, with Sequoia 
we have a deadlock due to write locks in the database.  It can also occur quite 
easily which shared locks, which PostgreSQL takes when doing updates in the 
presence of foreign keys.   In MySQL InnoDB we don't get a deadlock.  Instead, 
InnoDB gives up trying to acquire the lock after a timeout, which results in 
the infamous " Lock wait timeout exceeded" error.  At this point the 
transaction typically fails unless the application is smart enough to resubmit. 
 

Here's a survey of the solutions to the distributed deadlock problem and notes 
on how they are currently applied in Sequoia.  

Solution #1.  Don't use transactions.  

If you don't use transactions there is no problem, because database locks 
release as the statement completes.  The solution is deterministic and free of 
deadlocks.   It works by default for MySQL MyISAM.  The down side is that not 
all applications can do this. 

Solution #2.  Single thread transactions.  

If somebody starts a transaction that does a write, the transaction must be 
enqueued.  It may only run after each preceding transaction completes.  This 
solution is deterministic and deadlock-free, since transactions can never 
become interleaved and create a conflict.   The downside is that this  is very 
slow since it requires transactions to be fully ordered. 

Solution #3.  Store locks on transactions, not requests,  in Sequoia

When a transaction cannot acquire locks, it must wait.  This is the behavior of 
the 'enforceTableLocking' VDB option in Sequoia.  It is deterministic but 
creates deadlocks in middleware since transactions may commonly acquire table 
locks in reverse order.   Another problem we have found is that it is quite a 
bit slower since it tends to move you back toward the single-threaded 
transaction processing described in #2.  

Solution #4.  Execute using request locks and deal with blocking when it occurs

There are several variations on this approach, which are presented below.  

4.0 Let the Chips Fall...

This is the null hypothesis--do nothing.  It works only if there are no 
interleaved requests across transactions.  This is default behavior of Sequoia. 
 

4.1 Timeouts on Request Locks

After a time-out give up the lock in middleware and allow other requests to 
proceed.  The problem with time-outs is that they are non-deterministic, since 
you might re-order updates if an update is simply slow rather than truly 
blocked.   

4.2  Rollback of Blocking Transactions

Roll back preceding transactions when a request blocks.  Basically you would 
find and roll back transactions that might be blocking (e.g., any transaction 
that is not complete and may have previously held the lock).  This seems 
difficult to implement and would not work in every case if there were very many 
locks, since you would get a lot of blocked transactions.  It would potentially 
lead to large numbers of aborts.  

4.3 Transfer Request Locks After Lock Timeout

On MySQL InnoDB, requests that cannot obtain a lock within a certain period of 
time are rejected.  This does not abort the transaction but creates an error 
that is reflected back from JDBC.   At this point you would reorder the failed 
request so that it is waits until preceding transactions complete.   The 
blocking seems to be deterministic for InnoDB; lock wait timeouts also do not 
cancel the transaction so you can retry easily.  PostgreSQL on the other hand 
as well as other databases do not support lock timeouts and errors of any kind 
errors cause the transaction to abort, so you cannot just resubmit.  Instead, 
you would wait forever.  

4.4 Look for Locks after Requests Block

This is a variation of #4.1.  If an update is holding a request lock longer 
than a certain period of time, we look in the database to see if it is waiting 
on locks.  If it is in fact waiting on a lock, we would simple grant the lock 
to the next request in the queue.   This seems tricky but both MySQL/InnoDB and 
PostgreSQL provide mechanisms to look for blocked transactions dynamically.  In 
the case of MySQL, you can parse output of 'show innodb engine status' to find 
blocked requests.  In PostgreSQL you can submit queries on the pg_locks table.  
 The advantage of this approach over 4.1 is that it should be deterministic 
provided database lock acquisition is deterministic, which we assure using 
table logs. 

Other approaches and comments on what is presented here are most welcome.   I 
will make additional posts as we gather more information.  For now we are 
looking mostly at MySQL and PostgreSQL but of course other databases are of 
interest as well. 

> Possible race condition leading to concurrent requests being executed in a 
> different order on backends
> ------------------------------------------------------------------------------------------------------
>
>          Key: SEQUOIA-1034
>          URL: https://forge.continuent.org/jira/browse/SEQUOIA-1034
>      Project: Sequoia
>         Type: Bug

>   Components: Core
>     Versions: Sequoia 2.10.9
>     Reporter: Stephane Giron
>     Assignee: Stephane Giron
>     Priority: Critical
>      Fix For: sequoia 2.10.10

>
>
> This can lead to backends inconsistencies.
> As an example, this is the behavior that was observed :
> CREATE TABLE `table1` ( 
>   `db_id` bigint(20) NOT NULL auto_increment, 
>   `col1` varchar(255) default NULL, 
>   `creation_date` datetime default NULL, 
>   `request_date` varchar(255) default NULL, 
>   `col2` varchar(255) default NULL, 
>   `version` varchar(255) default NULL, 
>   PRIMARY KEY (`db_id`), 
>   UNIQUE KEY `col1` (`col1`,`col2`) 
> ) ENGINE=InnoDB DEFAULT CHARSET=latin1; 
> Transaction 1 
> begin 
> insert into table1 (version) values('a1'); 
> insert into table1 (version) values('c1'); 
> commit 
> Transaction 2 
> begin 
> insert into table1 (version) values('b') 
> commit 
> T1 begin 
> T2 begin 
> T1 insert 1 
> T2 insert 1 
> T1 insert 2 
> commits 
> The following result is what I got in the backends : 
> On one controller : 
> | 8 | NULL | NULL | NULL | NULL | a1 | 
> | 9 | NULL | NULL | NULL | NULL | c1 | 
> | 10 | NULL | NULL | NULL | NULL | b | 
> On the other : 
> | 8 | NULL | NULL | NULL | NULL | a1 | 
> | 9 | NULL | NULL | NULL | NULL | b | 
> | 10 | NULL | NULL | NULL | NULL | c1 | 
> This can happen if the statement "T2 insert 1", which will be posted in the 
> conflicting queue, can be processed on one controller (because there is 
> nothing else at this time in the non conflicting queue), whereas on the other 
> controller the "T1 insert 2" statement arrives in the non conflicting queue 
> before "T2 insert 1" starts executing.
> This behavior can be observed essentially because of the auto incremented 
> key. However other type of statements may run out of order as well.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   https://forge.continuent.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

Reply via email to