RE: [Sequoia] Deadlock when replaying recovery log

Andrew Lawrenson Thu, 03 Jul 2008 01:00:20 -0700

Emanuel,

        with regards to the auto-commit, pseudo-code for what's going on is 
basically, all in 1 thread:


1) Connection conn1 = getConnection();
2) Connection conn2 = getAutoCommitConnection(); // this connection will 
auto-commit
3) begin transaction t1 on conn1
4) perform an update on conn1   // this is part of t1
5) perform an update on conn2           // this is outside of t1
6) commit t1 on conn1

What I appear to see is that:
scheduler.suspendNewTransactions() would block #3

If RequestManager.suspendActivity() is called between #3 & #5:

scheduler.suspendNewWrites() will block #5
scheduler.waitForSuspendedTransactionsToComplete() will wait for #6

However, because this thread is blocked in #5, this thread will never reach #6, 
so waitForSuspendedTransactionsToComplete will eventually time out.

Thanks again,

Andy

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Emmanuel Cecchet
Sent: 02 July 2008 20:17
To: Sequoia general mailing list
Subject: Re: [Sequoia] Deadlock when replaying recovery log

Andrew,

>         the suspension of write activity is done from 
> RequestManager.suspendActivity(), which performs the following:
>
> 1) scheduler.suspendNewTransactions();
> 2) scheduler.suspendNewWrites();
> 3) scheduler.waitForSuspendedTransactionsToComplete();
> 4) scheduler.waitForSuspendedWritesToComplete();
>
> suspendNewTransactions is blocking any new transaction from starting, but 
> from trying to look through the code, I'm guessing that this only applies to 
> "proper" transactions, as opposed to any writes done via an auto-commit 
> connection - and these writes are blocked by the suspendNewWrites.
>
> Thus I'm assuming that in my scenario, where my app is executing auto-commit 
> inserts within the scope of a large transaction, the suspendNewWrites is 
> blocking the auto-commit inserts, which is then preventing the main 
> transaction from completing (so waitForSuspendedTransactionsToComplete 
> doesn't complete).
>
I don't get how an autocommit insert is going to block the main transaction. If 
the long transaction was already holding the lock, the autocommit would be 
blocked. If the long transaction was not holding the lock, the autocommit would 
proceed and finish. When writes are suspended, they are suspended before 
acquiring the locks so I don't really see how the deadlock can happen.
Could you describe a scenario that would show how such deadlock could happen?
> How, if I re-order the above logic into:
>
> 1) scheduler.suspendNewTransactions();
> 2) scheduler.waitForSuspendedTransactionsToComplete();
> 3) scheduler.suspendNewWrites();
> 4) scheduler.waitForSuspendedWritesToComplete();
>
> What I'm hoping will happen is that sequoia will block new transactions, wait 
> for existing ones to finish, then block any write activity outside of 
> transactions & wait for any existing ones of these to finish.
>
> I've tried this and have been unable to reproduce my problem below.
> However, I'm not 100% certain that this will have any other unanticipated 
> side-effects, or is otherwise unsuitable.
>
> What do you think?
>
I think the reason for blocking writes before is to prevent lazy transactions 
starts from blocking on one controller and not another one.
I have to double check that story as your solution would be easier to debug. 
Having an example showing how the current implementation fails would help 
motivating the refactoring.

Thanks again for your detailed feedback, Emmanuel


>
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of
> Andrew Lawrenson
> Sent: 27 June 2008 16:43
> To: Sequoia general mailing list
> Subject: RE: [Sequoia] Deadlock when replaying recovery log
>
> Emanuel,
>
>         I'm currently working around this issue by serializing my 
> transactions (so that each transaction is executed in serial).
>
> However, when I do this, I'm encountering another issue at the end of 
> recovery, if I'm recovering while activity is taking place.
> The exact circumstances here is that I have two backends, and I'm backing one 
> up while the system is in use.
>
> When the backend has been backed up, it needs to replay through the recovery 
> log to bring itself back up to date.
> It's managing to replay the bulk of the log up to the current point of
> execution (so far, so good)
>
> What sequoia then appears to be doing is suspending write activity, then 
> recovering through the last log entries - which makes sense.
> I believe it's suspending write activity by blocking new transactions
> & new writes, then waiting for existing transactions & writes to end
> (i.e. via RequestManager suspendActivity() )
>
> What I'm repeatedly seeing, however, are problems such as this:
>
> 2008-06-27 16:19:44,730 WARN
> org.continuent.sequoia.controller.recoverylog.RecoverThread - Recovery
> log entry marked as  still executing : StatementExecuteUpdateTask from
> transaction 5764 (delete from disclosure_variables where disc_id = 11
> 490)
> 2008-06-27 16:19:59,730 WARN
> org.continuent.sequoia.controller.scheduler - Waited for 15 secs but 1
> transactions still
> open: [6923]
> 2008-06-27 16:19:59,730 WARN
> org.continuent.sequoia.controller.scheduler - Will wait for 285 secs
> more and attempt to a bort them
> 2008-06-27 16:20:29,730 WARN
> org.continuent.sequoia.controller.scheduler - Waited for 45 secs but 1
> transactions still
> open: [6923]
> 2008-06-27 16:20:29,730 WARN
> org.continuent.sequoia.controller.scheduler - Will wait for 255 secs
> more and attempt to a bort them
>
> This continues until 300 seconds have passed, then it kills the transaction:
>
> 2008-06-27 16:24:44,998 WARN
> org.continuent.sequoia.controller.scheduler - Timeout reached (300
> secs), aborting remaini ng active transactions
> 2008-06-27 16:24:44,998 WARN
> org.continuent.sequoia.controller.scheduler - Aborting transaction
> 6923
>
>
> This problem appears to be caused by the type of work I'm performing.  When 
> database work is performed by my application, it is generally done as part of 
> a transaction, but some database activity is also performed via auto-commit 
> statements outside of the transaction, as they shouldn't be rolled-back if 
> problems occur (e.g. audit logs, marking jobs as failed, etc).
>
> When I look at the transaction which is marked as still being open - 6923 in 
> the above example - it has stopped partway through the work being perfomed, 
> just before executing an auto-commit statement.
>
> I assume that the auto-commit statement is being blocked by sequoia 
> suspending activity (either as a write or a new transaction) - but this means 
> that the work will never be completed and the main transaction committed - 
> hence the observed problem.
>
> After the transaction has been aborted, I see a rollback for this transaction 
> in the recovery log, followed by the auto-commit statement with a exec_time 
> of just over 300 seconds...
>
>
> Any suggestions?
>
> Many thanks again,
>
>         Andrew Lawrenson
>
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of
> Andrew Lawrenson
> Sent: 27 June 2008 16:01
> To: Sequoia general mailing list
> Subject: RE: [Sequoia] Deadlock when replaying recovery log
>
> Emanuel,
>
>         I've tried this again with distibution turned on, and the same 
> ordering occurs, e.g:
>
> Log_id 13625: Transaction_id = 12750, SqlStmt = begin Log_id 13626: 
> Transaction_id = 12750, SqlStmt = update users set lastlogin = {ts 
> '2008-06-27 15:42:15.0'} where id = ?
> Log_id 13627: Transaction_id = 12754, SqlStmt = begin Log_id 13628: 
> Transaction_id = 12754, SqlStmt = update users set lastlogin = {ts 
> '2008-06-27 15:42:15.0'} where id = ?
> Log_id 13629: Transaction_id = 12758, SqlStmt = begin Log_id 13630: 
> Transaction_id = 12758, SqlStmt = update users set lastlogin = {ts 
> '2008-06-27 15:42:15.0'} where id = ?
> Log_id 13631: Transaction_id = 12762, SqlStmt = begin Log_id 13632: 
> Transaction_id = 12762, SqlStmt = update users set lastlogin = {ts 
> '2008-06-27 15:42:15.0'} where id = ?
> Log_id 13633: Transaction_id = 12750, SqlStmt = commit Log_id 13634: 
> Transaction_id = 12754, SqlStmt = commit Log_id 13635: Transaction_id = 
> 12758, SqlStmt = commit ...
> Log_id 13638: Transaction_id = 12762, SqlStmt = commit
>
> Here's some log output, this is from a very large log logging virtualdatabase 
> & recoverylog at a DEBUG level, but I think contains what you want - if you 
> need more, let me know:
>
> 2008-06-27 15:48:35,765 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> handleMessageSingleThreaded (class
> org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
> StatementExecuteUpdate): W 55395 17813 update users set lastlogin =
> {ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
> 2008-06-27 15:48:35,765 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> 192.168.0.109:6954: Scheduling distributedRequest 55395 from
> Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
> 2008-06-27 15:48:35,765 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> handleMessageMultiThreaded (class
> org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
> StatementExecuteUpdate): W 55395 17813 update users set lastlogin =
> {ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
> 2008-06-27 15:48:35,765 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> 192.168.0.109:6954: Executing distributedRequest 55395 from
> Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
> 2008-06-27 15:48:35,765 DEBUG
> org.continuent.sequoia.controller.recoverylog - Logging request 20221
> 2008-06-27 15:48:35,765 DEBUG
> org.continuent.sequoia.controller.recoverylog - Releasing recovery log
> connection: [EMAIL PROTECTED] (XID
> = 53657), (SESSIONID = 4287), (DATABASE = HA), (DRDAID = null)
> 2008-06-27 15:48:35,765 DEBUG
> org.continuent.sequoia.controller.recoverylog - Logging in recovery
> log connection
> 2008-06-27 15:48:35,765 DEBUG
> org.continuent.sequoia.controller.recoverylog -
> 2057c01b-011a-ca77-dce2-ffffd8827a3c
> 2008-06-27 15:48:35,765 DEBUG
> org.continuent.sequoia.controller.recoverylog - Logging request 20222
> 2008-06-27 15:48:35,765 DEBUG
> org.continuent.sequoia.controller.recoverylog -
> 2057c01b-011a-ca77-dce2-ffffd8827a3c
> 2008-06-27 15:48:35,781 DEBUG
> org.continuent.sequoia.controller.recoverylog - Logging request 20222
> 2008-06-27 15:48:35,781 DEBUG
> org.continuent.sequoia.controller.recoverylog - Releasing recovery log
> connection: [EMAIL PROTECTED] (XID
> = 53660), (SESSIONID = 4288), (DATABASE = HA), (DRDAID = null)
> 2008-06-27 15:48:35,781 DEBUG
> org.continuent.sequoia.controller.recoverylog - Logging in recovery
> log connection
> 2008-06-27 15:48:35,781 DEBUG
> org.continuent.sequoia.controller.recoverylog -
> c065801d-011a-ca77-dce2-ffffd8827a3c
> 2008-06-27 15:48:35,796 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> handleMessageSingleThreaded (class
> org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
> StatementExecuteUpdate): W 55397 17814 update users set lastlogin =
> {ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
> 2008-06-27 15:48:35,796 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> 192.168.0.109:6954: Scheduling distributedRequest 55397 from
> Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
> 2008-06-27 15:48:35,796 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> handleMessageMultiThreaded (class
> org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
> StatementExecuteUpdate): W 55397 17814 update users set lastlogin =
> {ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
> 2008-06-27 15:48:35,796 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> 192.168.0.109:6954: Executing distributedRequest 55397 from
> Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
> 2008-06-27 15:48:35,796 DEBUG
> org.continuent.sequoia.controller.recoverylog - Logging request 20223
> 2008-06-27 15:48:35,796 DEBUG
> org.continuent.sequoia.controller.recoverylog - Releasing recovery log
> connection: [EMAIL PROTECTED] (XID
> = 53662), (SESSIONID = 4289), (DATABASE = HA), (DRDAID = null)
> 2008-06-27 15:48:35,796 DEBUG
> org.continuent.sequoia.controller.recoverylog - Logging in recovery
> log connection
> 2008-06-27 15:48:35,796 DEBUG
> org.continuent.sequoia.controller.recoverylog -
> 2057c01b-011a-ca77-dce2-ffffd8827a3c
> 2008-06-27 15:48:35,796 DEBUG
> org.continuent.sequoia.controller.recoverylog - Logging request 20224
> 2008-06-27 15:48:35,796 DEBUG
> org.continuent.sequoia.controller.recoverylog -
> 2057c01b-011a-ca77-dce2-ffffd8827a3c
> 2008-06-27 15:48:35,796 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> handleMessageSingleThreaded (class
> org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
> StatementExecuteUpdate): W 55411 17821 update users set lastlogin =
> {ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
> 2008-06-27 15:48:35,796 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> 192.168.0.109:6954: Scheduling distributedRequest 55411 from
> Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
> 2008-06-27 15:48:35,796 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> handleMessageMultiThreaded (class
> org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
> StatementExecuteUpdate): W 55411 17821 update users set lastlogin =
> {ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
> 2008-06-27 15:48:35,796 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> 192.168.0.109:6954: Executing distributedRequest 55411 from
> Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
> 2008-06-27 15:48:35,796 DEBUG
> org.continuent.sequoia.controller.recoverylog - Logging request 20225
> 2008-06-27 15:48:35,796 DEBUG
> org.continuent.sequoia.controller.recoverylog -
> 2057c01b-011a-ca77-dce2-ffffd8827a3c
> 2008-06-27 15:48:35,796 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> handleMessageSingleThreaded (class
> org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
> StatementExecuteUpdate): W 55413 17822 update users set lastlogin =
> {ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
> 2008-06-27 15:48:35,796 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> 192.168.0.109:6954: Scheduling distributedRequest 55413 from
> Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
> 2008-06-27 15:48:35,796 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.VirtualDatabaseWorke
> rThread.AndyCluster - Commit command
> 2008-06-27 15:48:35,812 DEBUG
> org.continuent.sequoia.controller.recoverylog - Logging request 20226
> 2008-06-27 15:48:35,812 DEBUG
> org.continuent.sequoia.controller.recoverylog - Releasing recovery log
> connection: [EMAIL PROTECTED] (XID
> = 53666), (SESSIONID = 4290), (DATABASE = HA), (DRDAID = null)
> 2008-06-27 15:48:35,812 DEBUG
> org.continuent.sequoia.controller.recoverylog - Logging in recovery
> log connection
> 2008-06-27 15:48:35,812 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> handleMessageMultiThreaded (class
> org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
> StatementExecuteUpdate): W 55413 17822 update users set lastlogin =
> {ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
> 2008-06-27 15:48:35,812 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> 192.168.0.109:6954: Executing distributedRequest 55413 from
> Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
> 2008-06-27 15:48:35,812 DEBUG
> org.continuent.sequoia.controller.recoverylog -
> 2057c01b-011a-ca77-dce2-ffffd8827a3c
> 2008-06-27 15:48:35,812 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> handleMessageSingleThreaded (class
> org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
> Commit): Commit transaction 17813
> 2008-06-27 15:48:35,812 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> handleMessageMultiThreaded (class
> org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
> Commit): Commit transaction 17813
> 2008-06-27 15:48:35,812 DEBUG
> org.continuent.sequoia.controller.recoverylog - Logging request 20227
> 2008-06-27 15:48:35,812 DEBUG
> org.continuent.sequoia.controller.recoverylog -
> 2057c01b-011a-ca77-dce2-ffffd8827a3c
> 2008-06-27 15:48:35,812 DEBUG
> org.continuent.sequoia.controller.recoverylog - Logging request 20228
> 2008-06-27 15:48:35,812 DEBUG
> org.continuent.sequoia.controller.recoverylog -
> 2057c01b-011a-ca77-dce2-ffffd8827a3c
> 2008-06-27 15:48:35,812 DEBUG
> org.continuent.sequoia.controller.recoverylog - Logging request 20229
> 2008-06-27 15:48:35,812 DEBUG
> org.continuent.sequoia.controller.recoverylog -
> 2057c01b-011a-ca77-dce2-ffffd8827a3c
> 2008-06-27 15:48:35,812 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.VirtualDatabaseWorke
> rThread.AndyCluster - Commit command
> 2008-06-27 15:48:35,812 DEBUG
> org.continuent.sequoia.controller.recoverylog - Logging request 20229
> 2008-06-27 15:48:35,812 DEBUG
> org.continuent.sequoia.controller.recoverylog -
> c065801d-011a-ca77-dce2-ffffd8827a3c
> 2008-06-27 15:48:35,812 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.VirtualDatabaseWorke
> rThread.AndyCluster - StatementExecuteQuery command
> 2008-06-27 15:48:35,812 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.VirtualDatabaseWorke
> rThread.AndyCluster - StatementExecuteUpdateWithKeys command
> 2008-06-27 15:48:35,828 DEBUG
> org.continuent.sequoia.controller.recoverylog - Logging request 20224
> 2008-06-27 15:48:35,828 DEBUG
> org.continuent.sequoia.controller.recoverylog - Releasing recovery log
> connection: [EMAIL PROTECTED] (XID
> = 53672), (SESSIONID = 4291), (DATABASE = HA), (DRDAID = null)
> 2008-06-27 15:48:35,828 DEBUG
> org.continuent.sequoia.controller.recoverylog - Logging in recovery
> log connection
> 2008-06-27 15:48:35,828 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> handleMessageSingleThreaded (class
> org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
> Commit): Commit transaction 17814
> 2008-06-27 15:48:35,828 DEBUG
> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
> handleMessageMultiThreaded (class
> org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
> Commit): Commit transaction 17814
> 2008-06-27 15:48:35,828 DEBUG
> org.continuent.sequoia.controller.recoverylog -
> c065801d-011a-ca77-dce2-ffffd8827a3c
> 2008-06-27 15:48:35,828 DEBUG
> org.continuent.sequoia.controller.recoverylog - Logging request 20230
> 2008-06-27 15:48:35,828 DEBUG
> org.continuent.sequoia.controller.recoverylog -
> 2057c01b-011a-ca77-dce2-ffffd8827a3c
>
>
> Thanks again for the help,
>
> Andy
>
>
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of
> Emmanuel Cecchet
> Sent: 27 June 2008 13:55
> To: Sequoia general mailing list
> Subject: Re: [Sequoia] Deadlock when replaying recovery log
>
> Hi Andrew,
>
>>         this is with Sequoia 2.10.10, on a single non-distributed controller 
>> with 2 backends - both emebedded derby databases.
>>
>>
> This might be an issue related to the non-distributed code. Could you add a 
> <Distributed> element to your configuration file and have another run just to 
> check if the order in the recovery log changes.
> If the problem persists, it would be interesting to have a trace of the log 
> with virtualdatabase and recoverylog loggers set to DEBUG in log4j.properties 
> to see in what order events occur in your scenario.
>
> Thanks a lot for the feedback,
> Emmanuel
>
>>         a set of sample recovery log entries are:
>>
>> Log_id 103: Transaction_id = 146, RequestId = 0, SqlStmt = begin
>> Log_id 104: Transaction_id = 146, RequestId = 322, SqlStmt = update users 
>> set lastlogin = {ts '2008-06-26 17:27:37.0'} where id = ?
>> Log_id 105: Transaction_id = 147, RequestId = 0, SqlStmt = begin
>> Log_id 106: Transaction_id = 147, RequestId = 326, SqlStmt = update users 
>> set lastlogin = {ts '2008-06-26 17:27:37.0'} where id = ?
>> Log_id 107: Transaction_id = 148, RequestId = 0, SqlStmt = begin
>> Log_id 108: Transaction_id = 148, RequestId = 328, SqlStmt = update users 
>> set lastlogin = {ts '2008-06-26 17:27:37.0'} where id = ?
>> Log_id 109: Transaction_id = 149, RequestId = 0, SqlStmt = begin
>> Log_id 110: Transaction_id = 149, RequestId = 330, SqlStmt = update users 
>> set lastlogin = {ts '2008-06-26 17:27:37.0'} where id = ?
>> <a few more of these>
>> Log_id 123: Transaction_id = 146, RequestId = 0, SqlStmt = commit
>> Log_id 124: Transaction_id = 147, RequestId = 0, SqlStmt = commit
>> Log_id 125: Transaction_id = 148, RequestId = 0, SqlStmt = commit
>> Log_id 126: Transaction_id = 149, RequestId = 0, SqlStmt = commit <a
>> few more commits>
>>
>> Many thanks for any help,
>>
>>         Andy
>>
>> -----Original Message-----
>> From: [EMAIL PROTECTED]
>> [mailto:[EMAIL PROTECTED] On Behalf Of
>> Emmanuel Cecchet
>> Sent: 26 June 2008 16:33
>> To: Sequoia general mailing list
>> Subject: Re: [Sequoia] Deadlock when replaying recovery log
>>
>> Hi Andrew,
>>
>>
>>
>>>     I'm having problems with replaying recovery logs, where the
>>> replay appears to be deadlocking itself.  I'm getting the following error:
>>>
>>> 2008-06-26 09:53:30,103 ERROR
>>> org.continuent.sequoia.controller.recoverylog.RecoverThread -
>>> Recovery process failed to replay transaction 1110: request
>>> StatementExecuteUpdateTask from transaction 1110 (update... )
>>> (Backend AndyCluster - BackendWorkerThread for backend 'DB-BACKUP'
>>> with RAIDb
>>> level:1 failed (A lock could not be obtained within the time requested.
>>>
>>>
>>> As an example, consider the following simplified scenario:
>>>
>>> I have a table 'test':
>>>
>>> create table test ( a int , b int )
>>> insert into test values ( 1, 1 )
>>>
>>> we then have two transactions, t1 & t2 each in different threads
>>> which do the following:
>>>
>>> 1) t1: update test set b = 2 where a = 1
>>> 2) t2: update test set b = 3 where a = 1
>>> 3) t1: commit
>>> 4) t2: commit
>>>
>>> normally, this will work fine, when t2 executes, it is blocked on a
>>> lock, but as soon as t1 commits, the lock is freed and t2 then commits.
>>>
>>> however, if you replay the transaction log, all the statements are
>>> executed sequentially, and what I think is happening is that the 2nd
>>> statement hangs and eventually has a lock timeout, as t1 will not be
>>> commited until after it.
>>> This means that you can't recover, as you can't replay the log.
>>>
>>> Is my understanding correct, and if so, is there anything I can do
>>> about this?
>>>
>>>
>> There is something wrong here because the recovery log stores queries in the 
>> execution order and not in their arrival order.
>> This means that the scenario you are describing should not be logged this 
>> way in the recovery log.
>>
>> Could you provide a dump of your recovery log content?
>> Which version of Sequoia are you using?
>> Are all your database backends running the same database engine?
>>
>> Thanks for your feedback,
>> Emmanuel
>>
>>
>>


--
Emmanuel Cecchet
FTO @ Frog Thinker
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: [EMAIL PROTECTED]
Skype: emmanuel_cecchet

_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

RE: [Sequoia] Deadlock when replaying recovery log

Reply via email to