RE: [Sequoia] Deadlock when replaying recovery log

Andrew Lawrenson Fri, 04 Jul 2008 01:20:42 -0700

Emanuel,

to try & clarify:


the six steps below are executed by a single application thread - so, for 
example, for #6 (the commit) to be executed, #5 (the auto-commit update) must 
have finished first.

now, my understanding is that if this thread has begun the transaction (#3), 
then RequestManager.suspendActivity() is called, the Request Manager will call:
scheduler.suspendNewTransactions()
scheduler.suspendNewWrites()
scheduler.waitForSuspendedTransactionsToComplete()

As the app then continues executing, the suspension of new writes will stop #5 
from completing.
But waitForSuspendedTransactionsToComplete will wait for the commit (#6) to be 
executed.
As the app is waiting for #5 to complete before it commits, and #5 is blocked, 
it will never reach #6, and never commit.
Thus, everything grinds to a halt, until it's all timed-out after 300 seconds.


Thanks again,

        Andy


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Emmanuel Cecchet
Sent: 03 July 2008 19:31
To: Sequoia general mailing list
Subject: Re: [Sequoia] Deadlock when replaying recovery log

Hi Andrew,

>         with regards to the auto-commit, pseudo-code for what's going on is 
> basically, all in 1 thread:
>
> 1) Connection conn1 = getConnection();
> 2) Connection conn2 = getAutoCommitConnection(); // this connection
> will auto-commit
> 3) begin transaction t1 on conn1
> 4) perform an update on conn1   // this is part of t1
> 5) perform an update on conn2           // this is outside of t1
> 6) commit t1 on conn1
>
> What I appear to see is that:
> scheduler.suspendNewTransactions() would block #3
>
> If RequestManager.suspendActivity() is called between #3 & #5:
>
> scheduler.suspendNewWrites() will block #5
> scheduler.waitForSuspendedTransactionsToComplete() will wait for #6
>
If #3 was blocked, the transaction is not started and 
waitForSuspendedTransactionsToComplete should not wait.  I have to double check 
that but that should be the goal.

> However, because this thread is blocked in #5, this thread will never reach 
> #6, so waitForSuspendedTransactionsToComplete will eventually time out.
>
The thing I don't understand is that 2 transactions should execute in 2 
separate thread so I don't see why #5 would be blocked unless this is 
application level synchronization.

I have to look into the code more closely but if you can enlighten me on the 
issues I raised, that would help a lot.
Thanks again,
Emmanuel

> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of
> Emmanuel Cecchet
> Sent: 02 July 2008 20:17
> To: Sequoia general mailing list
> Subject: Re: [Sequoia] Deadlock when replaying recovery log
>
> Andrew,
>
>
>>         the suspension of write activity is done from 
>> RequestManager.suspendActivity(), which performs the following:
>>
>> 1) scheduler.suspendNewTransactions();
>> 2) scheduler.suspendNewWrites();
>> 3) scheduler.waitForSuspendedTransactionsToComplete();
>> 4) scheduler.waitForSuspendedWritesToComplete();
>>
>> suspendNewTransactions is blocking any new transaction from starting, but 
>> from trying to look through the code, I'm guessing that this only applies to 
>> "proper" transactions, as opposed to any writes done via an auto-commit 
>> connection - and these writes are blocked by the suspendNewWrites.
>>
>> Thus I'm assuming that in my scenario, where my app is executing auto-commit 
>> inserts within the scope of a large transaction, the suspendNewWrites is 
>> blocking the auto-commit inserts, which is then preventing the main 
>> transaction from completing (so waitForSuspendedTransactionsToComplete 
>> doesn't complete).
>>
>>
> I don't get how an autocommit insert is going to block the main transaction. 
> If the long transaction was already holding the lock, the autocommit would be 
> blocked. If the long transaction was not holding the lock, the autocommit 
> would proceed and finish. When writes are suspended, they are suspended 
> before acquiring the locks so I don't really see how the deadlock can happen.
> Could you describe a scenario that would show how such deadlock could happen?
>
>> How, if I re-order the above logic into:
>>
>> 1) scheduler.suspendNewTransactions();
>> 2) scheduler.waitForSuspendedTransactionsToComplete();
>> 3) scheduler.suspendNewWrites();
>> 4) scheduler.waitForSuspendedWritesToComplete();
>>
>> What I'm hoping will happen is that sequoia will block new transactions, 
>> wait for existing ones to finish, then block any write activity outside of 
>> transactions & wait for any existing ones of these to finish.
>>
>> I've tried this and have been unable to reproduce my problem below.
>> However, I'm not 100% certain that this will have any other unanticipated 
>> side-effects, or is otherwise unsuitable.
>>
>> What do you think?
>>
>>
> I think the reason for blocking writes before is to prevent lazy transactions 
> starts from blocking on one controller and not another one.
> I have to double check that story as your solution would be easier to debug. 
> Having an example showing how the current implementation fails would help 
> motivating the refactoring.
>
> Thanks again for your detailed feedback, Emmanuel
>
>
>
>> -----Original Message-----
>> From: [EMAIL PROTECTED]
>> [mailto:[EMAIL PROTECTED] On Behalf Of
>> Andrew Lawrenson
>> Sent: 27 June 2008 16:43
>> To: Sequoia general mailing list
>> Subject: RE: [Sequoia] Deadlock when replaying recovery log
>>
>> Emanuel,
>>
>>         I'm currently working around this issue by serializing my 
>> transactions (so that each transaction is executed in serial).
>>
>> However, when I do this, I'm encountering another issue at the end of 
>> recovery, if I'm recovering while activity is taking place.
>> The exact circumstances here is that I have two backends, and I'm backing 
>> one up while the system is in use.
>>
>> When the backend has been backed up, it needs to replay through the recovery 
>> log to bring itself back up to date.
>> It's managing to replay the bulk of the log up to the current point
>> of execution (so far, so good)
>>
>> What sequoia then appears to be doing is suspending write activity, then 
>> recovering through the last log entries - which makes sense.
>> I believe it's suspending write activity by blocking new transactions
>> & new writes, then waiting for existing transactions & writes to end
>> (i.e. via RequestManager suspendActivity() )
>>
>> What I'm repeatedly seeing, however, are problems such as this:
>>
>> 2008-06-27 16:19:44,730 WARN
>> org.continuent.sequoia.controller.recoverylog.RecoverThread -
>> Recovery log entry marked as  still executing :
>> StatementExecuteUpdateTask from transaction 5764 (delete from
>> disclosure_variables where disc_id = 11
>> 490)
>> 2008-06-27 16:19:59,730 WARN
>> org.continuent.sequoia.controller.scheduler - Waited for 15 secs but
>> 1 transactions still
>> open: [6923]
>> 2008-06-27 16:19:59,730 WARN
>> org.continuent.sequoia.controller.scheduler - Will wait for 285 secs
>> more and attempt to a bort them
>> 2008-06-27 16:20:29,730 WARN
>> org.continuent.sequoia.controller.scheduler - Waited for 45 secs but
>> 1 transactions still
>> open: [6923]
>> 2008-06-27 16:20:29,730 WARN
>> org.continuent.sequoia.controller.scheduler - Will wait for 255 secs
>> more and attempt to a bort them
>>
>> This continues until 300 seconds have passed, then it kills the transaction:
>>
>> 2008-06-27 16:24:44,998 WARN
>> org.continuent.sequoia.controller.scheduler - Timeout reached (300
>> secs), aborting remaini ng active transactions
>> 2008-06-27 16:24:44,998 WARN
>> org.continuent.sequoia.controller.scheduler - Aborting transaction
>> 6923
>>
>>
>> This problem appears to be caused by the type of work I'm performing.  When 
>> database work is performed by my application, it is generally done as part 
>> of a transaction, but some database activity is also performed via 
>> auto-commit statements outside of the transaction, as they shouldn't be 
>> rolled-back if problems occur (e.g. audit logs, marking jobs as failed, etc).
>>
>> When I look at the transaction which is marked as still being open - 6923 in 
>> the above example - it has stopped partway through the work being perfomed, 
>> just before executing an auto-commit statement.
>>
>> I assume that the auto-commit statement is being blocked by sequoia 
>> suspending activity (either as a write or a new transaction) - but this 
>> means that the work will never be completed and the main transaction 
>> committed - hence the observed problem.
>>
>> After the transaction has been aborted, I see a rollback for this 
>> transaction in the recovery log, followed by the auto-commit statement with 
>> a exec_time of just over 300 seconds...
>>
>>
>> Any suggestions?
>>
>> Many thanks again,
>>
>>         Andrew Lawrenson
>>
>>
>> -----Original Message-----
>> From: [EMAIL PROTECTED]
>> [mailto:[EMAIL PROTECTED] On Behalf Of
>> Andrew Lawrenson
>> Sent: 27 June 2008 16:01
>> To: Sequoia general mailing list
>> Subject: RE: [Sequoia] Deadlock when replaying recovery log
>>
>> Emanuel,
>>
>>         I've tried this again with distibution turned on, and the same 
>> ordering occurs, e.g:
>>
>> Log_id 13625: Transaction_id = 12750, SqlStmt = begin Log_id 13626: 
>> Transaction_id = 12750, SqlStmt = update users set lastlogin = {ts 
>> '2008-06-27 15:42:15.0'} where id = ?
>> Log_id 13627: Transaction_id = 12754, SqlStmt = begin Log_id 13628: 
>> Transaction_id = 12754, SqlStmt = update users set lastlogin = {ts 
>> '2008-06-27 15:42:15.0'} where id = ?
>> Log_id 13629: Transaction_id = 12758, SqlStmt = begin Log_id 13630: 
>> Transaction_id = 12758, SqlStmt = update users set lastlogin = {ts 
>> '2008-06-27 15:42:15.0'} where id = ?
>> Log_id 13631: Transaction_id = 12762, SqlStmt = begin Log_id 13632: 
>> Transaction_id = 12762, SqlStmt = update users set lastlogin = {ts 
>> '2008-06-27 15:42:15.0'} where id = ?
>> Log_id 13633: Transaction_id = 12750, SqlStmt = commit Log_id 13634: 
>> Transaction_id = 12754, SqlStmt = commit Log_id 13635: Transaction_id = 
>> 12758, SqlStmt = commit ...
>> Log_id 13638: Transaction_id = 12762, SqlStmt = commit
>>
>> Here's some log output, this is from a very large log logging 
>> virtualdatabase & recoverylog at a DEBUG level, but I think contains what 
>> you want - if you need more, let me know:
>>
>> 2008-06-27 15:48:35,765 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> handleMessageSingleThreaded (class
>> org.continuent.sequoia.controller.virtualdatabase.protocol.Distribute
>> d
>> StatementExecuteUpdate): W 55395 17813 update users set lastlogin =
>> {ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
>> 2008-06-27 15:48:35,765 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> 192.168.0.109:6954: Scheduling distributedRequest 55395 from
>> Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
>> 2008-06-27 15:48:35,765 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> handleMessageMultiThreaded (class
>> org.continuent.sequoia.controller.virtualdatabase.protocol.Distribute
>> d
>> StatementExecuteUpdate): W 55395 17813 update users set lastlogin =
>> {ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
>> 2008-06-27 15:48:35,765 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> 192.168.0.109:6954: Executing distributedRequest 55395 from
>> Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
>> 2008-06-27 15:48:35,765 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Logging request 20221
>> 2008-06-27 15:48:35,765 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Releasing recovery
>> log
>> connection: [EMAIL PROTECTED] (XID
>> = 53657), (SESSIONID = 4287), (DATABASE = HA), (DRDAID = null)
>> 2008-06-27 15:48:35,765 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Logging in recovery
>> log connection
>> 2008-06-27 15:48:35,765 DEBUG
>> org.continuent.sequoia.controller.recoverylog -
>> 2057c01b-011a-ca77-dce2-ffffd8827a3c
>> 2008-06-27 15:48:35,765 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Logging request 20222
>> 2008-06-27 15:48:35,765 DEBUG
>> org.continuent.sequoia.controller.recoverylog -
>> 2057c01b-011a-ca77-dce2-ffffd8827a3c
>> 2008-06-27 15:48:35,781 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Logging request 20222
>> 2008-06-27 15:48:35,781 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Releasing recovery
>> log
>> connection: [EMAIL PROTECTED]
>> (XID = 53660), (SESSIONID = 4288), (DATABASE = HA), (DRDAID = null)
>> 2008-06-27 15:48:35,781 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Logging in recovery
>> log connection
>> 2008-06-27 15:48:35,781 DEBUG
>> org.continuent.sequoia.controller.recoverylog -
>> c065801d-011a-ca77-dce2-ffffd8827a3c
>> 2008-06-27 15:48:35,796 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> handleMessageSingleThreaded (class
>> org.continuent.sequoia.controller.virtualdatabase.protocol.Distribute
>> d
>> StatementExecuteUpdate): W 55397 17814 update users set lastlogin =
>> {ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
>> 2008-06-27 15:48:35,796 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> 192.168.0.109:6954: Scheduling distributedRequest 55397 from
>> Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
>> 2008-06-27 15:48:35,796 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> handleMessageMultiThreaded (class
>> org.continuent.sequoia.controller.virtualdatabase.protocol.Distribute
>> d
>> StatementExecuteUpdate): W 55397 17814 update users set lastlogin =
>> {ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
>> 2008-06-27 15:48:35,796 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> 192.168.0.109:6954: Executing distributedRequest 55397 from
>> Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
>> 2008-06-27 15:48:35,796 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Logging request 20223
>> 2008-06-27 15:48:35,796 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Releasing recovery
>> log
>> connection: [EMAIL PROTECTED]
>> (XID = 53662), (SESSIONID = 4289), (DATABASE = HA), (DRDAID = null)
>> 2008-06-27 15:48:35,796 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Logging in recovery
>> log connection
>> 2008-06-27 15:48:35,796 DEBUG
>> org.continuent.sequoia.controller.recoverylog -
>> 2057c01b-011a-ca77-dce2-ffffd8827a3c
>> 2008-06-27 15:48:35,796 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Logging request 20224
>> 2008-06-27 15:48:35,796 DEBUG
>> org.continuent.sequoia.controller.recoverylog -
>> 2057c01b-011a-ca77-dce2-ffffd8827a3c
>> 2008-06-27 15:48:35,796 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> handleMessageSingleThreaded (class
>> org.continuent.sequoia.controller.virtualdatabase.protocol.Distribute
>> d
>> StatementExecuteUpdate): W 55411 17821 update users set lastlogin =
>> {ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
>> 2008-06-27 15:48:35,796 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> 192.168.0.109:6954: Scheduling distributedRequest 55411 from
>> Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
>> 2008-06-27 15:48:35,796 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> handleMessageMultiThreaded (class
>> org.continuent.sequoia.controller.virtualdatabase.protocol.Distribute
>> d
>> StatementExecuteUpdate): W 55411 17821 update users set lastlogin =
>> {ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
>> 2008-06-27 15:48:35,796 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> 192.168.0.109:6954: Executing distributedRequest 55411 from
>> Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
>> 2008-06-27 15:48:35,796 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Logging request 20225
>> 2008-06-27 15:48:35,796 DEBUG
>> org.continuent.sequoia.controller.recoverylog -
>> 2057c01b-011a-ca77-dce2-ffffd8827a3c
>> 2008-06-27 15:48:35,796 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> handleMessageSingleThreaded (class
>> org.continuent.sequoia.controller.virtualdatabase.protocol.Distribute
>> d
>> StatementExecuteUpdate): W 55413 17822 update users set lastlogin =
>> {ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
>> 2008-06-27 15:48:35,796 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> 192.168.0.109:6954: Scheduling distributedRequest 55413 from
>> Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
>> 2008-06-27 15:48:35,796 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.VirtualDatabaseWork
>> e
>> rThread.AndyCluster - Commit command
>> 2008-06-27 15:48:35,812 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Logging request 20226
>> 2008-06-27 15:48:35,812 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Releasing recovery
>> log
>> connection: [EMAIL PROTECTED] (XID
>> = 53666), (SESSIONID = 4290), (DATABASE = HA), (DRDAID = null)
>> 2008-06-27 15:48:35,812 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Logging in recovery
>> log connection
>> 2008-06-27 15:48:35,812 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> handleMessageMultiThreaded (class
>> org.continuent.sequoia.controller.virtualdatabase.protocol.Distribute
>> d
>> StatementExecuteUpdate): W 55413 17822 update users set lastlogin =
>> {ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
>> 2008-06-27 15:48:35,812 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> 192.168.0.109:6954: Executing distributedRequest 55413 from
>> Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
>> 2008-06-27 15:48:35,812 DEBUG
>> org.continuent.sequoia.controller.recoverylog -
>> 2057c01b-011a-ca77-dce2-ffffd8827a3c
>> 2008-06-27 15:48:35,812 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> handleMessageSingleThreaded (class
>> org.continuent.sequoia.controller.virtualdatabase.protocol.Distribute
>> d
>> Commit): Commit transaction 17813
>> 2008-06-27 15:48:35,812 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> handleMessageMultiThreaded (class
>> org.continuent.sequoia.controller.virtualdatabase.protocol.Distribute
>> d
>> Commit): Commit transaction 17813
>> 2008-06-27 15:48:35,812 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Logging request 20227
>> 2008-06-27 15:48:35,812 DEBUG
>> org.continuent.sequoia.controller.recoverylog -
>> 2057c01b-011a-ca77-dce2-ffffd8827a3c
>> 2008-06-27 15:48:35,812 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Logging request 20228
>> 2008-06-27 15:48:35,812 DEBUG
>> org.continuent.sequoia.controller.recoverylog -
>> 2057c01b-011a-ca77-dce2-ffffd8827a3c
>> 2008-06-27 15:48:35,812 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Logging request 20229
>> 2008-06-27 15:48:35,812 DEBUG
>> org.continuent.sequoia.controller.recoverylog -
>> 2057c01b-011a-ca77-dce2-ffffd8827a3c
>> 2008-06-27 15:48:35,812 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.VirtualDatabaseWork
>> e
>> rThread.AndyCluster - Commit command
>> 2008-06-27 15:48:35,812 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Logging request 20229
>> 2008-06-27 15:48:35,812 DEBUG
>> org.continuent.sequoia.controller.recoverylog -
>> c065801d-011a-ca77-dce2-ffffd8827a3c
>> 2008-06-27 15:48:35,812 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.VirtualDatabaseWork
>> e rThread.AndyCluster - StatementExecuteQuery command
>> 2008-06-27 15:48:35,812 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.VirtualDatabaseWork
>> e rThread.AndyCluster - StatementExecuteUpdateWithKeys command
>> 2008-06-27 15:48:35,828 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Logging request 20224
>> 2008-06-27 15:48:35,828 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Releasing recovery
>> log
>> connection: [EMAIL PROTECTED]
>> (XID = 53672), (SESSIONID = 4291), (DATABASE = HA), (DRDAID = null)
>> 2008-06-27 15:48:35,828 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Logging in recovery
>> log connection
>> 2008-06-27 15:48:35,828 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> handleMessageSingleThreaded (class
>> org.continuent.sequoia.controller.virtualdatabase.protocol.Distribute
>> d
>> Commit): Commit transaction 17814
>> 2008-06-27 15:48:35,828 DEBUG
>> org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
>> handleMessageMultiThreaded (class
>> org.continuent.sequoia.controller.virtualdatabase.protocol.Distribute
>> d
>> Commit): Commit transaction 17814
>> 2008-06-27 15:48:35,828 DEBUG
>> org.continuent.sequoia.controller.recoverylog -
>> c065801d-011a-ca77-dce2-ffffd8827a3c
>> 2008-06-27 15:48:35,828 DEBUG
>> org.continuent.sequoia.controller.recoverylog - Logging request 20230
>> 2008-06-27 15:48:35,828 DEBUG
>> org.continuent.sequoia.controller.recoverylog -
>> 2057c01b-011a-ca77-dce2-ffffd8827a3c
>>
>>
>> Thanks again for the help,
>>
>> Andy
>>
>>
>>
>> -----Original Message-----
>> From: [EMAIL PROTECTED]
>> [mailto:[EMAIL PROTECTED] On Behalf Of
>> Emmanuel Cecchet
>> Sent: 27 June 2008 13:55
>> To: Sequoia general mailing list
>> Subject: Re: [Sequoia] Deadlock when replaying recovery log
>>
>> Hi Andrew,
>>
>>
>>>         this is with Sequoia 2.10.10, on a single non-distributed 
>>> controller with 2 backends - both emebedded derby databases.
>>>
>>>
>>>
>> This might be an issue related to the non-distributed code. Could you add a 
>> <Distributed> element to your configuration file and have another run just 
>> to check if the order in the recovery log changes.
>> If the problem persists, it would be interesting to have a trace of the log 
>> with virtualdatabase and recoverylog loggers set to DEBUG in 
>> log4j.properties to see in what order events occur in your scenario.
>>
>> Thanks a lot for the feedback,
>> Emmanuel
>>
>>
>>>         a set of sample recovery log entries are:
>>>
>>> Log_id 103: Transaction_id = 146, RequestId = 0, SqlStmt = begin
>>> Log_id 104: Transaction_id = 146, RequestId = 322, SqlStmt = update users 
>>> set lastlogin = {ts '2008-06-26 17:27:37.0'} where id = ?
>>> Log_id 105: Transaction_id = 147, RequestId = 0, SqlStmt = begin
>>> Log_id 106: Transaction_id = 147, RequestId = 326, SqlStmt = update users 
>>> set lastlogin = {ts '2008-06-26 17:27:37.0'} where id = ?
>>> Log_id 107: Transaction_id = 148, RequestId = 0, SqlStmt = begin
>>> Log_id 108: Transaction_id = 148, RequestId = 328, SqlStmt = update users 
>>> set lastlogin = {ts '2008-06-26 17:27:37.0'} where id = ?
>>> Log_id 109: Transaction_id = 149, RequestId = 0, SqlStmt = begin
>>> Log_id 110: Transaction_id = 149, RequestId = 330, SqlStmt = update users 
>>> set lastlogin = {ts '2008-06-26 17:27:37.0'} where id = ?
>>> <a few more of these>
>>> Log_id 123: Transaction_id = 146, RequestId = 0, SqlStmt = commit
>>> Log_id 124: Transaction_id = 147, RequestId = 0, SqlStmt = commit
>>> Log_id 125: Transaction_id = 148, RequestId = 0, SqlStmt = commit
>>> Log_id 126: Transaction_id = 149, RequestId = 0, SqlStmt = commit <a
>>> few more commits>
>>>
>>> Many thanks for any help,
>>>
>>>         Andy
>>>
>>> -----Original Message-----
>>> From: [EMAIL PROTECTED]
>>> [mailto:[EMAIL PROTECTED] On Behalf Of
>>> Emmanuel Cecchet
>>> Sent: 26 June 2008 16:33
>>> To: Sequoia general mailing list
>>> Subject: Re: [Sequoia] Deadlock when replaying recovery log
>>>
>>> Hi Andrew,
>>>
>>>
>>>
>>>
>>>>     I'm having problems with replaying recovery logs, where the
>>>> replay appears to be deadlocking itself.  I'm getting the following error:
>>>>
>>>> 2008-06-26 09:53:30,103 ERROR
>>>> org.continuent.sequoia.controller.recoverylog.RecoverThread -
>>>> Recovery process failed to replay transaction 1110: request
>>>> StatementExecuteUpdateTask from transaction 1110 (update... )
>>>> (Backend AndyCluster - BackendWorkerThread for backend 'DB-BACKUP'
>>>> with RAIDb
>>>> level:1 failed (A lock could not be obtained within the time requested.
>>>>
>>>>
>>>> As an example, consider the following simplified scenario:
>>>>
>>>> I have a table 'test':
>>>>
>>>> create table test ( a int , b int ) insert into test values ( 1, 1
>>>> )
>>>>
>>>> we then have two transactions, t1 & t2 each in different threads
>>>> which do the following:
>>>>
>>>> 1) t1: update test set b = 2 where a = 1
>>>> 2) t2: update test set b = 3 where a = 1
>>>> 3) t1: commit
>>>> 4) t2: commit
>>>>
>>>> normally, this will work fine, when t2 executes, it is blocked on a
>>>> lock, but as soon as t1 commits, the lock is freed and t2 then commits.
>>>>
>>>> however, if you replay the transaction log, all the statements are
>>>> executed sequentially, and what I think is happening is that the
>>>> 2nd statement hangs and eventually has a lock timeout, as t1 will
>>>> not be commited until after it.
>>>> This means that you can't recover, as you can't replay the log.
>>>>
>>>> Is my understanding correct, and if so, is there anything I can do
>>>> about this?
>>>>
>>>>
>>>>
>>> There is something wrong here because the recovery log stores queries in 
>>> the execution order and not in their arrival order.
>>> This means that the scenario you are describing should not be logged this 
>>> way in the recovery log.
>>>
>>> Could you provide a dump of your recovery log content?
>>> Which version of Sequoia are you using?
>>> Are all your database backends running the same database engine?
>>>
>>> Thanks for your feedback,
>>> Emmanuel
>>>

--
Emmanuel Cecchet
FTO @ Frog Thinker
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: [EMAIL PROTECTED]
Skype: emmanuel_cecchet

_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

RE: [Sequoia] Deadlock when replaying recovery log

Reply via email to