RE: [Sequoia] Deadlock when replaying recovery log

Andrew Lawrenson Fri, 27 Jun 2008 08:52:26 -0700

Emanuel,

        I'm currently working around this issue by serializing my transactions 
(so that each transaction is executed in serial).


However, when I do this, I'm encountering another issue at the end of recovery, 
if I'm recovering while activity is taking place.
The exact circumstances here is that I have two backends, and I'm backing one 
up while the system is in use.

When the backend has been backed up, it needs to replay through the recovery 
log to bring itself back up to date.
It's managing to replay the bulk of the log up to the current point of 
execution (so far, so good)

What sequoia then appears to be doing is suspending write activity, then 
recovering through the last log entries - which makes sense.
I believe it's suspending write activity by blocking new transactions & new 
writes, then waiting for existing transactions & writes to end (i.e. via 
RequestManager suspendActivity() )

What I'm repeatedly seeing, however, are problems such as this:

2008-06-27 16:19:44,730 WARN  
org.continuent.sequoia.controller.recoverylog.RecoverThread - Recovery log 
entry marked as
 still executing : StatementExecuteUpdateTask from transaction 5764 (delete 
from disclosure_variables where disc_id = 11
490)
2008-06-27 16:19:59,730 WARN  org.continuent.sequoia.controller.scheduler - 
Waited for 15 secs but 1 transactions still
open: [6923]
2008-06-27 16:19:59,730 WARN  org.continuent.sequoia.controller.scheduler - 
Will wait for 285 secs more and attempt to a
bort them
2008-06-27 16:20:29,730 WARN  org.continuent.sequoia.controller.scheduler - 
Waited for 45 secs but 1 transactions still
open: [6923]
2008-06-27 16:20:29,730 WARN  org.continuent.sequoia.controller.scheduler - 
Will wait for 255 secs more and attempt to a
bort them

This continues until 300 seconds have passed, then it kills the transaction:

2008-06-27 16:24:44,998 WARN  org.continuent.sequoia.controller.scheduler - 
Timeout reached (300 secs), aborting remaini
ng active transactions
2008-06-27 16:24:44,998 WARN  org.continuent.sequoia.controller.scheduler - 
Aborting transaction 6923


This problem appears to be caused by the type of work I'm performing.  When 
database work is performed by my application, it is generally done as part of a 
transaction, but some database activity is also performed via auto-commit 
statements outside of the transaction, as they shouldn't be rolled-back if 
problems occur (e.g. audit logs, marking jobs as failed, etc).

When I look at the transaction which is marked as still being open - 6923 in 
the above example - it has stopped partway through the work being perfomed, 
just before executing an auto-commit statement.

I assume that the auto-commit statement is being blocked by sequoia suspending 
activity (either as a write or a new transaction) - but this means that the 
work will never be completed and the main transaction committed - hence the 
observed problem.

After the transaction has been aborted, I see a rollback for this transaction 
in the recovery log, followed by the auto-commit statement with a exec_time of 
just over 300 seconds...


Any suggestions?

Many thanks again,

        Andrew Lawrenson


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Lawrenson
Sent: 27 June 2008 16:01
To: Sequoia general mailing list
Subject: RE: [Sequoia] Deadlock when replaying recovery log

Emanuel,

        I've tried this again with distibution turned on, and the same ordering 
occurs, e.g:

Log_id 13625: Transaction_id = 12750, SqlStmt = begin Log_id 13626: 
Transaction_id = 12750, SqlStmt = update users set lastlogin = {ts '2008-06-27 
15:42:15.0'} where id = ?
Log_id 13627: Transaction_id = 12754, SqlStmt = begin Log_id 13628: 
Transaction_id = 12754, SqlStmt = update users set lastlogin = {ts '2008-06-27 
15:42:15.0'} where id = ?
Log_id 13629: Transaction_id = 12758, SqlStmt = begin Log_id 13630: 
Transaction_id = 12758, SqlStmt = update users set lastlogin = {ts '2008-06-27 
15:42:15.0'} where id = ?
Log_id 13631: Transaction_id = 12762, SqlStmt = begin Log_id 13632: 
Transaction_id = 12762, SqlStmt = update users set lastlogin = {ts '2008-06-27 
15:42:15.0'} where id = ?
Log_id 13633: Transaction_id = 12750, SqlStmt = commit Log_id 13634: 
Transaction_id = 12754, SqlStmt = commit Log_id 13635: Transaction_id = 12758, 
SqlStmt = commit ...
Log_id 13638: Transaction_id = 12762, SqlStmt = commit

Here's some log output, this is from a very large log logging virtualdatabase & 
recoverylog at a DEBUG level, but I think contains what you want - if you need 
more, let me know:

2008-06-27 15:48:35,765 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
handleMessageSingleThreaded (class 
org.continuent.sequoia.controller.virtualdatabase.protocol.DistributedStatementExecuteUpdate):
 W 55395 17813 update users set lastlogin = {ts '2008-06-27 15:48:35.0'} where 
id = ?/<!%I|613|!%>
2008-06-27 15:48:35,765 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
192.168.0.109:6954: Scheduling distributedRequest 55395 from 
Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
2008-06-27 15:48:35,765 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
handleMessageMultiThreaded (class 
org.continuent.sequoia.controller.virtualdatabase.protocol.DistributedStatementExecuteUpdate):
 W 55395 17813 update users set lastlogin = {ts '2008-06-27 15:48:35.0'} where 
id = ?/<!%I|613|!%>
2008-06-27 15:48:35,765 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
192.168.0.109:6954: Executing distributedRequest 55395 from 
Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
2008-06-27 15:48:35,765 DEBUG org.continuent.sequoia.controller.recoverylog - 
Logging request 20221
2008-06-27 15:48:35,765 DEBUG org.continuent.sequoia.controller.recoverylog - 
Releasing recovery log connection: [EMAIL PROTECTED] (XID = 53657), (SESSIONID 
= 4287), (DATABASE = HA), (DRDAID = null)
2008-06-27 15:48:35,765 DEBUG org.continuent.sequoia.controller.recoverylog - 
Logging in recovery log connection
2008-06-27 15:48:35,765 DEBUG org.continuent.sequoia.controller.recoverylog - 
2057c01b-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,765 DEBUG org.continuent.sequoia.controller.recoverylog - 
Logging request 20222
2008-06-27 15:48:35,765 DEBUG org.continuent.sequoia.controller.recoverylog - 
2057c01b-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,781 DEBUG org.continuent.sequoia.controller.recoverylog - 
Logging request 20222
2008-06-27 15:48:35,781 DEBUG org.continuent.sequoia.controller.recoverylog - 
Releasing recovery log connection: [EMAIL PROTECTED] (XID = 53660), (SESSIONID 
= 4288), (DATABASE = HA), (DRDAID = null)
2008-06-27 15:48:35,781 DEBUG org.continuent.sequoia.controller.recoverylog - 
Logging in recovery log connection
2008-06-27 15:48:35,781 DEBUG org.continuent.sequoia.controller.recoverylog - 
c065801d-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,796 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
handleMessageSingleThreaded (class 
org.continuent.sequoia.controller.virtualdatabase.protocol.DistributedStatementExecuteUpdate):
 W 55397 17814 update users set lastlogin = {ts '2008-06-27 15:48:35.0'} where 
id = ?/<!%I|613|!%>
2008-06-27 15:48:35,796 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
192.168.0.109:6954: Scheduling distributedRequest 55397 from 
Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
2008-06-27 15:48:35,796 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
handleMessageMultiThreaded (class 
org.continuent.sequoia.controller.virtualdatabase.protocol.DistributedStatementExecuteUpdate):
 W 55397 17814 update users set lastlogin = {ts '2008-06-27 15:48:35.0'} where 
id = ?/<!%I|613|!%>
2008-06-27 15:48:35,796 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
192.168.0.109:6954: Executing distributedRequest 55397 from 
Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
2008-06-27 15:48:35,796 DEBUG org.continuent.sequoia.controller.recoverylog - 
Logging request 20223
2008-06-27 15:48:35,796 DEBUG org.continuent.sequoia.controller.recoverylog - 
Releasing recovery log connection: [EMAIL PROTECTED] (XID = 53662), (SESSIONID 
= 4289), (DATABASE = HA), (DRDAID = null)
2008-06-27 15:48:35,796 DEBUG org.continuent.sequoia.controller.recoverylog - 
Logging in recovery log connection
2008-06-27 15:48:35,796 DEBUG org.continuent.sequoia.controller.recoverylog - 
2057c01b-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,796 DEBUG org.continuent.sequoia.controller.recoverylog - 
Logging request 20224
2008-06-27 15:48:35,796 DEBUG org.continuent.sequoia.controller.recoverylog - 
2057c01b-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,796 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
handleMessageSingleThreaded (class 
org.continuent.sequoia.controller.virtualdatabase.protocol.DistributedStatementExecuteUpdate):
 W 55411 17821 update users set lastlogin = {ts '2008-06-27 15:48:35.0'} where 
id = ?/<!%I|613|!%>
2008-06-27 15:48:35,796 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
192.168.0.109:6954: Scheduling distributedRequest 55411 from 
Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
2008-06-27 15:48:35,796 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
handleMessageMultiThreaded (class 
org.continuent.sequoia.controller.virtualdatabase.protocol.DistributedStatementExecuteUpdate):
 W 55411 17821 update users set lastlogin = {ts '2008-06-27 15:48:35.0'} where 
id = ?/<!%I|613|!%>
2008-06-27 15:48:35,796 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
192.168.0.109:6954: Executing distributedRequest 55411 from 
Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
2008-06-27 15:48:35,796 DEBUG org.continuent.sequoia.controller.recoverylog - 
Logging request 20225
2008-06-27 15:48:35,796 DEBUG org.continuent.sequoia.controller.recoverylog - 
2057c01b-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,796 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
handleMessageSingleThreaded (class 
org.continuent.sequoia.controller.virtualdatabase.protocol.DistributedStatementExecuteUpdate):
 W 55413 17822 update users set lastlogin = {ts '2008-06-27 15:48:35.0'} where 
id = ?/<!%I|613|!%>
2008-06-27 15:48:35,796 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
192.168.0.109:6954: Scheduling distributedRequest 55413 from 
Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
2008-06-27 15:48:35,796 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.VirtualDatabaseWorkerThread.AndyCluster
 - Commit command
2008-06-27 15:48:35,812 DEBUG org.continuent.sequoia.controller.recoverylog - 
Logging request 20226
2008-06-27 15:48:35,812 DEBUG org.continuent.sequoia.controller.recoverylog - 
Releasing recovery log connection: [EMAIL PROTECTED] (XID = 53666), (SESSIONID 
= 4290), (DATABASE = HA), (DRDAID = null)
2008-06-27 15:48:35,812 DEBUG org.continuent.sequoia.controller.recoverylog - 
Logging in recovery log connection
2008-06-27 15:48:35,812 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
handleMessageMultiThreaded (class 
org.continuent.sequoia.controller.virtualdatabase.protocol.DistributedStatementExecuteUpdate):
 W 55413 17822 update users set lastlogin = {ts '2008-06-27 15:48:35.0'} where 
id = ?/<!%I|613|!%>
2008-06-27 15:48:35,812 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
192.168.0.109:6954: Executing distributedRequest 55413 from 
Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
2008-06-27 15:48:35,812 DEBUG org.continuent.sequoia.controller.recoverylog - 
2057c01b-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,812 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
handleMessageSingleThreaded (class 
org.continuent.sequoia.controller.virtualdatabase.protocol.DistributedCommit): 
Commit transaction 17813
2008-06-27 15:48:35,812 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
handleMessageMultiThreaded (class 
org.continuent.sequoia.controller.virtualdatabase.protocol.DistributedCommit): 
Commit transaction 17813
2008-06-27 15:48:35,812 DEBUG org.continuent.sequoia.controller.recoverylog - 
Logging request 20227
2008-06-27 15:48:35,812 DEBUG org.continuent.sequoia.controller.recoverylog - 
2057c01b-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,812 DEBUG org.continuent.sequoia.controller.recoverylog - 
Logging request 20228
2008-06-27 15:48:35,812 DEBUG org.continuent.sequoia.controller.recoverylog - 
2057c01b-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,812 DEBUG org.continuent.sequoia.controller.recoverylog - 
Logging request 20229
2008-06-27 15:48:35,812 DEBUG org.continuent.sequoia.controller.recoverylog - 
2057c01b-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,812 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.VirtualDatabaseWorkerThread.AndyCluster
 - Commit command
2008-06-27 15:48:35,812 DEBUG org.continuent.sequoia.controller.recoverylog - 
Logging request 20229
2008-06-27 15:48:35,812 DEBUG org.continuent.sequoia.controller.recoverylog - 
c065801d-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,812 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.VirtualDatabaseWorkerThread.AndyCluster
 - StatementExecuteQuery command
2008-06-27 15:48:35,812 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.VirtualDatabaseWorkerThread.AndyCluster
 - StatementExecuteUpdateWithKeys command
2008-06-27 15:48:35,828 DEBUG org.continuent.sequoia.controller.recoverylog - 
Logging request 20224
2008-06-27 15:48:35,828 DEBUG org.continuent.sequoia.controller.recoverylog - 
Releasing recovery log connection: [EMAIL PROTECTED] (XID = 53672), (SESSIONID 
= 4291), (DATABASE = HA), (DRDAID = null)
2008-06-27 15:48:35,828 DEBUG org.continuent.sequoia.controller.recoverylog - 
Logging in recovery log connection
2008-06-27 15:48:35,828 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
handleMessageSingleThreaded (class 
org.continuent.sequoia.controller.virtualdatabase.protocol.DistributedCommit): 
Commit transaction 17814
2008-06-27 15:48:35,828 DEBUG 
org.continuent.sequoia.controller.virtualdatabase.AndyCluster - 
handleMessageMultiThreaded (class 
org.continuent.sequoia.controller.virtualdatabase.protocol.DistributedCommit): 
Commit transaction 17814
2008-06-27 15:48:35,828 DEBUG org.continuent.sequoia.controller.recoverylog - 
c065801d-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,828 DEBUG org.continuent.sequoia.controller.recoverylog - 
Logging request 20230
2008-06-27 15:48:35,828 DEBUG org.continuent.sequoia.controller.recoverylog - 
2057c01b-011a-ca77-dce2-ffffd8827a3c


Thanks again for the help,

Andy



-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Emmanuel Cecchet
Sent: 27 June 2008 13:55
To: Sequoia general mailing list
Subject: Re: [Sequoia] Deadlock when replaying recovery log

Hi Andrew,
>         this is with Sequoia 2.10.10, on a single non-distributed controller 
> with 2 backends - both emebedded derby databases.
>
This might be an issue related to the non-distributed code. Could you add a 
<Distributed> element to your configuration file and have another run just to 
check if the order in the recovery log changes.
If the problem persists, it would be interesting to have a trace of the log 
with virtualdatabase and recoverylog loggers set to DEBUG in log4j.properties 
to see in what order events occur in your scenario.

Thanks a lot for the feedback,
Emmanuel
>         a set of sample recovery log entries are:
>
> Log_id 103: Transaction_id = 146, RequestId = 0, SqlStmt = begin
> Log_id 104: Transaction_id = 146, RequestId = 322, SqlStmt = update users set 
> lastlogin = {ts '2008-06-26 17:27:37.0'} where id = ?
> Log_id 105: Transaction_id = 147, RequestId = 0, SqlStmt = begin
> Log_id 106: Transaction_id = 147, RequestId = 326, SqlStmt = update users set 
> lastlogin = {ts '2008-06-26 17:27:37.0'} where id = ?
> Log_id 107: Transaction_id = 148, RequestId = 0, SqlStmt = begin
> Log_id 108: Transaction_id = 148, RequestId = 328, SqlStmt = update users set 
> lastlogin = {ts '2008-06-26 17:27:37.0'} where id = ?
> Log_id 109: Transaction_id = 149, RequestId = 0, SqlStmt = begin
> Log_id 110: Transaction_id = 149, RequestId = 330, SqlStmt = update users set 
> lastlogin = {ts '2008-06-26 17:27:37.0'} where id = ?
> <a few more of these>
> Log_id 123: Transaction_id = 146, RequestId = 0, SqlStmt = commit
> Log_id 124: Transaction_id = 147, RequestId = 0, SqlStmt = commit
> Log_id 125: Transaction_id = 148, RequestId = 0, SqlStmt = commit
> Log_id 126: Transaction_id = 149, RequestId = 0, SqlStmt = commit <a
> few more commits>
>
> Many thanks for any help,
>
>         Andy
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of
> Emmanuel Cecchet
> Sent: 26 June 2008 16:33
> To: Sequoia general mailing list
> Subject: Re: [Sequoia] Deadlock when replaying recovery log
>
> Hi Andrew,
>
>
>>     I'm having problems with replaying recovery logs, where the
>> replay appears to be deadlocking itself.  I'm getting the following error:
>>
>> 2008-06-26 09:53:30,103 ERROR
>> org.continuent.sequoia.controller.recoverylog.RecoverThread -
>> Recovery process failed to replay transaction 1110: request
>> StatementExecuteUpdateTask from transaction 1110 (update... )
>> (Backend AndyCluster - BackendWorkerThread for backend 'DB-BACKUP'
>> with RAIDb
>> level:1 failed (A lock could not be obtained within the time requested.
>>
>>
>> As an example, consider the following simplified scenario:
>>
>> I have a table 'test':
>>
>> create table test ( a int , b int )
>> insert into test values ( 1, 1 )
>>
>> we then have two transactions, t1 & t2 each in different threads
>> which do the following:
>>
>> 1) t1: update test set b = 2 where a = 1
>> 2) t2: update test set b = 3 where a = 1
>> 3) t1: commit
>> 4) t2: commit
>>
>> normally, this will work fine, when t2 executes, it is blocked on a
>> lock, but as soon as t1 commits, the lock is freed and t2 then commits.
>>
>> however, if you replay the transaction log, all the statements are
>> executed sequentially, and what I think is happening is that the 2nd
>> statement hangs and eventually has a lock timeout, as t1 will not be
>> commited until after it.
>> This means that you can't recover, as you can't replay the log.
>>
>> Is my understanding correct, and if so, is there anything I can do
>> about this?
>>
> There is something wrong here because the recovery log stores queries in the 
> execution order and not in their arrival order.
> This means that the scenario you are describing should not be logged this way 
> in the recovery log.
>
> Could you provide a dump of your recovery log content?
> Which version of Sequoia are you using?
> Are all your database backends running the same database engine?
>
> Thanks for your feedback,
> Emmanuel
>
>
--
Emmanuel Cecchet
FTO @ Frog Thinker
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: [EMAIL PROTECTED]
Skype: emmanuel_cecchet

_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

RE: [Sequoia] Deadlock when replaying recovery log

Reply via email to