-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Andrew Lawrenson
Sent: 27 June 2008 16:43
To: Sequoia general mailing list
Subject: RE: [Sequoia] Deadlock when replaying recovery log
Emanuel,
I'm currently working around this issue by serializing my transactions
(so that each transaction is executed in serial).
However, when I do this, I'm encountering another issue at the end of recovery,
if I'm recovering while activity is taking place.
The exact circumstances here is that I have two backends, and I'm backing one
up while the system is in use.
When the backend has been backed up, it needs to replay through the recovery
log to bring itself back up to date.
It's managing to replay the bulk of the log up to the current point of
execution (so far, so good)
What sequoia then appears to be doing is suspending write activity, then
recovering through the last log entries - which makes sense.
I believe it's suspending write activity by blocking new transactions
& new writes, then waiting for existing transactions & writes to end
(i.e. via RequestManager suspendActivity() )
What I'm repeatedly seeing, however, are problems such as this:
2008-06-27 16:19:44,730 WARN
org.continuent.sequoia.controller.recoverylog.RecoverThread - Recovery
log entry marked as still executing : StatementExecuteUpdateTask from
transaction 5764 (delete from disclosure_variables where disc_id = 11
490)
2008-06-27 16:19:59,730 WARN
org.continuent.sequoia.controller.scheduler - Waited for 15 secs but 1
transactions still
open: [6923]
2008-06-27 16:19:59,730 WARN
org.continuent.sequoia.controller.scheduler - Will wait for 285 secs
more and attempt to a bort them
2008-06-27 16:20:29,730 WARN
org.continuent.sequoia.controller.scheduler - Waited for 45 secs but 1
transactions still
open: [6923]
2008-06-27 16:20:29,730 WARN
org.continuent.sequoia.controller.scheduler - Will wait for 255 secs
more and attempt to a bort them
This continues until 300 seconds have passed, then it kills the transaction:
2008-06-27 16:24:44,998 WARN
org.continuent.sequoia.controller.scheduler - Timeout reached (300
secs), aborting remaini ng active transactions
2008-06-27 16:24:44,998 WARN
org.continuent.sequoia.controller.scheduler - Aborting transaction
6923
This problem appears to be caused by the type of work I'm performing. When
database work is performed by my application, it is generally done as part of a
transaction, but some database activity is also performed via auto-commit
statements outside of the transaction, as they shouldn't be rolled-back if
problems occur (e.g. audit logs, marking jobs as failed, etc).
When I look at the transaction which is marked as still being open - 6923 in
the above example - it has stopped partway through the work being perfomed,
just before executing an auto-commit statement.
I assume that the auto-commit statement is being blocked by sequoia suspending
activity (either as a write or a new transaction) - but this means that the
work will never be completed and the main transaction committed - hence the
observed problem.
After the transaction has been aborted, I see a rollback for this transaction
in the recovery log, followed by the auto-commit statement with a exec_time of
just over 300 seconds...
Any suggestions?
Many thanks again,
Andrew Lawrenson
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Andrew Lawrenson
Sent: 27 June 2008 16:01
To: Sequoia general mailing list
Subject: RE: [Sequoia] Deadlock when replaying recovery log
Emanuel,
I've tried this again with distibution turned on, and the same ordering
occurs, e.g:
Log_id 13625: Transaction_id = 12750, SqlStmt = begin Log_id 13626:
Transaction_id = 12750, SqlStmt = update users set lastlogin = {ts '2008-06-27
15:42:15.0'} where id = ?
Log_id 13627: Transaction_id = 12754, SqlStmt = begin Log_id 13628:
Transaction_id = 12754, SqlStmt = update users set lastlogin = {ts '2008-06-27
15:42:15.0'} where id = ?
Log_id 13629: Transaction_id = 12758, SqlStmt = begin Log_id 13630:
Transaction_id = 12758, SqlStmt = update users set lastlogin = {ts '2008-06-27
15:42:15.0'} where id = ?
Log_id 13631: Transaction_id = 12762, SqlStmt = begin Log_id 13632:
Transaction_id = 12762, SqlStmt = update users set lastlogin = {ts '2008-06-27
15:42:15.0'} where id = ?
Log_id 13633: Transaction_id = 12750, SqlStmt = commit Log_id 13634:
Transaction_id = 12754, SqlStmt = commit Log_id 13635: Transaction_id = 12758,
SqlStmt = commit ...
Log_id 13638: Transaction_id = 12762, SqlStmt = commit
Here's some log output, this is from a very large log logging virtualdatabase &
recoverylog at a DEBUG level, but I think contains what you want - if you need
more, let me know:
2008-06-27 15:48:35,765 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
handleMessageSingleThreaded (class
org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
StatementExecuteUpdate): W 55395 17813 update users set lastlogin =
{ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
2008-06-27 15:48:35,765 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
192.168.0.109:6954: Scheduling distributedRequest 55395 from
Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
2008-06-27 15:48:35,765 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
handleMessageMultiThreaded (class
org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
StatementExecuteUpdate): W 55395 17813 update users set lastlogin =
{ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
2008-06-27 15:48:35,765 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
192.168.0.109:6954: Executing distributedRequest 55395 from
Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
2008-06-27 15:48:35,765 DEBUG
org.continuent.sequoia.controller.recoverylog - Logging request 20221
2008-06-27 15:48:35,765 DEBUG
org.continuent.sequoia.controller.recoverylog - Releasing recovery log
connection: [EMAIL PROTECTED] (XID
= 53657), (SESSIONID = 4287), (DATABASE = HA), (DRDAID = null)
2008-06-27 15:48:35,765 DEBUG
org.continuent.sequoia.controller.recoverylog - Logging in recovery
log connection
2008-06-27 15:48:35,765 DEBUG
org.continuent.sequoia.controller.recoverylog -
2057c01b-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,765 DEBUG
org.continuent.sequoia.controller.recoverylog - Logging request 20222
2008-06-27 15:48:35,765 DEBUG
org.continuent.sequoia.controller.recoverylog -
2057c01b-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,781 DEBUG
org.continuent.sequoia.controller.recoverylog - Logging request 20222
2008-06-27 15:48:35,781 DEBUG
org.continuent.sequoia.controller.recoverylog - Releasing recovery log
connection: [EMAIL PROTECTED] (XID
= 53660), (SESSIONID = 4288), (DATABASE = HA), (DRDAID = null)
2008-06-27 15:48:35,781 DEBUG
org.continuent.sequoia.controller.recoverylog - Logging in recovery
log connection
2008-06-27 15:48:35,781 DEBUG
org.continuent.sequoia.controller.recoverylog -
c065801d-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,796 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
handleMessageSingleThreaded (class
org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
StatementExecuteUpdate): W 55397 17814 update users set lastlogin =
{ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
2008-06-27 15:48:35,796 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
192.168.0.109:6954: Scheduling distributedRequest 55397 from
Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
2008-06-27 15:48:35,796 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
handleMessageMultiThreaded (class
org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
StatementExecuteUpdate): W 55397 17814 update users set lastlogin =
{ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
2008-06-27 15:48:35,796 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
192.168.0.109:6954: Executing distributedRequest 55397 from
Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
2008-06-27 15:48:35,796 DEBUG
org.continuent.sequoia.controller.recoverylog - Logging request 20223
2008-06-27 15:48:35,796 DEBUG
org.continuent.sequoia.controller.recoverylog - Releasing recovery log
connection: [EMAIL PROTECTED] (XID
= 53662), (SESSIONID = 4289), (DATABASE = HA), (DRDAID = null)
2008-06-27 15:48:35,796 DEBUG
org.continuent.sequoia.controller.recoverylog - Logging in recovery
log connection
2008-06-27 15:48:35,796 DEBUG
org.continuent.sequoia.controller.recoverylog -
2057c01b-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,796 DEBUG
org.continuent.sequoia.controller.recoverylog - Logging request 20224
2008-06-27 15:48:35,796 DEBUG
org.continuent.sequoia.controller.recoverylog -
2057c01b-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,796 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
handleMessageSingleThreaded (class
org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
StatementExecuteUpdate): W 55411 17821 update users set lastlogin =
{ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
2008-06-27 15:48:35,796 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
192.168.0.109:6954: Scheduling distributedRequest 55411 from
Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
2008-06-27 15:48:35,796 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
handleMessageMultiThreaded (class
org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
StatementExecuteUpdate): W 55411 17821 update users set lastlogin =
{ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
2008-06-27 15:48:35,796 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
192.168.0.109:6954: Executing distributedRequest 55411 from
Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
2008-06-27 15:48:35,796 DEBUG
org.continuent.sequoia.controller.recoverylog - Logging request 20225
2008-06-27 15:48:35,796 DEBUG
org.continuent.sequoia.controller.recoverylog -
2057c01b-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,796 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
handleMessageSingleThreaded (class
org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
StatementExecuteUpdate): W 55413 17822 update users set lastlogin =
{ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
2008-06-27 15:48:35,796 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
192.168.0.109:6954: Scheduling distributedRequest 55413 from
Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
2008-06-27 15:48:35,796 DEBUG
org.continuent.sequoia.controller.virtualdatabase.VirtualDatabaseWorke
rThread.AndyCluster - Commit command
2008-06-27 15:48:35,812 DEBUG
org.continuent.sequoia.controller.recoverylog - Logging request 20226
2008-06-27 15:48:35,812 DEBUG
org.continuent.sequoia.controller.recoverylog - Releasing recovery log
connection: [EMAIL PROTECTED] (XID
= 53666), (SESSIONID = 4290), (DATABASE = HA), (DRDAID = null)
2008-06-27 15:48:35,812 DEBUG
org.continuent.sequoia.controller.recoverylog - Logging in recovery
log connection
2008-06-27 15:48:35,812 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
handleMessageMultiThreaded (class
org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
StatementExecuteUpdate): W 55413 17822 update users set lastlogin =
{ts '2008-06-27 15:48:35.0'} where id = ?/<!%I|613|!%>
2008-06-27 15:48:35,812 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
192.168.0.109:6954: Executing distributedRequest 55413 from
Member(address=/192.168.0.109:3127, uid=192.168.0.109:3127)
2008-06-27 15:48:35,812 DEBUG
org.continuent.sequoia.controller.recoverylog -
2057c01b-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,812 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
handleMessageSingleThreaded (class
org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
Commit): Commit transaction 17813
2008-06-27 15:48:35,812 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
handleMessageMultiThreaded (class
org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
Commit): Commit transaction 17813
2008-06-27 15:48:35,812 DEBUG
org.continuent.sequoia.controller.recoverylog - Logging request 20227
2008-06-27 15:48:35,812 DEBUG
org.continuent.sequoia.controller.recoverylog -
2057c01b-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,812 DEBUG
org.continuent.sequoia.controller.recoverylog - Logging request 20228
2008-06-27 15:48:35,812 DEBUG
org.continuent.sequoia.controller.recoverylog -
2057c01b-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,812 DEBUG
org.continuent.sequoia.controller.recoverylog - Logging request 20229
2008-06-27 15:48:35,812 DEBUG
org.continuent.sequoia.controller.recoverylog -
2057c01b-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,812 DEBUG
org.continuent.sequoia.controller.virtualdatabase.VirtualDatabaseWorke
rThread.AndyCluster - Commit command
2008-06-27 15:48:35,812 DEBUG
org.continuent.sequoia.controller.recoverylog - Logging request 20229
2008-06-27 15:48:35,812 DEBUG
org.continuent.sequoia.controller.recoverylog -
c065801d-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,812 DEBUG
org.continuent.sequoia.controller.virtualdatabase.VirtualDatabaseWorke
rThread.AndyCluster - StatementExecuteQuery command
2008-06-27 15:48:35,812 DEBUG
org.continuent.sequoia.controller.virtualdatabase.VirtualDatabaseWorke
rThread.AndyCluster - StatementExecuteUpdateWithKeys command
2008-06-27 15:48:35,828 DEBUG
org.continuent.sequoia.controller.recoverylog - Logging request 20224
2008-06-27 15:48:35,828 DEBUG
org.continuent.sequoia.controller.recoverylog - Releasing recovery log
connection: [EMAIL PROTECTED] (XID
= 53672), (SESSIONID = 4291), (DATABASE = HA), (DRDAID = null)
2008-06-27 15:48:35,828 DEBUG
org.continuent.sequoia.controller.recoverylog - Logging in recovery
log connection
2008-06-27 15:48:35,828 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
handleMessageSingleThreaded (class
org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
Commit): Commit transaction 17814
2008-06-27 15:48:35,828 DEBUG
org.continuent.sequoia.controller.virtualdatabase.AndyCluster -
handleMessageMultiThreaded (class
org.continuent.sequoia.controller.virtualdatabase.protocol.Distributed
Commit): Commit transaction 17814
2008-06-27 15:48:35,828 DEBUG
org.continuent.sequoia.controller.recoverylog -
c065801d-011a-ca77-dce2-ffffd8827a3c
2008-06-27 15:48:35,828 DEBUG
org.continuent.sequoia.controller.recoverylog - Logging request 20230
2008-06-27 15:48:35,828 DEBUG
org.continuent.sequoia.controller.recoverylog -
2057c01b-011a-ca77-dce2-ffffd8827a3c
Thanks again for the help,
Andy
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Emmanuel Cecchet
Sent: 27 June 2008 13:55
To: Sequoia general mailing list
Subject: Re: [Sequoia] Deadlock when replaying recovery log
Hi Andrew,
this is with Sequoia 2.10.10, on a single non-distributed controller
with 2 backends - both emebedded derby databases.
This might be an issue related to the non-distributed code. Could you add a
<Distributed> element to your configuration file and have another run just to
check if the order in the recovery log changes.
If the problem persists, it would be interesting to have a trace of the log
with virtualdatabase and recoverylog loggers set to DEBUG in log4j.properties
to see in what order events occur in your scenario.
Thanks a lot for the feedback,
Emmanuel
a set of sample recovery log entries are:
Log_id 103: Transaction_id = 146, RequestId = 0, SqlStmt = begin
Log_id 104: Transaction_id = 146, RequestId = 322, SqlStmt = update users set
lastlogin = {ts '2008-06-26 17:27:37.0'} where id = ?
Log_id 105: Transaction_id = 147, RequestId = 0, SqlStmt = begin
Log_id 106: Transaction_id = 147, RequestId = 326, SqlStmt = update users set
lastlogin = {ts '2008-06-26 17:27:37.0'} where id = ?
Log_id 107: Transaction_id = 148, RequestId = 0, SqlStmt = begin
Log_id 108: Transaction_id = 148, RequestId = 328, SqlStmt = update users set
lastlogin = {ts '2008-06-26 17:27:37.0'} where id = ?
Log_id 109: Transaction_id = 149, RequestId = 0, SqlStmt = begin
Log_id 110: Transaction_id = 149, RequestId = 330, SqlStmt = update users set
lastlogin = {ts '2008-06-26 17:27:37.0'} where id = ?
<a few more of these>
Log_id 123: Transaction_id = 146, RequestId = 0, SqlStmt = commit
Log_id 124: Transaction_id = 147, RequestId = 0, SqlStmt = commit
Log_id 125: Transaction_id = 148, RequestId = 0, SqlStmt = commit
Log_id 126: Transaction_id = 149, RequestId = 0, SqlStmt = commit <a
few more commits>
Many thanks for any help,
Andy
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Emmanuel Cecchet
Sent: 26 June 2008 16:33
To: Sequoia general mailing list
Subject: Re: [Sequoia] Deadlock when replaying recovery log
Hi Andrew,
I'm having problems with replaying recovery logs, where the
replay appears to be deadlocking itself. I'm getting the following error:
2008-06-26 09:53:30,103 ERROR
org.continuent.sequoia.controller.recoverylog.RecoverThread -
Recovery process failed to replay transaction 1110: request
StatementExecuteUpdateTask from transaction 1110 (update... )
(Backend AndyCluster - BackendWorkerThread for backend 'DB-BACKUP'
with RAIDb
level:1 failed (A lock could not be obtained within the time requested.
As an example, consider the following simplified scenario:
I have a table 'test':
create table test ( a int , b int )
insert into test values ( 1, 1 )
we then have two transactions, t1 & t2 each in different threads
which do the following:
1) t1: update test set b = 2 where a = 1
2) t2: update test set b = 3 where a = 1
3) t1: commit
4) t2: commit
normally, this will work fine, when t2 executes, it is blocked on a
lock, but as soon as t1 commits, the lock is freed and t2 then commits.
however, if you replay the transaction log, all the statements are
executed sequentially, and what I think is happening is that the 2nd
statement hangs and eventually has a lock timeout, as t1 will not be
commited until after it.
This means that you can't recover, as you can't replay the log.
Is my understanding correct, and if so, is there anything I can do
about this?
There is something wrong here because the recovery log stores queries in the
execution order and not in their arrival order.
This means that the scenario you are describing should not be logged this way
in the recovery log.
Could you provide a dump of your recovery log content?
Which version of Sequoia are you using?
Are all your database backends running the same database engine?
Thanks for your feedback,
Emmanuel