I think this issue is related to SEQUOIA-955 - I was using a waitForCompletionPolicy of
"first". When set to "all", the problem disappears.
Looking in various places, I assume the current recommendation is to use "all" &
"first" is less reliable.
If this is correct, it would probably be worth changing the example configurations in
config/virtualdatabase to use "all" - they currently (2.10.9) use "first" -
which is where I copied my configuration from.
thanks,
Andrew Lawrenson
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Lawrenson
Sent: 22 February 2008 14:50
To: Sequoia general mailing list
Subject: [Sequoia] RE: priority inversion?
As an update:
The same occurs when only using a single controller.
The problem occurs because the locks held in sequoia are conflicting with locks
held in the database, causing deadlock (which is undetected as both sides see
only half the locks).
I.e.
In the database:
Transaction A exclusively locks row R1.
Transaction B waits to lock row R1.
In Sequoia:
Sequoia thinks A is waiting for B, and appears not to commit transaction.
Eventually, transaction B fails with a lock timeout.
The full scenario is:
I'm executing two statements (from application logs):
2008-02-22 13:05:36,985 DEBUG DRS.DAO - about to execute ' update users set
failcount = 0 where ( id = 611 OR id = 100073 ) '
2008-02-22 13:05:36,995 DEBUG DRS.DAO - executed ' update users set failcount =
0 where ( id = 611 OR id = 100073 ) '
2008-02-22 13:05:37,033 DEBUG DRS.DAO - about to execute ' update users set
failcount = 0 where ( id = 611 OR id = 100074 ) '
2008-02-22 13:05:37,044 DEBUG DRS.DAO - executed ' update users set failcount =
0 where ( id = 611 OR id = 100074 ) '
The first one (100073) is Sequoia transaction #285 The second one (100074) is
Sequoia transaction #287
I have two backends - DRSPRI & DRSPRI-BACKUP (both apache derby databases).
Both queries perform as expected on DRSPRI (these are a mixture of sequoia &
derby logs):
for #285 on 1st back-end (DRSPRI):
2008-02-22 13:05:36,987 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.DRSPRI -
Adding task StatementExecuteUpdateTask from transaction 285 (update users set
failcount = 0 where ( id = 611 OR id = 100073 )) to pending request queue
2008-02-22 13:05:36,987 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI - Executing
task StatementExecuteUpdateTask from transaction 285 (update users set
failcount = 0 where ( id = 611 OR id = 100073 ))
2008-02-22 13:05:36.987 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI' with RAIDb level:1,5,main] (XID = 81701), (SESSIONID = 22), (DATABASE
= DRSPRI), (DRDAID = null), Begin compiling prepared statement: update users
set failcount = 0 where ( id = 611 OR id = 100073 ) :End prepared statement
2008-02-22 13:05:36.989 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI' with RAIDb level:1,5,main] (XID = 81701), (SESSIONID = 22), (DATABASE
= DRSPRI), (DRDAID = null), End compiling prepared statement: update users set
failcount = 0 where ( id = 611 OR id = 100073 ) :End prepared statement
2008-02-22 13:05:36.992 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI' with RAIDb level:1,5,main] (XID = 81701), (SESSIONID = 22), (DATABASE
= DRSPRI), (DRDAID = null), Executing prepared statement: update users set
failcount = 0 where ( id = 611 OR id = 100073 ) :End prepared statement
2008-02-22 13:05:36,993 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI - Task
StatementExecuteUpdateTask from transaction 285 (update users set failcount = 0
where ( id = 611 OR id = 100073 )) completed
2008-02-22 13:05:36,997 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.DRSPRI -
Adding task CommitTask (285) to pending request queue
2008-02-22 13:05:36,997 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI - Executing
task CommitTask (285)
2008-02-22 13:05:36.997 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI' with RAIDb level:1,5,main] (XID = 81701), (SESSIONID = 22), (DATABASE
= DRSPRI), (DRDAID = null), Committing
2008-02-22 13:05:37.007 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI' with RAIDb level:1,5,main] (XID = 81701), (SESSIONID = 22), (DATABASE
= DRSPRI), (DRDAID = null), Committing
2008-02-22 13:05:37,007 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI - Task
CommitTask (285) completed
for #287 on 1st back-end (DRSPRI):
2008-02-22 13:05:37,035 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.DRSPRI -
Adding task StatementExecuteUpdateTask from transaction 287 (update users set
failcount = 0 where ( id = 611 OR id = 100074 )) to pending request queue
2008-02-22 13:05:37,036 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI - Executing
task StatementExecuteUpdateTask from transaction 287 (update users set
failcount = 0 where ( id = 611 OR id = 100074 ))
2008-02-22 13:05:37.036 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI' with RAIDb level:1,5,main] (XID = 81706), (SESSIONID = 22), (DATABASE
= DRSPRI), (DRDAID = null), Begin compiling prepared statement: update users
set failcount = 0 where ( id = 611 OR id = 100074 ) :End prepared statement
2008-02-22 13:05:37.038 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI' with RAIDb level:1,5,main] (XID = 81706), (SESSIONID = 22), (DATABASE
= DRSPRI), (DRDAID = null), End compiling prepared statement: update users set
failcount = 0 where ( id = 611 OR id = 100074 ) :End prepared statement
2008-02-22 13:05:37.041 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI' with RAIDb level:1,5,main] (XID = 81706), (SESSIONID = 22), (DATABASE
= DRSPRI), (DRDAID = null), Executing prepared statement: update users set
failcount = 0 where ( id = 611 OR id = 100074 ) :End prepared statement
2008-02-22 13:05:37,042 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI - Task
StatementExecuteUpdateTask from transaction 287 (update users set failcount = 0
where ( id = 611 OR id = 100074 )) completed
2008-02-22 13:05:37,045 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.DRSPRI -
Adding task CommitTask (287) to pending request queue
2008-02-22 13:05:37,045 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI - Executing
task CommitTask (287)
2008-02-22 13:05:37.045 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI' with RAIDb level:1,5,main] (XID = 81706), (SESSIONID = 22), (DATABASE
= DRSPRI), (DRDAID = null), Committing
2008-02-22 13:05:37.054 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI' with RAIDb level:1,5,main] (XID = 81706), (SESSIONID = 22), (DATABASE
= DRSPRI), (DRDAID = null), Committing
2008-02-22 13:05:37,055 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI - Task
CommitTask (287) completed
However, problems occur on the second backend:
for #285 on 2nd back-end (DRSPRI-BACKUP):
2008-02-22 13:05:36,989 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.DRSPRI-BACKUP
- Adding task StatementExecuteUpdateTask from transaction 285 (update users
set failcount = 0 where ( id = 611 OR id = 100073 )) to pending request queue
2008-02-22 13:05:37,091 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.DRSPRI-BACKUP
- Priority inversion for already started request update users set failcount =
0 where ( id = 611 OR id = 100073 )
2008-02-22 13:05:37,091 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.DRSPRI-BACKUP
- Moving StatementExecuteUpdateTask from transaction 285 (update users set
failcount = 0 where ( id = 611 OR id = 100073 )) to non conflicting queue
2008-02-22 13:05:37,091 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI-BACKUP -
Executing task StatementExecuteUpdateTask from transaction 285 (update users
set failcount = 0 where ( id = 611 OR id = 100073 ))
2008-02-22 13:05:37.091 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 80696), (SESSIONID = 23),
(DATABASE = DRSPRI-BACKUP), (DRDAID = null), Begin compiling prepared
statement: update users set failcount = 0 where ( id = 611 OR id = 100073 )
:End prepared statement
2008-02-22 13:05:37.094 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 80696), (SESSIONID = 23),
(DATABASE = DRSPRI-BACKUP), (DRDAID = null), End compiling prepared statement:
update users set failcount = 0 where ( id = 611 OR id = 100073 ) :End prepared
statement
2008-02-22 13:05:37.099 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 80696), (SESSIONID = 23),
(DATABASE = DRSPRI-BACKUP), (DRDAID = null), Executing prepared statement:
update users set failcount = 0 where ( id = 611 OR id = 100073 ) :End prepared
statement
...but, it hits a lock in the database created by 287 & waits for 1 minute
before failing
2008-02-22 13:06:37.137 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 80696), (SESSIONID = 23),
(DATABASE = DRSPRI-BACKUP), (DRDAID = null), Cleanup action starting
2008-02-22 13:06:37.138 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 80696), (SESSIONID = 23),
(DATABASE = DRSPRI-BACKUP), (DRDAID = null), Failed Statement is: update users
set failcount = 0 where ( id = 611 OR id = 100073 )
2008-02-22 13:06:37,155 WARN
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.DRSPRI-BACKUP
- A worker thread was still processing task StatementExecuteUpdateTask from
transaction 285 (update users set failcount = 0 where ( id = 611 OR id = 100073
)), aborting the request execution.
2008-02-22 13:06:37,821 WARN
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.DRSPRI-BACKUP
- Unable to remove task StatementExecuteUpdateTask from transaction 285
(update users set failcount = 0 where ( id = 611 OR id = 100073 )) from pending
request queue
2008-02-22 13:06:37,821 ERROR
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.DRSPRI-BACKUP
- Failed to remove task StatementExecuteUpdateTask from transaction 285
(update users set failcount = 0 where ( id = 611 OR id = 100073 )) from []
2008-02-22 13:06:37,821 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI-BACKUP - Task
StatementExecuteUpdateTask from transaction 285 (update users set failcount = 0
where ( id = 611 OR id = 100073 )) completed
#287 on 2nd back-end (DRSPRI-BACKUP)
2008-02-22 13:05:37,038 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.DRSPRI-BACKUP
- Adding task StatementExecuteUpdateTask from transaction 287 (update users
set failcount = 0 where ( id = 611 OR id = 100074 )) to pending request queue
2008-02-22 13:05:37,091 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.DRSPRI-BACKUP
- Priority inversion for already started request update users set failcount =
0 where ( id = 611 OR id = 100074 )
2008-02-22 13:05:37,091 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.DRSPRI-BACKUP
- Moving StatementExecuteUpdateTask from transaction 287 (update users set
failcount = 0 where ( id = 611 OR id = 100074 )) to non conflicting queue
2008-02-22 13:05:37,091 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI-BACKUP -
Executing task StatementExecuteUpdateTask from transaction 287 (update users
set failcount = 0 where ( id = 611 OR id = 100074 ))
2008-02-22 13:05:37.091 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 80697), (SESSIONID = 22),
(DATABASE = DRSPRI-BACKUP), (DRDAID = null), Begin compiling prepared
statement: update users set failcount = 0 where ( id = 611 OR id = 100074 )
:End prepared statement
2008-02-22 13:05:37.094 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 80697), (SESSIONID = 22),
(DATABASE = DRSPRI-BACKUP), (DRDAID = null), End compiling prepared statement:
update users set failcount = 0 where ( id = 611 OR id = 100074 ) :End prepared
statement
2008-02-22 13:05:37.098 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 80697), (SESSIONID = 22),
(DATABASE = DRSPRI-BACKUP), (DRDAID = null), Executing prepared statement:
update users set failcount = 0 where ( id = 611 OR id = 100074 ) :End prepared
statement
2008-02-22 13:05:37,100 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI-BACKUP - Task
StatementExecuteUpdateTask from transaction 287 (update users set failcount = 0
where ( id = 611 OR id = 100074 )) completed
then, 30 secs later:
2008-02-22 13:06:08,656 DEBUG org.continuent.sequoia.controller.loadbalancer -
287 waits for USERS locked by 285 (and others)
so, overall, the summary of the sequence is:
13:05:36,985 app executes #285: ' update users set failcount = 0 where ( id =
611 OR id = 100073 ) '
13:05:36.987 drspri executes query #285
13:05:36,993 drspri finishes query #285
13:05:36,995 app sees query #285 as finished.
??:??:??,??? app commits #285
13:05:36,997 drspri starts commit for #285
13:05:37,007 drspri finishes commit for #285
13:05:37,033 app executes #287: ' update users set failcount = 0 where ( id =
611 OR id = 100074 ) '
13:05:37.036 drspri executes query #287
13:05:37.042 drspri finishes query #287
13:05:37,044 app sees query #287 as finished.
??:??:??,??? app commits #287
13:05:37,045 drspri starts commit for #287
13:05:37,055 drspri finishes commit for #287
13:05:37.091 drspri-backup performs Priority Inversion on #285
13:05:37.091 drspri-backup performs Priority Inversion on #287
13:05:37.091 drspri-backup executes query #285
13:05:37.091 drspri-backup executes query #287 in database, #287 runs first,
gets exclusive lock on row.
13:05:37,100 drspri-backup finishes query #287
13:06:08,656 seqoia thinks #287 is waiting for #285, hasn't committed.
13:06:37.137 drspri-backup db fails query #285 with lock timeout
So... Does anyone know if this is me doing something stupid, or a sequoia
problem?
My RequestScheduler config is:
<RequestScheduler>
<RAIDb-1Scheduler level="passThrough"/>
</RequestScheduler>
And the docs say "The load balancer will just ensure that the writes are sent in the
same order to all backends." - but that doesn't seem to be the case here?
Many thanks,
Andrew Lawrenson
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Lawrenson
Sent: 21 February 2008 17:21
To: [email protected]
Subject: [Sequoia] priority inversion?
Hi all,
I'm attempting to use sequoia (2.10.9) to cluster some Apache Derby
databases, which generally seems to work as expected.
However, there is a problem that seems to keep cropping up, and appears like it
might be related to Priority Inversion - and I'm not quite sure what that is.
The problem I'm seeing is that sometimes when two queries (which are sequential
& from the same client) both update the same row, they are hitting a locking
situation which wouldn't normally occur.
Normally, the requests from sequoia come through to derby looking like this
(these are Derby Logs, for transaction #40738):
2008-02-21 16:13:34.190 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 40738), (SESSIONID = 21),
(DATABASE = DRSPRI-BACKUP), (DRDAID = null), Executing prepared statement:
update users set failcount = 0 where id = ? :End prepared statement with 1
parameters begin parameter #1: 611 :end parameter
2008-02-21 16:13:34.193 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 40738), (SESSIONID = 21),
(DATABASE = DRSPRI-BACKUP), (DRDAID = null), Committing
2008-02-21 16:13:34.295 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 40738), (SESSIONID = 21),
(DATABASE = DRSPRI-BACKUP), (DRDAID = null), Committing
2008-02-21 16:13:34.295 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 40738), (SESSIONID = 21),
(DATABASE = DRSPRI-BACKUP), (DRDAID = null), Committing
However, every so often, all I get is the execution, without any commit:
2008-02-21 16:13:34.295 GMT Thread[DRSCluster - BackendWorkerThread for backend
'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 40739), (SESSIONID = 21),
(DATABASE = DRSPRI-BACKUP), (DRDAID = null), Executing prepared statement:
update users set failcount = 0 where id = ? :End prepared statement with 1
parameters begin parameter #1: 611 :end parameter
(these is the only activity visible in derby for transaction #40739)
but at the same time, Sequoia performs Priority Inversion on (presumably) this
query:
2008-02-21 16:13:34,295 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.DRSPRI-BACKUP -
Priority inversion for already started request update users set failcount = 0 where
id = ?/<!%I|611|!%>
2008-02-21 16:13:34,295 DEBUG
org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.DRSPRI-BACKUP -
Moving StatementExecuteUpdateTask from transaction 4222124650660804 (update users set
failcount = 0 where id = ?/<!%I|611|!%>) to non conflicting queue
when this happens, it appears that the commit (which is issued by the client)
never gets presented to derby, so that a lock is now kept on the updated row,
and subsequent updates of this row fail with lock timeout errors (confirmed
from Derby lock diagnostics).
I've reproduced this several times, and every time there is a Priority
Inversion when the commit disappears. Without sequoia, I've had no locking
issues.
I've also just noticed that in all cases, the commit for the previous
transaction, the executing of the failing transaction and the priority
inversion are all timestamped to the same millisecond (although the previous
commit always comes through to derby before the excecution)
So, can anyone explain what Prority Inversion is, and where the problem might
be?
If it has any relevance, my setup has two or three distributed controllers
(using appia w/tcp sequencer), each of which has two derby backends. Although
there are distributed controllers, these failures are all local (e.g. the
failures occur in the backends attached to the controller which receives the
request).
many thanks in advance,
Andrew Lawrenson