RE: [Sequoia] waitForCompletionPolicy (was RE: priority inversion?)

Andrew Lawrenson Tue, 26 Feb 2008 09:46:06 -0800

Emmanuel,

        I've done this with both a single controller, with 2 databases 
attached, as well as two replicating controllers (each with two databases).
However, at the moment, when I've done it with a single controller, it is still 
setup to replicate - it's just that the other controller isn't started (if that 
makes any difference).


        thanks,

        Andrew Lawrenson

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Emmanuel Cecchet
Sent: 26 February 2008 17:33
To: Sequoia general mailing list
Subject: Re: [Sequoia] waitForCompletionPolicy (was RE: priority inversion?)

Andrew,

Thanks for the update. Could you confirm that your 2 databases are attached to 
the same controller?
Are you replicating controllers at all or do you have a single controller?

Thanks again for your feedback,
Emmanuel

> I think this issue is related to SEQUOIA-955 - I was using a 
> waitForCompletionPolicy of "first".  When set to "all", the problem 
> disappears.
>
> Looking in various places, I assume the current recommendation is to use 
> "all" & "first" is less reliable.
>
> If this is correct, it would probably be worth changing the example 
> configurations in config/virtualdatabase to use "all" - they currently 
> (2.10.9) use "first" - which is where I copied my configuration from.
>
>         thanks,
>
>         Andrew Lawrenson
>
>
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of
> Andrew Lawrenson
> Sent: 22 February 2008 14:50
> To: Sequoia general mailing list
> Subject: [Sequoia] RE: priority inversion?
>
> As an update:
>
> The same occurs when only using a single controller.
> The problem occurs because the locks held in sequoia are conflicting with 
> locks held in the database, causing deadlock (which is undetected as both 
> sides see only half the locks).
>
> I.e.
> In the database:
>    Transaction A exclusively locks row R1.
>    Transaction B waits to lock row R1.
>
> In Sequoia:
>    Sequoia thinks A is waiting for B, and appears not to commit transaction.  
> Eventually, transaction B fails with a lock timeout.
>
>
> The full scenario is:
>
> I'm executing two statements (from application logs):
>
> 2008-02-22 13:05:36,985 DEBUG DRS.DAO - about to execute ' update users set 
> failcount = 0 where ( id = 611 OR id = 100073 ) '
> 2008-02-22 13:05:36,995 DEBUG DRS.DAO - executed ' update users set failcount 
> = 0 where ( id = 611 OR id = 100073 ) '
> 2008-02-22 13:05:37,033 DEBUG DRS.DAO - about to execute ' update users set 
> failcount = 0 where ( id = 611 OR id = 100074 ) '
> 2008-02-22 13:05:37,044 DEBUG DRS.DAO - executed ' update users set failcount 
> = 0 where ( id = 611 OR id = 100074 ) '
>
> The first one (100073) is Sequoia transaction #285 The second one
> (100074) is Sequoia transaction #287
>
> I have two backends - DRSPRI & DRSPRI-BACKUP (both apache derby databases).
>
> Both queries perform as expected on DRSPRI (these are a mixture of sequoia & 
> derby logs):
>
> for #285 on 1st back-end (DRSPRI):
> 2008-02-22 13:05:36,987 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.D
> RSPRI - Adding task StatementExecuteUpdateTask from transaction 285
> (update users set failcount = 0 where ( id = 611 OR id = 100073 )) to
> pending request queue
> 2008-02-22 13:05:36,987 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI -
> Executing task StatementExecuteUpdateTask from transaction 285 (update
> users set failcount = 0 where ( id = 611 OR id = 100073 ))
> 2008-02-22 13:05:36.987 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI' with RAIDb level:1,5,main] (XID = 81701),
> (SESSIONID = 22), (DATABASE = DRSPRI), (DRDAID = null), Begin
> compiling prepared statement: update users set failcount = 0 where (
> id = 611 OR id = 100073 ) :End prepared statement
> 2008-02-22 13:05:36.989 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI' with RAIDb level:1,5,main] (XID = 81701),
> (SESSIONID = 22), (DATABASE = DRSPRI), (DRDAID = null), End compiling
> prepared statement: update users set failcount = 0 where ( id = 611 OR
> id = 100073 ) :End prepared statement
> 2008-02-22 13:05:36.992 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI' with RAIDb level:1,5,main] (XID = 81701),
> (SESSIONID = 22), (DATABASE = DRSPRI), (DRDAID = null), Executing
> prepared statement: update users set failcount = 0 where ( id = 611 OR
> id = 100073 ) :End prepared statement
> 2008-02-22 13:05:36,993 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI -
> Task StatementExecuteUpdateTask from transaction 285 (update users set
> failcount = 0 where ( id = 611 OR id = 100073 )) completed
> 2008-02-22 13:05:36,997 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.D
> RSPRI - Adding task CommitTask (285) to pending request queue
> 2008-02-22 13:05:36,997 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI -
> Executing task CommitTask (285)
> 2008-02-22 13:05:36.997 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI' with RAIDb level:1,5,main] (XID = 81701),
> (SESSIONID = 22), (DATABASE = DRSPRI), (DRDAID = null), Committing
> 2008-02-22 13:05:37.007 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI' with RAIDb level:1,5,main] (XID = 81701),
> (SESSIONID = 22), (DATABASE = DRSPRI), (DRDAID = null), Committing
> 2008-02-22 13:05:37,007 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI -
> Task CommitTask (285) completed
>
> for #287 on 1st back-end (DRSPRI):
> 2008-02-22 13:05:37,035 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.D
> RSPRI - Adding task StatementExecuteUpdateTask from transaction 287
> (update users set failcount = 0 where ( id = 611 OR id = 100074 )) to
> pending request queue
> 2008-02-22 13:05:37,036 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI -
> Executing task StatementExecuteUpdateTask from transaction 287 (update
> users set failcount = 0 where ( id = 611 OR id = 100074 ))
> 2008-02-22 13:05:37.036 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI' with RAIDb level:1,5,main] (XID = 81706),
> (SESSIONID = 22), (DATABASE = DRSPRI), (DRDAID = null), Begin
> compiling prepared statement: update users set failcount = 0 where (
> id = 611 OR id = 100074 ) :End prepared statement
> 2008-02-22 13:05:37.038 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI' with RAIDb level:1,5,main] (XID = 81706),
> (SESSIONID = 22), (DATABASE = DRSPRI), (DRDAID = null), End compiling
> prepared statement: update users set failcount = 0 where ( id = 611 OR
> id = 100074 ) :End prepared statement
> 2008-02-22 13:05:37.041 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI' with RAIDb level:1,5,main] (XID = 81706),
> (SESSIONID = 22), (DATABASE = DRSPRI), (DRDAID = null), Executing
> prepared statement: update users set failcount = 0 where ( id = 611 OR
> id = 100074 ) :End prepared statement
> 2008-02-22 13:05:37,042 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI -
> Task StatementExecuteUpdateTask from transaction 287 (update users set
> failcount = 0 where ( id = 611 OR id = 100074 )) completed
> 2008-02-22 13:05:37,045 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.D
> RSPRI - Adding task CommitTask (287) to pending request queue
> 2008-02-22 13:05:37,045 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI -
> Executing task CommitTask (287)
> 2008-02-22 13:05:37.045 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI' with RAIDb level:1,5,main] (XID = 81706),
> (SESSIONID = 22), (DATABASE = DRSPRI), (DRDAID = null), Committing
> 2008-02-22 13:05:37.054 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI' with RAIDb level:1,5,main] (XID = 81706),
> (SESSIONID = 22), (DATABASE = DRSPRI), (DRDAID = null), Committing
> 2008-02-22 13:05:37,055 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI -
> Task CommitTask (287) completed
>
> However, problems occur on the second backend:
>
> for #285 on 2nd back-end (DRSPRI-BACKUP):
> 2008-02-22 13:05:36,989 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.D
> RSPRI-BACKUP - Adding task StatementExecuteUpdateTask from transaction
> 285 (update users set failcount = 0 where ( id = 611 OR id = 100073 ))
> to pending request queue
> 2008-02-22 13:05:37,091 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.D
> RSPRI-BACKUP - Priority inversion for already started request update
> users set failcount = 0 where ( id = 611 OR id = 100073 )
> 2008-02-22 13:05:37,091 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.D
> RSPRI-BACKUP - Moving StatementExecuteUpdateTask from transaction 285
> (update users set failcount = 0 where ( id = 611 OR id = 100073 )) to
> non conflicting queue
> 2008-02-22 13:05:37,091 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI-BACKU
> P - Executing task StatementExecuteUpdateTask from transaction 285
> (update users set failcount = 0 where ( id = 611 OR id = 100073 ))
> 2008-02-22 13:05:37.091 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 80696),
> (SESSIONID = 23), (DATABASE = DRSPRI-BACKUP), (DRDAID = null), Begin
> compiling prepared statement: update users set failcount = 0 where (
> id = 611 OR id = 100073 ) :End prepared statement
> 2008-02-22 13:05:37.094 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 80696),
> (SESSIONID = 23), (DATABASE = DRSPRI-BACKUP), (DRDAID = null), End
> compiling prepared statement: update users set failcount = 0 where (
> id = 611 OR id = 100073 ) :End prepared statement
> 2008-02-22 13:05:37.099 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 80696),
> (SESSIONID = 23), (DATABASE = DRSPRI-BACKUP), (DRDAID = null),
> Executing prepared statement: update users set failcount = 0 where (
> id = 611 OR id = 100073 ) :End prepared statement
>
> ...but, it hits a lock in the database created by 287 & waits for 1
> minute before failing
> 2008-02-22 13:06:37.137 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 80696),
> (SESSIONID = 23), (DATABASE = DRSPRI-BACKUP), (DRDAID = null), Cleanup
> action starting
> 2008-02-22 13:06:37.138 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 80696),
> (SESSIONID = 23), (DATABASE = DRSPRI-BACKUP), (DRDAID = null), Failed
> Statement is: update users set failcount = 0 where ( id = 611 OR id =
> 100073 )
> 2008-02-22 13:06:37,155 WARN  
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.DRSPRI-BACKUP
>  - A worker thread was still processing task StatementExecuteUpdateTask from 
> transaction 285 (update users set failcount = 0 where ( id = 611 OR id = 
> 100073 )), aborting the request execution.
> 2008-02-22 13:06:37,821 WARN
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.D
> RSPRI-BACKUP - Unable to remove task StatementExecuteUpdateTask from
> transaction 285 (update users set failcount = 0 where ( id = 611 OR id
> = 100073 )) from pending request queue
> 2008-02-22 13:06:37,821 ERROR
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.D
> RSPRI-BACKUP - Failed to remove task StatementExecuteUpdateTask from
> transaction 285 (update users set failcount = 0 where ( id = 611 OR id
> = 100073 )) from []
> 2008-02-22 13:06:37,821 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI-BACKU
> P - Task StatementExecuteUpdateTask from transaction 285 (update users
> set failcount = 0 where ( id = 611 OR id = 100073 )) completed
>
> #287 on 2nd back-end (DRSPRI-BACKUP)
> 2008-02-22 13:05:37,038 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.D
> RSPRI-BACKUP - Adding task StatementExecuteUpdateTask from transaction
> 287 (update users set failcount = 0 where ( id = 611 OR id = 100074 ))
> to pending request queue
> 2008-02-22 13:05:37,091 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.D
> RSPRI-BACKUP - Priority inversion for already started request update
> users set failcount = 0 where ( id = 611 OR id = 100074 )
> 2008-02-22 13:05:37,091 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.D
> RSPRI-BACKUP - Moving StatementExecuteUpdateTask from transaction 287
> (update users set failcount = 0 where ( id = 611 OR id = 100074 )) to
> non conflicting queue
> 2008-02-22 13:05:37,091 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI-BACKU
> P - Executing task StatementExecuteUpdateTask from transaction 287
> (update users set failcount = 0 where ( id = 611 OR id = 100074 ))
> 2008-02-22 13:05:37.091 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 80697),
> (SESSIONID = 22), (DATABASE = DRSPRI-BACKUP), (DRDAID = null), Begin
> compiling prepared statement: update users set failcount = 0 where (
> id = 611 OR id = 100074 ) :End prepared statement
> 2008-02-22 13:05:37.094 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 80697),
> (SESSIONID = 22), (DATABASE = DRSPRI-BACKUP), (DRDAID = null), End
> compiling prepared statement: update users set failcount = 0 where (
> id = 611 OR id = 100074 ) :End prepared statement
> 2008-02-22 13:05:37.098 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 80697),
> (SESSIONID = 22), (DATABASE = DRSPRI-BACKUP), (DRDAID = null),
> Executing prepared statement: update users set failcount = 0 where (
> id = 611 OR id = 100074 ) :End prepared statement
> 2008-02-22 13:05:37,100 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSPRI-BACKU
> P - Task StatementExecuteUpdateTask from transaction 287 (update users
> set failcount = 0 where ( id = 611 OR id = 100074 )) completed
>
> then, 30 secs later:
> 2008-02-22 13:06:08,656 DEBUG
> org.continuent.sequoia.controller.loadbalancer - 287 waits for USERS
> locked by 285 (and others)
>
>
>
> so, overall, the summary of the sequence is:
>
> 13:05:36,985 app executes #285: ' update users set failcount = 0 where ( id = 
> 611 OR id = 100073 ) '
> 13:05:36.987 drspri executes query #285
> 13:05:36,993 drspri finishes query #285
>
> 13:05:36,995 app sees query #285 as finished.
> ??:??:??,??? app commits #285
> 13:05:36,997 drspri starts commit for #285
> 13:05:37,007 drspri finishes commit for #285
>
> 13:05:37,033 app executes #287: ' update users set failcount = 0 where ( id = 
> 611 OR id = 100074 ) '
> 13:05:37.036 drspri executes query #287
> 13:05:37.042 drspri finishes query #287
>
> 13:05:37,044 app sees query #287 as finished.
> ??:??:??,??? app commits #287
> 13:05:37,045 drspri starts commit for #287
> 13:05:37,055 drspri finishes commit for #287
>
> 13:05:37.091 drspri-backup performs Priority Inversion on #285
> 13:05:37.091 drspri-backup performs Priority Inversion on #287
> 13:05:37.091 drspri-backup executes query #285
> 13:05:37.091 drspri-backup executes query #287 in database, #287 runs first, 
> gets exclusive lock on row.
> 13:05:37,100 drspri-backup finishes query #287
> 13:06:08,656 seqoia thinks #287 is waiting for #285, hasn't committed.
> 13:06:37.137 drspri-backup db fails query #285 with lock timeout
>
>
>
> So... Does anyone know if this is me doing something stupid, or a sequoia 
> problem?
>
> My RequestScheduler config is:
>
>       <RequestScheduler>
>          <RAIDb-1Scheduler level="passThrough"/>
>       </RequestScheduler>
>
> And the docs say "The load balancer will just ensure that the writes are sent 
> in the same order to all backends." - but that doesn't seem to be the case 
> here?
>
> Many thanks,
>
>         Andrew Lawrenson
>
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of
> Andrew Lawrenson
> Sent: 21 February 2008 17:21
> To: [email protected]
> Subject: [Sequoia] priority inversion?
>
> Hi all,
>
>     I'm attempting to use sequoia (2.10.9) to cluster some Apache Derby 
> databases, which generally seems to work as expected.
> However, there is a problem that seems to keep cropping up, and appears like 
> it might be related to Priority Inversion - and I'm not quite sure what that 
> is.
>
> The problem I'm seeing is that sometimes when two queries (which are 
> sequential & from the same client) both update the same row, they are hitting 
> a locking situation which wouldn't normally occur.
>
> Normally, the requests from sequoia come through to derby looking like this 
> (these are Derby Logs, for transaction #40738):
>
> 2008-02-21 16:13:34.190 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 40738),
> (SESSIONID = 21), (DATABASE = DRSPRI-BACKUP), (DRDAID = null),
> Executing prepared statement: update users set failcount = 0 where id
> = ? :End prepared statement with 1 parameters begin parameter #1: 611
> :end parameter
> 2008-02-21 16:13:34.193 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 40738),
> (SESSIONID = 21), (DATABASE = DRSPRI-BACKUP), (DRDAID = null),
> Committing
> 2008-02-21 16:13:34.295 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 40738),
> (SESSIONID = 21), (DATABASE = DRSPRI-BACKUP), (DRDAID = null),
> Committing
> 2008-02-21 16:13:34.295 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 40738),
> (SESSIONID = 21), (DATABASE = DRSPRI-BACKUP), (DRDAID = null),
> Committing
>
> However, every so often, all I get is the execution, without any commit:
> 2008-02-21 16:13:34.295 GMT Thread[DRSCluster - BackendWorkerThread
> for backend 'DRSPRI-BACKUP' with RAIDb level:1,5,main] (XID = 40739),
> (SESSIONID = 21), (DATABASE = DRSPRI-BACKUP), (DRDAID = null),
> Executing prepared statement: update users set failcount = 0 where id
> = ? :End prepared statement with 1 parameters begin parameter #1: 611
> :end parameter
>
> (these is the only activity visible in derby for transaction #40739)
>
> but at the same time, Sequoia performs Priority Inversion on (presumably) 
> this query:
>
> 2008-02-21 16:13:34,295 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.D
> RSPRI-BACKUP - Priority inversion for already started request update
> users set failcount = 0 where id = ?/<!%I|611|!%>
> 2008-02-21 16:13:34,295 DEBUG
> org.continuent.sequoia.controller.backend.DatabaseBackend.DRSCluster.D
> RSPRI-BACKUP - Moving StatementExecuteUpdateTask from transaction
> 4222124650660804 (update users set failcount = 0 where id =
> ?/<!%I|611|!%>) to non conflicting queue
>
> when this happens, it appears that the commit (which is issued by the client) 
> never gets presented to derby, so that a lock is now kept on the updated row, 
> and subsequent updates of this row fail with lock timeout errors (confirmed 
> from Derby lock diagnostics).
>
> I've reproduced this several times, and every time there is a Priority 
> Inversion when the commit disappears.  Without sequoia, I've had no locking 
> issues.
>
> I've also just noticed that in all cases, the commit for the previous
> transaction, the executing of the failing transaction and the priority
> inversion are all timestamped to the same millisecond (although the
> previous commit always comes through to derby before the excecution)
>
> So, can anyone explain what Prority Inversion is, and where the problem might 
> be?
>
> If it has any relevance, my setup has two or three distributed controllers 
> (using appia w/tcp sequencer), each of which has two derby backends.  
> Although there are distributed controllers, these failures are all local 
> (e.g. the failures occur in the backends attached to the controller which 
> receives the request).
>
>
> many thanks in advance,
>
>     Andrew Lawrenson
>
>
>

_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

RE: [Sequoia] waitForCompletionPolicy (was RE: priority inversion?)

Reply via email to