Re: [Sequoia] Re: RecoverThread

Emmanuel Cecchet Tue, 18 Mar 2008 06:05:24 -0700

Hi Chris,

Excellent catch!

I recommend in the meantime that driver connection pooling is disableduntil that fix gets in (Reset is only used with driver connection pooling).To disable connection pooling use:'jdbc:sequoia://host/db?connectionPooling=false'


Thanks again for the fix,
Emmanuel

I found the issue and fixed it, although it wasn't where I expected.I'd like to call attention to the JIRA I just created(https://forge.continuent.org/jira/browse/SEQUOIA-1053), in case thebug is biting anyone else as hard as it bit me.
Symptoms:
Backend gets stuck in infinite loop trying to recover. We saw thiswhen finishing up a backup and also when disabling/enabling a backend.
  It's possible for entire VDB to get stuck in suspended state.
Depending on whether you've updated to pre-release source code, youmay see messages like "Recovery log entry marked as still executing".
Steps to reproduce:
  Two controllers, one VDB.  Each controller has one backend.
  Disable backend #2.
Backend #1 performs a readonly transaction (connect, set readonly,begin, select, commit)Backend #1 performs a non-readonly transaction (connect, begin,update, commit)
  Try to enable backend #2
  recovery of backend #2 gets stuck spinning forever

Fix:
add "isReadOnly = false;" to VirtualDatabseWorkerThread reset()function. (see JIRA for explanation)
Depending on the order and timing of transactions, it's possible toget the entire VDB stuck. If those two transactions happen at theright point in the recovery process, backend #2 can be spinning whilethe VDB is suspended.
Hope this helps someone.
-Chris


On Feb 26, 2008, at 2:28 PM, Christopher Ekberg wrote:
We've been having sporadic problems in RecoverThread, both whenfinishing up a backup and when bringing up a new node. Sequoia2.10.8 embedded, 1 vdb (RAIDb-1), 2 machines (host1, host2), lots ofclient traffic going on.
Here's an example of what we see happening:
we tell host1to backup. (vdb.backupBackend(...))
this calls requestManager.backupBackend(...)
this calls requestManager.disableBackendWithCheckpoint
backup happens
requestManager.enableBackendFromCheckpoint(...) is called, whichcreates a RecoverThread
RecoverThread has everyone stop (requestManager.suspendActivity()) sohost1 can catch up playing recovery log of statements coming induring backup
host1 gets stuck in infinite loop waiting for a task to complete. itnever does, so RecoverThread never callsrequestManager.resumeActivity() to wake up host2.
host2 is still paused so clients can't make requests.
host1 is stuck in an infinite loop in RecoverThread run() here (I'veadded logging, and the ability to break out of the loop I see inunreleased sequoia code):
// Play the remaining writes that were pending and which havebeen logged
     boolean replayedAllLog = false;
     do
     { // Loop until the whole recovery log has been replayed
         // Or stop if the activity is resumed by force
         try
         {
logger.info("RecoverThread about to replay new recovery logtasks");
         logIdx = recover(logIdx, pendingRecoveryTasks);
// The status update for the last request (probably acommit/rollback)// is not be there yet. Wait for it to be flushed to the logand
         // retry.
Is it possible that host1 spins because it's waiting for atransaction to complete that is blocked until resumeActivity iscalled? i.e., connection is open in autocommit=off mode, statementis executed, host1 tells host2 to suspendActivity so commit statementis blocked, host1 wants transaction to be finished but it'll neverget the commit. That would imply that the "stop new connections andwait for existing transactions to finish" I assume happens isn'thappening properly, but it's probably something else.
Anyone else seen this? Is this a known issue? Is there a reliableworkaround or fix?
-Chris
----
Chris Ekberg
Jackpot Rewards, Inc.
275 Grove Street, Suite 3-120
Newton, MA 02466-2274
617-795-2850, x. 2313
[EMAIL PROTECTED]
www.JackpotRewards.com
**Note that as of Feb. 20, my email address has changed. Please updateyour contact information for me.
_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia



--
Emmanuel Cecchet - Research scientist
EPFL - LABOS/DSLAB - IN.N 317
Phone: +41-21-693-7558

_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

Re: [Sequoia] Re: RecoverThread

Reply via email to