[Sequoia] Re: RecoverThread

Christopher Ekberg Fri, 14 Mar 2008 07:22:12 -0700

I found the issue and fixed it, although it wasn't where I expected.I'd like to call attention to the JIRA I just created (https://forge.continuent.org/jira/browse/SEQUOIA-1053), in case the bug is biting anyone else as hard as it bit me.


Symptoms:

Backend gets stuck in infinite loop trying to recover. We saw thiswhen finishing up a backup and also when disabling/enabling a backend.

  It's possible for entire VDB to get stuck in suspended state.

Depending on whether you've updated to pre-release source code, youmay see messages like "Recovery log entry marked as still executing".


Steps to reproduce:
  Two controllers, one VDB.  Each controller has one backend.
  Disable backend #2.

Backend #1 performs a readonly transaction (connect, set readonly,begin, select, commit)Backend #1 performs a non-readonly transaction (connect, begin,update, commit)

  Try to enable backend #2
  recovery of backend #2 gets stuck spinning forever

Fix:

add "isReadOnly = false;" to VirtualDatabseWorkerThread reset()function. (see JIRA for explanation)

Depending on the order and timing of transactions, it's possible toget the entire VDB stuck. If those two transactions happen at theright point in the recovery process, backend #2 can be spinning whilethe VDB is suspended.


Hope this helps someone.
-Chris


On Feb 26, 2008, at 2:28 PM, Christopher Ekberg wrote:

We've been having sporadic problems in RecoverThread, both whenfinishing up a backup and when bringing up a new node. Sequoia2.10.8 embedded, 1 vdb (RAIDb-1), 2 machines (host1, host2), lots ofclient traffic going on.
Here's an example of what we see happening:
we tell host1to backup. (vdb.backupBackend(...))
this calls requestManager.backupBackend(...)
this calls requestManager.disableBackendWithCheckpoint
backup happens
requestManager.enableBackendFromCheckpoint(...) is called, whichcreates a RecoverThread
RecoverThread has everyone stop (requestManager.suspendActivity())so host1 can catch up playing recovery log of statements coming induring backup
host1 gets stuck in infinite loop waiting for a task to complete.it never does, so RecoverThread never callsrequestManager.resumeActivity() to wake up host2.
host2 is still paused so clients can't make requests.
host1 is stuck in an infinite loop in RecoverThread run() here (I'veadded logging, and the ability to break out of the loop I see inunreleased sequoia code):
// Play the remaining writes that were pending and which havebeen logged
     boolean replayedAllLog = false;
     do
     { // Loop until the whole recovery log has been replayed
          // Or stop if the activity is resumed by force
          try
          {
logger.info("RecoverThread about to replay new recovery logtasks");
         logIdx = recover(logIdx, pendingRecoveryTasks);
// The status update for the last request (probably acommit/rollback)// is not be there yet. Wait for it to be flushed to thelog and
         // retry.
Is it possible that host1 spins because it's waiting for atransaction to complete that is blocked until resumeActivity iscalled? i.e., connection is open in autocommit=off mode, statementis executed, host1 tells host2 to suspendActivity so commitstatement is blocked, host1 wants transaction to be finished butit'll never get the commit. That would imply that the "stop newconnections and wait for existing transactions to finish" I assumehappens isn't happening properly, but it's probably something else.
Anyone else seen this? Is this a known issue? Is there a reliableworkaround or fix?
-Chris


----
Chris Ekberg
Jackpot Rewards, Inc.
275 Grove Street, Suite 3-120
Newton, MA 02466-2274
617-795-2850, x. 2313
[EMAIL PROTECTED]
www.JackpotRewards.com

**Note that as of Feb. 20, my email address has changed. Please updateyour contact information for me.


_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

[Sequoia] Re: RecoverThread

Reply via email to