[Sequoia] RecoverThread

Christopher Ekberg Tue, 26 Feb 2008 11:34:31 -0800

We've been having sporadic problems in RecoverThread, both whenfinishing up a backup and when bringing up a new node. Sequoia 2.10.8embedded, 1 vdb (RAIDb-1), 2 machines (host1, host2), lots of clienttraffic going on.


Here's an example of what we see happening:
we tell host1to backup. (vdb.backupBackend(...))
this calls requestManager.backupBackend(...)
this calls requestManager.disableBackendWithCheckpoint
backup happens

requestManager.enableBackendFromCheckpoint(...) is called, whichcreates a RecoverThread

RecoverThread has everyone stop (requestManager.suspendActivity()) sohost1 can catch up playing recovery log of statements coming in duringbackup

host1 gets stuck in infinite loop waiting for a task to complete. itnever does, so RecoverThread never callsrequestManager.resumeActivity() to wake up host2.


host2 is still paused so clients can't make requests.

host1 is stuck in an infinite loop in RecoverThread run() here (I'veadded logging, and the ability to break out of the loop I see inunreleased sequoia code):

// Play the remaining writes that were pending and which havebeen logged

      boolean replayedAllLog = false;
      do
      { // Loop until the whole recovery log has been replayed
          // Or stop if the activity is resumed by force
          try
          {

logger.info("RecoverThread about to replay new recovery logtasks");

          logIdx = recover(logIdx, pendingRecoveryTasks);

// The status update for the last request (probably acommit/rollback)// is not be there yet. Wait for it to be flushed to thelog and

          // retry.

Is it possible that host1 spins because it's waiting for a transactionto complete that is blocked until resumeActivity is called? i.e.,connection is open in autocommit=off mode, statement is executed,host1 tells host2 to suspendActivity so commit statement is blocked,host1 wants transaction to be finished but it'll never get thecommit. That would imply that the "stop new connections and wait forexisting transactions to finish" I assume happens isn't happeningproperly, but it's probably something else.

Anyone else seen this? Is this a known issue? Is there a reliableworkaround or fix?


-Chris


_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

[Sequoia] RecoverThread

Reply via email to