Re: Mahalo and Txn Spec question

Robert Resendes Tue, 10 Jul 2007 07:03:24 -0700

Mark Brouwer wrote:

Robert Resendes wrote:

Mark Brouwer wrote:


However when I look into the code of Mahalo I can see when rejoining a
transaction with a different crash count that CrashCountException is
thrown, but I can't see where Mahalo forces the transaction to abort,
which based on my interpretation of the spec seems to be required. Is my
interpretation wrong or is there a bug in Mahalo (or the code I seem to
miss).

I think your interpretation and analysis are correct from a quick lookat the code. Do you want to file the bug/issue or should I?


Hi Robert,

I'm happy if you file this issue in JIRA.

OK.

Also I assume that when a transaction manager service drives the
transaction to abort a transaction manager should skip the transaction
participant for which the rejoin failed due to the crash countexception?
The participant should assume an "abort" upon receivingCrashCountException (although I don't see that explicitly statedanywhere). Even if it doesn't receive the exception (e.g. due tonetwork issues), it can check (via getState()) on the status of anyoutstanding transactions it's managing. So, in either case, theparticipant should be able to figure out that it needs to drop out ofthe transaction.
That said, trying to call abort() on the "inconsistent" participantmight short circuit the getState() call (above). So, for that case, itcould be seen as an optimization if you do make the abort call. [I'massuming that releasing resources (early) for a transaction is worththe extra cost of a remote call, here, and that this scenario is notthe norm within your system.]
I doubt the above is completely clear to me so in my own words and
assuming Mahalo as implementation. Say we have 2 participants A & B that
joined transaction T. A crashes and recovers and it is aware of T having
joined although it lost its state related to the transaction and
therefore wants to notify the transaction manager service with a
different crash count that it must abort the transaction.

So A 'rejoins' the transaction manager service with a different crash
count. The transaction manager service throws CrashCountException and
(Mahalo doesn't do it due to a bug) should move the transaction into the
aborted state. At that point it will call abort on transaction
participant B, it skips the registered transaction participant A,
because that one is inconsistent (and will likely to fail) and
transaction participant A can infer from the CrashCountException that
the transaction has been aborted. If the exception gets lost due to a
RemoteException it can find out later with a call to getState().

I think that's the same idea I was trying to get across.


Another question I have with regard to Mahalo is what happens if it
receives an indefinite RemoteException (such as ConnectException). I
found some language with regard to retries (5?) but I couldn't find the
retry logic, intervals between retries, etc. Can you help me out here
Robert what happens?
--

For this specific case (i.e. abort on the participant) or in general? Ifit's the latter, then it will be a long discussion.

Bob

Re: Mahalo and Txn Spec question

Reply via email to