Mark Brouwer wrote:
Robert Resendes wrote:
Mark Brouwer wrote:

However when I look into the code of Mahalo I can see when rejoining a
transaction with a different crash count that CrashCountException is
thrown, but I can't see where Mahalo forces the transaction to abort,
which based on my interpretation of the spec seems to be required. Is my
interpretation wrong or is there a bug in Mahalo (or the code I seem to
miss).
I think your interpretation and analysis are correct from a quick look at the code. Do you want to file the bug/issue or should I?

Hi Robert,

I'm happy if you file this issue in JIRA.
OK.


Also I assume that when a transaction manager service drives the
transaction to abort a transaction manager should skip the transaction
participant for which the rejoin failed due to the crash count exception?

The participant should assume an "abort" upon receiving CrashCountException (although I don't see that explicitly stated anywhere). Even if it doesn't receive the exception (e.g. due to network issues), it can check (via getState()) on the status of any outstanding transactions it's managing. So, in either case, the participant should be able to figure out that it needs to drop out of the transaction.

That said, trying to call abort() on the "inconsistent" participant might short circuit the getState() call (above). So, for that case, it could be seen as an optimization if you do make the abort call. [I'm assuming that releasing resources (early) for a transaction is worth the extra cost of a remote call, here, and that this scenario is not the norm within your system.]

I doubt the above is completely clear to me so in my own words and
assuming Mahalo as implementation. Say we have 2 participants A & B that
joined transaction T. A crashes and recovers and it is aware of T having
joined although it lost its state related to the transaction and
therefore wants to notify the transaction manager service with a
different crash count that it must abort the transaction.

So A 'rejoins' the transaction manager service with a different crash
count. The transaction manager service throws CrashCountException and
(Mahalo doesn't do it due to a bug) should move the transaction into the
aborted state. At that point it will call abort on transaction
participant B, it skips the registered transaction participant A,
because that one is inconsistent (and will likely to fail) and
transaction participant A can infer from the CrashCountException that
the transaction has been aborted. If the exception gets lost due to a
RemoteException it can find out later with a call to getState().

I think that's the same idea I was trying to get across.


Another question I have with regard to Mahalo is what happens if it
receives an indefinite RemoteException (such as ConnectException). I
found some language with regard to retries (5?) but I couldn't find the
retry logic, intervals between retries, etc. Can you help me out here
Robert what happens?
--
For this specific case (i.e. abort on the participant) or in general? If it's the latter, then it will be a long discussion.

Bob


Reply via email to