Alex,

Are you sure that this is a bug.

Take the case of three servers A, B and C with A being leader.

If transactions 1, 2 and 3 are committed, then a majority of the nodes,
including at least A, must have seen these transactions.  Moreover,
transactions cannot be committed on a node unless all previous transactions
have been seen on that node as well.  Thus, by symmetry, we can consider
cases where B alone committed these transactions or where B and C committed
them.  Only the first case is problematic.

Now, assume further that transaction 4 has arrived at B and been forwarded
to A but neither B nor C have committed to it.

The situation now is that in this first epoch, A has seen 1-4, B has seen
1-3 and C has seen nothing.  At least two nodes know the current epoch
because we obviously have a quorum and we know that B knows the current
epoch because it has seen transactions from this epoch.  Thus the collection
of machines that know the current epoch can be A+B or A+B+C.

IF all three nodes now die simultaneously and B and C come back up, the
question is what will happen.  We know that the two nodes will agree on the
epoch because at least B has the last epoch.  Node B will be elected leader
because it has seen later transactions than C.  C will now get the
transactions and we have a quorum in a new epoch.

If A returns at this point, it will know about transactions 1, 2, 3 and 4.
 Further, it will know that 1, 2, and 3 have been committed in the first
epoch and that 4 was proposed, but never committed.  As it joins, it will
find that a new epoch has started and will recognize B as master.  B will
tell it to truncate the log by deleting 4, but 4 was never committed anyway.

Where is the problem?

On Thu, Jul 21, 2011 at 1:11 PM, Alexander Shraer <[email protected]>wrote:

> The problem is in leader election - if the server doesn't reboot before
> running leader election (the usual case)  then only the transactions for
> which it received a commit count and it might not be elected leader, even if
> it has seen more transactions than the others. This may lead to transactions
> being dropped.
>
> I opened a JIRA for this.
>

Reply via email to