Hi,

In your case only A and E has committed the latest transaction say am calling 
it as txid=1000. B, C, D servers are down at this time and doesn't have the 
changes of txid=1000. 
Also, when restarting B,C,D the servers A, E are not available. Now the newly 
elected Leader is seeing atmost txid=999 and when A, E rejoins the quorum it 
will 'truncate' himself by deleting the txid=1000. As you said, the write 
operation performed will be lost in this case.

I could see this is a kinda tricky case of double failures or multiple 
failures. But I agree this can happen. 
My point is, if user wants to maintain a reliable cluster then he should keep 
in mind that the failures more than the tolerated number of failures may leads 
to unexpected results like this.


Best Regards,
Rakesh
-----Original Message-----
From: bit1...@163.com [mailto:bit1...@163.com] 
Sent: 05 January 2015 15:56
To: user@zookeeper.apache.org
Subject: Re: Question about the two-phrase commit

Could someone help on this question? Thanks.



bit1...@163.com
 
From: bit1...@163.com
Date: 2015-01-05 15:05
To: user@zookeeper.apache.org
Subject: Question about the two-phrase commit

Hi,Zookeepers,

I got a question about the two phrase commit in Zookeeper. When a write 
operation happens

1. Leader proposes all the followers to accept the change(Proposal Vote phrase) 
2. Followers ack the proposal and writes the change to the disk(but not 
persisted yet?) 3. When the Leader receives the majority of acks from 
followers, the Leader asks the followers to commit the change 4. When each 
follower receives the commit request, follower commits the changes(persist the 
change for ever?)

In the above process, something rare could happen a. Say,there are 5 nodes in 
the quorum(1 leader E, 4 follower A,B,C,D).
b. The write operation is issued by the client that connects to Follower A c. A 
commits the changes and response to the client that the writer succeeds. 
d. Assume that When the response from A is  back to client telling the client 
that the write is successful, But in the period, the other followers (B,C,D) 
haven't even received the commit request, and B,C,D are down without getting a 
chance to commit the change.


Then shut down A and E. 
 Restart B,C,D,making sure that they will elect a leader.and A start later(A's 
latest tranactions will be lost,because A will sync with Lead).

When this is done, the write operation done before is lost?

Is there anything I miss in the above process? Thanks.





bit1...@163.com

Reply via email to