jiajunwang opened a new issue #1027:
URL: https://github.com/apache/helix/issues/1027


   There are 2 known cases that may lead the partition to have 2 masters 
temporarily.
   - Unclean helix controller leader switch, which leads to 2 pipelines running 
in parallel.
   - Long ZK propagation latency causes the controller leader to see 
out-of-date information. Then the rebalance result may lead to 2 masters.
   
   The rebalancer will fix the additional master eventually, but the default 
operations are arbitrary and it may cause the oldest master to be leftover. 
Some many application requires the master to have the latest data, this caused 
serious application issue.
   
   We are trying to address 3 issues here:
   1. Unclean controller leadership switch
   2. Long ZK propagation latency caused out-of-date data read
   3. To gracefully recover the abnormal states (give there are more possible 
causes that lead to the problematic states).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to