jiajunwang opened a new issue #1027: URL: https://github.com/apache/helix/issues/1027
There are 2 known cases that may lead the partition to have 2 masters temporarily. - Unclean helix controller leader switch, which leads to 2 pipelines running in parallel. - Long ZK propagation latency causes the controller leader to see out-of-date information. Then the rebalance result may lead to 2 masters. The rebalancer will fix the additional master eventually, but the default operations are arbitrary and it may cause the oldest master to be leftover. Some many application requires the master to have the latest data, this caused serious application issue. We are trying to address 3 issues here: 1. Unclean controller leadership switch 2. Long ZK propagation latency caused out-of-date data read 3. To gracefully recover the abnormal states (give there are more possible causes that lead to the problematic states). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
