xyuanlu commented on code in PR #2772:
URL: https://github.com/apache/helix/pull/2772#discussion_r1513583433
##########
helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java:
##########
@@ -327,21 +318,28 @@ public Object call() {
// if yes, auto enable maintenance mode, and use the maintenance rebalancer
for this pipeline.
private boolean validateOfflineInstancesLimit(final
ResourceControllerDataProvider cache,
final HelixManager manager) {
- int maxOfflineInstancesAllowed =
cache.getClusterConfig().getMaxOfflineInstancesAllowed();
- if (maxOfflineInstancesAllowed >= 0) {
- int offlineCount =
- cache.getAssignableInstances().size() -
cache.getAssignableEnabledLiveInstances().size();
- if (offlineCount > maxOfflineInstancesAllowed) {
+ int maxInstancesUnableToAcceptOnlineReplicas =
+ cache.getClusterConfig().getMaxOfflineInstancesAllowed();
+ if (maxInstancesUnableToAcceptOnlineReplicas >= 0) {
+ // Instead of only checking the offline instances, we consider how many
instances in the cluster
+ // are not assignable and live. This is because some instances may be
online but have an unassignable
+ // InstanceOperation such as EVACUATE, DISABLE, or UNKNOWN. We will
exclude SWAP_IN instances from
Review Comment:
We may add capacity with UNKNOWN instance operation. Probably consider not
include UNKNOWN as down instances when auto EMM.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]