xyuanlu commented on code in PR #3043: URL: https://github.com/apache/helix/pull/3043#discussion_r2153564934
########## helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java: ########## @@ -290,7 +327,151 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr } // end of for-each-partition } - private boolean shouldCreateSTCancellation(Message pendingMessage, String desiredState, + /** + * Calculate the current active replica count based on state model type. + * This method determines how many replicas are currently serving traffic for a partition by + * analyzing the current state distribution and applying state model-specific rules. The count includes + * replicas in top states, secondary top states (where applicable), and ERROR states since helix considers + * them active. + * State model handling: + * - Single-top state models: Differentiates between patterns with and without secondary top + * states + * - ONLINE-OFFLINE: Counts ONLINE + ERROR states only + * - MASTER-SLAVE-OFFLINE: Counts MASTER + SLAVE + ERROR states + * - ONLINE-STANDBY-OFFLINE: Counts ONLINE + STANDBY + ERROR states + * - Multi-top state models: Counts only top states + ERROR states + * @param currentStateMap Map of instance name to current state for this partition, representing + * the actual state of each replica before any pending transitions + * @param stateModelDef State model definition containing the state hierarchy and transition rules + * used to determine which states are considered active + * @return The number of replicas currently in active states, used to determine the + * currentActiveReplicaNumber for the partition. + */ + private int calculateCurrentActiveReplicaCount(Map<String, String> currentStateMap, + StateModelDefinition stateModelDef) { + if (stateModelDef.isSingleTopStateModel()) { + return calculateSingleTopStateActiveCount(currentStateMap, stateModelDef); Review Comment: I think it is too complex here. All we need is, get the secondary top state count, get top state, include error, exclude dropped and offline. ########## helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java: ########## @@ -290,7 +327,151 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr } // end of for-each-partition } - private boolean shouldCreateSTCancellation(Message pendingMessage, String desiredState, + /** + * Calculate the current active replica count based on state model type. + * This method determines how many replicas are currently serving traffic for a partition by + * analyzing the current state distribution and applying state model-specific rules. The count includes + * replicas in top states, secondary top states (where applicable), and ERROR states since helix considers + * them active. + * State model handling: + * - Single-top state models: Differentiates between patterns with and without secondary top + * states + * - ONLINE-OFFLINE: Counts ONLINE + ERROR states only + * - MASTER-SLAVE-OFFLINE: Counts MASTER + SLAVE + ERROR states + * - ONLINE-STANDBY-OFFLINE: Counts ONLINE + STANDBY + ERROR states + * - Multi-top state models: Counts only top states + ERROR states + * @param currentStateMap Map of instance name to current state for this partition, representing + * the actual state of each replica before any pending transitions + * @param stateModelDef State model definition containing the state hierarchy and transition rules + * used to determine which states are considered active + * @return The number of replicas currently in active states, used to determine the + * currentActiveReplicaNumber for the partition. + */ + private int calculateCurrentActiveReplicaCount(Map<String, String> currentStateMap, + StateModelDefinition stateModelDef) { + if (stateModelDef.isSingleTopStateModel()) { + return calculateSingleTopStateActiveCount(currentStateMap, stateModelDef); + } else { + return calculateMultiTopStateActiveCount(currentStateMap, stateModelDef); + } + } + + /** + * Calculate active replica count for single-top state models. + * Single-top state models have different active state definitions: + * - ONLINE-OFFLINE: Only ONLINE (top state) + ERROR are active + * - ONLINE-STANDBY-OFFLINE: ONLINE (top state) + STANDBY (secondary top) + ERROR are active + * Note: We need to identify true secondary states (like STANDBY) vs transition-only + * states (like OFFLINE). + */ + private int calculateSingleTopStateActiveCount(Map<String, String> currentStateMap, + StateModelDefinition stateModelDef) { + List<String> trueSecondaryTopStates = getTrueSecondaryTopStates(stateModelDef); + if (trueSecondaryTopStates.isEmpty()) { + // No true secondary states exist (e.g., ONLINE-OFFLINE pattern) + // Count: top + ERROR states only + // Example: OnlineOffline has getSecondTopStates()=["OFFLINE"] but OFFLINE is non-serving state + // so trueSecondaryTopStates=[] and we only count ONLINE + ERROR + return (int) currentStateMap.values().stream() + .filter(state -> stateModelDef.getTopState().contains(state) + || HelixDefinedState.ERROR.name().equals(state)) + .count(); + } else { + // True secondary states exist (e.g., MASTER-SLAVE, ONLINE-STANDBY, LEADER-STANDBY, OnlineOfflineWithBootstrap) + // Count: top + true secondary top + ERROR states + // Example for MasterSlave: trueSecondaryTopStates=["SLAVE"] + // Example for OnlineOfflineWithBootstrap: trueSecondaryTopStates=["BOOTSTRAP"] + return (int) currentStateMap.values().stream() + .filter(state -> stateModelDef.getTopState().contains(state) + || trueSecondaryTopStates.contains(state) + || HelixDefinedState.ERROR.name().equals(state)) + .count(); + } + } + + /** + * Get true secondary top states - states that: + * 1. Are not the top state itself (avoid double-counting) + * 2. Are not non-serving states like OFFLINE and DROPPED. + * Reasons for elimination: + * - getSecondTopStates() can include the top state itself in some state models. + * Example - OnlineOfflineWithBootstrap: + * topState="ONLINE", getSecondTopStates()=["ONLINE", "BOOTSTRAP"] + * After filtering: trueSecondaryTopStates=["BOOTSTRAP"] (removes "ONLINE" as it is top state.) + * - getSecondTopStates() can also include OFFLINE as a secondary top state in some state models. + * Example - OnlineOffline: + * getSecondTopStates() = ["OFFLINE"] as it transitions to ONLINE. + * After filtering: trueSecondaryTopStates=[] (removes "OFFLINE" as it's not a serving state). + * @param stateModelDef State model definition containing state hierarchy information + */ + private List<String> getTrueSecondaryTopStates(StateModelDefinition stateModelDef) { + return stateModelDef.getSecondTopStates().stream() + .filter(state -> !stateModelDef.getTopState().equals(state)) // Remove top state duplicates + .filter(state -> !OnlineOfflineSMD.States.OFFLINE.name().equals(state) Review Comment: Lets not use one single state model's offline state. Even the string name is the same, this is not the elegant way. ########## helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java: ########## @@ -290,7 +327,151 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr } // end of for-each-partition } - private boolean shouldCreateSTCancellation(Message pendingMessage, String desiredState, + /** + * Calculate the current active replica count based on state model type. + * This method determines how many replicas are currently serving traffic for a partition by + * analyzing the current state distribution and applying state model-specific rules. The count includes + * replicas in top states, secondary top states (where applicable), and ERROR states since helix considers + * them active. + * State model handling: + * - Single-top state models: Differentiates between patterns with and without secondary top + * states + * - ONLINE-OFFLINE: Counts ONLINE + ERROR states only + * - MASTER-SLAVE-OFFLINE: Counts MASTER + SLAVE + ERROR states + * - ONLINE-STANDBY-OFFLINE: Counts ONLINE + STANDBY + ERROR states + * - Multi-top state models: Counts only top states + ERROR states + * @param currentStateMap Map of instance name to current state for this partition, representing + * the actual state of each replica before any pending transitions + * @param stateModelDef State model definition containing the state hierarchy and transition rules + * used to determine which states are considered active + * @return The number of replicas currently in active states, used to determine the + * currentActiveReplicaNumber for the partition. + */ + private int calculateCurrentActiveReplicaCount(Map<String, String> currentStateMap, + StateModelDefinition stateModelDef) { + if (stateModelDef.isSingleTopStateModel()) { + return calculateSingleTopStateActiveCount(currentStateMap, stateModelDef); + } else { + return calculateMultiTopStateActiveCount(currentStateMap, stateModelDef); + } + } + + /** + * Calculate active replica count for single-top state models. + * Single-top state models have different active state definitions: + * - ONLINE-OFFLINE: Only ONLINE (top state) + ERROR are active + * - ONLINE-STANDBY-OFFLINE: ONLINE (top state) + STANDBY (secondary top) + ERROR are active + * Note: We need to identify true secondary states (like STANDBY) vs transition-only + * states (like OFFLINE). + */ + private int calculateSingleTopStateActiveCount(Map<String, String> currentStateMap, + StateModelDefinition stateModelDef) { + List<String> trueSecondaryTopStates = getTrueSecondaryTopStates(stateModelDef); + if (trueSecondaryTopStates.isEmpty()) { + // No true secondary states exist (e.g., ONLINE-OFFLINE pattern) + // Count: top + ERROR states only + // Example: OnlineOffline has getSecondTopStates()=["OFFLINE"] but OFFLINE is non-serving state + // so trueSecondaryTopStates=[] and we only count ONLINE + ERROR + return (int) currentStateMap.values().stream() Review Comment: Isnt this condition include the condition in line 384? Meaning we only need to have the else? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org