junkaixue commented on code in PR #2772:
URL: https://github.com/apache/helix/pull/2772#discussion_r1545770980
##########
helix-common/src/main/java/org/apache/helix/constants/InstanceConstants.java:
##########
@@ -1,17 +1,67 @@
package org.apache.helix.constants;
+import java.util.Set;
+
public class InstanceConstants {
public static final String INSTANCE_NOT_DISABLED = "INSTANCE_NOT_DISABLED";
+ /**
+ * The set contains the InstanceOperations that are allowed to be assigned
replicas by the rebalancer.
+ */
+ public static final Set<InstanceOperation> ASSIGNABLE_INSTANCE_OPERATIONS =
+ Set.of(InstanceOperation.ENABLE, InstanceOperation.DISABLE);
+
+
+ /**
+ * The set contains the InstanceOperations that are overridden when the
deprecated HELIX_ENABLED
+ * field is set to false. This will maintain backwards compatibility with
the deprecated field.
+ */
+ public static final Set<InstanceOperation>
INSTANCE_DISABLED_OVERRIDABLE_OPERATIONS =
+ Set.of(InstanceOperation.ENABLE, InstanceOperation.DISABLE,
InstanceOperation.EVACUATE);
Review Comment:
NIT: let's mark these are backward compatible support as the TODO instead of
comments. In Helix 2.0, we will remove these logic as only instance operation
alive.
Remember to deprecate the old field of ENABLE/DISABLE
##########
helix-core/src/main/java/org/apache/helix/model/InstanceConfig.java:
##########
@@ -302,13 +310,14 @@ public void setInstanceDisabledReason(String
disabledReason) {
* It will be a no-op when instance is enabled.
*/
public void setInstanceDisabledType(InstanceConstants.InstanceDisabledType
disabledType) {
Review Comment:
Shall we make it more generic? Not only for disable but evacuate and else?
Probably call "setInstanceNonServingType" ?
##########
helix-core/src/main/java/org/apache/helix/controller/stages/IntermediateStateCalcStage.java:
##########
@@ -661,11 +661,11 @@ private Map<String, Integer> getRequiredStates(String
resourceName,
// preference list
if (preferenceList != null) {
return stateModelDefinition.getStateCountMap((int)
preferenceList.stream().filter(
- i ->
resourceControllerDataProvider.getAssignableEnabledLiveInstances().contains(i))
+ i ->
resourceControllerDataProvider.getEnabledLiveInstances().contains(i))
.count(), requiredNumReplica); // StateModelDefinition's counts
}
return stateModelDefinition.getStateCountMap(
-
resourceControllerDataProvider.getAssignableEnabledLiveInstances().size(),
+ resourceControllerDataProvider.getEnabledLiveInstances().size(),
Review Comment:
Shall we use assignable here? Even if evacuating is in the progress of
moving things out, but before it brings down to offline/dropped, we should
count that state.
##########
helix-core/src/main/java/org/apache/helix/model/InstanceConfig.java:
##########
@@ -777,6 +850,38 @@ public boolean
validateTopologySettingInInstanceConfig(ClusterConfig clusterConf
return true;
}
+ /**
+ * Overwrite the InstanceConfigProperties from the given InstanceConfig to
this InstanceConfig.
+ * The merge is done by overwriting the properties in this InstanceConfig
with the properties
+ * from the given InstanceConfig. {@link #NON_OVERWRITABLE_PROPERTIES} will
not be overridden.
+ *
+ * @param overwritingInstanceConfig the InstanceConfig to override into this
InstanceConfig
+ */
+ public void overwriteInstanceConfig(InstanceConfig
overwritingInstanceConfig) {
+ for (InstanceConfigProperty keyEnum : InstanceConfigProperty.values()) {
+ if (NON_OVERWRITABLE_PROPERTIES.contains(keyEnum)) {
+ continue;
+ }
+
+ String key = keyEnum.toString();
+ // Add key value pair if it exists in overwritingInstanceConfig and
remove it if it doesn't
+ // exist in overwritingInstanceConfig but does exist in this
InstanceConfig.
+ if
(overwritingInstanceConfig.getRecord().getSimpleFields().containsKey(key)) {
+ _record.setSimpleField(key,
overwritingInstanceConfig.getRecord().getSimpleField(key));
+ } else if (_record.getSimpleFields().containsKey(key)) {
+ _record.getSimpleFields().remove(key);
+ } else if
(overwritingInstanceConfig.getRecord().getListFields().containsKey(key)) {
+ _record.setListField(key,
overwritingInstanceConfig.getRecord().getListField(key));
+ } else if (_record.getListFields().containsKey(key)) {
+ _record.getListFields().remove(key);
+ } else if
(overwritingInstanceConfig.getRecord().getMapFields().containsKey(key)) {
+ _record.setMapField(key,
overwritingInstanceConfig.getRecord().getMapField(key));
+ } else if (_record.getMapFields().containsKey(key)) {
+ _record.getMapFields().remove(key);
+ }
Review Comment:
Why not just addAll the defined keys for three fields.... This is too
complicated.
##########
helix-core/src/main/java/org/apache/helix/controller/dataproviders/BaseControllerDataProvider.java:
##########
@@ -1127,7 +1066,7 @@ private void
updateDisabledInstances(Collection<InstanceConfig> allInstanceConfi
_disabledInstanceSet.clear();
for (InstanceConfig config : allInstanceConfigs) {
Map<String, List<String>> disabledPartitionMap =
config.getDisabledPartitionsMap();
- if (!InstanceValidationUtil.isInstanceEnabled(config, clusterConfig)) {
+ if
(config.getInstanceOperation().equals(InstanceConstants.InstanceOperation.DISABLE))
{
Review Comment:
Why we change to the operation instead of the function? We can keep backward
compatible logic in function instead of changing the check everywhere.
##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/DelayedAutoRebalancer.java:
##########
@@ -363,6 +364,9 @@ protected Map<String, String>
computeBestPossibleStateForPartition(Set<String> l
currentMapWithPreferenceList.keySet().retainAll(preferenceList);
combinedPreferenceList.addAll(currentInstances);
+ if (cache != null) {
+ combinedPreferenceList.removeAll(cache.getUnknownInstances());
+ }
Review Comment:
This can be confusing. If there are unknown instances, they should be
filtered out before adding to combined preference list.
If you remove here, the N -> N + 1 can be stuck as even N + 1 replica be
bootstrapped.
##########
helix-core/src/main/java/org/apache/helix/model/InstanceConfig.java:
##########
@@ -302,13 +310,14 @@ public void setInstanceDisabledReason(String
disabledReason) {
* It will be a no-op when instance is enabled.
*/
public void setInstanceDisabledType(InstanceConstants.InstanceDisabledType
disabledType) {
- if (!getInstanceEnabled()) {
+ if
(getInstanceOperation().equals(InstanceConstants.InstanceOperation.DISABLE)) {
Review Comment:
Same here. To me, just disable would be too specific as evacuate, unknown
they can all have the type?
##########
helix-core/src/main/java/org/apache/helix/model/InstanceConfig.java:
##########
@@ -321,29 +330,93 @@ public String getInstanceDisabledReason() {
* Default is am empty string.
*/
public String getInstanceDisabledType() {
- if (getInstanceEnabled()) {
+ if
(!getInstanceOperation().equals(InstanceConstants.InstanceOperation.DISABLE)) {
return InstanceConstants.INSTANCE_NOT_DISABLED;
}
return
_record.getStringField(InstanceConfigProperty.HELIX_DISABLED_TYPE.name(),
InstanceConstants.InstanceDisabledType.DEFAULT_INSTANCE_DISABLE_TYPE.name());
}
/**
- * Get the timestamp (milliseconds from epoch) when this instance was
enabled/disabled last time.
+ * Set the instance operation for this instance.
*
- * @return
+ * @param operation the instance operation
*/
- public long getInstanceEnabledTime() {
- return
_record.getLongField(InstanceConfigProperty.HELIX_ENABLED_TIMESTAMP.name(), -1);
- }
-
public void setInstanceOperation(InstanceConstants.InstanceOperation
operation) {
_record.setSimpleField(InstanceConfigProperty.INSTANCE_OPERATION.name(),
operation == null ? "" : operation.name());
+ if (operation == null || operation ==
InstanceConstants.InstanceOperation.ENABLE
+ || operation == InstanceConstants.InstanceOperation.DISABLE) {
+ // We are still setting the HELIX_ENABLED field for backwards
compatibility.
+ // It is possible that users will be using earlier version of HelixAdmin
or helix-rest
+ // is on older version.
+ // TODO: Remove this when we are sure that all users are using the new
field INSTANCE_OPERATION.
+ setInstanceEnabledHelper(!(operation ==
InstanceConstants.InstanceOperation.DISABLE));
+ }
+ }
+
+ private void setInstanceOperationInit(InstanceConstants.InstanceOperation
operation) {
+ if (operation == null) {
+ return;
+ }
+ _record.setSimpleField(InstanceConfigProperty.INSTANCE_OPERATION.name(),
operation.name());
+ }
+
+ /**
+ * Get the InstanceOperation of this instance, default is ENABLE if nothing
is set. If
+ * HELIX_ENABLED is set to false, then the instance operation is DISABLE for
backwards
+ * compatibility.
+ *
+ * @return the instance operation
+ */
+ public InstanceConstants.InstanceOperation getInstanceOperation() {
+ String instanceOperationString =
+
_record.getSimpleField(InstanceConfigProperty.INSTANCE_OPERATION.name());
+
+ InstanceConstants.InstanceOperation instanceOperation;
+ try {
+ // If INSTANCE_OPERATION is not set, then the instance is enabled.
+ instanceOperation = instanceOperationString == null ||
instanceOperationString.isEmpty()
+ ? InstanceConstants.InstanceOperation.ENABLE
+ :
InstanceConstants.InstanceOperation.valueOf(instanceOperationString);
+ } catch (IllegalArgumentException e) {
+ _logger.error("Invalid instance operation: " + instanceOperationString +
" for instance: "
+ + _record.getId()
+ + ". You may need to update your version of Helix to get support for
this "
+ + "type of InstanceOperation. Defaulting to UNKNOWN.");
+ return InstanceConstants.InstanceOperation.UNKNOWN;
+ }
+
+ // Always respect the HELIX_ENABLED being set to false when instance
operation is unset
+ // for backwards compatibility.
+ if (!_record.getBooleanField(InstanceConfigProperty.HELIX_ENABLED.name(),
+ HELIX_ENABLED_DEFAULT_VALUE)
+ &&
(InstanceConstants.INSTANCE_DISABLED_OVERRIDABLE_OPERATIONS.contains(
+ instanceOperation))) {
+ return InstanceConstants.InstanceOperation.DISABLE;
+ }
+
+ return instanceOperation;
}
- public String getInstanceOperation() {
- return
_record.getStringField(InstanceConfigProperty.INSTANCE_OPERATION.name(), "");
+ /**
+ * Check if this instance is enabled. This is used to determine if the
instance can host online
+ * replicas and take new assignment.
+ *
+ * @return true if enabled, false otherwise
+ */
+ public boolean getInstanceEnabled() {
Review Comment:
This needs backward compatible work. It not only checks operation field but
also the old field. We can not assume user use the new API already carrying
over the old fields with new fields.
##########
helix-core/src/main/java/org/apache/helix/model/InstanceConfig.java:
##########
@@ -777,6 +850,38 @@ public boolean
validateTopologySettingInInstanceConfig(ClusterConfig clusterConf
return true;
}
+ /**
+ * Overwrite the InstanceConfigProperties from the given InstanceConfig to
this InstanceConfig.
+ * The merge is done by overwriting the properties in this InstanceConfig
with the properties
+ * from the given InstanceConfig. {@link #NON_OVERWRITABLE_PROPERTIES} will
not be overridden.
+ *
+ * @param overwritingInstanceConfig the InstanceConfig to override into this
InstanceConfig
+ */
+ public void overwriteInstanceConfig(InstanceConfig
overwritingInstanceConfig) {
+ for (InstanceConfigProperty keyEnum : InstanceConfigProperty.values()) {
+ if (NON_OVERWRITABLE_PROPERTIES.contains(keyEnum)) {
+ continue;
+ }
+
+ String key = keyEnum.toString();
Review Comment:
use name() would be better.
##########
helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java:
##########
@@ -161,49 +162,47 @@ private void
addSwapInInstancesToBestPossibleState(Map<String, Resource> resourc
return;
}
- // If the corresponding swap-in instance is not enabled,
assign replicas with
- // initial state.
- if (!enabledSwapInInstances.contains(
- swapOutToSwapInInstancePairs.get(swapOutInstance))) {
-
stateMap.put(swapOutToSwapInInstancePairs.get(swapOutInstance),
- stateModelDef.getInitialState());
- return;
- }
-
- // If the swap-in node is live and enabled, do assignment with
the following logic:
- // 1. If the swap-out instance's replica is a secondTopState,
set the swap-in instance's replica
- // to the same secondTopState.
- // 2. If the swap-out instance's replica is any other state
and is in the preferenceList,
- // set the swap-in instance's replica to the topState if the
StateModel allows another to be added.
- // If not, set the swap-in instance's replica to the
secondTopState.
- // We can make this assumption because if there is assignment
to the swapOutInstance, it must be either
- // a topState or a secondTopState.
- if (stateMap.containsKey(swapOutInstance) &&
stateModelDef.getSecondTopStates()
- .contains(stateMap.get(swapOutInstance))) {
- // If the swap-out instance's replica is a secondTopState,
set the swap-in instance's replica
- // to the same secondTopState.
-
stateMap.put(swapOutToSwapInInstancePairs.get(swapOutInstance),
- stateMap.get(swapOutInstance));
- } else {
- // If the swap-out instance's replica is any other state in
the stateMap or not present in the
- // stateMap, set the swap-in instance's replica to the
topState if the StateModel allows another
- // to be added. If not, set the swap-in to the
secondTopState.
- String topStateCount =
-
stateModelDef.getNumInstancesPerState(stateModelDef.getTopState());
- if (topStateCount.equals(
-
StateModelDefinition.STATE_REPLICA_COUNT_ALL_CANDIDATE_NODES)
- || topStateCount.equals(
- StateModelDefinition.STATE_REPLICA_COUNT_ALL_REPLICAS)) {
- // If the StateModel allows for another replica with the
topState to be added,
- // set the swap-in instance's replica to the topState.
-
stateMap.put(swapOutToSwapInInstancePairs.get(swapOutInstance),
- stateModelDef.getTopState());
+ // If the swap-in node is live, do assignment with the
following logic:
+ // 1. If the swap-out instance's replica is in the stateMap:
+ // - if the swap-out instance's replica is a topState, set the
swap-in instance's replica to the topState.
+ // if another is allowed to be added, otherwise set the
swap-in instance's replica to a secondTopState.
+ // - if the swap-out instance's replica is not a topState or
ERROR, set the swap-in instance's replica to the same state.
+ // - if the swap-out instance's replica is ERROR, set the
swap-in instance's replica to the initialState.
+ // 2. If the swap-out instance's replica is not in the
stateMap, set the swap-in instance's replica to the initialState.
+ // This happens when the swap-out node is offline.
+ if (stateMap.containsKey(swapOutInstance)) {
+ if
(stateMap.get(swapOutInstance).equals(stateModelDef.getTopState())
+ ||
stateMap.get(swapOutInstance).equals(HelixDefinedState.ERROR.name())) {
+ // If the swap-out instance's replica is a topState, set
the swap-in instance's replica
+ // to the topState if the StateModel allows another to be
added. If not, set the swap-in
+ // to the secondTopState.
+ String topStateCount =
+
stateModelDef.getNumInstancesPerState(stateModelDef.getTopState());
+ if (topStateCount.equals(
+
StateModelDefinition.STATE_REPLICA_COUNT_ALL_CANDIDATE_NODES)
+ || topStateCount.equals(
+
StateModelDefinition.STATE_REPLICA_COUNT_ALL_REPLICAS)) {
+ // If the StateModel allows for another replica with the
topState to be added,
+ // set the swap-in instance's replica to the topState.
+
stateMap.put(swapOutToSwapInInstancePairs.get(swapOutInstance),
+ stateModelDef.getTopState());
+ } else {
+ // If StateModel does not allow another topState replica
to be
+ // added, set the swap-in instance's replica to the
secondTopState.
+
stateMap.put(swapOutToSwapInInstancePairs.get(swapOutInstance),
+
stateModelDef.getSecondTopStates().iterator().next());
+ }
} else {
- // If StateModel does not allow another topState replica
to be
- // added, set the swap-in instance's replica to the
secondTopState.
+ // If the swap-out instance's replica is not a topState or
ERROR, set the swap-in instance's replica
+ // to the same state
stateMap.put(swapOutToSwapInInstancePairs.get(swapOutInstance),
- stateModelDef.getSecondTopStates().iterator().next());
+ stateMap.get(swapOutInstance));
}
+ } else {
+ // If the swap-out instance's replica is not in the
stateMap, set the swap-in instance's replica
+ // to the initialState. This happens when the swap-out node
is offline.
+
stateMap.put(swapOutToSwapInInstancePairs.get(swapOutInstance),
+ stateModelDef.getInitialState());
}
});
});
Review Comment:
I would recommend not hybrid the logics here. My understanding of this
process is:
1. Get partition mappings to swap-in instances first.
2. Going through again for assigning states with second top (single leader)
or top.
3. If you really would like to handle the swap out is already offline case,
we can do a minor adjustment. But I personally do not like that way. Because
the behavior is unclear to me. Before swap in fully bootstrap and make it as
serving node is not the contract we have for this API?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]