Hi Devs,
Is there a way to validate the capacity availability of cluster instances
when adding a resource and rebalancing it with WAGED? Because the resource
addition process seems to happen in an event pipeline. So, when it
encounters the "FAILED_TO_CALCULATE" exception, it doesn't seem to
propagate to the place where we add the resource. Therefore, it seems
tricky to validate capacity availability beforehand.
While looking for this I found [1]. But I couldn't clearly understand the
usage of the "WAGED simulation API" mentioned there. So, here's what I've
tried;
So the questions are:
- Is it correct?
- If so, is encountering a ""getIdealAssignmentForWagedFullAuto():
Calculation failed: Failed to compute BestPossibleState!"" can be
considered "FAILED_TO_CALCULATE"?
- If so, is there a way to get the proper reason for the failure. Like "Unable
to find any available candidate node for partition resource4_0; Fail
reasons: {resource4-resource4_0-ONLINE={c8cep_on_localhost_12002=[Node has
insufficient capacity]..."
- Or, is there a better way of doing this?
try {
IdealState newIS = getIdealState(resourceName);
ResourceConfig newResourceConfig = new ResourceConfig(resourceName);
// Set PARTITION_CAPACITY_MAP
Map<String, String> capacityDataMap = ImmutableMap.of("CPU",
"20", "MEMORY", "60");
newResourceConfig.getRecord().setMapField(ResourceConfig.ResourceConfigProperty.PARTITION_CAPACITY_MAP.name(),
Collections.singletonMap(ResourceConfig.DEFAULT_PARTITION_KEY,
OBJECT_MAPPER.writeValueAsString(capacityDataMap)));
// Read existing cluster/instances/resources info
final ZKHelixDataAccessor dataAccessor = new
ZKHelixDataAccessor(CLUSTER_NAME,
new
ZkBaseDataAccessor.Builder<ZNRecord>().setZkAddress(ZK_ADDRESS).build());
ClusterConfig clusterConfig =
dataAccessor.getProperty(dataAccessor.keyBuilder().clusterConfig());
List<InstanceConfig> instanceConfigs =
dataAccessor.getChildValues(dataAccessor.keyBuilder().instanceConfigs(),
true);
List<String> liveInstances =
dataAccessor.getChildNames(dataAccessor.keyBuilder().liveInstances());
List<IdealState> idealStates =
dataAccessor.getChildValues(dataAccessor.keyBuilder().idealStates(),
true);
List<ResourceConfig> resourceConfigs =
dataAccessor.getChildValues(dataAccessor.keyBuilder().resourceConfigs(),
true);
// Do we need add this?
idealStates.add(newIS);
resourceConfigs.add(newResourceConfig);
// Verify that utilResult contains the assignment for the
resources added
Map<String, ResourceAssignment> utilResult = HelixUtil
.getTargetAssignmentForWagedFullAuto(ZK_ADDRESS,
clusterConfig, instanceConfigs,
liveInstances, idealStates, resourceConfigs);
} catch (HelixException e) {
// Getting "getIdealAssignmentForWagedFullAuto(): Calculation
failed: Failed to compute BestPossibleState!"
// means not enough capacity?
}
[1] https://github.com/apache/helix/pull/1701
Thank you,
Grainier Perera.