If you did not turn on the rackware, it shall be a capacity problem. You
can file an issue in Apache Helix github to improve the exception or logs.

Best,

Junkai

On Fri, Jun 24, 2022 at 2:10 AM Grainier Perera <[email protected]> wrote:

> Hi Devs,
>
> Is there a way to validate the capacity availability of cluster instances
> when adding a resource and rebalancing it with WAGED? Because the resource
> addition process seems to happen in an event pipeline. So, when it
> encounters the "FAILED_TO_CALCULATE" exception, it doesn't seem to
> propagate to the place where we add the resource. Therefore, it seems
> tricky to validate capacity availability beforehand.
>
> While looking for this I found [1]. But I couldn't clearly understand the
> usage of the "WAGED simulation API" mentioned there. So, here's what I've
> tried;
>
> So the questions are:
> - Is it correct?
> - If so, is encountering a ""getIdealAssignmentForWagedFullAuto():
> Calculation failed: Failed to compute BestPossibleState!"" can be
> considered "FAILED_TO_CALCULATE"?
> - If so, is there a way to get the proper reason for the failure. Like "Unable
> to find any available candidate node for partition resource4_0; Fail
> reasons: {resource4-resource4_0-ONLINE={c8cep_on_localhost_12002=[Node has
> insufficient capacity]..."
> - Or, is there a better way of doing this?
>
>     try {
>         IdealState newIS = getIdealState(resourceName);
>         ResourceConfig newResourceConfig = new ResourceConfig(resourceName);
>         // Set PARTITION_CAPACITY_MAP
>         Map<String, String> capacityDataMap = ImmutableMap.of("CPU", "20", 
> "MEMORY", "60");
>         
> newResourceConfig.getRecord().setMapField(ResourceConfig.ResourceConfigProperty.PARTITION_CAPACITY_MAP.name(),
>                 
> Collections.singletonMap(ResourceConfig.DEFAULT_PARTITION_KEY, 
> OBJECT_MAPPER.writeValueAsString(capacityDataMap)));
>
>         // Read existing cluster/instances/resources info
>         final ZKHelixDataAccessor dataAccessor = new 
> ZKHelixDataAccessor(CLUSTER_NAME,
>                 new 
> ZkBaseDataAccessor.Builder<ZNRecord>().setZkAddress(ZK_ADDRESS).build());
>         ClusterConfig clusterConfig = 
> dataAccessor.getProperty(dataAccessor.keyBuilder().clusterConfig());
>         List<InstanceConfig> instanceConfigs = 
> dataAccessor.getChildValues(dataAccessor.keyBuilder().instanceConfigs(), 
> true);
>         List<String> liveInstances = 
> dataAccessor.getChildNames(dataAccessor.keyBuilder().liveInstances());
>         List<IdealState> idealStates = 
> dataAccessor.getChildValues(dataAccessor.keyBuilder().idealStates(), true);
>         List<ResourceConfig> resourceConfigs = 
> dataAccessor.getChildValues(dataAccessor.keyBuilder().resourceConfigs(), 
> true);
>
>         // Do we need add this?
>         idealStates.add(newIS);
>         resourceConfigs.add(newResourceConfig);
>
>         // Verify that utilResult contains the assignment for the resources 
> added
>         Map<String, ResourceAssignment> utilResult = HelixUtil
>                 .getTargetAssignmentForWagedFullAuto(ZK_ADDRESS, 
> clusterConfig, instanceConfigs,
>                         liveInstances, idealStates, resourceConfigs);
>
>     } catch (HelixException e) {
>         // Getting "getIdealAssignmentForWagedFullAuto(): Calculation failed: 
> Failed to compute BestPossibleState!"
>         // means not enough capacity?
>     }
>
> [1] https://github.com/apache/helix/pull/1701
>
> Thank you,
> Grainier Perera.
>


-- 
Junkai Xue

Reply via email to