Hi Junkai,

Thank you for the explanation. I'll create an issue to improve the
exception message.

Thanks,
Grainier Perera.


On Sat, 25 Jun 2022 at 09:38, Junkai Xue <[email protected]> wrote:

> If you did not turn on the rackware, it shall be a capacity problem. You
> can file an issue in Apache Helix github to improve the exception or logs.
>
> Best,
>
> Junkai
>
> On Fri, Jun 24, 2022 at 2:10 AM Grainier Perera <[email protected]>
> wrote:
>
>> Hi Devs,
>>
>> Is there a way to validate the capacity availability of cluster instances
>> when adding a resource and rebalancing it with WAGED? Because the resource
>> addition process seems to happen in an event pipeline. So, when it
>> encounters the "FAILED_TO_CALCULATE" exception, it doesn't seem to
>> propagate to the place where we add the resource. Therefore, it seems
>> tricky to validate capacity availability beforehand.
>>
>> While looking for this I found [1]. But I couldn't clearly understand the
>> usage of the "WAGED simulation API" mentioned there. So, here's what I've
>> tried;
>>
>> So the questions are:
>> - Is it correct?
>> - If so, is encountering a ""getIdealAssignmentForWagedFullAuto():
>> Calculation failed: Failed to compute BestPossibleState!"" can be
>> considered "FAILED_TO_CALCULATE"?
>> - If so, is there a way to get the proper reason for the failure. Like 
>> "Unable
>> to find any available candidate node for partition resource4_0; Fail
>> reasons: {resource4-resource4_0-ONLINE={c8cep_on_localhost_12002=[Node has
>> insufficient capacity]..."
>> - Or, is there a better way of doing this?
>>
>>     try {
>>         IdealState newIS = getIdealState(resourceName);
>>         ResourceConfig newResourceConfig = new ResourceConfig(resourceName);
>>         // Set PARTITION_CAPACITY_MAP
>>         Map<String, String> capacityDataMap = ImmutableMap.of("CPU", "20", 
>> "MEMORY", "60");
>>         
>> newResourceConfig.getRecord().setMapField(ResourceConfig.ResourceConfigProperty.PARTITION_CAPACITY_MAP.name(),
>>                 
>> Collections.singletonMap(ResourceConfig.DEFAULT_PARTITION_KEY, 
>> OBJECT_MAPPER.writeValueAsString(capacityDataMap)));
>>
>>         // Read existing cluster/instances/resources info
>>         final ZKHelixDataAccessor dataAccessor = new 
>> ZKHelixDataAccessor(CLUSTER_NAME,
>>                 new 
>> ZkBaseDataAccessor.Builder<ZNRecord>().setZkAddress(ZK_ADDRESS).build());
>>         ClusterConfig clusterConfig = 
>> dataAccessor.getProperty(dataAccessor.keyBuilder().clusterConfig());
>>         List<InstanceConfig> instanceConfigs = 
>> dataAccessor.getChildValues(dataAccessor.keyBuilder().instanceConfigs(), 
>> true);
>>         List<String> liveInstances = 
>> dataAccessor.getChildNames(dataAccessor.keyBuilder().liveInstances());
>>         List<IdealState> idealStates = 
>> dataAccessor.getChildValues(dataAccessor.keyBuilder().idealStates(), true);
>>         List<ResourceConfig> resourceConfigs = 
>> dataAccessor.getChildValues(dataAccessor.keyBuilder().resourceConfigs(), 
>> true);
>>
>>         // Do we need add this?
>>         idealStates.add(newIS);
>>         resourceConfigs.add(newResourceConfig);
>>
>>         // Verify that utilResult contains the assignment for the resources 
>> added
>>         Map<String, ResourceAssignment> utilResult = HelixUtil
>>                 .getTargetAssignmentForWagedFullAuto(ZK_ADDRESS, 
>> clusterConfig, instanceConfigs,
>>                         liveInstances, idealStates, resourceConfigs);
>>
>>     } catch (HelixException e) {
>>         // Getting "getIdealAssignmentForWagedFullAuto(): Calculation 
>> failed: Failed to compute BestPossibleState!"
>>         // means not enough capacity?
>>     }
>>
>> [1] https://github.com/apache/helix/pull/1701
>>
>> Thank you,
>> Grainier Perera.
>>
>
>
> --
> Junkai Xue
>

Reply via email to