Hi Junkai, Thank you for the explanation. I'll create an issue to improve the exception message.
Thanks, Grainier Perera. On Sat, 25 Jun 2022 at 09:38, Junkai Xue <[email protected]> wrote: > If you did not turn on the rackware, it shall be a capacity problem. You > can file an issue in Apache Helix github to improve the exception or logs. > > Best, > > Junkai > > On Fri, Jun 24, 2022 at 2:10 AM Grainier Perera <[email protected]> > wrote: > >> Hi Devs, >> >> Is there a way to validate the capacity availability of cluster instances >> when adding a resource and rebalancing it with WAGED? Because the resource >> addition process seems to happen in an event pipeline. So, when it >> encounters the "FAILED_TO_CALCULATE" exception, it doesn't seem to >> propagate to the place where we add the resource. Therefore, it seems >> tricky to validate capacity availability beforehand. >> >> While looking for this I found [1]. But I couldn't clearly understand the >> usage of the "WAGED simulation API" mentioned there. So, here's what I've >> tried; >> >> So the questions are: >> - Is it correct? >> - If so, is encountering a ""getIdealAssignmentForWagedFullAuto(): >> Calculation failed: Failed to compute BestPossibleState!"" can be >> considered "FAILED_TO_CALCULATE"? >> - If so, is there a way to get the proper reason for the failure. Like >> "Unable >> to find any available candidate node for partition resource4_0; Fail >> reasons: {resource4-resource4_0-ONLINE={c8cep_on_localhost_12002=[Node has >> insufficient capacity]..." >> - Or, is there a better way of doing this? >> >> try { >> IdealState newIS = getIdealState(resourceName); >> ResourceConfig newResourceConfig = new ResourceConfig(resourceName); >> // Set PARTITION_CAPACITY_MAP >> Map<String, String> capacityDataMap = ImmutableMap.of("CPU", "20", >> "MEMORY", "60"); >> >> newResourceConfig.getRecord().setMapField(ResourceConfig.ResourceConfigProperty.PARTITION_CAPACITY_MAP.name(), >> >> Collections.singletonMap(ResourceConfig.DEFAULT_PARTITION_KEY, >> OBJECT_MAPPER.writeValueAsString(capacityDataMap))); >> >> // Read existing cluster/instances/resources info >> final ZKHelixDataAccessor dataAccessor = new >> ZKHelixDataAccessor(CLUSTER_NAME, >> new >> ZkBaseDataAccessor.Builder<ZNRecord>().setZkAddress(ZK_ADDRESS).build()); >> ClusterConfig clusterConfig = >> dataAccessor.getProperty(dataAccessor.keyBuilder().clusterConfig()); >> List<InstanceConfig> instanceConfigs = >> dataAccessor.getChildValues(dataAccessor.keyBuilder().instanceConfigs(), >> true); >> List<String> liveInstances = >> dataAccessor.getChildNames(dataAccessor.keyBuilder().liveInstances()); >> List<IdealState> idealStates = >> dataAccessor.getChildValues(dataAccessor.keyBuilder().idealStates(), true); >> List<ResourceConfig> resourceConfigs = >> dataAccessor.getChildValues(dataAccessor.keyBuilder().resourceConfigs(), >> true); >> >> // Do we need add this? >> idealStates.add(newIS); >> resourceConfigs.add(newResourceConfig); >> >> // Verify that utilResult contains the assignment for the resources >> added >> Map<String, ResourceAssignment> utilResult = HelixUtil >> .getTargetAssignmentForWagedFullAuto(ZK_ADDRESS, >> clusterConfig, instanceConfigs, >> liveInstances, idealStates, resourceConfigs); >> >> } catch (HelixException e) { >> // Getting "getIdealAssignmentForWagedFullAuto(): Calculation >> failed: Failed to compute BestPossibleState!" >> // means not enough capacity? >> } >> >> [1] https://github.com/apache/helix/pull/1701 >> >> Thank you, >> Grainier Perera. >> > > > -- > Junkai Xue >
