If you did not turn on the rackware, it shall be a capacity problem. You can file an issue in Apache Helix github to improve the exception or logs.
Best, Junkai On Fri, Jun 24, 2022 at 2:10 AM Grainier Perera <[email protected]> wrote: > Hi Devs, > > Is there a way to validate the capacity availability of cluster instances > when adding a resource and rebalancing it with WAGED? Because the resource > addition process seems to happen in an event pipeline. So, when it > encounters the "FAILED_TO_CALCULATE" exception, it doesn't seem to > propagate to the place where we add the resource. Therefore, it seems > tricky to validate capacity availability beforehand. > > While looking for this I found [1]. But I couldn't clearly understand the > usage of the "WAGED simulation API" mentioned there. So, here's what I've > tried; > > So the questions are: > - Is it correct? > - If so, is encountering a ""getIdealAssignmentForWagedFullAuto(): > Calculation failed: Failed to compute BestPossibleState!"" can be > considered "FAILED_TO_CALCULATE"? > - If so, is there a way to get the proper reason for the failure. Like "Unable > to find any available candidate node for partition resource4_0; Fail > reasons: {resource4-resource4_0-ONLINE={c8cep_on_localhost_12002=[Node has > insufficient capacity]..." > - Or, is there a better way of doing this? > > try { > IdealState newIS = getIdealState(resourceName); > ResourceConfig newResourceConfig = new ResourceConfig(resourceName); > // Set PARTITION_CAPACITY_MAP > Map<String, String> capacityDataMap = ImmutableMap.of("CPU", "20", > "MEMORY", "60"); > > newResourceConfig.getRecord().setMapField(ResourceConfig.ResourceConfigProperty.PARTITION_CAPACITY_MAP.name(), > > Collections.singletonMap(ResourceConfig.DEFAULT_PARTITION_KEY, > OBJECT_MAPPER.writeValueAsString(capacityDataMap))); > > // Read existing cluster/instances/resources info > final ZKHelixDataAccessor dataAccessor = new > ZKHelixDataAccessor(CLUSTER_NAME, > new > ZkBaseDataAccessor.Builder<ZNRecord>().setZkAddress(ZK_ADDRESS).build()); > ClusterConfig clusterConfig = > dataAccessor.getProperty(dataAccessor.keyBuilder().clusterConfig()); > List<InstanceConfig> instanceConfigs = > dataAccessor.getChildValues(dataAccessor.keyBuilder().instanceConfigs(), > true); > List<String> liveInstances = > dataAccessor.getChildNames(dataAccessor.keyBuilder().liveInstances()); > List<IdealState> idealStates = > dataAccessor.getChildValues(dataAccessor.keyBuilder().idealStates(), true); > List<ResourceConfig> resourceConfigs = > dataAccessor.getChildValues(dataAccessor.keyBuilder().resourceConfigs(), > true); > > // Do we need add this? > idealStates.add(newIS); > resourceConfigs.add(newResourceConfig); > > // Verify that utilResult contains the assignment for the resources > added > Map<String, ResourceAssignment> utilResult = HelixUtil > .getTargetAssignmentForWagedFullAuto(ZK_ADDRESS, > clusterConfig, instanceConfigs, > liveInstances, idealStates, resourceConfigs); > > } catch (HelixException e) { > // Getting "getIdealAssignmentForWagedFullAuto(): Calculation failed: > Failed to compute BestPossibleState!" > // means not enough capacity? > } > > [1] https://github.com/apache/helix/pull/1701 > > Thank you, > Grainier Perera. > -- Junkai Xue
