What I mean is the ZNode inside the Zookeeper under path of /[your cluster name]/CONFIGS/CLUSTER/[your cluster name]
Best, Junkai On Tue, Jun 7, 2022 at 7:24 PM Phong X. Nguyen <[email protected]> wrote: > Yes, it involves enable/disable operations as the server comes up and > down. In the logs we would sometimes not see the host in the "Current quota > capacity" log message, either. > > When you refer to Cluster Config, did you mean what's accessible by > "-listClusterInfo helix-ctrl" ? > > Thanks, > Phong X. Nguyen > > On Tue, Jun 7, 2022 at 7:19 PM Junkai Xue <[email protected]> wrote: > >> Hi Phong, >> >> Thanks for leveraging Helix 1.0.3. I have a question for your testing. >> Will this test involve enable/disable operation? If yes, it could be a bug >> that was caused in 1.0.3, which leads to the instance being disabled >> through batch enable/disable. One thing you can verify: check the Cluster >> Config to see in map fields of disabled instances whether they contain the >> instance coming back. >> >> We are working on the 1.0.4 version to fix that. >> >> Best, >> >> Junkai >> >> >> >> On Tue, Jun 7, 2022 at 6:50 PM Phong X. Nguyen <[email protected]> >> wrote: >> >>> Helix Team, >>> >>> We're testing an upgrade to Helix 1.0.3 from Helix 1.0.1 primarily for >>> the log4j2 fixes. As we test it, we're discovering that WAGED seems to be >>> rebalancing in a slightly different way than before: >>> >>> Our configuration has 32 instances and 32 partitions. The simpleFields >>> configuration is as follows: >>> >>> "simpleFields" : { >>> "HELIX_ENABLED" : "true", >>> "NUM_PARTITIONS" : "32", >>> "MAX_PARTITIONS_PER_INSTANCE" : "4", >>> "DELAY_REBALANCE_ENABLE" : "true", >>> "DELAY_REBALANCE_TIME" : "30000", >>> "REBALANCE_MODE" : "FULL_AUTO", >>> "REBALANCER_CLASS_NAME" : >>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer", >>> "REPLICAS" : "1", >>> "STATE_MODEL_DEF_REF" : "OnlineOffline", >>> "STATE_MODEL_FACTORY_NAME" : "DEFAULT" >>> } >>> >>> Out of the 32 instances, we have 2 production test servers, e.g. >>> 'server01' and 'server02'. >>> >>> Previously, if we restarted the application on 'server01' in order to >>> deploy some test code, Helix would move one of the partitions over to >>> another host, and when 'server01' came back online the partition would be >>> rebalanced back. Currently we are not seeing his behavior; the partition >>> stays with the other host and does not go back. While this is within the >>> constraints of the max partitions, we're confused as to why this might >>> happen now. >>> >>> Have there been any changes to WAGED that might account for this? The >>> release notes mentioned that both 1.0.2 and 1.0.3 made some changes to >>> Helix. >>> >>> Thanks, >>> - Phong X. Nguyen >>> >> >> >> -- >> Junkai Xue >> > -- Junkai Xue
