BTW, have you setup proper capacity in InstanceConfig of the only instance?
Best, Junkai On Sat, Jun 18, 2022 at 7:10 PM Junkai Xue <[email protected]> wrote: > Interesting. Is this reproducible? We can have a try on your data. > > Best, > > Junkai > > On Sat, Jun 18, 2022 at 4:31 AM Grainier Perera <[email protected]> > wrote: > >> Hi Junkai, >> >> I tried removing `MAX_PARTITIONS_PER_INSTANCE`. But it's still the same. >> What's weird is, when I add a few resources, I see some of them still not >> getting into the `ONLINE` state. In the below sample, you can see only the >> 2nd and 4th resources have proper `mapFields`, whereas the 1st and 3rd >> resources don't seem to have any mapping (all of them have the >> same IdealState). However, after a restart, this can change to 1 & 3 >> becomes `ONLINE` and 2 & 3 may lose their mapping. But the pattern >> remains... cannot understand why. >> >> >> *ExternalView for _mm:root:_system:cron1:*{ >> "id" : "_mm:root:_system:cron1", >> "simpleFields" : { >> "BUCKET_SIZE" : "0", >> "DELAY_REBALANCE_ENABLED" : "true", >> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >> "NUM_PARTITIONS" : "1", >> "REBALANCER_CLASS_NAME" : >> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer", >> "REBALANCE_DELAY" : "10000", >> "REBALANCE_MODE" : "FULL_AUTO", >> "REPLICAS" : "1", >> "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel" >> }, >> *"mapFields" : { },* >> "listFields" : { } >> } >> >> >> *ExternalView for _mm:root:_system:cron2:*{ >> "id" : "_mm:root:_system:cron2", >> "simpleFields" : { >> "BUCKET_SIZE" : "0", >> "DELAY_REBALANCE_ENABLED" : "true", >> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >> "NUM_PARTITIONS" : "1", >> "REBALANCER_CLASS_NAME" : >> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer", >> "REBALANCE_DELAY" : "10000", >> "REBALANCE_MODE" : "FULL_AUTO", >> "REPLICAS" : "1", >> "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel" >> }, >> >> >> >> >> * "mapFields" : { "_mm:root:_system:cron2_0" : { >> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE" } },* >> "listFields" : { } >> } >> >> >> *ExternalView for _mm:root:_system:cron3:*{ >> "id" : "_mm:root:_system:cron3", >> "simpleFields" : { >> "BUCKET_SIZE" : "0", >> "DELAY_REBALANCE_ENABLED" : "true", >> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >> "NUM_PARTITIONS" : "1", >> "REBALANCER_CLASS_NAME" : >> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer", >> "REBALANCE_DELAY" : "10000", >> "REBALANCE_MODE" : "FULL_AUTO", >> "REPLICAS" : "1", >> "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel" >> }, >> *"mapFields" : { },* >> "listFields" : { } >> } >> >> >> *ExternalView for _mm:root:_system:cron4:*{ >> "id" : "_mm:root:_system:cron4", >> "simpleFields" : { >> "BUCKET_SIZE" : "0", >> "DELAY_REBALANCE_ENABLED" : "true", >> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >> "NUM_PARTITIONS" : "1", >> "REBALANCER_CLASS_NAME" : >> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer", >> "REBALANCE_DELAY" : "10000", >> "REBALANCE_MODE" : "FULL_AUTO", >> "REPLICAS" : "1", >> "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel" >> }, >> >> >> >> >> * "mapFields" : { "_mm:root:_system:cron4_0" : { >> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE" } },* >> "listFields" : { } >> } >> >> Thanks, >> Grainier Perera. >> >> >> On Sat, 18 Jun 2022 at 13:21, Junkai Xue <[email protected]> wrote: >> >>> Then most likely, it caused by this entry of config: >>> "MAX_PARTITIONS_PER_INSTANCE" : "1", >>> Usually, we never set this config up. It restricts the assignment for >>> instance. So now you already have one partition from 3_0 assigned. No other >>> partition can be assigned. >>> >>> So either you remove this entry of config setup or add more instances >>> may help. >>> >>> Please let us know if you have further questions. >>> >>> best, >>> >>> Junkai >>> >>> On Fri, Jun 17, 2022 at 11:38 PM Grainier Perera <[email protected]> >>> wrote: >>> >>>> Hi Junkai, >>>> >>>> - Correct. I haven't added any rack-aware information. >>>> - I'm connecting 1 instance at the startup and then expanding on-demand >>>> (I've set ALLOW_PARTICIPANT_AUTO_JOIN to true). >>>> - I've checked the live instances and other znodes in Zookeeper. >>>> Everything looks ok, except >>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2 has empty >>>> `mapFields` while /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3 >>>> has `mapFields` with a ONLINE record. I still cannot understand why? and >>>> what I'm doing wrong :( >>>> >>>> >>>> *[zk: localhost:2181(CONNECTED) 18] get >>>> /C8CEPCluster/CONFIGS/CLUSTER/C8CEPCluster*{ >>>> "id" : "C8CEPCluster", >>>> "simpleFields" : { >>>> "allowParticipantAutoJoin" : "true" >>>> }, >>>> "mapFields" : { >>>> "DEFAULT_INSTANCE_CAPACITY_MAP" : { >>>> "MEMORY" : "100", >>>> "CPU" : "100" >>>> }, >>>> "DEFAULT_PARTITION_WEIGHT_MAP" : { >>>> "MEMORY" : "5", >>>> "CPU" : "5" >>>> } >>>> }, >>>> "listFields" : { >>>> "INSTANCE_CAPACITY_KEYS" : [ "CPU", "MEMORY" ] >>>> } >>>> } >>>> >>>> *[zk: localhost:2181(CONNECTED) 8] get >>>> /C8CEPCluster/LIVEINSTANCES/c8cep-0.c8cep.c8.svc.cluster.local_12000*{ >>>> "id" : "c8cep-0.c8cep.c8.svc.cluster.local_12000", >>>> "simpleFields" : { >>>> "CURRENT_TASK_THREAD_POOL_SIZE" : "40", >>>> "HELIX_VERSION" : "1.0.4", >>>> "LIVE_INSTANCE" : "[email protected]", >>>> "SESSION_ID" : "106a30539a8003e" >>>> }, >>>> "mapFields" : { }, >>>> "listFields" : { } >>>> } >>>> [zk: localhost:2181(CONNECTED) 26] get >>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron2 >>>> { >>>> "id" : "_mm:root:_system:cron2", >>>> "simpleFields" : { }, >>>> "mapFields" : { >>>> "PARTITION_CAPACITY_MAP" : { >>>> "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}" >>>> } >>>> }, >>>> "listFields" : { } >>>> } >>>> >>>> *[zk: localhost:2181(CONNECTED) 27] get >>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron3*{ >>>> "id" : "_mm:root:_system:cron3", >>>> "simpleFields" : { }, >>>> "mapFields" : { >>>> "PARTITION_CAPACITY_MAP" : { >>>> "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}" >>>> } >>>> }, >>>> "listFields" : { } >>>> } >>>> >>>> *[zk: localhost:2181(CONNECTED) 38] get >>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron2*{ >>>> "id" : "_mm:root:_system:cron2", >>>> "simpleFields" : { >>>> "DELAY_REBALANCE_ENABLED" : "true", >>>> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >>>> "MAX_PARTITIONS_PER_INSTANCE" : "1", >>>> "NUM_PARTITIONS" : "1", >>>> "REBALANCER_CLASS_NAME" : >>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer", >>>> "REBALANCE_DELAY" : "10000", >>>> "REBALANCE_MODE" : "FULL_AUTO", >>>> "REPLICAS" : "1", >>>> "STATE_MODEL_DEF_REF" : "C8CEPStateModel" >>>> }, >>>> "mapFields" : { >>>> "_mm:root:_system:cron2_0" : { } >>>> }, >>>> "listFields" : { >>>> "_mm:root:_system:cron2_0" : [ ] >>>> } >>>> } >>>> >>>> *[zk: localhost:2181(CONNECTED) 39] get >>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron3*{ >>>> "id" : "_mm:root:_system:cron3", >>>> "simpleFields" : { >>>> "DELAY_REBALANCE_ENABLED" : "true", >>>> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >>>> "MAX_PARTITIONS_PER_INSTANCE" : "1", >>>> "NUM_PARTITIONS" : "1", >>>> "REBALANCER_CLASS_NAME" : >>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer", >>>> "REBALANCE_DELAY" : "10000", >>>> "REBALANCE_MODE" : "FULL_AUTO", >>>> "REPLICAS" : "1", >>>> "STATE_MODEL_DEF_REF" : "C8CEPStateModel" >>>> }, >>>> "mapFields" : { >>>> "_mm:root:_system:cron3_0" : { } >>>> }, >>>> "listFields" : { >>>> "_mm:root:_system:cron3_0" : [ ] >>>> } >>>> } >>>> >>>> *[zk: localhost:2181(CONNECTED) 42] get >>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2*{ >>>> "id" : "_mm:root:_system:cron2", >>>> "simpleFields" : { >>>> "BUCKET_SIZE" : "0", >>>> "DELAY_REBALANCE_ENABLED" : "true", >>>> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >>>> "MAX_PARTITIONS_PER_INSTANCE" : "1", >>>> "NUM_PARTITIONS" : "1", >>>> "REBALANCER_CLASS_NAME" : >>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer", >>>> "REBALANCE_DELAY" : "10000", >>>> "REBALANCE_MODE" : "FULL_AUTO", >>>> "REPLICAS" : "1", >>>> "STATE_MODEL_DEF_REF" : "C8CEPStateModel" >>>> }, >>>> *"mapFields" : { },* >>>> "listFields" : { } >>>> } >>>> >>>> *[zk: localhost:2181(CONNECTED) 43] get >>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3*{ >>>> "id" : "_mm:root:_system:cron3", >>>> "simpleFields" : { >>>> "BUCKET_SIZE" : "0", >>>> "DELAY_REBALANCE_ENABLED" : "true", >>>> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >>>> "MAX_PARTITIONS_PER_INSTANCE" : "1", >>>> "NUM_PARTITIONS" : "1", >>>> "REBALANCER_CLASS_NAME" : >>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer", >>>> "REBALANCE_DELAY" : "10000", >>>> "REBALANCE_MODE" : "FULL_AUTO", >>>> "REPLICAS" : "1", >>>> "STATE_MODEL_DEF_REF" : "C8CEPStateModel" >>>> }, >>>> >>>> >>>> >>>> >>>> *"mapFields" : { "_mm:root:_system:cron3_0" : { >>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE" } }*, >>>> "listFields" : { } >>>> } >>>> >>>> Thank you. >>>> Grainier Perera. >>>> >>>> >>>> On Sat, 18 Jun 2022 at 10:45, Junkai Xue <[email protected]> wrote: >>>> >>>>> OK. So you dont put any rackaware information. Then how many instances >>>>> do you have connecting to that cluster? Please double check the live >>>>> instances in Zookeeper as well. >>>>> >>>>> Best, >>>>> >>>>> Junkai >>>>> >>>>> On Fri, Jun 17, 2022 at 10:01 PM Grainier Perera <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Junkai, >>>>>> >>>>>> I've added cluster init code to the gist [1]. Apart from that, >>>>>> ClusterConfig is configured like this; >>>>>> >>>>>> ClusterConfig clusterConfig = >>>>>> configAccessor.getClusterConfig(CLUSTER_NAME); >>>>>> // Configuring the capacity keys in the Cluster Config. >>>>>> For example, MEMORY. >>>>>> >>>>>> clusterConfig.setInstanceCapacityKeys(INSTANCE_CAPACITY_KEYS); >>>>>> // Configuring the instance capacity in the Instance >>>>>> Config. For example, MEMORY = 100. >>>>>> >>>>>> clusterConfig.setDefaultInstanceCapacityMap(INSTANCE_CAPACITY); >>>>>> // Configuring the partition weight in the Resource >>>>>> Config. For example, MEMORY = 5. >>>>>> >>>>>> clusterConfig.setDefaultPartitionWeightMap(DEFAULT_RESOURCE_USAGE); >>>>>> configAccessor.setClusterConfig(CLUSTER_NAME, >>>>>> clusterConfig); >>>>>> >>>>>> [1] >>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb#file-clustersetup-java >>>>>> >>>>>> Thanks, >>>>>> Grainier Perera. >>>>>> >>>>>> >>>>>> On Sat, 18 Jun 2022 at 10:00, Junkai Xue <[email protected]> wrote: >>>>>> >>>>>>> Could you please share your cluster config as well? >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Junkai >>>>>>> >>>>>>> On Fri, Jun 17, 2022 at 8:24 PM Grainier Perera <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Devs, >>>>>>>> >>>>>>>> I'm trying to add several resources to the cluster using the >>>>>>>> following configurations[1]. However, only some will become `ONLINE`. >>>>>>>> What >>>>>>>> could be the reason? Is there a way to guarantee every resource will >>>>>>>> become >>>>>>>> `ONLINE` if WAGED capacity constraints are met? >>>>>>>> >>>>>>>> You can see with the same IdealState, "_mm:root:_system:cron3" has >>>>>>>> mapFields and it is ONLINE, and "_mm:root:_system:cron2" is not. >>>>>>>> Furthermore, I see this behavior more often when the replicas count is >>>>>>>> set >>>>>>>> to 1. >>>>>>>> >>>>>>>> ResourceInfo: >>>>>>>> 1. "_mm:root:_system:cron2" >>>>>>>> >>>>>>>> IdealState for _mm:root:_system:cron2: >>>>>>>> { >>>>>>>> "id" : "_mm:root:_system:cron2", >>>>>>>> "simpleFields" : { >>>>>>>> "DELAY_REBALANCE_ENABLED" : "true", >>>>>>>> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >>>>>>>> "MAX_PARTITIONS_PER_INSTANCE" : "1", >>>>>>>> "NUM_PARTITIONS" : "1", >>>>>>>> "REBALANCER_CLASS_NAME" : >>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer", >>>>>>>> "REBALANCE_DELAY" : "10000", >>>>>>>> "REBALANCE_MODE" : "FULL_AUTO", >>>>>>>> "REPLICAS" : "1", >>>>>>>> "STATE_MODEL_DEF_REF" : "C8CEPStateModel" >>>>>>>> }, >>>>>>>> "mapFields" : { >>>>>>>> "_mm:root:_system:cron2_0" : { } >>>>>>>> }, >>>>>>>> "listFields" : { >>>>>>>> "_mm:root:_system:cron2_0" : [ ] >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> ExternalView for _mm:root:_system:cron2: >>>>>>>> { >>>>>>>> "id" : "_mm:root:_system:cron2", >>>>>>>> "simpleFields" : { >>>>>>>> "BUCKET_SIZE" : "0", >>>>>>>> "DELAY_REBALANCE_ENABLED" : "true", >>>>>>>> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >>>>>>>> "MAX_PARTITIONS_PER_INSTANCE" : "1", >>>>>>>> "NUM_PARTITIONS" : "1", >>>>>>>> "REBALANCER_CLASS_NAME" : >>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer", >>>>>>>> "REBALANCE_DELAY" : "10000", >>>>>>>> "REBALANCE_MODE" : "FULL_AUTO", >>>>>>>> "REPLICAS" : "1", >>>>>>>> "STATE_MODEL_DEF_REF" : "C8CEPStateModel" >>>>>>>> }, >>>>>>>> *"mapFields" : { },* >>>>>>>> "listFields" : { } >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> 2. "_mm:root:_system:cron3" >>>>>>>> >>>>>>>> IdealState for _mm:root:_system:cron3: >>>>>>>> { >>>>>>>> "id" : "_mm:root:_system:cron3", >>>>>>>> "simpleFields" : { >>>>>>>> "DELAY_REBALANCE_ENABLED" : "true", >>>>>>>> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >>>>>>>> "MAX_PARTITIONS_PER_INSTANCE" : "1", >>>>>>>> "NUM_PARTITIONS" : "1", >>>>>>>> "REBALANCER_CLASS_NAME" : >>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer", >>>>>>>> "REBALANCE_DELAY" : "10000", >>>>>>>> "REBALANCE_MODE" : "FULL_AUTO", >>>>>>>> "REPLICAS" : "1", >>>>>>>> "STATE_MODEL_DEF_REF" : "C8CEPStateModel" >>>>>>>> }, >>>>>>>> "mapFields" : { >>>>>>>> "_mm:root:_system:cron3_0" : { } >>>>>>>> }, >>>>>>>> "listFields" : { >>>>>>>> "_mm:root:_system:cron3_0" : [ ] >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> ExternalView for _mm:root:_system:cron3: >>>>>>>> { >>>>>>>> "id" : "_mm:root:_system:cron3", >>>>>>>> "simpleFields" : { >>>>>>>> "BUCKET_SIZE" : "0", >>>>>>>> "DELAY_REBALANCE_ENABLED" : "true", >>>>>>>> "IDEAL_STATE_MODE" : "AUTO_REBALANCE", >>>>>>>> "MAX_PARTITIONS_PER_INSTANCE" : "1", >>>>>>>> "NUM_PARTITIONS" : "1", >>>>>>>> "REBALANCER_CLASS_NAME" : >>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer", >>>>>>>> "REBALANCE_DELAY" : "10000", >>>>>>>> "REBALANCE_MODE" : "FULL_AUTO", >>>>>>>> "REPLICAS" : "1", >>>>>>>> "STATE_MODEL_DEF_REF" : "C8CEPStateModel" >>>>>>>> }, >>>>>>>> *"mapFields" : {* >>>>>>>> * "_mm:root:_system:cron3_0" : {* >>>>>>>> * "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"* >>>>>>>> * }* >>>>>>>> * },* >>>>>>>> "listFields" : { } >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> [1]: >>>>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb >>>>>>>> >>>>>>>> Thank you. >>>>>>>> Grainier Perera. >>>>>>>> >>>>>>> >>> >>> -- >>> Junkai Xue >>> >>
