Interesting. Is this reproducible? We can have a try on your data.

Best,

Junkai

On Sat, Jun 18, 2022 at 4:31 AM Grainier Perera <[email protected]> wrote:

> Hi Junkai,
>
> I tried removing `MAX_PARTITIONS_PER_INSTANCE`. But it's still the same.
> What's weird is, when I add a few resources, I see some of them still not
> getting into the `ONLINE` state. In the below sample, you can see only the
> 2nd and 4th resources have proper `mapFields`, whereas the 1st and 3rd
> resources don't seem to have any mapping (all of them have the
> same IdealState). However, after a restart, this can change to 1 & 3
> becomes `ONLINE` and 2 & 3 may lose their mapping. But the pattern
> remains... cannot understand why.
>
>
> *ExternalView for _mm:root:_system:cron1:*{
>   "id" : "_mm:root:_system:cron1",
>   "simpleFields" : {
>     "BUCKET_SIZE" : "0",
>     "DELAY_REBALANCE_ENABLED" : "true",
>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>     "NUM_PARTITIONS" : "1",
>     "REBALANCER_CLASS_NAME" :
> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>     "REBALANCE_DELAY" : "10000",
>     "REBALANCE_MODE" : "FULL_AUTO",
>     "REPLICAS" : "1",
>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>   },
>   *"mapFields" : { },*
>   "listFields" : { }
> }
>
>
> *ExternalView for _mm:root:_system:cron2:*{
>   "id" : "_mm:root:_system:cron2",
>   "simpleFields" : {
>     "BUCKET_SIZE" : "0",
>     "DELAY_REBALANCE_ENABLED" : "true",
>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>     "NUM_PARTITIONS" : "1",
>     "REBALANCER_CLASS_NAME" :
> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>     "REBALANCE_DELAY" : "10000",
>     "REBALANCE_MODE" : "FULL_AUTO",
>     "REPLICAS" : "1",
>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>   },
>
>
>
>
> *  "mapFields" : {    "_mm:root:_system:cron2_0" : {
> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
>   "listFields" : { }
> }
>
>
> *ExternalView for _mm:root:_system:cron3:*{
>   "id" : "_mm:root:_system:cron3",
>   "simpleFields" : {
>     "BUCKET_SIZE" : "0",
>     "DELAY_REBALANCE_ENABLED" : "true",
>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>     "NUM_PARTITIONS" : "1",
>     "REBALANCER_CLASS_NAME" :
> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>     "REBALANCE_DELAY" : "10000",
>     "REBALANCE_MODE" : "FULL_AUTO",
>     "REPLICAS" : "1",
>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>   },
>   *"mapFields" : { },*
>   "listFields" : { }
> }
>
>
> *ExternalView for _mm:root:_system:cron4:*{
>   "id" : "_mm:root:_system:cron4",
>   "simpleFields" : {
>     "BUCKET_SIZE" : "0",
>     "DELAY_REBALANCE_ENABLED" : "true",
>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>     "NUM_PARTITIONS" : "1",
>     "REBALANCER_CLASS_NAME" :
> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>     "REBALANCE_DELAY" : "10000",
>     "REBALANCE_MODE" : "FULL_AUTO",
>     "REPLICAS" : "1",
>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>   },
>
>
>
>
> *  "mapFields" : {    "_mm:root:_system:cron4_0" : {
> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
>   "listFields" : { }
> }
>
> Thanks,
> Grainier Perera.
>
>
> On Sat, 18 Jun 2022 at 13:21, Junkai Xue <[email protected]> wrote:
>
>> Then most likely, it caused by this entry of config:
>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>> Usually, we never set this config up. It restricts the assignment for
>> instance. So now you already have one partition from 3_0 assigned. No other
>> partition can be assigned.
>>
>> So either you remove this entry of config setup or add more instances may
>> help.
>>
>> Please let us know if you have further questions.
>>
>> best,
>>
>> Junkai
>>
>> On Fri, Jun 17, 2022 at 11:38 PM Grainier Perera <[email protected]>
>> wrote:
>>
>>> Hi Junkai,
>>>
>>> - Correct. I haven't added any rack-aware information.
>>> - I'm connecting 1 instance at the startup and then expanding on-demand
>>> (I've set ALLOW_PARTICIPANT_AUTO_JOIN to true).
>>> - I've checked the live instances and other znodes in Zookeeper.
>>> Everything looks ok, except
>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2 has empty `mapFields`
>>> while /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3 has `mapFields`
>>> with a ONLINE record. I still cannot understand why? and what I'm doing
>>> wrong :(
>>>
>>>
>>> *[zk: localhost:2181(CONNECTED) 18] get
>>> /C8CEPCluster/CONFIGS/CLUSTER/C8CEPCluster*{
>>>   "id" : "C8CEPCluster",
>>>   "simpleFields" : {
>>>     "allowParticipantAutoJoin" : "true"
>>>   },
>>>   "mapFields" : {
>>>     "DEFAULT_INSTANCE_CAPACITY_MAP" : {
>>>       "MEMORY" : "100",
>>>       "CPU" : "100"
>>>     },
>>>     "DEFAULT_PARTITION_WEIGHT_MAP" : {
>>>       "MEMORY" : "5",
>>>       "CPU" : "5"
>>>     }
>>>   },
>>>   "listFields" : {
>>>     "INSTANCE_CAPACITY_KEYS" : [ "CPU", "MEMORY" ]
>>>   }
>>> }
>>>
>>> *[zk: localhost:2181(CONNECTED) 8] get
>>> /C8CEPCluster/LIVEINSTANCES/c8cep-0.c8cep.c8.svc.cluster.local_12000*{
>>>   "id" : "c8cep-0.c8cep.c8.svc.cluster.local_12000",
>>>   "simpleFields" : {
>>>     "CURRENT_TASK_THREAD_POOL_SIZE" : "40",
>>>     "HELIX_VERSION" : "1.0.4",
>>>     "LIVE_INSTANCE" : "[email protected]",
>>>     "SESSION_ID" : "106a30539a8003e"
>>>   },
>>>   "mapFields" : { },
>>>   "listFields" : { }
>>> }
>>> [zk: localhost:2181(CONNECTED) 26] get
>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron2
>>> {
>>>   "id" : "_mm:root:_system:cron2",
>>>   "simpleFields" : { },
>>>   "mapFields" : {
>>>     "PARTITION_CAPACITY_MAP" : {
>>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>>     }
>>>   },
>>>   "listFields" : { }
>>> }
>>>
>>> *[zk: localhost:2181(CONNECTED) 27] get
>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron3*{
>>>   "id" : "_mm:root:_system:cron3",
>>>   "simpleFields" : { },
>>>   "mapFields" : {
>>>     "PARTITION_CAPACITY_MAP" : {
>>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>>     }
>>>   },
>>>   "listFields" : { }
>>> }
>>>
>>> *[zk: localhost:2181(CONNECTED) 38] get
>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron2*{
>>>   "id" : "_mm:root:_system:cron2",
>>>   "simpleFields" : {
>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>     "NUM_PARTITIONS" : "1",
>>>     "REBALANCER_CLASS_NAME" :
>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>     "REBALANCE_DELAY" : "10000",
>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>     "REPLICAS" : "1",
>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>   },
>>>   "mapFields" : {
>>>     "_mm:root:_system:cron2_0" : { }
>>>   },
>>>   "listFields" : {
>>>     "_mm:root:_system:cron2_0" : [ ]
>>>   }
>>> }
>>>
>>> *[zk: localhost:2181(CONNECTED) 39] get
>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron3*{
>>>   "id" : "_mm:root:_system:cron3",
>>>   "simpleFields" : {
>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>     "NUM_PARTITIONS" : "1",
>>>     "REBALANCER_CLASS_NAME" :
>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>     "REBALANCE_DELAY" : "10000",
>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>     "REPLICAS" : "1",
>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>   },
>>>   "mapFields" : {
>>>     "_mm:root:_system:cron3_0" : { }
>>>   },
>>>   "listFields" : {
>>>     "_mm:root:_system:cron3_0" : [ ]
>>>   }
>>> }
>>>
>>> *[zk: localhost:2181(CONNECTED) 42] get
>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2*{
>>>   "id" : "_mm:root:_system:cron2",
>>>   "simpleFields" : {
>>>     "BUCKET_SIZE" : "0",
>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>     "NUM_PARTITIONS" : "1",
>>>     "REBALANCER_CLASS_NAME" :
>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>     "REBALANCE_DELAY" : "10000",
>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>     "REPLICAS" : "1",
>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>   },
>>>   *"mapFields" : { },*
>>>   "listFields" : { }
>>> }
>>>
>>> *[zk: localhost:2181(CONNECTED) 43] get
>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3*{
>>>   "id" : "_mm:root:_system:cron3",
>>>   "simpleFields" : {
>>>     "BUCKET_SIZE" : "0",
>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>     "NUM_PARTITIONS" : "1",
>>>     "REBALANCER_CLASS_NAME" :
>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>     "REBALANCE_DELAY" : "10000",
>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>     "REPLICAS" : "1",
>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>   },
>>>
>>>
>>>
>>>
>>> *"mapFields" : {    "_mm:root:_system:cron3_0" : {
>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  }*,
>>>   "listFields" : { }
>>> }
>>>
>>> Thank you.
>>> Grainier Perera.
>>>
>>>
>>> On Sat, 18 Jun 2022 at 10:45, Junkai Xue <[email protected]> wrote:
>>>
>>>> OK. So you dont put any rackaware information. Then how many instances
>>>> do you have connecting to that cluster? Please double check the live
>>>> instances in Zookeeper as well.
>>>>
>>>> Best,
>>>>
>>>> Junkai
>>>>
>>>> On Fri, Jun 17, 2022 at 10:01 PM Grainier Perera <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Junkai,
>>>>>
>>>>> I've added cluster init code to the gist [1]. Apart from that,
>>>>> ClusterConfig is configured like this;
>>>>>
>>>>>             ClusterConfig clusterConfig =
>>>>> configAccessor.getClusterConfig(CLUSTER_NAME);
>>>>>             // Configuring the capacity keys in the Cluster Config.
>>>>> For example, MEMORY.
>>>>>
>>>>> clusterConfig.setInstanceCapacityKeys(INSTANCE_CAPACITY_KEYS);
>>>>>             // Configuring the instance capacity in the Instance
>>>>> Config. For example, MEMORY = 100.
>>>>>
>>>>> clusterConfig.setDefaultInstanceCapacityMap(INSTANCE_CAPACITY);
>>>>>             // Configuring the partition weight in the Resource
>>>>> Config. For example, MEMORY = 5.
>>>>>
>>>>> clusterConfig.setDefaultPartitionWeightMap(DEFAULT_RESOURCE_USAGE);
>>>>>             configAccessor.setClusterConfig(CLUSTER_NAME,
>>>>> clusterConfig);
>>>>>
>>>>> [1]
>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb#file-clustersetup-java
>>>>>
>>>>> Thanks,
>>>>> Grainier Perera.
>>>>>
>>>>>
>>>>> On Sat, 18 Jun 2022 at 10:00, Junkai Xue <[email protected]> wrote:
>>>>>
>>>>>> Could you please share your cluster config as well?
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Junkai
>>>>>>
>>>>>> On Fri, Jun 17, 2022 at 8:24 PM Grainier Perera <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Devs,
>>>>>>>
>>>>>>> I'm trying to add several resources to the cluster using the
>>>>>>> following configurations[1]. However, only some will become `ONLINE`. 
>>>>>>> What
>>>>>>> could be the reason? Is there a way to guarantee every resource will 
>>>>>>> become
>>>>>>> `ONLINE` if WAGED capacity constraints are met?
>>>>>>>
>>>>>>> You can see with the same IdealState, "_mm:root:_system:cron3" has
>>>>>>> mapFields and it is ONLINE, and "_mm:root:_system:cron2" is not.
>>>>>>> Furthermore, I see this behavior more often when the replicas count is 
>>>>>>> set
>>>>>>> to 1.
>>>>>>>
>>>>>>> ResourceInfo:
>>>>>>> 1. "_mm:root:_system:cron2"
>>>>>>>
>>>>>>> IdealState for _mm:root:_system:cron2:
>>>>>>> {
>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>   "simpleFields" : {
>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>     "REPLICAS" : "1",
>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>   },
>>>>>>>   "mapFields" : {
>>>>>>>     "_mm:root:_system:cron2_0" : { }
>>>>>>>   },
>>>>>>>   "listFields" : {
>>>>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> ExternalView for _mm:root:_system:cron2:
>>>>>>> {
>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>   "simpleFields" : {
>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>     "REPLICAS" : "1",
>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>   },
>>>>>>>   *"mapFields" : { },*
>>>>>>>   "listFields" : { }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> 2. "_mm:root:_system:cron3"
>>>>>>>
>>>>>>> IdealState for _mm:root:_system:cron3:
>>>>>>> {
>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>   "simpleFields" : {
>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>     "REPLICAS" : "1",
>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>   },
>>>>>>>   "mapFields" : {
>>>>>>>     "_mm:root:_system:cron3_0" : { }
>>>>>>>   },
>>>>>>>   "listFields" : {
>>>>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> ExternalView for _mm:root:_system:cron3:
>>>>>>> {
>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>   "simpleFields" : {
>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>     "REPLICAS" : "1",
>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>   },
>>>>>>>   *"mapFields" : {*
>>>>>>> *    "_mm:root:_system:cron3_0" : {*
>>>>>>> *      "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"*
>>>>>>> *    }*
>>>>>>> *  },*
>>>>>>>   "listFields" : { }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> [1]:
>>>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb
>>>>>>>
>>>>>>> Thank you.
>>>>>>> Grainier Perera.
>>>>>>>
>>>>>>
>>
>> --
>> Junkai Xue
>>
>

Reply via email to