BTW, have you setup proper capacity in InstanceConfig of the only instance?

Best,

Junkai

On Sat, Jun 18, 2022 at 7:10 PM Junkai Xue <[email protected]> wrote:

> Interesting. Is this reproducible? We can have a try on your data.
>
> Best,
>
> Junkai
>
> On Sat, Jun 18, 2022 at 4:31 AM Grainier Perera <[email protected]>
> wrote:
>
>> Hi Junkai,
>>
>> I tried removing `MAX_PARTITIONS_PER_INSTANCE`. But it's still the same.
>> What's weird is, when I add a few resources, I see some of them still not
>> getting into the `ONLINE` state. In the below sample, you can see only the
>> 2nd and 4th resources have proper `mapFields`, whereas the 1st and 3rd
>> resources don't seem to have any mapping (all of them have the
>> same IdealState). However, after a restart, this can change to 1 & 3
>> becomes `ONLINE` and 2 & 3 may lose their mapping. But the pattern
>> remains... cannot understand why.
>>
>>
>> *ExternalView for _mm:root:_system:cron1:*{
>>   "id" : "_mm:root:_system:cron1",
>>   "simpleFields" : {
>>     "BUCKET_SIZE" : "0",
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>   },
>>   *"mapFields" : { },*
>>   "listFields" : { }
>> }
>>
>>
>> *ExternalView for _mm:root:_system:cron2:*{
>>   "id" : "_mm:root:_system:cron2",
>>   "simpleFields" : {
>>     "BUCKET_SIZE" : "0",
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>   },
>>
>>
>>
>>
>> *  "mapFields" : {    "_mm:root:_system:cron2_0" : {
>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
>>   "listFields" : { }
>> }
>>
>>
>> *ExternalView for _mm:root:_system:cron3:*{
>>   "id" : "_mm:root:_system:cron3",
>>   "simpleFields" : {
>>     "BUCKET_SIZE" : "0",
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>   },
>>   *"mapFields" : { },*
>>   "listFields" : { }
>> }
>>
>>
>> *ExternalView for _mm:root:_system:cron4:*{
>>   "id" : "_mm:root:_system:cron4",
>>   "simpleFields" : {
>>     "BUCKET_SIZE" : "0",
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>   },
>>
>>
>>
>>
>> *  "mapFields" : {    "_mm:root:_system:cron4_0" : {
>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
>>   "listFields" : { }
>> }
>>
>> Thanks,
>> Grainier Perera.
>>
>>
>> On Sat, 18 Jun 2022 at 13:21, Junkai Xue <[email protected]> wrote:
>>
>>> Then most likely, it caused by this entry of config:
>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>> Usually, we never set this config up. It restricts the assignment for
>>> instance. So now you already have one partition from 3_0 assigned. No other
>>> partition can be assigned.
>>>
>>> So either you remove this entry of config setup or add more instances
>>> may help.
>>>
>>> Please let us know if you have further questions.
>>>
>>> best,
>>>
>>> Junkai
>>>
>>> On Fri, Jun 17, 2022 at 11:38 PM Grainier Perera <[email protected]>
>>> wrote:
>>>
>>>> Hi Junkai,
>>>>
>>>> - Correct. I haven't added any rack-aware information.
>>>> - I'm connecting 1 instance at the startup and then expanding on-demand
>>>> (I've set ALLOW_PARTICIPANT_AUTO_JOIN to true).
>>>> - I've checked the live instances and other znodes in Zookeeper.
>>>> Everything looks ok, except
>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2 has empty
>>>> `mapFields` while /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3
>>>> has `mapFields` with a ONLINE record. I still cannot understand why? and
>>>> what I'm doing wrong :(
>>>>
>>>>
>>>> *[zk: localhost:2181(CONNECTED) 18] get
>>>> /C8CEPCluster/CONFIGS/CLUSTER/C8CEPCluster*{
>>>>   "id" : "C8CEPCluster",
>>>>   "simpleFields" : {
>>>>     "allowParticipantAutoJoin" : "true"
>>>>   },
>>>>   "mapFields" : {
>>>>     "DEFAULT_INSTANCE_CAPACITY_MAP" : {
>>>>       "MEMORY" : "100",
>>>>       "CPU" : "100"
>>>>     },
>>>>     "DEFAULT_PARTITION_WEIGHT_MAP" : {
>>>>       "MEMORY" : "5",
>>>>       "CPU" : "5"
>>>>     }
>>>>   },
>>>>   "listFields" : {
>>>>     "INSTANCE_CAPACITY_KEYS" : [ "CPU", "MEMORY" ]
>>>>   }
>>>> }
>>>>
>>>> *[zk: localhost:2181(CONNECTED) 8] get
>>>> /C8CEPCluster/LIVEINSTANCES/c8cep-0.c8cep.c8.svc.cluster.local_12000*{
>>>>   "id" : "c8cep-0.c8cep.c8.svc.cluster.local_12000",
>>>>   "simpleFields" : {
>>>>     "CURRENT_TASK_THREAD_POOL_SIZE" : "40",
>>>>     "HELIX_VERSION" : "1.0.4",
>>>>     "LIVE_INSTANCE" : "[email protected]",
>>>>     "SESSION_ID" : "106a30539a8003e"
>>>>   },
>>>>   "mapFields" : { },
>>>>   "listFields" : { }
>>>> }
>>>> [zk: localhost:2181(CONNECTED) 26] get
>>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron2
>>>> {
>>>>   "id" : "_mm:root:_system:cron2",
>>>>   "simpleFields" : { },
>>>>   "mapFields" : {
>>>>     "PARTITION_CAPACITY_MAP" : {
>>>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>>>     }
>>>>   },
>>>>   "listFields" : { }
>>>> }
>>>>
>>>> *[zk: localhost:2181(CONNECTED) 27] get
>>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron3*{
>>>>   "id" : "_mm:root:_system:cron3",
>>>>   "simpleFields" : { },
>>>>   "mapFields" : {
>>>>     "PARTITION_CAPACITY_MAP" : {
>>>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>>>     }
>>>>   },
>>>>   "listFields" : { }
>>>> }
>>>>
>>>> *[zk: localhost:2181(CONNECTED) 38] get
>>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron2*{
>>>>   "id" : "_mm:root:_system:cron2",
>>>>   "simpleFields" : {
>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>     "NUM_PARTITIONS" : "1",
>>>>     "REBALANCER_CLASS_NAME" :
>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>     "REBALANCE_DELAY" : "10000",
>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>     "REPLICAS" : "1",
>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>   },
>>>>   "mapFields" : {
>>>>     "_mm:root:_system:cron2_0" : { }
>>>>   },
>>>>   "listFields" : {
>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>   }
>>>> }
>>>>
>>>> *[zk: localhost:2181(CONNECTED) 39] get
>>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron3*{
>>>>   "id" : "_mm:root:_system:cron3",
>>>>   "simpleFields" : {
>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>     "NUM_PARTITIONS" : "1",
>>>>     "REBALANCER_CLASS_NAME" :
>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>     "REBALANCE_DELAY" : "10000",
>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>     "REPLICAS" : "1",
>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>   },
>>>>   "mapFields" : {
>>>>     "_mm:root:_system:cron3_0" : { }
>>>>   },
>>>>   "listFields" : {
>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>   }
>>>> }
>>>>
>>>> *[zk: localhost:2181(CONNECTED) 42] get
>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2*{
>>>>   "id" : "_mm:root:_system:cron2",
>>>>   "simpleFields" : {
>>>>     "BUCKET_SIZE" : "0",
>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>     "NUM_PARTITIONS" : "1",
>>>>     "REBALANCER_CLASS_NAME" :
>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>     "REBALANCE_DELAY" : "10000",
>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>     "REPLICAS" : "1",
>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>   },
>>>>   *"mapFields" : { },*
>>>>   "listFields" : { }
>>>> }
>>>>
>>>> *[zk: localhost:2181(CONNECTED) 43] get
>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3*{
>>>>   "id" : "_mm:root:_system:cron3",
>>>>   "simpleFields" : {
>>>>     "BUCKET_SIZE" : "0",
>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>     "NUM_PARTITIONS" : "1",
>>>>     "REBALANCER_CLASS_NAME" :
>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>     "REBALANCE_DELAY" : "10000",
>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>     "REPLICAS" : "1",
>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>   },
>>>>
>>>>
>>>>
>>>>
>>>> *"mapFields" : {    "_mm:root:_system:cron3_0" : {
>>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  }*,
>>>>   "listFields" : { }
>>>> }
>>>>
>>>> Thank you.
>>>> Grainier Perera.
>>>>
>>>>
>>>> On Sat, 18 Jun 2022 at 10:45, Junkai Xue <[email protected]> wrote:
>>>>
>>>>> OK. So you dont put any rackaware information. Then how many instances
>>>>> do you have connecting to that cluster? Please double check the live
>>>>> instances in Zookeeper as well.
>>>>>
>>>>> Best,
>>>>>
>>>>> Junkai
>>>>>
>>>>> On Fri, Jun 17, 2022 at 10:01 PM Grainier Perera <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Junkai,
>>>>>>
>>>>>> I've added cluster init code to the gist [1]. Apart from that,
>>>>>> ClusterConfig is configured like this;
>>>>>>
>>>>>>             ClusterConfig clusterConfig =
>>>>>> configAccessor.getClusterConfig(CLUSTER_NAME);
>>>>>>             // Configuring the capacity keys in the Cluster Config.
>>>>>> For example, MEMORY.
>>>>>>
>>>>>> clusterConfig.setInstanceCapacityKeys(INSTANCE_CAPACITY_KEYS);
>>>>>>             // Configuring the instance capacity in the Instance
>>>>>> Config. For example, MEMORY = 100.
>>>>>>
>>>>>> clusterConfig.setDefaultInstanceCapacityMap(INSTANCE_CAPACITY);
>>>>>>             // Configuring the partition weight in the Resource
>>>>>> Config. For example, MEMORY = 5.
>>>>>>
>>>>>> clusterConfig.setDefaultPartitionWeightMap(DEFAULT_RESOURCE_USAGE);
>>>>>>             configAccessor.setClusterConfig(CLUSTER_NAME,
>>>>>> clusterConfig);
>>>>>>
>>>>>> [1]
>>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb#file-clustersetup-java
>>>>>>
>>>>>> Thanks,
>>>>>> Grainier Perera.
>>>>>>
>>>>>>
>>>>>> On Sat, 18 Jun 2022 at 10:00, Junkai Xue <[email protected]> wrote:
>>>>>>
>>>>>>> Could you please share your cluster config as well?
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Junkai
>>>>>>>
>>>>>>> On Fri, Jun 17, 2022 at 8:24 PM Grainier Perera <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Devs,
>>>>>>>>
>>>>>>>> I'm trying to add several resources to the cluster using the
>>>>>>>> following configurations[1]. However, only some will become `ONLINE`. 
>>>>>>>> What
>>>>>>>> could be the reason? Is there a way to guarantee every resource will 
>>>>>>>> become
>>>>>>>> `ONLINE` if WAGED capacity constraints are met?
>>>>>>>>
>>>>>>>> You can see with the same IdealState, "_mm:root:_system:cron3" has
>>>>>>>> mapFields and it is ONLINE, and "_mm:root:_system:cron2" is not.
>>>>>>>> Furthermore, I see this behavior more often when the replicas count is 
>>>>>>>> set
>>>>>>>> to 1.
>>>>>>>>
>>>>>>>> ResourceInfo:
>>>>>>>> 1. "_mm:root:_system:cron2"
>>>>>>>>
>>>>>>>> IdealState for _mm:root:_system:cron2:
>>>>>>>> {
>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>   "simpleFields" : {
>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>   },
>>>>>>>>   "mapFields" : {
>>>>>>>>     "_mm:root:_system:cron2_0" : { }
>>>>>>>>   },
>>>>>>>>   "listFields" : {
>>>>>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>>>>>   }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> ExternalView for _mm:root:_system:cron2:
>>>>>>>> {
>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>   "simpleFields" : {
>>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>   },
>>>>>>>>   *"mapFields" : { },*
>>>>>>>>   "listFields" : { }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> 2. "_mm:root:_system:cron3"
>>>>>>>>
>>>>>>>> IdealState for _mm:root:_system:cron3:
>>>>>>>> {
>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>   "simpleFields" : {
>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>   },
>>>>>>>>   "mapFields" : {
>>>>>>>>     "_mm:root:_system:cron3_0" : { }
>>>>>>>>   },
>>>>>>>>   "listFields" : {
>>>>>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>>>>>   }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> ExternalView for _mm:root:_system:cron3:
>>>>>>>> {
>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>   "simpleFields" : {
>>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>   },
>>>>>>>>   *"mapFields" : {*
>>>>>>>> *    "_mm:root:_system:cron3_0" : {*
>>>>>>>> *      "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"*
>>>>>>>> *    }*
>>>>>>>> *  },*
>>>>>>>>   "listFields" : { }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> [1]:
>>>>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>> Grainier Perera.
>>>>>>>>
>>>>>>>
>>>
>>> --
>>> Junkai Xue
>>>
>>

Reply via email to