Hi Junkai,

I tried removing `MAX_PARTITIONS_PER_INSTANCE`. But it's still the same.
What's weird is, when I add a few resources, I see some of them still not
getting into the `ONLINE` state. In the below sample, you can see only the
2nd and 4th resources have proper `mapFields`, whereas the 1st and 3rd
resources don't seem to have any mapping (all of them have the
same IdealState). However, after a restart, this can change to 1 & 3
becomes `ONLINE` and 2 & 3 may lose their mapping. But the pattern
remains... cannot understand why.


*ExternalView for _mm:root:_system:cron1:*{
  "id" : "_mm:root:_system:cron1",
  "simpleFields" : {
    "BUCKET_SIZE" : "0",
    "DELAY_REBALANCE_ENABLED" : "true",
    "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
    "NUM_PARTITIONS" : "1",
    "REBALANCER_CLASS_NAME" :
"org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
    "REBALANCE_DELAY" : "10000",
    "REBALANCE_MODE" : "FULL_AUTO",
    "REPLICAS" : "1",
    "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
  },
  *"mapFields" : { },*
  "listFields" : { }
}


*ExternalView for _mm:root:_system:cron2:*{
  "id" : "_mm:root:_system:cron2",
  "simpleFields" : {
    "BUCKET_SIZE" : "0",
    "DELAY_REBALANCE_ENABLED" : "true",
    "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
    "NUM_PARTITIONS" : "1",
    "REBALANCER_CLASS_NAME" :
"org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
    "REBALANCE_DELAY" : "10000",
    "REBALANCE_MODE" : "FULL_AUTO",
    "REPLICAS" : "1",
    "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
  },




*  "mapFields" : {    "_mm:root:_system:cron2_0" : {
"c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
  "listFields" : { }
}


*ExternalView for _mm:root:_system:cron3:*{
  "id" : "_mm:root:_system:cron3",
  "simpleFields" : {
    "BUCKET_SIZE" : "0",
    "DELAY_REBALANCE_ENABLED" : "true",
    "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
    "NUM_PARTITIONS" : "1",
    "REBALANCER_CLASS_NAME" :
"org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
    "REBALANCE_DELAY" : "10000",
    "REBALANCE_MODE" : "FULL_AUTO",
    "REPLICAS" : "1",
    "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
  },
  *"mapFields" : { },*
  "listFields" : { }
}


*ExternalView for _mm:root:_system:cron4:*{
  "id" : "_mm:root:_system:cron4",
  "simpleFields" : {
    "BUCKET_SIZE" : "0",
    "DELAY_REBALANCE_ENABLED" : "true",
    "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
    "NUM_PARTITIONS" : "1",
    "REBALANCER_CLASS_NAME" :
"org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
    "REBALANCE_DELAY" : "10000",
    "REBALANCE_MODE" : "FULL_AUTO",
    "REPLICAS" : "1",
    "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
  },




*  "mapFields" : {    "_mm:root:_system:cron4_0" : {
"c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
  "listFields" : { }
}

Thanks,
Grainier Perera.


On Sat, 18 Jun 2022 at 13:21, Junkai Xue <[email protected]> wrote:

> Then most likely, it caused by this entry of config:
>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
> Usually, we never set this config up. It restricts the assignment for
> instance. So now you already have one partition from 3_0 assigned. No other
> partition can be assigned.
>
> So either you remove this entry of config setup or add more instances may
> help.
>
> Please let us know if you have further questions.
>
> best,
>
> Junkai
>
> On Fri, Jun 17, 2022 at 11:38 PM Grainier Perera <[email protected]>
> wrote:
>
>> Hi Junkai,
>>
>> - Correct. I haven't added any rack-aware information.
>> - I'm connecting 1 instance at the startup and then expanding on-demand
>> (I've set ALLOW_PARTICIPANT_AUTO_JOIN to true).
>> - I've checked the live instances and other znodes in Zookeeper.
>> Everything looks ok, except
>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2 has empty `mapFields`
>> while /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3 has `mapFields`
>> with a ONLINE record. I still cannot understand why? and what I'm doing
>> wrong :(
>>
>>
>> *[zk: localhost:2181(CONNECTED) 18] get
>> /C8CEPCluster/CONFIGS/CLUSTER/C8CEPCluster*{
>>   "id" : "C8CEPCluster",
>>   "simpleFields" : {
>>     "allowParticipantAutoJoin" : "true"
>>   },
>>   "mapFields" : {
>>     "DEFAULT_INSTANCE_CAPACITY_MAP" : {
>>       "MEMORY" : "100",
>>       "CPU" : "100"
>>     },
>>     "DEFAULT_PARTITION_WEIGHT_MAP" : {
>>       "MEMORY" : "5",
>>       "CPU" : "5"
>>     }
>>   },
>>   "listFields" : {
>>     "INSTANCE_CAPACITY_KEYS" : [ "CPU", "MEMORY" ]
>>   }
>> }
>>
>> *[zk: localhost:2181(CONNECTED) 8] get
>> /C8CEPCluster/LIVEINSTANCES/c8cep-0.c8cep.c8.svc.cluster.local_12000*{
>>   "id" : "c8cep-0.c8cep.c8.svc.cluster.local_12000",
>>   "simpleFields" : {
>>     "CURRENT_TASK_THREAD_POOL_SIZE" : "40",
>>     "HELIX_VERSION" : "1.0.4",
>>     "LIVE_INSTANCE" : "[email protected]",
>>     "SESSION_ID" : "106a30539a8003e"
>>   },
>>   "mapFields" : { },
>>   "listFields" : { }
>> }
>> [zk: localhost:2181(CONNECTED) 26] get
>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron2
>> {
>>   "id" : "_mm:root:_system:cron2",
>>   "simpleFields" : { },
>>   "mapFields" : {
>>     "PARTITION_CAPACITY_MAP" : {
>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>     }
>>   },
>>   "listFields" : { }
>> }
>>
>> *[zk: localhost:2181(CONNECTED) 27] get
>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron3*{
>>   "id" : "_mm:root:_system:cron3",
>>   "simpleFields" : { },
>>   "mapFields" : {
>>     "PARTITION_CAPACITY_MAP" : {
>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>     }
>>   },
>>   "listFields" : { }
>> }
>>
>> *[zk: localhost:2181(CONNECTED) 38] get
>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron2*{
>>   "id" : "_mm:root:_system:cron2",
>>   "simpleFields" : {
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>   },
>>   "mapFields" : {
>>     "_mm:root:_system:cron2_0" : { }
>>   },
>>   "listFields" : {
>>     "_mm:root:_system:cron2_0" : [ ]
>>   }
>> }
>>
>> *[zk: localhost:2181(CONNECTED) 39] get
>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron3*{
>>   "id" : "_mm:root:_system:cron3",
>>   "simpleFields" : {
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>   },
>>   "mapFields" : {
>>     "_mm:root:_system:cron3_0" : { }
>>   },
>>   "listFields" : {
>>     "_mm:root:_system:cron3_0" : [ ]
>>   }
>> }
>>
>> *[zk: localhost:2181(CONNECTED) 42] get
>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2*{
>>   "id" : "_mm:root:_system:cron2",
>>   "simpleFields" : {
>>     "BUCKET_SIZE" : "0",
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>   },
>>   *"mapFields" : { },*
>>   "listFields" : { }
>> }
>>
>> *[zk: localhost:2181(CONNECTED) 43] get
>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3*{
>>   "id" : "_mm:root:_system:cron3",
>>   "simpleFields" : {
>>     "BUCKET_SIZE" : "0",
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>   },
>>
>>
>>
>>
>> *"mapFields" : {    "_mm:root:_system:cron3_0" : {
>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  }*,
>>   "listFields" : { }
>> }
>>
>> Thank you.
>> Grainier Perera.
>>
>>
>> On Sat, 18 Jun 2022 at 10:45, Junkai Xue <[email protected]> wrote:
>>
>>> OK. So you dont put any rackaware information. Then how many instances
>>> do you have connecting to that cluster? Please double check the live
>>> instances in Zookeeper as well.
>>>
>>> Best,
>>>
>>> Junkai
>>>
>>> On Fri, Jun 17, 2022 at 10:01 PM Grainier Perera <[email protected]>
>>> wrote:
>>>
>>>> Hi Junkai,
>>>>
>>>> I've added cluster init code to the gist [1]. Apart from that,
>>>> ClusterConfig is configured like this;
>>>>
>>>>             ClusterConfig clusterConfig =
>>>> configAccessor.getClusterConfig(CLUSTER_NAME);
>>>>             // Configuring the capacity keys in the Cluster Config. For
>>>> example, MEMORY.
>>>>
>>>> clusterConfig.setInstanceCapacityKeys(INSTANCE_CAPACITY_KEYS);
>>>>             // Configuring the instance capacity in the Instance
>>>> Config. For example, MEMORY = 100.
>>>>
>>>> clusterConfig.setDefaultInstanceCapacityMap(INSTANCE_CAPACITY);
>>>>             // Configuring the partition weight in the Resource Config.
>>>> For example, MEMORY = 5.
>>>>
>>>> clusterConfig.setDefaultPartitionWeightMap(DEFAULT_RESOURCE_USAGE);
>>>>             configAccessor.setClusterConfig(CLUSTER_NAME,
>>>> clusterConfig);
>>>>
>>>> [1]
>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb#file-clustersetup-java
>>>>
>>>> Thanks,
>>>> Grainier Perera.
>>>>
>>>>
>>>> On Sat, 18 Jun 2022 at 10:00, Junkai Xue <[email protected]> wrote:
>>>>
>>>>> Could you please share your cluster config as well?
>>>>>
>>>>> Best,
>>>>>
>>>>> Junkai
>>>>>
>>>>> On Fri, Jun 17, 2022 at 8:24 PM Grainier Perera <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Devs,
>>>>>>
>>>>>> I'm trying to add several resources to the cluster using the
>>>>>> following configurations[1]. However, only some will become `ONLINE`. 
>>>>>> What
>>>>>> could be the reason? Is there a way to guarantee every resource will 
>>>>>> become
>>>>>> `ONLINE` if WAGED capacity constraints are met?
>>>>>>
>>>>>> You can see with the same IdealState, "_mm:root:_system:cron3" has
>>>>>> mapFields and it is ONLINE, and "_mm:root:_system:cron2" is not.
>>>>>> Furthermore, I see this behavior more often when the replicas count is 
>>>>>> set
>>>>>> to 1.
>>>>>>
>>>>>> ResourceInfo:
>>>>>> 1. "_mm:root:_system:cron2"
>>>>>>
>>>>>> IdealState for _mm:root:_system:cron2:
>>>>>> {
>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>   "simpleFields" : {
>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>     "REPLICAS" : "1",
>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>   },
>>>>>>   "mapFields" : {
>>>>>>     "_mm:root:_system:cron2_0" : { }
>>>>>>   },
>>>>>>   "listFields" : {
>>>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> ExternalView for _mm:root:_system:cron2:
>>>>>> {
>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>   "simpleFields" : {
>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>     "REPLICAS" : "1",
>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>   },
>>>>>>   *"mapFields" : { },*
>>>>>>   "listFields" : { }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> 2. "_mm:root:_system:cron3"
>>>>>>
>>>>>> IdealState for _mm:root:_system:cron3:
>>>>>> {
>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>   "simpleFields" : {
>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>     "REPLICAS" : "1",
>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>   },
>>>>>>   "mapFields" : {
>>>>>>     "_mm:root:_system:cron3_0" : { }
>>>>>>   },
>>>>>>   "listFields" : {
>>>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> ExternalView for _mm:root:_system:cron3:
>>>>>> {
>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>   "simpleFields" : {
>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>     "REPLICAS" : "1",
>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>   },
>>>>>>   *"mapFields" : {*
>>>>>> *    "_mm:root:_system:cron3_0" : {*
>>>>>> *      "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"*
>>>>>> *    }*
>>>>>> *  },*
>>>>>>   "listFields" : { }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> [1]:
>>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb
>>>>>>
>>>>>> Thank you.
>>>>>> Grainier Perera.
>>>>>>
>>>>>
>
> --
> Junkai Xue
>

Reply via email to