Hi Wei, I’ve finally identified two VMs that are constantly causing the CPU overcommit ratio to be recreated, which prevents the host from rejoining the management server. I deleted the offending VMs and recreated them from a template.
Regards, Levin On 1 Jul 2025 at 01:30 +0800, Levin Ng <levindec...@gmail.com>, wrote: > Hi Wei, > > Today after restart management server, I got same error for the same host > rejoined last time, do you have any hint? > > 2025-07-01 01:11:19,119 ERROR [c.c.a.m.ClusteredAgentManagerImpl] > (AgentConnectTaskPool-126:[ctx-ef0bee48]) (logid:08898435) Monitor > ComputeCapacityListener says there is an error in the connect process for 125 > due to Duplicate key cpuOvercommitRatio (attempted merging values 12 and 12) > java.lang.IllegalStateException: Duplicate key cpuOvercommitRatio (attempted > merging values 12 and 12) > > Regards, > Levin > > On 28 Jun 2025 at 16:49 +0800, Levin Ng <levindec...@gmail.com>, wrote: > > Hi Wei, > > > > I did search the user_vm_details and vm_instance tables with the host_id, > > but I couldn’t find any duplicate records. I just shut down the running VMs > > on those hosts, removed the hosts, and let the agent re-join the ACS. The > > problem is gone, thanks to your help again! It’s been really frustrating > > with the recent ACS upgrade. > > > > Regards, > > Levin > > On 28 Jun 2025 at 16:34 +0800, Wei ZHOU <ustcweiz...@gmail.com>, wrote: > > > can you also check user_vm_details for the VMs running on the host ? > > > > > > > > > -Wei > > > > > > On Sat, Jun 28, 2025 at 10:04 AM Levin Ng <levindec...@gmail.com> wrote: > > > > > > > Hi Wei, > > > > > > > > Thanks again, from the problematic cluster_id 7, it just contains one > > > > cpuOvercommitRatio row, any idea? > > > > > > > > Regads, > > > > Levin > > > > > > > > MariaDB [cloud]> select * from cluster_details; > > > > +----+------------+-----------------------+-------+ > > > > | id | cluster_id | name | value | > > > > +----+------------+-----------------------+-------+ > > > > | 1 | 1 | memoryOvercommitRatio | 1.0 | > > > > | 2 | 1 | cpuOvercommitRatio | 1.0 | > > > > | 3 | 2 | memoryOvercommitRatio | 1.0 | > > > > | 4 | 2 | cpuOvercommitRatio | 1.0 | > > > > | 5 | 3 | memoryOvercommitRatio | 1.0 | > > > > | 6 | 3 | cpuOvercommitRatio | 1.0 | > > > > | 7 | 4 | memoryOvercommitRatio | 1.0 | > > > > | 8 | 4 | cpuOvercommitRatio | 1.0 | > > > > | 9 | 5 | memoryOvercommitRatio | 1.0 | > > > > | 10 | 5 | cpuOvercommitRatio | 1.0 | > > > > | 11 | 6 | memoryOvercommitRatio | 1.0 | > > > > | 12 | 6 | cpuOvercommitRatio | 1.0 | > > > > | 13 | 7 | memoryOvercommitRatio | 1.0 | > > > > | 14 | 7 | cpuOvercommitRatio | 12 | > > > > | 15 | 7 | resourceHAEnabled | false | > > > > | 16 | 8 | memoryOvercommitRatio | 1.3 | > > > > | 17 | 8 | cpuOvercommitRatio | 15.0 | > > > > | 18 | 9 | memoryOvercommitRatio | 1.3 | > > > > | 19 | 9 | cpuOvercommitRatio | 15.0 | > > > > | 20 | 10 | memoryOvercommitRatio | 1.3 | > > > > | 21 | 10 | cpuOvercommitRatio | 15.0 | > > > > | 22 | 11 | memoryOvercommitRatio | 1.0 | > > > > | 23 | 11 | cpuOvercommitRatio | 12.0 | > > > > +----+------------+-----------------------+-------+ > > > > 23 rows in set (0.001 sec) > > > > > > > > MariaDB [cloud]> desc cluster_details; > > > > > > > > +------------+---------------------+------+-----+---------+----------------+ > > > > | Field | Type | Null | Key | Default | Extra | > > > > > > > > +------------+---------------------+------+-----+---------+----------------+ > > > > | id | bigint(20) unsigned | NO | PRI | NULL | auto_increment | > > > > | cluster_id | bigint(20) unsigned | NO | MUL | NULL | | > > > > | name | varchar(255) | NO | MUL | NULL | | > > > > | value | varchar(255) | NO | | NULL | | > > > > > > > > +------------+---------------------+------+-----+---------+----------------+ > > > > 4 rows in set (0.005 sec) > > > > > > > > On 28 Jun 2025 at 15:54 +0800, Wei ZHOU <ustcweiz...@gmail.com>, wrote: > > > > > Hi, > > > > > > > > > > Maybe check cluster_details if there are multiple records with the > > > > > same > > > > > name "cpuOvercommitRatio" for a cluster. > > > > > > > > > > > > > > > -Wei > > > > > > > > > > On Sat, Jun 28, 2025 at 9:37 AM Levin Ng <levindec...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I’m having trouble after 4.20.1 upgrade, some of the existing host > > > > > > are > > > > not > > > > > > able to reconnect ACS management and found some sql error in the > > > > > > log, > > > > > > anyone have idea how to resolve it?, thank you very much. > > > > > > > > > > > > 2025-06-28 15:30:49,259 ERROR [c.c.a.m.ClusteredAgentManagerImpl] > > > > > > (AgentConnectTaskPool-1092:[ctx-99bfb3dd]) (logid:b354f521) Monitor > > > > > > ComputeCapacityListener says there is an error in the connect > > > > > > process > > > > for > > > > > > 110 due to Duplicate key cpuOvercommitRatio (attempted merging > > > > > > values > > > > 12 > > > > > > and 12) java.lang.IllegalStateException: Duplicate key > > > > cpuOvercommitRatio > > > > > > (attempted merging values 12 and 12) > > > > > > > > > > > > > > > > > > Regards, > > > > > > Levin > > > > > > > > > >