Hi Wei,

I’ve finally identified two VMs that are constantly causing the CPU overcommit 
ratio to be recreated, which prevents the host from rejoining the management 
server. I deleted the offending VMs and recreated them from a template.

Regards,
Levin
On 1 Jul 2025 at 01:30 +0800, Levin Ng <levindec...@gmail.com>, wrote:
> Hi Wei,
>
> Today after restart management server, I got same error for the same host 
> rejoined last time,  do you have any hint?
>
> 2025-07-01 01:11:19,119 ERROR [c.c.a.m.ClusteredAgentManagerImpl] 
> (AgentConnectTaskPool-126:[ctx-ef0bee48]) (logid:08898435) Monitor 
> ComputeCapacityListener says there is an error in the connect process for 125 
> due to Duplicate key cpuOvercommitRatio (attempted merging values 12 and 12) 
> java.lang.IllegalStateException: Duplicate key cpuOvercommitRatio (attempted 
> merging values 12 and 12)
>
> Regards,
> Levin
>
> On 28 Jun 2025 at 16:49 +0800, Levin Ng <levindec...@gmail.com>, wrote:
> > Hi Wei,
> >
> > I did search the user_vm_details and vm_instance tables with the host_id, 
> > but I couldn’t find any duplicate records. I just shut down the running VMs 
> > on those hosts, removed the hosts, and let the agent re-join the ACS. The 
> > problem is gone, thanks to your help again! It’s been really frustrating 
> > with the recent ACS upgrade.
> >
> > Regards,
> > Levin
> > On 28 Jun 2025 at 16:34 +0800, Wei ZHOU <ustcweiz...@gmail.com>, wrote:
> > > can you also check user_vm_details for the VMs running on the host ?
> > >
> > >
> > > -Wei
> > >
> > > On Sat, Jun 28, 2025 at 10:04 AM Levin Ng <levindec...@gmail.com> wrote:
> > >
> > > > Hi Wei,
> > > >
> > > > Thanks again, from the problematic cluster_id 7, it just contains one
> > > > cpuOvercommitRatio row, any idea?
> > > >
> > > > Regads,
> > > > Levin
> > > >
> > > > MariaDB [cloud]> select * from cluster_details;
> > > > +----+------------+-----------------------+-------+
> > > > | id | cluster_id | name | value |
> > > > +----+------------+-----------------------+-------+
> > > > | 1 | 1 | memoryOvercommitRatio | 1.0 |
> > > > | 2 | 1 | cpuOvercommitRatio | 1.0 |
> > > > | 3 | 2 | memoryOvercommitRatio | 1.0 |
> > > > | 4 | 2 | cpuOvercommitRatio | 1.0 |
> > > > | 5 | 3 | memoryOvercommitRatio | 1.0 |
> > > > | 6 | 3 | cpuOvercommitRatio | 1.0 |
> > > > | 7 | 4 | memoryOvercommitRatio | 1.0 |
> > > > | 8 | 4 | cpuOvercommitRatio | 1.0 |
> > > > | 9 | 5 | memoryOvercommitRatio | 1.0 |
> > > > | 10 | 5 | cpuOvercommitRatio | 1.0 |
> > > > | 11 | 6 | memoryOvercommitRatio | 1.0 |
> > > > | 12 | 6 | cpuOvercommitRatio | 1.0 |
> > > > | 13 | 7 | memoryOvercommitRatio | 1.0 |
> > > > | 14 | 7 | cpuOvercommitRatio | 12 |
> > > > | 15 | 7 | resourceHAEnabled | false |
> > > > | 16 | 8 | memoryOvercommitRatio | 1.3 |
> > > > | 17 | 8 | cpuOvercommitRatio | 15.0 |
> > > > | 18 | 9 | memoryOvercommitRatio | 1.3 |
> > > > | 19 | 9 | cpuOvercommitRatio | 15.0 |
> > > > | 20 | 10 | memoryOvercommitRatio | 1.3 |
> > > > | 21 | 10 | cpuOvercommitRatio | 15.0 |
> > > > | 22 | 11 | memoryOvercommitRatio | 1.0 |
> > > > | 23 | 11 | cpuOvercommitRatio | 12.0 |
> > > > +----+------------+-----------------------+-------+
> > > > 23 rows in set (0.001 sec)
> > > >
> > > > MariaDB [cloud]> desc cluster_details;
> > > >
> > > > +------------+---------------------+------+-----+---------+----------------+
> > > > | Field | Type | Null | Key | Default | Extra |
> > > >
> > > > +------------+---------------------+------+-----+---------+----------------+
> > > > | id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
> > > > | cluster_id | bigint(20) unsigned | NO | MUL | NULL | |
> > > > | name | varchar(255) | NO | MUL | NULL | |
> > > > | value | varchar(255) | NO | | NULL | |
> > > >
> > > > +------------+---------------------+------+-----+---------+----------------+
> > > > 4 rows in set (0.005 sec)
> > > >
> > > > On 28 Jun 2025 at 15:54 +0800, Wei ZHOU <ustcweiz...@gmail.com>, wrote:
> > > > > Hi,
> > > > >
> > > > > Maybe check cluster_details if there are multiple records with the 
> > > > > same
> > > > > name "cpuOvercommitRatio" for a cluster.
> > > > >
> > > > >
> > > > > -Wei
> > > > >
> > > > > On Sat, Jun 28, 2025 at 9:37 AM Levin Ng <levindec...@gmail.com> 
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I’m having trouble after 4.20.1 upgrade, some of the existing host 
> > > > > > are
> > > > not
> > > > > > able to reconnect ACS management and found some sql error in the 
> > > > > > log,
> > > > > > anyone have idea how to resolve it?, thank you very much.
> > > > > >
> > > > > > 2025-06-28 15:30:49,259 ERROR [c.c.a.m.ClusteredAgentManagerImpl]
> > > > > > (AgentConnectTaskPool-1092:[ctx-99bfb3dd]) (logid:b354f521) Monitor
> > > > > > ComputeCapacityListener says there is an error in the connect 
> > > > > > process
> > > > for
> > > > > > 110 due to Duplicate key cpuOvercommitRatio (attempted merging 
> > > > > > values
> > > > 12
> > > > > > and 12) java.lang.IllegalStateException: Duplicate key
> > > > cpuOvercommitRatio
> > > > > > (attempted merging values 12 and 12)
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Levin
> > > > > >
> > > >

Reply via email to