Re: All 4 hosts disconnected in Alert state due to ComputeCapacityListener NULL: how to fix?

Janis Viklis | Files.fm Wed, 03 Jul 2024 06:28:03 -0700

mgmt_server_id is *NULL *just for those 4 hosts, other hosts ar fine.Looking at logs, cs1 management server starts to connect pools at first:

2024-07-01 16:31:29,617 DEBUG [c.c.s.l.StoragePoolMonitor](AgentTaskPool-380:ctx-f411cc14) (logid:284129f8) Host 248 connected,connecting host to shared pool id 152 and sending storage pool...

...

2024-07-01 16:31:29,839 DEBUG [c.c.a.t.Request](AgentTaskPool-380:ctx-f411cc14) (logid:284129f8) Seq248-1798343626204381188: Received: { Ans: , MgmtId: 95534596974, via:248(xs31.failiem.lv), Ver: v1, Flags: 10, {ModifyStoragePoolAnswer } }


------------------------------------------------------------------------

DB Tables cloud.host and cloud.mshost:

*SELECT id, status, Type, mgmt_server_id FROM cloud.host where ID in(74,77,170, 248, 254, 257, 260) :*


260     Alert   Routing         
257     Alert   Routing         
254     Alert   Routing         
248     Alert   Routing         
170     Up      Routing         95534596974
77      Up      Routing         95534596974
74      Up      Routing         95534596974

179 95534596974 1720012401793 localhostb34f493a-42c0-47a8-ada4-04be4cdd8c49 Up 4.13.1.0 10.10.10.11 90902024-07-03 13:13:47

178 95536034244 1718828790629 cs2.failiem.lv70420423-b362-4335-b083-8ad1342ce485 Down 4.13.1.0 10.10.10.12 90902024-06-19 20:39:19

176 95530190206 1719663483676 localhost96a155b6-7041-48ff-9f20-268ea77c5098 Down 4.13.1.0 10.10.10.13 90902024-06-29 12:24:28

175 95536505104 1719666507512 localhostc8e6fefa-7464-4bb7-a379-5eafb55c666d Down 4.13.1.0 10.10.10.11 90902024-06-29 13:38:00

174 95534962877 1682516323955 localhost45a057c6-6d50-41a9-bbad-cab370c01832 Down 4.13.1.0 10.10.10.11 90902024-06-15 08:36:06

172 95529749065 1658756353180 localhost535277d3-33df-4b2a-9f1d-07f05084d473 Down 4.13.1.0 10.10.10.13 90902024-06-15 07:53:32

170 95529797928 1603725530943 localhost5892611f-7af8-4686-8818-95ade086e6cf Down 4.13.1.0 10.10.10.13 90902020-11-03 04:05:40

167 95534560846 1658756323907 localhoste7ffd55a-77b7-4848-90de-5b5f10cc4500 Down 4.13.1.0 10.10.10.11 90902023-04-17 09:50:14

163 95534279505 1582559260879 cs1.failiem.lv8c254697-9783-11ea-900f-00163e4db64e Down 4.11.1.0 10.10.10.11 90902020-05-16 14:07:09

161 95531601526 1582559325515 cs3.failiem.lv8c25457e-9783-11ea-900f-00163e4db64e Down 4.11.1.0 10.10.10.13 90902020-05-16 14:07:21

        1

Janis


On 2024-07-03 13:11, Nux wrote:

A shot in the dark, haven't checked the log files properly.
For these hosts in the disconnected state, if you check them in the DBcloud.host table (type="Routing" btw), which mgmt_server_id are theyreporting?
Then check cloud.mshost table and see whether the management serverwith that id is in there and marked as UP etc.
HTH

On 2024-07-03 06:57, Janis Viklis | Files.fm wrote:
(sorry, some bad formatting in previous email)
Could anyone have any ideas why this error occurs and how to debugit? (248 is a host id)
Monitor ComputeCapacityListener says there is an error in the connectprocess for 248 due to null
Janis

On 2024-07-01 21:44, Janis Viklis | Files.fm wrote:
Hi,
looking for help after 2 weeks: What could be the reason thatsuddenly after restarting the 4.13.1 Management server, all 4 XEN(xcp-ng 8.1) hosts of one Intel cluster disconnects and goes into"Alert state" with an error:
Monitor ComputeCapacityListener says there is an error in theconnect process for 248 due to null
I can't find the reason for 2 weeks. The other AMD Xenserver 6.5cluster is working just fine.
Everything seems ok: network is working, I restarted: toolstack,both system vms (SSVM, consolev), one of the hosts, then removed andadded back.
Previously there were 3 management servers via Haproxy and GaleraMariadb, I left only one. (tried upgrade to 3.14.1, didn't help). Ican manage hosts via Xencenter. There ar 5 storage pools and 3secondary.
Thanks, hoping on some clues or directions, Janis.

Below is LOG output:

Re: All 4 hosts disconnected in Alert state due to ComputeCapacityListener NULL: how to fix?

Reply via email to