Upgraded to 4.19.0.1.

The same error and more new ones:
Web UI: The given command 'readyForShutdown' either does not exist.

[root@cs1 ~]# log | grep WARN
2024-07-23 20:48:13,896 WARN  [c.c.a.m.AgentAttache] (StatsCollector-1:ctx-0331434c) (logid:1eb04bf4) Seq 77-731834939447705632: Timed out on null 2024-07-23 20:48:13,896 WARN  [c.c.a.m.AgentManagerImpl] (StatsCollector-1:ctx-0331434c) (logid:1eb04bf4) Operation timed out: Commands 731834939447705632 to Host 77 timed out after 172800 2024-07-23 20:48:13,896 WARN  [c.c.v.VirtualMachineManagerImpl] (StatsCollector-1:ctx-0331434c) (logid:1eb04bf4) Unable to obtain VM network statistics. 2024-07-23 20:48:13,930 WARN  [c.c.a.m.AgentAttache] (StatsCollector-1:ctx-0331434c) (logid:1eb04bf4) Seq 170-4359484439294640163: Timed out on null 2024-07-23 20:48:13,931 WARN  [c.c.a.m.AgentManagerImpl] (StatsCollector-1:ctx-0331434c) (logid:1eb04bf4) Operation timed out: Commands 4359484439294640163 to Host 170 timed out after 172800 2024-07-23 20:48:13,931 WARN  [c.c.v.VirtualMachineManagerImpl] (StatsCollector-1:ctx-0331434c) (logid:1eb04bf4) Unable to obtain VM network statistics. 2024-07-23 20:48:19,729 WARN  [c.c.h.x.d.XcpServerDiscoverer] (AgentTaskPool-9:ctx-8ba2cf7c) (logid:3c35ed8f) defaulting to xenserver650 resource for product brand: XCP-ng with product version: 8.1.0 2024-07-23 20:48:19,940 WARN  [c.c.h.x.d.XcpServerDiscoverer] (AgentTaskPool-10:ctx-2fe9992d) (logid:bcbe402f) defaulting to xenserver650 resource for product brand: XCP-ng with product version: 8.1.0 2024-07-23 20:48:20,072 WARN  [c.c.h.x.d.XcpServerDiscoverer] (AgentTaskPool-11:ctx-0f428011) (logid:3f311481) defaulting to xenserver650 resource for product brand: XCP-ng with product version: 8.1.0 2024-07-23 20:48:20,145 WARN  [c.c.h.x.d.XcpServerDiscoverer] (AgentTaskPool-12:ctx-7cf4ea84) (logid:2936e5ed) defaulting to xenserver650 resource for product brand: XCP-ng with product version: 8.1.0 2024-07-23 20:48:24,469 WARN  [c.c.r.ResourceManagerImpl] (AgentTaskPool-9:ctx-8ba2cf7c) (logid:3c35ed8f) Unable to connect due to 2024-07-23 20:48:26,003 WARN  [c.c.r.ResourceManagerImpl] (AgentTaskPool-10:ctx-2fe9992d) (logid:bcbe402f) Unable to connect due to 2024-07-23 20:48:26,028 WARN  [c.c.r.ResourceManagerImpl] (AgentTaskPool-12:ctx-7cf4ea84) (logid:2936e5ed) Unable to connect due to 2024-07-23 20:48:26,247 WARN  [c.c.r.ResourceManagerImpl] (AgentTaskPool-11:ctx-0f428011) (logid:3f311481) Unable to connect due to 2024-07-23 20:48:29,635 WARN  [c.c.a.m.AgentAttache] (StatsCollector-6:ctx-6ae014e7) (logid:e109ae1e) Seq 77-731834939447705633: Timed out on null 2024-07-23 20:48:29,635 WARN  [c.c.a.m.AgentManagerImpl] (StatsCollector-6:ctx-6ae014e7) (logid:e109ae1e) Operation timed out: Commands 731834939447705633 to Host 77 timed out after 172800 2024-07-23 20:48:29,635 WARN  [c.c.v.VirtualMachineManagerImpl] (StatsCollector-6:ctx-6ae014e7) (logid:e109ae1e) Unable to obtain VM statistics. 2024-07-23 20:48:29,676 WARN  [c.c.a.m.AgentAttache] (StatsCollector-6:ctx-6ae014e7) (logid:e109ae1e) Seq 170-4359484439294640164: Timed out on null 2024-07-23 20:48:29,676 WARN  [c.c.a.m.AgentManagerImpl] (StatsCollector-6:ctx-6ae014e7) (logid:e109ae1e) Operation timed out: Commands 4359484439294640164 to Host 170 timed out after 172800 2024-07-23 20:48:29,676 WARN  [c.c.v.VirtualMachineManagerImpl] (StatsCollector-6:ctx-6ae014e7) (logid:e109ae1e) Unable to obtain VM statistics. 2024-07-23 20:48:29,707 WARN  [c.c.a.m.AgentAttache] (StatsCollector-1:ctx-2fd01c39) (logid:5a92adb5) Seq 18-1364027737139838981: Timed out on null 2024-07-23 20:48:29,707 WARN  [c.c.a.m.AgentManagerImpl] (StatsCollector-1:ctx-2fd01c39) (logid:5a92adb5) Operation timed out: Commands 1364027737139838981 to Host 18 timed out after 172800 2024-07-23 20:48:29,707 WARN  [c.c.r.ResourceManagerImpl] (StatsCollector-1:ctx-2fd01c39) (logid:5a92adb5) Unable to obtain host 18 statistics. 2024-07-23 20:48:29,707 WARN  [c.c.s.StatsCollector] (StatsCollector-1:ctx-2fd01c39) (logid:5a92adb5) The Host stats is null for host: 18 2024-07-23 20:48:29,735 WARN  [c.c.a.m.AgentAttache] (StatsCollector-1:ctx-2fd01c39) (logid:5a92adb5) Seq 74-9075316199104970777: Timed out on null 2024-07-23 20:48:29,735 WARN  [c.c.a.m.AgentManagerImpl] (StatsCollector-1:ctx-2fd01c39) (logid:5a92adb5) Operation timed out: Commands 9075316199104970777 to Host 74 timed out after 172800 2024-07-23 20:48:29,735 WARN  [c.c.r.ResourceManagerImpl] (StatsCollector-1:ctx-2fd01c39) (logid:5a92adb5) Unable to obtain host 74 statistics. 2024-07-23 20:48:29,736 WARN  [c.c.s.StatsCollector] (StatsCollector-1:ctx-2fd01c39) (logid:5a92adb5) The Host stats is null for host: 74 2024-07-23 20:48:29,765 WARN  [c.c.a.m.AgentAttache] (StatsCollector-1:ctx-2fd01c39) (logid:5a92adb5) Seq 77-731834939447705634: Timed out on null 2024-07-23 20:48:29,765 WARN  [c.c.a.m.AgentManagerImpl] (StatsCollector-1:ctx-2fd01c39) (logid:5a92adb5) Operation timed out: Commands 731834939447705634 to Host 77 timed out after 172800 2024-07-23 20:48:29,765 WARN  [c.c.r.ResourceManagerImpl] (StatsCollector-1:ctx-2fd01c39) (logid:5a92adb5) Unable to obtain host 77 statistics. 2024-07-23 20:48:29,765 WARN  [c.c.s.StatsCollector] (StatsCollector-1:ctx-2fd01c39) (logid:5a92adb5) The Host stats is null for host: 77 2024-07-23 20:48:29,792 WARN  [c.c.a.m.AgentAttache] (StatsCollector-1:ctx-2fd01c39) (logid:5a92adb5) Seq 170-4359484439294640165: Timed out on null 2024-07-23 20:48:29,792 WARN  [c.c.a.m.AgentManagerImpl] (StatsCollector-1:ctx-2fd01c39) (logid:5a92adb5) Operation timed out: Commands 4359484439294640165 to Host 170 timed out after 172800 2024-07-23 20:48:29,792 WARN  [c.c.r.ResourceManagerImpl] (StatsCollector-1:ctx-2fd01c39) (logid:5a92adb5) Unable to obtain host 170 statistics. 2024-07-23 20:48:29,792 WARN  [c.c.s.StatsCollector] (StatsCollector-1:ctx-2fd01c39) (logid:5a92adb5) The Host stats is null for host: 170 2024-07-23 20:48:31,609 WARN  [c.c.a.m.AgentAttache] (StatsCollector-4:ctx-c7c5ff50) (logid:67a9af5e) Seq 77-731834939447705635: Timed out on null 2024-07-23 20:48:31,626 WARN  [c.c.a.m.AgentAttache] (StatsCollector-4:ctx-c7c5ff50) (logid:67a9af5e) Seq 170-4359484439294640166: Timed out on null 2024-07-23 20:48:31,641 WARN  [c.c.a.m.AgentAttache] (StatsCollector-4:ctx-c7c5ff50) (logid:67a9af5e) Seq 74-9075316199104970778: Timed out on null 2024-07-23 20:48:31,667 WARN  [c.c.a.m.AgentAttache] (StatsCollector-4:ctx-c7c5ff50) (logid:67a9af5e) Seq 77-731834939447705636: Timed out on null 2024-07-23 20:48:31,684 WARN  [c.c.a.m.AgentAttache] (StatsCollector-4:ctx-c7c5ff50) (logid:67a9af5e) Seq 170-4359484439294640167: Timed out on null 2024-07-23 20:48:31,700 WARN  [c.c.a.m.AgentAttache] (StatsCollector-4:ctx-c7c5ff50) (logid:67a9af5e) Seq 74-9075316199104970779: Timed out on null 2024-07-23 20:48:31,728 WARN  [c.c.a.m.AgentAttache] (StatsCollector-4:ctx-c7c5ff50) (logid:67a9af5e) Seq 74-9075316199104970780: Timed out on null 2024-07-23 20:48:31,744 WARN  [c.c.a.m.AgentAttache] (StatsCollector-4:ctx-c7c5ff50) (logid:67a9af5e) Seq 170-4359484439294640168: Timed out on null 2024-07-23 20:48:31,761 WARN  [c.c.a.m.AgentAttache] (StatsCollector-4:ctx-c7c5ff50) (logid:67a9af5e) Seq 77-731834939447705637: Timed out on null 2024-07-23 20:48:31,787 WARN  [c.c.a.m.AgentAttache] (StatsCollector-4:ctx-c7c5ff50) (logid:67a9af5e) Seq 170-4359484439294640169: Timed out on null 2024-07-23 20:48:31,803 WARN  [c.c.a.m.AgentAttache] (StatsCollector-4:ctx-c7c5ff50) (logid:67a9af5e) Seq 74-9075316199104970781: Timed out on null 2024-07-23 20:48:31,820 WARN  [c.c.a.m.AgentAttache] (StatsCollector-4:ctx-c7c5ff50) (logid:67a9af5e) Seq 77-731834939447705638: Timed out on null 2024-07-23 20:48:31,844 WARN  [c.c.a.m.AgentAttache] (StatsCollector-4:ctx-c7c5ff50) (logid:67a9af5e) Seq 74-9075316199104970782: Timed out on null 2024-07-23 20:48:31,870 WARN  [c.c.a.m.AgentAttache] (StatsCollector-4:ctx-c7c5ff50) (logid:67a9af5e) Seq 77-731834939447705639: Timed out on null 2024-07-23 20:48:31,886 WARN  [c.c.a.m.AgentAttache] (StatsCollector-4:ctx-c7c5ff50) (logid:67a9af5e) Seq 170-4359484439294640170: Timed out on null 2024-07-23 20:48:44,035 WARN  [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-ad836823) (logid:febec4ab) alertType=[24] dataCenterId=[9] podId=[null] clusterId=[null] message=[System Alert: Number of unallocated shared network IPs is low in availability zone LTC-DC].

2024-07-23 20:45:43,969 ERROR [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-834ef0f3) (logid:7e9bd4f4) Caught exception in recalculating capacity 2024-07-23 20:46:19,513 ERROR [c.c.u.s.SshHelper] (AgentTaskPool-5:ctx-4a180df2) (logid:44b1d912) SSH execution of command xe sm-list | grep "resigning of duplicates" has an error status code in return. Result output: 2024-07-23 20:46:19,679 ERROR [c.c.u.s.SshHelper] (AgentTaskPool-6:ctx-4510f484) (logid:e8eaa85e) SSH execution of command xe sm-list | grep "resigning of duplicates" has an error status code in return. Result output: 2024-07-23 20:46:19,846 ERROR [c.c.u.s.SshHelper] (AgentTaskPool-7:ctx-18d4342d) (logid:c1d175a1) SSH execution of command xe sm-list | grep "resigning of duplicates" has an error status code in return. Result output: 2024-07-23 20:46:19,896 ERROR [c.c.u.s.SshHelper] (AgentTaskPool-8:ctx-574e515b) (logid:10346a2e) SSH execution of command xe sm-list | grep "resigning of duplicates" has an error status code in return. Result output: 2024-07-23 20:46:24,407 ERROR [c.c.a.m.AgentManagerImpl] (AgentTaskPool-5:ctx-4a180df2) (logid:44b1d912) Monitor ComputeCapacityListener says there is an error in the connect process for 248 due to null 2024-07-23 20:46:25,735 ERROR [c.c.a.m.AgentManagerImpl] (AgentTaskPool-6:ctx-4510f484) (logid:e8eaa85e) Monitor ComputeCapacityListener says there is an error in the connect process for 254 due to null 2024-07-23 20:46:25,817 ERROR [c.c.a.m.AgentManagerImpl] (AgentTaskPool-8:ctx-574e515b) (logid:10346a2e) Monitor ComputeCapacityListener says there is an error in the connect process for 260 due to null 2024-07-23 20:46:26,029 ERROR [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-18d4342d) (logid:c1d175a1) Monitor ComputeCapacityListener says there is an error in the connect process for 257 due to null

Janis

On 2024-07-23 16:44, Janis Viklis | Files.fm wrote:
Seems similar issue to this one: https://issues.apache.org/jira/browse/CLOUDSTACK-8747

Sending Connect to listener: ComputeCapacityListener
Found 5 VMs on host 27
Found 1 VM, not running on host 27
Monitor*ComputeCapacityListener *says there is an error in the connect processfor  27 due tonull Host 27 is disconnecting with event AgentDisconnected
The next status of agent 27is Alert, current status is Connecting

Janis

On 2024-07-04 3:48, Nux wrote:
Janis,

No clue, it's been a while since I used Xenserver and you are also on quite an old version as well, right? There have been many bugs fixed since 4.13.

Would it be possible to include a much larger fragment from the logs or the full logs?

Also, have you checked the Xcp logs, anything there, is XenCenter showing anything out of the ordinary?

HTH

On 2024-07-03 14:36, Janis Viklis | Files.fm wrote:
If I set valid management server id, it returns to NULL after next host check cycle.

I wonder could it bet somehow related to total or cluster resources. (but i tried to find and check/change all overprovisionig multipliers)

2024-07-03 16:30:16,036 DEBUG [c.c.c.CapacityManagerImpl] (CapacityChecker:ctx-af9f7c42) (logid:31d432e5) Found 32 VMs on host 248 2024-07-03 16:30:16,039 DEBUG [c.c.c.CapacityManagerImpl] (CapacityChecker:ctx-af9f7c42) (logid:31d432e5) Found 0 VMs are Migrating from host 248 2024-07-03 16:30:16,138 ERROR [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-af9f7c42) (logid:31d432e5) Caught exception in recalculating capacity
java.lang.NullPointerException
        at com.cloud.capacity.CapacityManagerImpl.updateCapacityForHost(CapacityManagerImpl.java:677)         at com.cloud.alert.AlertManagerImpl.recalculateCapacity(AlertManagerImpl.java:279)         at com.cloud.alert.AlertManagerImpl.checkForAlerts(AlertManagerImpl.java:432)         at com.cloud.alert.AlertManagerImpl$CapacityChecker.runInContext(AlertManagerImpl.java:422)         at org.apache.cloudstack.managed.context.ManagedContextTimerTask$1.runInContext(ManagedContextTimerTask.java:30)         at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)         at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)         at org.apache.cloudstack.managed.context.ManagedContextTimerTask.run(ManagedContextTimerTask.java:32)
        at java.util.TimerThread.mainLoop(Timer.java:555)
        at java.util.TimerThread.run(Timer.java:505)

Janis

On 2024-07-03 16:27, Nux wrote:
Hello,

What happens if you update the 4 problematic hosts with a valid mgmt id?

On 2024-07-03 14:23, Janis Viklis | Files.fm wrote:
mgmt_server_id is NULL just for those 4 hosts, other hosts ar fine.
Looking at logs, cs1 management server starts to connect pools at
first:

2024-07-01 16:31:29,617 DEBUG [c.c.s.l.StoragePoolMonitor]
(AgentTaskPool-380:ctx-f411cc14) (logid:284129f8) Host 248 connected,
connecting host to shared pool id 152 and sending storage pool...

------------------------------------------------------------------------


DB Tables: cloud.host and cloud.mshost:

SELECT id, status, Type, mgmt_server_id FROM cloud.host where ID in
(74,77,170, 248, 254, 257, 260) :

         260
         Alert
         Routing

         257
         Alert
         Routing

         254
         Alert
         Routing

         248
         Alert
         Routing

         170
         Up
         Routing
         95534596974

         77
         Up
         Routing
         95534596974

         74
         Up
         Routing
         95534596974

         179
         95534596974
         1720012401793
         localhost
         b34f493a-42c0-47a8-ada4-04be4cdd8c49
         Up
         4.13.1.0
         10.10.10.11
         9090
         2024-07-03 13:13:47

         0

         178
         95536034244
         1718828790629
         cs2.failiem.lv
         70420423-b362-4335-b083-8ad1342ce485
         Down
         4.13.1.0
         10.10.10.12
         9090
         2024-06-19 20:39:19

         1

         176
         95530190206
         1719663483676
         localhost
         96a155b6-7041-48ff-9f20-268ea77c5098
         Down
         4.13.1.0
         10.10.10.13
         9090
         2024-06-29 12:24:28

         1

         175
         95536505104
         1719666507512
         localhost
         c8e6fefa-7464-4bb7-a379-5eafb55c666d
         Down
         4.13.1.0
         10.10.10.11
         9090
         2024-06-29 13:38:00

         0

         174
         95534962877
         1682516323955
         localhost
         45a057c6-6d50-41a9-bbad-cab370c01832
         Down
         4.13.1.0
         10.10.10.11
         9090
         2024-06-15 08:36:06

         1

         172
         95529749065
         1658756353180
         localhost
         535277d3-33df-4b2a-9f1d-07f05084d473
         Down
         4.13.1.0
         10.10.10.13
         9090
         2024-06-15 07:53:32

         1

         170
         95529797928
         1603725530943
         localhost
         5892611f-7af8-4686-8818-95ade086e6cf
         Down
         4.13.1.0
         10.10.10.13
         9090
         2020-11-03 04:05:40

         1

         167
         95534560846
         1658756323907
         localhost
         e7ffd55a-77b7-4848-90de-5b5f10cc4500
         Down
         4.13.1.0
         10.10.10.11
         9090
         2023-04-17 09:50:14

         1

         163
         95534279505
         1582559260879
         cs1.failiem.lv
         8c254697-9783-11ea-900f-00163e4db64e
         Down
         4.11.1.0
         10.10.10.11
         9090
         2020-05-16 14:07:09

         1

         161
         95531601526
         1582559325515
         cs3.failiem.lv
         8c25457e-9783-11ea-900f-00163e4db64e
         Down
         4.11.1.0
         10.10.10.13
         9090
         2020-05-16 14:07:21

         1

Janis

On 2024-07-03 13:11, Nux wrote:

A shot in the dark, haven't checked the log files properly.
For these hosts in the disconnected state, if you check them in the
DB cloud.host table (type="Routing" btw), which mgmt_server_id are
they reporting?

Then check cloud.mshost table and see whether the management server
with that id is in there and marked as UP etc.

HTH

On 2024-07-03 06:57, Janis Viklis | Files.fm wrote:
(sorry, some bad formatting in previous email)

Could anyone have any ideas why this error occurs and how to debug
it? (248 is a host id)

Monitor ComputeCapacityListener says there is an error in the
connect process for 248 due to null

Janis

On 2024-07-01 21:44, Janis Viklis | Files.fm wrote:
Hi,

looking for help after 2 weeks:  What could be the reason that
suddenly after restarting the 4.13.1 Management server, all 4 XEN
(xcp-ng 8.1) hosts of one Intel cluster disconnects and goes into
"Alert state" with an error:

Monitor ComputeCapacityListener says there is an error in the
connect process for 248 due to null

I can't find the reason for 2 weeks. The other AMD Xenserver 6.5
cluster is working just fine.

Everything seems ok: network is working, I restarted: toolstack,
both system vms (SSVM, consolev), one of the hosts, then removed and
added back.

Previously there were 3 management servers via Haproxy and Galera
Mariadb, I left only one. (tried upgrade to 3.14.1, didn't help). I
can manage hosts via Xencenter. There ar 5 storage pools and 3
secondary.

Thanks, hoping on some clues or directions, Janis.

Below is LOG output:

Reply via email to