Hello Albert, thanks, and sorry for the late response.
On Sept. 26 vdsmd went down on the host again. (didn't monitor properly, so, only now, I realized it was down:) This time I could not find any segvs in the logs. When did it start? We used until now, 4.3, and then upgraded first to 4.5.1 (through 4.4), as described in the documentation. Until then, we never encountered crashes of the vdsmd. Only ony after going to 4.5.2 and switching to compatibility level 4.7, random crashed of the vdsmd happen. The Ovirt engine, btw. is on a seperate Virtual machine, outside of the ovirt cluster. On the host ,/var/log/vdsm/vdsm.log: I only can see, logging stoppped, and after restarting vdsmd, it started again. Also, after restarting vdsmd, all Virtual machines on that host are shown with the proper running state. ---<snipp>-- 2022-09-26 11:52:25,161+0200 INFO (jsonrpc/6) [api.host] FINISH getAllVmStats return={'status': {'code': 0, 'message': 'Done'}, 'statsList': (suppressed)} from=::1,58732 (api:54) 2022-09-26 11:52:25,316+0200 INFO (periodic/2) [vdsm.api] START repoStats(domains=()) from=internal, task_id=cfb0ce21-b884-4fd9-baa6-d020107035fe (api:48) 2022-09-26 11:52:25,317+0200 INFO (periodic/2) [vdsm.api] FINISH repoStats return={'6f018fbd-de93-4c56-880d-8ede2aad2674': {'code': 0, 'lastCheck': '4.6', 'delay': '0.000571996', 'valid': True, 'version': 5, 'a cquired': True, 'actual': True}, '41012bfb-b802-4092-b699-7f5284d95c8e': {'code': 0, 'lastCheck': '5.7', 'delay': '0.00073916', 'valid': True, 'version': 5, 'acquired': True, 'actual': True}, 'a15496dc-c241-4658 -af9d-0dfe11783916': {'code': 0, 'lastCheck': '4.4', 'delay': '0.00066827', 'valid': True, 'version': 5, 'acquired': True, 'actual': True}, '2c870e06-6c70-45ec-b665-ce29408c8a8e': {'code': 0, 'lastCheck': '4.4', 'delay': '0.000624771', 'valid': True, 'version': 5, 'acquired': True, 'actual': True}, '8008fa12-17c7-4d9f-b871-153bdd4a283e': {'code': 0, 'lastCheck': '1.8', 'delay': '0.000390569', 'valid': True, 'version': 5, 'acquired': True, 'actual': True}} from=internal, task_id=cfb0ce21-b884-4fd9-baa6-d020107035fe (api:54) 2022-09-26 11:52:26,171+0200 INFO (jsonrpc/4) [vdsm.api] START getSpmStatus(spUUID='5836aaac-0030-0064-024d-0000000002e4') from=::ffff:10.2.0.4,37000, task_id=d9d88470-3f04-4060-8d19-f9c68fa928ec (api:48) 2022-09-26 11:52:26,191+0200 INFO (jsonrpc/4) [vdsm.api] FINISH getSpmStatus return={'spm_st': {'spmStatus': 'SPM', 'spmLver': 66, 'spmId': 2}} from=::ffff:10.2.0.4,37000, task_id=d9d88470-3f04-4060-8d19-f9c68f a928ec (api:54) 2022-09-26 11:52:26,218+0200 INFO (jsonrpc/5) [vdsm.api] START getStoragePoolInfo(spUUID='5836aaac-0030-0064-024d-0000000002e4') from=::ffff:10.2.0.4,54708, task_id=96aab09f-60ca-4ae7-be31-aa3070a08030 (api:48) 2022-09-26 11:52:26,240+0200 INFO (jsonrpc/5) [vdsm.api] FINISH getStoragePoolInfo return={'info': {'domains': '8008fa12-17c7-4d9f-b871-153bdd4a283e:Active,6f018fbd-de93-4c56-880d-8ede2aad2674:Active,2c870e06-6 c70-45ec-b665-ce29408c8a8e:Active,a15496dc-c241-4658-af9d-0dfe11783916:Active,41012bfb-b802-4092-b699-7f5284d95c8e:Active', 'isoprefix': '', 'lver': 66, 'master_uuid': '41012bfb-b802-4092-b699-7f5284d95c8e', 'ma ster_ver': 293, 'name': 'No Description', 'pool_status': 'connected', 'spm_id': 2, 'type': 'ISCSI', 'version': '5'}, 'dominfo': {'8008fa12-17c7-4d9f-b871-153bdd4a283e': {'status': 'Active', 'alerts': [], 'isopre fix': '', 'version': 5, 'disktotal': '879066349568', 'diskfree': '863233900544'}, '6f018fbd-de93-4c56-880d-8ede2aad2674': {'status': 'Active', 'alerts': [], 'isoprefix': '', 'version': 5, 'disktotal': '854966927 36', 'diskfree': '67779952640'}, '2c870e06-6c70-45ec-b665-ce29408c8a8e': {'status': 'Active', 'alerts': [], 'isoprefix': '', 'version': 5, 'disktotal': '536468258816', 'diskfree': '445602856960'}, 'a15496dc-c241 -4658-af9d-0dfe11783916': {'status': 'Active', 'alerts': [], 'isoprefix': '', 'version': 5, 'disktotal': '2683951906816', 'diskfree': '1178028998656'}, '41012bfb-b802-4092-b699-7f5284d95c8e': {'status': 'Active' , 'alerts': [], 'isoprefix': '', 'version': 5, 'disktotal': '536468258816', 'diskfree': '450300477440'}}} from=::ffff:10.2.0.4,54708, task_id=96aab09f-60ca-4ae7-be31-aa3070a08030 (api:54) 2022-09-26 11:52:29,273+0200 INFO (jsonrpc/1) [api.host] START getAllVmStats() from=::ffff:10.2.0.4,37000 (api:48) 2022-09-26 11:52:29,302+0200 INFO (jsonrpc/1) [api.host] FINISH getAllVmStats return={'status': {'code': 0, 'message': 'Done'}, 'statsList': (suppressed)} from=::ffff:10.2.0.4,37000 (api:54) 2022-09-26 11:52:33,280+0200 INFO (jsonrpc/2) [api.host] START getStats() from=::ffff:10.2.0.4,37000 (api:48) 2022-09-26 11:52:33,392+0200 INFO (jsonrpc/2) [vdsm.api] START repoStats(domains=()) from=::ffff:10.2.0.4,37000, task_id=a6e9dfd1-ed16-4178-b2b5-3396216e39cc (api:48) 2022-09-26 11:52:33,393+0200 INFO (jsonrpc/2) [vdsm.api] FINISH repoStats return={'6f018fbd-de93-4c56-880d-8ede2aad2674': {'code': 0, 'lastCheck': '2.6', 'delay': '0.000660599', 'valid': True, 'version': 5, 'ac quired': True, 'actual': True}, '41012bfb-b802-4092-b699-7f5284d95c8e': {'code': 0, 'lastCheck': '3.7', 'delay': '0.00630966', 'valid': True, 'version': 5, 'acquired': True, 'actual': True}, 'a15496dc-c241-4658- af9d-0dfe11783916': {'code': 0, 'lastCheck': '2.5', 'delay': '0.000574606', 'valid': True, 'version': 5, 'acquired': True, 'actual': True}, '2c870e06-6c70-45ec-b665-ce29408c8a8e': {'code': 0, 'lastCheck': '2.5', 'delay': '0.000632427', 'valid': True, 'version': 5, 'acquired': True, 'actual': True}, '8008fa12-17c7-4d9f-b871-153bdd4a283e': {'code': 0, 'lastCheck': '5.2', 'delay': '0.000390569', 'valid': True, 'version': 5, 'acquired': True, 'actual': True}} from=::ffff:10.2.0.4,37000, task_id=a6e9dfd1-ed16-4178-b2b5-3396216e39cc (api:54) 2022-09-26 11:52:33,393+0200 INFO (jsonrpc/2) [vdsm.api] START multipath_health() from=::ffff:10.2.0.4,37000, task_id=730e795b-ea87-4523-8711-caa27d7491fa (api:48) 2022-09-26 11:52:33,394+0200 INFO (jsonrpc/2) [vdsm.api] FINISH multipath_health return={} from=::ffff:10.2.0.4,37000, task_id=730e795b-ea87-4523-8711-caa27d7491fa (api:54) 2022-09-26 11:52:33,413+0200 INFO (jsonrpc/2) [api.host] FINISH getStats return={'status': {'code': 0, 'message': 'Done'}, 'info': (suppressed)} from=::ffff:10.2.0.4,37000 (api:54) 2022-10-01 07:03:26,112+0200 INFO (MainThread) [vds] (PID: 4130768) I am the actual vdsm 4.50.2.2.1 vserv05 (4.18.0-372.26.1.el8_6.x86_64) (vdsmd:152) 2022-10-01 07:03:26,113+0200 INFO (MainThread) [vds] VDSM will run with cpu affinity: frozenset({1}) (vdsmd:271) 2022-10-01 07:03:26,122+0200 INFO (MainThread) [storage.hsm] START HSM init (hsm:217) 2022-10-01 07:03:26,125+0200 INFO (MainThread) [storage.hsm] Creating data-center mount directory '/rhev/data-center/mnt' (hsm:222) 2022-10-01 07:03:26,125+0200 INFO (MainThread) [storage.fileutils] Creating directory: /rhev/data-center/mnt mode: None (fileUtils:231) 2022-10-01 07:03:26,127+0200 INFO (MainThread) [storage.storagepool] Unmounting master /rhev/data-center/mnt/blockSD/41012bfb-b802-4092-b699-7f5284d95c8e/master (sp:438) 2022-10-01 07:03:26,215+0200 INFO (MainThread) [storage.mount] unmounting /rhev/data-center/mnt/blockSD/41012bfb-b802-4092-b699-7f5284d95c8e/master (mount:215) 2022-10-01 07:03:26,338+0200 INFO (MainThread) [storage.hsm] Unlinking file '/rhev/data-center/5836aaac-0030-0064-024d-0000000002e4/2c870e06-6c70-45ec-b665-ce29408c8a8e' (hsm:344) 2022-10-01 07:03:26,339+0200 INFO (MainThread) [storage.hsm] Unlinking file '/rhev/data-center/5836aaac-0030-0064-024d-0000000002e4/6f018fbd-de93-4c56-880d-8ede2aad2674' (hsm:344) 2022-10-01 07:03:26,339+0200 INFO (MainThread) [storage.hsm] Unlinking file '/rhev/data-center/5836aaac-0030-0064-024d-0000000002e4/a15496dc-c241-4658-af9d-0dfe11783916' (hsm:344) ---<snapp>--- On Ovirt-engine, I can see, it cannot communicate with vdsmd, wich is not really surprinsing, when the vdsmd is down. ---<snipp>--- 2022-09-26 11:53:51,974+02 INFO [org.ovirt.engine.core.bll.aaa.TerminateSessionsForTokenCommand] (default task-237) [5ad44f73] Running command: TerminateSessionsForTokenCommand internal: true. 2022-09-26 11:53:54,705+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to /10.2.0.5 2022-09-26 11:53:54,705+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connected to /10.2.0.5:54321 2022-09-26 11:53:54,707+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) [] Unable to RefreshCapabilities: ConnectExce ption: Connection refused 2022-09-26 11:53:54,715+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) [] Command 'GetCapabilitiesAsy ncVDSCommand(HostName = vserv05, VdsIdAndVdsVDSCommandParametersBase:{hostId='75188aec-44ef-43b9-92b5-e5b06ae22ada', vds='Host[vserv05,75188aec-44ef-43b9-92b5-e5b06ae22ada]'})' execution failed: java.net.Connect Exception: Connection refused 2022-09-26 11:53:56,246+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connection timeout for host '10.2.0.5', last response arrived 90001 ms ago. 2022-09-26 11:53:56,269+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-51) [3ba89774] EVENT_ID: IRS_BROKER_COMMAND_FAILURE(10,803), VDSM command GetStoragePoolInfoVDS failed: Connection timeout for host '10.2.0.5', last response arrived 90001 ms ago. 2022-09-26 11:53:56,270+02 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-51) [3ba89774] ERROR, org.ovirt.engine.core.vdsbroker.irsbroker.GetStoragePoolInfoVDSCommand, exception: VDSGenericException: VDSNetworkException: Connection timeout for host '10.2.0.5', last response arrived 90001 ms ago., log id: c571eb8 2022-09-26 11:53:56,273+02 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-51) [3ba89774] Exception VDSNetworkException: VDSGenericException: VDSNetworkException: Connection timeout for host '10.2.0.5', last response arrived 90001 ms ago. 2022-09-26 11:53:57,726+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to /10.2.0.5 2022-09-26 11:53:57,727+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connected to /10.2.0.5:54321 2022-09-26 11:53:57,732+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-85) [] Unable to RefreshCapabilities: ConnectException: Connection refused 2022-09-26 11:53:57,734+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-85) [] Command 'GetCapabilitiesAsyncVDSCommand(HostName = vserv05, VdsIdAndVdsVDSCommandParametersBase:{hostId='75188aec-44ef-43b9-92b5-e5b06ae22ada', vds='Host[vserv05,75188aec-44ef-43b9-92b5-e5b06ae22ada]'})' execution failed: java.net.ConnectException: Connection refused 2022-09-26 11:54:00,750+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to /10.2.0.5 2022-09-26 11:54:00,750+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connected to /10.2.0.5:54321 2022-09-26 11:54:00,754+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-50) [] Unable to RefreshCapabilities: ConnectException: Connection refused 2022-09-26 11:54:00,758+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-50) [] Command 'GetCapabilitiesAsyncVDSCommand(HostName = vserv05, VdsIdAndVdsVDSCommandParametersBase:{hostId='75188aec-44ef-43b9-92b5-e5b06ae22ada', vds='Host[vserv05,75188aec-44ef-43b9-92b5-e5b06ae22ada]'})' execution failed: java.net.ConnectException: Connection refused 2022-09-26 11:54:03,772+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to /10.2.0.5 2022-09-26 11:54:03,773+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connected to /10.2.0.5:54321 2022-09-26 11:54:03,780+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-100) [] Unable to RefreshCapabilities: ConnectException: Connection refused 2022-09-26 11:54:03,782+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-100) [] Command 'GetCapabilitiesAsyncVDSCommand(HostName = vserv05, VdsIdAndVdsVDSCommandParametersBase:{hostId='75188aec-44ef-43b9-92b5-e5b06ae22ada', vds='Host[vserv05,75188aec-44ef-43b9-92b5-e5b06ae22ada]'})' execution failed: java.net.ConnectException: Connection refused 2022-09-26 11:54:06,330+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to /10.2.0.5 2022-09-26 11:54:06,330+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connected to /10.2.0.5:54321 2022-09-26 11:54:06,333+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-92) [] Command 'SpmStatusVDSCommand(HostName = vserv05, SpmStatusVDSCommandParameters:{hostId='75188aec-44ef-43b9-92b5-e5b06ae22ada', storagePoolId='5836aaac-0030-0064-024d-0000000002e4'})' execution failed: java.net.ConnectException: Connection refused 2022-09-26 11:54:06,517+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to /10.2.0.5 2022-09-26 11:54:06,517+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connected to /10.2.0.5:54321 ---<snapp>---- _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5BA5HV57DO77Y5EN5SPMDN5CRKCYPCFB/