Hello Albert, 

thanks, and sorry for the late response.

On Sept. 26 vdsmd went down on the host again. (didn't monitor properly, so, 
only now, I realized it was down:)

This time I could not find any segvs in the logs.

When did it start?
We used until  now, 4.3, and then upgraded first to 4.5.1 (through 4.4), as 
described in the documentation. Until then, we never encountered crashes of the 
vdsmd. Only ony after going to 4.5.2 and switching to compatibility level 4.7, 
random crashed of the vdsmd happen.

The Ovirt engine, btw. is on a seperate Virtual machine, outside of the ovirt 
cluster.

On the host ,/var/log/vdsm/vdsm.log:
I only can see, logging stoppped, and after restarting vdsmd, it started again. 
Also, after restarting vdsmd, all Virtual machines on that host are shown with 
the proper running state.

---<snipp>--
2022-09-26 11:52:25,161+0200 INFO  (jsonrpc/6) [api.host] FINISH getAllVmStats 
return={'status': {'code': 0, 'message': 'Done'}, 'statsList': (suppressed)} 
from=::1,58732 (api:54)
2022-09-26 11:52:25,316+0200 INFO  (periodic/2) [vdsm.api] START 
repoStats(domains=()) from=internal, 
task_id=cfb0ce21-b884-4fd9-baa6-d020107035fe (api:48)
2022-09-26 11:52:25,317+0200 INFO  (periodic/2) [vdsm.api] FINISH repoStats 
return={'6f018fbd-de93-4c56-880d-8ede2aad2674': {'code': 0, 'lastCheck': '4.6', 
'delay': '0.000571996', 'valid': True, 'version': 5, 'a
cquired': True, 'actual': True}, '41012bfb-b802-4092-b699-7f5284d95c8e': 
{'code': 0, 'lastCheck': '5.7', 'delay': '0.00073916', 'valid': True, 
'version': 5, 'acquired': True, 'actual': True}, 'a15496dc-c241-4658
-af9d-0dfe11783916': {'code': 0, 'lastCheck': '4.4', 'delay': '0.00066827', 
'valid': True, 'version': 5, 'acquired': True, 'actual': True}, 
'2c870e06-6c70-45ec-b665-ce29408c8a8e': {'code': 0, 'lastCheck': '4.4',
 'delay': '0.000624771', 'valid': True, 'version': 5, 'acquired': True, 
'actual': True}, '8008fa12-17c7-4d9f-b871-153bdd4a283e': {'code': 0, 
'lastCheck': '1.8', 'delay': '0.000390569', 'valid': True, 'version': 
5, 'acquired': True, 'actual': True}} from=internal, 
task_id=cfb0ce21-b884-4fd9-baa6-d020107035fe (api:54)
2022-09-26 11:52:26,171+0200 INFO  (jsonrpc/4) [vdsm.api] START 
getSpmStatus(spUUID='5836aaac-0030-0064-024d-0000000002e4') 
from=::ffff:10.2.0.4,37000, task_id=d9d88470-3f04-4060-8d19-f9c68fa928ec 
(api:48)
2022-09-26 11:52:26,191+0200 INFO  (jsonrpc/4) [vdsm.api] FINISH getSpmStatus 
return={'spm_st': {'spmStatus': 'SPM', 'spmLver': 66, 'spmId': 2}} 
from=::ffff:10.2.0.4,37000, task_id=d9d88470-3f04-4060-8d19-f9c68f
a928ec (api:54)
2022-09-26 11:52:26,218+0200 INFO  (jsonrpc/5) [vdsm.api] START 
getStoragePoolInfo(spUUID='5836aaac-0030-0064-024d-0000000002e4') 
from=::ffff:10.2.0.4,54708, task_id=96aab09f-60ca-4ae7-be31-aa3070a08030 
(api:48)
2022-09-26 11:52:26,240+0200 INFO  (jsonrpc/5) [vdsm.api] FINISH 
getStoragePoolInfo return={'info': {'domains': 
'8008fa12-17c7-4d9f-b871-153bdd4a283e:Active,6f018fbd-de93-4c56-880d-8ede2aad2674:Active,2c870e06-6
c70-45ec-b665-ce29408c8a8e:Active,a15496dc-c241-4658-af9d-0dfe11783916:Active,41012bfb-b802-4092-b699-7f5284d95c8e:Active',
 'isoprefix': '', 'lver': 66, 'master_uuid': 
'41012bfb-b802-4092-b699-7f5284d95c8e', 'ma
ster_ver': 293, 'name': 'No Description', 'pool_status': 'connected', 'spm_id': 
2, 'type': 'ISCSI', 'version': '5'}, 'dominfo': 
{'8008fa12-17c7-4d9f-b871-153bdd4a283e': {'status': 'Active', 'alerts': [], 
'isopre
fix': '', 'version': 5, 'disktotal': '879066349568', 'diskfree': 
'863233900544'}, '6f018fbd-de93-4c56-880d-8ede2aad2674': {'status': 'Active', 
'alerts': [], 'isoprefix': '', 'version': 5, 'disktotal': '854966927
36', 'diskfree': '67779952640'}, '2c870e06-6c70-45ec-b665-ce29408c8a8e': 
{'status': 'Active', 'alerts': [], 'isoprefix': '', 'version': 5, 'disktotal': 
'536468258816', 'diskfree': '445602856960'}, 'a15496dc-c241
-4658-af9d-0dfe11783916': {'status': 'Active', 'alerts': [], 'isoprefix': '', 
'version': 5, 'disktotal': '2683951906816', 'diskfree': '1178028998656'}, 
'41012bfb-b802-4092-b699-7f5284d95c8e': {'status': 'Active'
, 'alerts': [], 'isoprefix': '', 'version': 5, 'disktotal': '536468258816', 
'diskfree': '450300477440'}}} from=::ffff:10.2.0.4,54708, 
task_id=96aab09f-60ca-4ae7-be31-aa3070a08030 (api:54)
2022-09-26 11:52:29,273+0200 INFO  (jsonrpc/1) [api.host] START getAllVmStats() 
from=::ffff:10.2.0.4,37000 (api:48)
2022-09-26 11:52:29,302+0200 INFO  (jsonrpc/1) [api.host] FINISH getAllVmStats 
return={'status': {'code': 0, 'message': 'Done'}, 'statsList': (suppressed)} 
from=::ffff:10.2.0.4,37000 (api:54)
2022-09-26 11:52:33,280+0200 INFO  (jsonrpc/2) [api.host] START getStats() 
from=::ffff:10.2.0.4,37000 (api:48)
2022-09-26 11:52:33,392+0200 INFO  (jsonrpc/2) [vdsm.api] START 
repoStats(domains=()) from=::ffff:10.2.0.4,37000, 
task_id=a6e9dfd1-ed16-4178-b2b5-3396216e39cc (api:48)
2022-09-26 11:52:33,393+0200 INFO  (jsonrpc/2) [vdsm.api] FINISH repoStats 
return={'6f018fbd-de93-4c56-880d-8ede2aad2674': {'code': 0, 'lastCheck': '2.6', 
'delay': '0.000660599', 'valid': True, 'version': 5, 'ac
quired': True, 'actual': True}, '41012bfb-b802-4092-b699-7f5284d95c8e': 
{'code': 0, 'lastCheck': '3.7', 'delay': '0.00630966', 'valid': True, 
'version': 5, 'acquired': True, 'actual': True}, 'a15496dc-c241-4658-
af9d-0dfe11783916': {'code': 0, 'lastCheck': '2.5', 'delay': '0.000574606', 
'valid': True, 'version': 5, 'acquired': True, 'actual': True}, 
'2c870e06-6c70-45ec-b665-ce29408c8a8e': {'code': 0, 'lastCheck': '2.5',
 'delay': '0.000632427', 'valid': True, 'version': 5, 'acquired': True, 
'actual': True}, '8008fa12-17c7-4d9f-b871-153bdd4a283e': {'code': 0, 
'lastCheck': '5.2', 'delay': '0.000390569', 'valid': True, 'version': 
5, 'acquired': True, 'actual': True}} from=::ffff:10.2.0.4,37000, 
task_id=a6e9dfd1-ed16-4178-b2b5-3396216e39cc (api:54)
2022-09-26 11:52:33,393+0200 INFO  (jsonrpc/2) [vdsm.api] START 
multipath_health() from=::ffff:10.2.0.4,37000, 
task_id=730e795b-ea87-4523-8711-caa27d7491fa (api:48)
2022-09-26 11:52:33,394+0200 INFO  (jsonrpc/2) [vdsm.api] FINISH 
multipath_health return={} from=::ffff:10.2.0.4,37000, 
task_id=730e795b-ea87-4523-8711-caa27d7491fa (api:54)
2022-09-26 11:52:33,413+0200 INFO  (jsonrpc/2) [api.host] FINISH getStats 
return={'status': {'code': 0, 'message': 'Done'}, 'info': (suppressed)} 
from=::ffff:10.2.0.4,37000 (api:54)
2022-10-01 07:03:26,112+0200 INFO  (MainThread) [vds] (PID: 4130768) I am the 
actual vdsm 4.50.2.2.1 vserv05 (4.18.0-372.26.1.el8_6.x86_64) (vdsmd:152)
2022-10-01 07:03:26,113+0200 INFO  (MainThread) [vds] VDSM will run with cpu 
affinity: frozenset({1}) (vdsmd:271)
2022-10-01 07:03:26,122+0200 INFO  (MainThread) [storage.hsm] START HSM init 
(hsm:217)
2022-10-01 07:03:26,125+0200 INFO  (MainThread) [storage.hsm] Creating 
data-center mount directory '/rhev/data-center/mnt' (hsm:222)
2022-10-01 07:03:26,125+0200 INFO  (MainThread) [storage.fileutils] Creating 
directory: /rhev/data-center/mnt mode: None (fileUtils:231)
2022-10-01 07:03:26,127+0200 INFO  (MainThread) [storage.storagepool] 
Unmounting master 
/rhev/data-center/mnt/blockSD/41012bfb-b802-4092-b699-7f5284d95c8e/master 
(sp:438)
2022-10-01 07:03:26,215+0200 INFO  (MainThread) [storage.mount] unmounting 
/rhev/data-center/mnt/blockSD/41012bfb-b802-4092-b699-7f5284d95c8e/master 
(mount:215)
2022-10-01 07:03:26,338+0200 INFO  (MainThread) [storage.hsm] Unlinking file 
'/rhev/data-center/5836aaac-0030-0064-024d-0000000002e4/2c870e06-6c70-45ec-b665-ce29408c8a8e'
 (hsm:344)
2022-10-01 07:03:26,339+0200 INFO  (MainThread) [storage.hsm] Unlinking file 
'/rhev/data-center/5836aaac-0030-0064-024d-0000000002e4/6f018fbd-de93-4c56-880d-8ede2aad2674'
 (hsm:344)
2022-10-01 07:03:26,339+0200 INFO  (MainThread) [storage.hsm] Unlinking file 
'/rhev/data-center/5836aaac-0030-0064-024d-0000000002e4/a15496dc-c241-4658-af9d-0dfe11783916'
 (hsm:344)
---<snapp>---

On Ovirt-engine, I can see, it cannot communicate with vdsmd, wich is not 
really surprinsing, when the vdsmd is down. 

---<snipp>---
2022-09-26 11:53:51,974+02 INFO  
[org.ovirt.engine.core.bll.aaa.TerminateSessionsForTokenCommand] (default 
task-237) [5ad44f73] Running command: TerminateSessionsForTokenCommand 
internal: true.
2022-09-26 11:53:54,705+02 INFO  
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
Connecting to /10.2.0.5
2022-09-26 11:53:54,705+02 INFO  
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
Connected to /10.2.0.5:54321
2022-09-26 11:53:54,707+02 ERROR 
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] 
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) [] 
Unable to RefreshCapabilities: ConnectExce
ption: Connection refused
2022-09-26 11:53:54,715+02 ERROR 
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] 
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) [] 
Command 'GetCapabilitiesAsy
ncVDSCommand(HostName = vserv05, 
VdsIdAndVdsVDSCommandParametersBase:{hostId='75188aec-44ef-43b9-92b5-e5b06ae22ada',
 vds='Host[vserv05,75188aec-44ef-43b9-92b5-e5b06ae22ada]'})' execution failed: 
java.net.Connect
Exception: Connection refused
2022-09-26 11:53:56,246+02 ERROR 
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
Connection timeout for host '10.2.0.5', last response arrived 90001 ms ago.
2022-09-26 11:53:56,269+02 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-51) 
[3ba89774] EVENT_ID: IRS_BROKER_COMMAND_FAILURE(10,803), VDSM command 
GetStoragePoolInfoVDS failed: Connection timeout for host '10.2.0.5', last 
response arrived 90001 ms ago.
2022-09-26 11:53:56,270+02 ERROR 
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-51) 
[3ba89774] ERROR, 
org.ovirt.engine.core.vdsbroker.irsbroker.GetStoragePoolInfoVDSCommand, 
exception: VDSGenericException: VDSNetworkException: Connection timeout for 
host '10.2.0.5', last response arrived 90001 ms ago., log id: c571eb8
2022-09-26 11:53:56,273+02 ERROR 
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-51) 
[3ba89774] Exception VDSNetworkException: VDSGenericException: 
VDSNetworkException: Connection timeout for host '10.2.0.5', last response 
arrived 90001 ms ago.
2022-09-26 11:53:57,726+02 INFO  
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
Connecting to /10.2.0.5
2022-09-26 11:53:57,727+02 INFO  
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
Connected to /10.2.0.5:54321
2022-09-26 11:53:57,732+02 ERROR 
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] 
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-85) [] 
Unable to RefreshCapabilities: ConnectException: Connection refused
2022-09-26 11:53:57,734+02 ERROR 
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] 
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-85) [] 
Command 'GetCapabilitiesAsyncVDSCommand(HostName = vserv05, 
VdsIdAndVdsVDSCommandParametersBase:{hostId='75188aec-44ef-43b9-92b5-e5b06ae22ada',
 vds='Host[vserv05,75188aec-44ef-43b9-92b5-e5b06ae22ada]'})' execution failed: 
java.net.ConnectException: Connection refused
2022-09-26 11:54:00,750+02 INFO  
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
Connecting to /10.2.0.5
2022-09-26 11:54:00,750+02 INFO  
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
Connected to /10.2.0.5:54321
2022-09-26 11:54:00,754+02 ERROR 
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] 
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-50) [] 
Unable to RefreshCapabilities: ConnectException: Connection refused
2022-09-26 11:54:00,758+02 ERROR 
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] 
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-50) [] 
Command 'GetCapabilitiesAsyncVDSCommand(HostName = vserv05, 
VdsIdAndVdsVDSCommandParametersBase:{hostId='75188aec-44ef-43b9-92b5-e5b06ae22ada',
 vds='Host[vserv05,75188aec-44ef-43b9-92b5-e5b06ae22ada]'})' execution failed: 
java.net.ConnectException: Connection refused
2022-09-26 11:54:03,772+02 INFO  
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
Connecting to /10.2.0.5
2022-09-26 11:54:03,773+02 INFO  
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
Connected to /10.2.0.5:54321
2022-09-26 11:54:03,780+02 ERROR 
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] 
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-100) [] 
Unable to RefreshCapabilities: ConnectException: Connection refused
2022-09-26 11:54:03,782+02 ERROR 
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] 
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-100) [] 
Command 'GetCapabilitiesAsyncVDSCommand(HostName = vserv05, 
VdsIdAndVdsVDSCommandParametersBase:{hostId='75188aec-44ef-43b9-92b5-e5b06ae22ada',
 vds='Host[vserv05,75188aec-44ef-43b9-92b5-e5b06ae22ada]'})' execution failed: 
java.net.ConnectException: Connection refused
2022-09-26 11:54:06,330+02 INFO  
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
Connecting to /10.2.0.5
2022-09-26 11:54:06,330+02 INFO  
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
Connected to /10.2.0.5:54321
2022-09-26 11:54:06,333+02 ERROR 
[org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] 
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-92) [] 
Command 'SpmStatusVDSCommand(HostName = vserv05, 
SpmStatusVDSCommandParameters:{hostId='75188aec-44ef-43b9-92b5-e5b06ae22ada', 
storagePoolId='5836aaac-0030-0064-024d-0000000002e4'})' execution failed: 
java.net.ConnectException: Connection refused
2022-09-26 11:54:06,517+02 INFO  
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
Connecting to /10.2.0.5
2022-09-26 11:54:06,517+02 INFO  
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
Connected to /10.2.0.5:54321
---<snapp>----
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5BA5HV57DO77Y5EN5SPMDN5CRKCYPCFB/

Reply via email to