Hi,
I have an ovirt 4.4.3  with 2 clusters, hosted engine and iscsi storage.
First cluster, composed of 2 servers (host1 and host2), is dedicated to the
hosted engine, the second cluster is for vms. Furthermore, there is a SAN
with 3 luns: one for hosted engine storage, one for vms and one unused. My
SAN is built on top of a pacemaker/drbd cluster with 2 nodes with a virtual
ip used as iscsi Portal IP. Starting from today, after a failover of the
iscsi cluster, I'm unable to start the hosted engine. It seems that there
is some problem with storage.
Actually I have only one node (host1) running in the cluster. It seems
there is some lock on lvs, but I'm not sure of this.

Here are some details about the problem:

1. iscsiadm -m session
iSCSI Transport Class version 2.0-870
version 6.2.0.878-2
Target: iqn.2003-01.org.linux-iscsi.s1-node1.x8664:sn.2a734f67d5b1
(non-flash)
Current Portal: 10.3.8.8:3260,1
Persistent Portal: 10.3.8.8:3260,1
**********
Interface:
**********
Iface Name: default
Iface Transport: tcp
Iface Initiatorname: iqn.1994-05.com.redhat:4b668221d9a9
Iface IPaddress: 10.3.8.10
Iface HWaddress: default
Iface Netdev: default
SID: 1
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN
Internal iscsid Session State: NO CHANGE
*********
Timeouts:
*********
Recovery Timeout: 5
Target Reset Timeout: 30
LUN Reset Timeout: 30
Abort Timeout: 15
*****
CHAP:
*****
username: <empty>
password: ********
username_in: <empty>
password_in: ********
************************
Negotiated iSCSI params:
************************
HeaderDigest: None
DataDigest: None
MaxRecvDataSegmentLength: 262144
MaxXmitDataSegmentLength: 262144
FirstBurstLength: 65536
MaxBurstLength: 262144
ImmediateData: Yes
InitialR2T: Yes
MaxOutstandingR2T: 1
************************
Attached SCSI devices:
************************
Host Number: 7 State: running
scsi7 Channel 00 Id 0 Lun: 0
Attached scsi disk sdb State: running
scsi7 Channel 00 Id 0 Lun: 1
Attached scsi disk sdc State: running

2. vdsm.log errors:

2020-11-27 18:37:16,786+0100 INFO  (jsonrpc/0) [api] FINISH getStats
error=Virtual machine does not exist: {'vmId':
'f3a1194d-0632-43c6-8e12-7f22518cff87'} (api:129)
.....
2020-11-27 18:37:52,864+0100 INFO  (jsonrpc/4) [vdsm.api] FINISH
getVolumeInfo error=(-223, 'Sanlock resource read failure', 'Lease does not
exist on storage') from=::1,60880,
task_id=138a3615-d537-4e5f-a39c-335269ad0917 (api:52)
2020-11-27 18:37:52,864+0100 ERROR (jsonrpc/4) [storage.TaskManager.Task]
(Task='138a3615-d537-4e5f-a39c-335269ad0917') Unexpected error (task:880)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 887,
in _run
    return fn(*args, **kargs)
  File "<decorator-gen-159>", line 2, in getVolumeInfo
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in
method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 3142,
in getVolumeInfo
    info = self._produce_volume(sdUUID, imgUUID, volUUID).getInfo()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/volume.py", line 258,
in getInfo
    leasestatus = self.getLeaseStatus()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/volume.py", line 203,
in getLeaseStatus
    self.volUUID)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 549, in
inquireVolumeLease
    return self._domainLock.inquire(lease)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/clusterlock.py", line
464, in inquire
    sector=self._block_size)
sanlock.SanlockException: (-223, 'Sanlock resource read failure', 'Lease
does not exist on storage')
2020-11-27 18:37:52,865+0100 INFO  (jsonrpc/4) [storage.TaskManager.Task]
(Task='138a3615-d537-4e5f-a39c-335269ad0917') aborting: Task is aborted:
"value=(-223, 'Sanlock resource read failure', 'Lease does not exist on
storage') abortedcode=100" (task:1190)


3. supervdsm.log
MainProcess|monitor/de4645f::DEBUG::2020-11-27
18:41:25,286::commands::153::common.commands::(start) /usr/bin/taskset
--cpu-list 0-11 /usr/sbin/dmsetup remove
de4645fc--f379--4837--916b--a0c2b89927d9-dfa4e933--2b9c--4057--a4c5--aa4485b070e9
(cwd None)
MainProcess|monitor/de4645f::DEBUG::2020-11-27
18:41:25,293::commands::98::common.commands::(run) FAILED: <err> =
b'device-mapper: remove ioctl on
de4645fc--f379--4837--916b--a0c2b89927d9-dfa4e933--2b9c--4057--a4c5--aa4485b070e9
 failed: Device or resource busy\nCommand failed.\n'; <rc> = 1
MainProcess|monitor/de4645f::ERROR::2020-11-27
18:41:25,294::supervdsm_server::97::SuperVdsm.ServerCallback::(wrapper)
Error in devicemapper_removeMapping
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/devicemapper.py",
line 141, in removeMapping
    commands.run(cmd)
  File "/usr/lib/python3.6/site-packages/vdsm/common/commands.py", line
101, in run
    raise cmdutils.Error(args, p.returncode, out, err)
vdsm.common.cmdutils.Error: Command ['/usr/sbin/dmsetup', 'remove',
'de4645fc--f379--4837--916b--a0c2b89927d9-dfa4e933--2b9c--4057--a4c5--aa4485b070e9']
failed with rc=1 out=b'' err=b'device-mapper: remove ioctl on
de4645fc--f379--4837--916b--a0c2b89927d9-dfa4e933--2b9c--4057--a4c5--aa4485b070e9
 failed: Device or resource busy\nCommand failed.\n'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/supervdsm_server.py", line
95, in wrapper
    res = func(*args, **kwargs)
  File
"/usr/lib/python3.6/site-packages/vdsm/supervdsm_api/devicemapper.py", line
29, in devicemapper_removeMapping
    return devicemapper.removeMapping(deviceName)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/devicemapper.py",
line 143, in removeMapping
    raise Error("Could not remove mapping: {}".format(e))
vdsm.storage.devicemapper.Error: Could not remove mapping: Command
['/usr/sbin/dmsetup', 'remove',
'de4645fc--f379--4837--916b--a0c2b89927d9-dfa4e933--2b9c--4057--a4c5--aa4485b070e9']
failed with rc=1 out=b'' err=b'device-mapper: remove ioctl on
de4645fc--f379--4837--916b--a0c2b89927d9-dfa4e933--2b9c--4057--a4c5--aa4485b070e9
 failed: Device or resource busy\nCommand failed.\n'


4. /var/log/messages:
...
Nov 27 18:43:46 host1 journal[20573]: ovirt-ha-agent
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to
start necessary monitors
Nov 27 18:43:46 host1 journal[20573]: ovirt-ha-agent
ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call
last):#012  File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 85, in start_monitor#012    response = self._proxy.start_monitor(type,
options)#012  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1112, in
__call__#012    return self.__send(self.__name, args)#012  File
"/usr/lib64/python3.6/xmlrpc/client.py", line 1452, in __request#012
 verbose=self.__verbose#012  File "/usr/lib64/python3.6/xmlrpc/client.py",
line 1154, in request#012    return self.single_request(host, handler,
request_body, verbose)#012  File "/usr/lib64/python3.6/xmlrpc/client.py",
line 1166, in single_request#012    http_conn = self.send_request(host,
handler, request_body, verbose)#012  File
"/usr/lib64/python3.6/xmlrpc/client.py", line 1279, in send_request#012
 self.send_content(connection, request_body)#012  File
"/usr/lib64/python3.6/xmlrpc/client.py", line 1309, in send_content#012
 connection.endheaders(request_body)#012  File
"/usr/lib64/python3.6/http/client.py", line 1249, in endheaders#012
 self._send_output(message_body, encode_chunked=encode_chunked)#012  File
"/usr/lib64/python3.6/http/client.py", line 1036, in _send_output#012
 self.send(msg)#012  File "/usr/lib64/python3.6/http/client.py", line 974,
in send#012    self.connect()#012  File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py",
line 74, in connect#012
 self.sock.connect(base64.b16decode(self.host))#012FileNotFoundError:
[Errno 2] No such file or directory#012#012During handling of the above
exception, another exception occurred:#012#012Traceback (most recent call
last):#012  File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 131, in _run_agent#012    return action(he)#012  File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 55, in action_proper#012    return he.start_monitoring()#012  File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 437, in start_monitoring#012    self._initialize_broker()#012  File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 561, in _initialize_broker#012    m.get('options', {}))#012  File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 91, in start_monitor#012    ).format(t=type, o=options,
e=e)#012ovirt_hosted_engine_ha.lib.exceptions.RequestError: brokerlink -
failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or
directory, [monitor: 'network', options: {'addr': '10.3.7.1',
'network_test': 'dns', 'tcp_t_address': '', 'tcp_t_port': ''}]
Nov 27 18:43:46 host1 journal[20573]: ovirt-ha-agent
ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent
Nov 27 18:43:46 host1 systemd[1]: ovirt-ha-agent.service: Main process
exited, code=exited, status=157/n/a
Nov 27 18:43:46 host1 systemd[1]: ovirt-ha-agent.service: Failed with
result 'exit-code'.
....

Please, let me know if other logs are needed.
Thank you
-- 
Ai sensi dell'articolo 13 del Regolamento UE, n. 2016/679 (GDPR) si informa
che Titolare del trattamento dei Suoi dati personali, anche particolari,
compreso l'indirizzo di posta elettronica, è la EXTRAORDY S.r.l.. E'
possibile revocare il consenso in qualsiasi momento senza pregiudicare la
liceità del trattamento basata sul consenso prestato prima della revoca,
nonché proporre reclamo all'Autorità di controllo. Agli interessati sono
riconosciuti i diritti di cui agli artt. 15 ss. del Regolamento UE, n.
2016/679 e in particolare di chiedere al titolare del trattamento l'accesso
ai dati personali e la rettifica o la cancellazione degli stessi o la
limitazione del trattamento o la portabilità dei dati che lo riguardano o
di opporsi al loro trattamento rivolgendo le richieste inviando un
messaggio al seguente indirizzo e-mail [email protected].
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/6F4SO3EVMLNEGZDUTZ7NCPR75X3RVEAM/

Reply via email to