On čtvrtek 3. září 2020 22:49:17 CEST Gillingham, Eric J (US 393D) via Users wrote: > I recently removed a host from my cluster to upgrade it to 4.4, after I > removed the host from the datacenter VMs started to pause on the second > system they all migrated to. Investigating via the engine showed the > storage domain was showing as "unknown", when I try to activate it via the > engine it cycles to locked then to unknown again. > /var/log/sanlock.log contains a repeating: > add_lockspace > e1270474-108c-4cae-83d6-51698cffebbf:1:/dev/e1270474-108c-4cae-83d6-51698cf > febbf/ids:0 conflicts with name of list1 s1 > e1270474-108c-4cae-83d6-51698cffebbf:3:/dev/e1270474-108c-4cae-83d6-51698cf > febbf/ids:0
how do you remove the fist host, did you put it into maintenance first? I wonder, how this situation (two lockspaces with conflicting names) can occur. You can try to re-initialize the lockspace directly using sanlock command (see man sanlock), but it would be good to understand the situation first. > > vdsm.log contains these (maybe related) snippets: > --- > 2020-09-03 20:19:53,483+0000 INFO (jsonrpc/6) [vdsm.api] FINISH > getAllTasksStatuses error=Secured object is not in safe state > from=::ffff:137.79.52.43,36326, flow_id=18031a91, > task_id=8e92f059-743a-48c8-aa9d-e7c4c836337b (api:52) 2020-09-03 > 20:19:53,483+0000 ERROR (jsonrpc/6) [storage.TaskManager.Task] > (Task='8e92f059-743a-48c8-aa9d-e7c4c836337b') Unexpected error (task:875) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in > _run return fn(*args, **kargs) > File "<string>", line 2, in getAllTasksStatuses > File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in > method ret = func(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2201, in > getAllTasksStatuses allTasksStatus = self._pool.getAllTasksStatuses() > File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line > 77, in wrapper raise SecureError("Secured object is not in safe state") > SecureError: Secured object is not in safe state > 2020-09-03 20:19:53,483+0000 INFO (jsonrpc/6) [storage.TaskManager.Task] > (Task='8e92f059-743a-48c8-aa9d-e7c4c836337b') aborting: Task is aborted: > u'Secured object is not in safe state' - code 100 (task:1181) 2020-09-03 > 20:19:53,483+0000 ERROR (jsonrpc/6) [storage.Dispatcher] FINISH > getAllTasksStatuses error=Secured object is not in safe state > (dispatcher:87) Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/storage/dispatcher.py", line > 74, in wrapper result = ctask.prepare(func, *args, **kwargs) > File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 108, in > wrapper return m(self, *a, **kw) > File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1189, > in prepare raise self.error > SecureError: Secured object is not in safe state > --- > 2020-09-03 20:44:23,252+0000 INFO (tasks/2) > [storage.ThreadPool.WorkerThread] START task > 76415a77-9d29-4b72-ade1-53207cfc503b (cmd=<bound method Task.commit of > <vdsm.storage.task.Task instance at 0x7fe99c27dea8>>, args=None) (thre > adPool:208) > 2020-09-03 20:44:23,266+0000 INFO (tasks/2) [storage.SANLock] Acquiring > host id for domain e1270474-108c-4cae-83d6-51698cffebbf (id=1, wait=True) > (clusterlock:313) 2020-09-03 20:44:23,267+0000 ERROR (tasks/2) > [storage.TaskManager.Task] (Task='76415a77-9d29-4b72-ade1-53207cfc503b') > Unexpected error (task:875) Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in > _run return fn(*args, **kargs) > File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in > run return self.cmd(*self.argslist, **self.argsdict) > File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 317, in > startSpm self.masterDomain.acquireHostId(self.id) > File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 957, in > acquireHostId self._manifest.acquireHostId(hostId, wait) > File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 501, in > acquireHostId self._domainLock.acquireHostId(hostId, wait) > File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line > 344, in acquireHostId raise se.AcquireHostIdFailure(self._sdUUID, e) > AcquireHostIdFailure: Cannot acquire host id: > ('e1270474-108c-4cae-83d6-51698cffebbf', SanlockException(22, 'Sanlock > lockspace add failure', 'Invalid argument')) --- > > Another symptom is in the hosts view of the engine SPM bounces between > "Normal" and "Contending". When it's Normal if I select Management -> > Select as SPM I get "Error while executing action: Cannot force select SPM. > Unknown Data Center status." > I've tried rebooting the one remaining host in the cluster no to avail, > hosted-engine --reinitialize-lockspace also seems to not solve the issue. > > I'm kind of stumped as to what else to try, would appreciate any guidance on > how to resolve this. > Thank You > > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/FMJZV2OEKHPTS > TROSPLCQ3WJUIPB6CKL/
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/XSMRE7ZUNP5E5YEMRKPN5GSVKF5LYU4F/