I recently removed a host from my cluster to upgrade it to 4.4, after I removed 
the host from the datacenter VMs started to pause on the second system they all 
migrated to. Investigating via the engine showed the storage domain was showing 
as "unknown", when I try to activate it via the engine it cycles to locked then 
to unknown again.

/var/log/sanlock.log contains a repeating:
add_lockspace 
e1270474-108c-4cae-83d6-51698cffebbf:1:/dev/e1270474-108c-4cae-83d6-51698cffebbf/ids:0
 conflicts with name of list1 s1 
e1270474-108c-4cae-83d6-51698cffebbf:3:/dev/e1270474-108c-4cae-83d6-51698cffebbf/ids:0


vdsm.log contains these (maybe related) snippets:
---
2020-09-03 20:19:53,483+0000 INFO  (jsonrpc/6) [vdsm.api] FINISH 
getAllTasksStatuses error=Secured object is not in safe state 
from=::ffff:137.79.52.43,36326, flow_id=18031a91, 
task_id=8e92f059-743a-48c8-aa9d-e7c4c836337b (api:52)
2020-09-03 20:19:53,483+0000 ERROR (jsonrpc/6) [storage.TaskManager.Task] 
(Task='8e92f059-743a-48c8-aa9d-e7c4c836337b') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in 
_run
    return fn(*args, **kargs)
  File "<string>", line 2, in getAllTasksStatuses
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2201, in 
getAllTasksStatuses
    allTasksStatus = self._pool.getAllTasksStatuses()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, 
in wrapper
    raise SecureError("Secured object is not in safe state")
SecureError: Secured object is not in safe state
2020-09-03 20:19:53,483+0000 INFO  (jsonrpc/6) [storage.TaskManager.Task] 
(Task='8e92f059-743a-48c8-aa9d-e7c4c836337b') aborting: Task is aborted: 
u'Secured object is not in safe state' - code 100 (task:1181)
2020-09-03 20:19:53,483+0000 ERROR (jsonrpc/6) [storage.Dispatcher] FINISH 
getAllTasksStatuses error=Secured object is not in safe state (dispatcher:87)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/dispatcher.py", line 74, 
in wrapper
    result = ctask.prepare(func, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 108, in 
wrapper
    return m(self, *a, **kw)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1189, in 
prepare
    raise self.error
SecureError: Secured object is not in safe state
---
2020-09-03 20:44:23,252+0000 INFO  (tasks/2) [storage.ThreadPool.WorkerThread] 
START task 76415a77-9d29-4b72-ade1-53207cfc503b (cmd=<bound method Task.commit 
of <vdsm.storage.task.Task instance at 0x7fe99c27dea8>>, args=None) (thre
adPool:208)
2020-09-03 20:44:23,266+0000 INFO  (tasks/2) [storage.SANLock] Acquiring host 
id for domain e1270474-108c-4cae-83d6-51698cffebbf (id=1, wait=True) 
(clusterlock:313)
2020-09-03 20:44:23,267+0000 ERROR (tasks/2) [storage.TaskManager.Task] 
(Task='76415a77-9d29-4b72-ade1-53207cfc503b') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in 
_run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 317, in 
startSpm
    self.masterDomain.acquireHostId(self.id)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 957, in 
acquireHostId
    self._manifest.acquireHostId(hostId, wait)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 501, in 
acquireHostId
    self._domainLock.acquireHostId(hostId, wait)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 
344, in acquireHostId
    raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id: 
('e1270474-108c-4cae-83d6-51698cffebbf', SanlockException(22, 'Sanlock 
lockspace add failure', 'Invalid argument'))
---

Another symptom is in the hosts view of the engine SPM bounces between "Normal" 
and "Contending". When it's Normal if I select Management -> Select as SPM I 
get "Error while executing action: Cannot force select SPM. Unknown Data Center 
status."

I've tried rebooting the one remaining host in the cluster no to avail, 
hosted-engine --reinitialize-lockspace also seems to not solve the issue.


I'm kind of stumped as to what else to try, would appreciate any guidance on 
how to resolve this.

Thank You

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FMJZV2OEKHPTSTROSPLCQ3WJUIPB6CKL/

Reply via email to