This is using iscsi storage. I stopped the ovirt broker/agents/vdsm and used 
sanlock to remove the locks it was complaining about, but as soon as I started 
the ovirt tools up and the engine came online again the same messages 
reappeared.

After spending more than a day trying to resolve this nicely I gave up. 
Installed ovirt-node on the host I originally removed, added that to the 
cluster, then removed and nuked the misbehaving host and did a clean install 
there. I did run into an issue where the first host had an empty 
hosted-engine.conf, only had the cert and the id settings in it so it wouldn’t 
connect properly, but I worked around that by just copying the fully populated 
one from the semi-working host and changing the id to match.
No idea if this is the right solution but it _seems_ to be working and my VMs 
are back to running, just got too frustrated trying to debug through normal 
methods and finding solutions offered via the ovirt tools and documentation.

- Eric

On 9/4/20, 10:59 AM, "Strahil Nikolov" <hunter86...@yahoo.com> wrote:

    Is this a HCI setup ?
    If yes, check gluster status (I prefer cli but is also valid in the UI).

    gluster pool list
    gluster volume status

    gluster volume heal <VOL> info summary

    Best Regards,
    Strahil Nikolov






    В петък, 4 септември 2020 г., 00:38:13 Гринуич+3, Gillingham, Eric J (US 
393D) via Users <users@ovirt.org> написа: 





    I recently removed a host from my cluster to upgrade it to 4.4, after I 
removed the host from the datacenter VMs started to pause on the second system 
they all migrated to. Investigating via the engine showed the storage domain 
was showing as "unknown", when I try to activate it via the engine it cycles to 
locked then to unknown again.

    /var/log/sanlock.log contains a repeating:
    add_lockspace 
e1270474-108c-4cae-83d6-51698cffebbf:1:/dev/e1270474-108c-4cae-83d6-51698cffebbf/ids:0
 conflicts with name of list1 s1 
e1270474-108c-4cae-83d6-51698cffebbf:3:/dev/e1270474-108c-4cae-83d6-51698cffebbf/ids:0


    vdsm.log contains these (maybe related) snippets:
    ---
    2020-09-03 20:19:53,483+0000 INFO  (jsonrpc/6) [vdsm.api] FINISH 
getAllTasksStatuses error=Secured object is not in safe state 
from=::ffff:137.79.52.43,36326, flow_id=18031a91, 
task_id=8e92f059-743a-48c8-aa9d-e7c4c836337b (api:52)
    2020-09-03 20:19:53,483+0000 ERROR (jsonrpc/6) [storage.TaskManager.Task] 
(Task='8e92f059-743a-48c8-aa9d-e7c4c836337b') Unexpected error (task:875)
    Traceback (most recent call last):
      File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, 
in _run
        return fn(*args, **kargs)
      File "<string>", line 2, in getAllTasksStatuses
      File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in 
method
        ret = func(*args, **kwargs)
      File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2201, 
in getAllTasksStatuses
        allTasksStatus = self._pool.getAllTasksStatuses()
      File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 
77, in wrapper
        raise SecureError("Secured object is not in safe state")
    SecureError: Secured object is not in safe state
    2020-09-03 20:19:53,483+0000 INFO  (jsonrpc/6) [storage.TaskManager.Task] 
(Task='8e92f059-743a-48c8-aa9d-e7c4c836337b') aborting: Task is aborted: 
u'Secured object is not in safe state' - code 100 (task:1181)
    2020-09-03 20:19:53,483+0000 ERROR (jsonrpc/6) [storage.Dispatcher] FINISH 
getAllTasksStatuses error=Secured object is not in safe state (dispatcher:87)
    Traceback (most recent call last):
      File "/usr/lib/python2.7/site-packages/vdsm/storage/dispatcher.py", line 
74, in wrapper
        result = ctask.prepare(func, *args, **kwargs)
      File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 108, 
in wrapper
        return m(self, *a, **kw)
      File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1189, 
in prepare
        raise self.error
    SecureError: Secured object is not in safe state
    ---
    2020-09-03 20:44:23,252+0000 INFO  (tasks/2) 
[storage.ThreadPool.WorkerThread] START task 
76415a77-9d29-4b72-ade1-53207cfc503b (cmd=<bound method Task.commit of 
<vdsm.storage.task.Task instance at 0x7fe99c27dea8>>, args=None) (thre
    adPool:208)
    2020-09-03 20:44:23,266+0000 INFO  (tasks/2) [storage.SANLock] Acquiring 
host id for domain e1270474-108c-4cae-83d6-51698cffebbf (id=1, wait=True) 
(clusterlock:313)
    2020-09-03 20:44:23,267+0000 ERROR (tasks/2) [storage.TaskManager.Task] 
(Task='76415a77-9d29-4b72-ade1-53207cfc503b') Unexpected error (task:875)
    Traceback (most recent call last):
      File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, 
in _run
        return fn(*args, **kargs)
      File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, 
in run
        return self.cmd(*self.argslist, **self.argsdict)
      File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 317, in 
startSpm
        self.masterDomain.acquireHostId(self.id)
      File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 957, in 
acquireHostId
        self._manifest.acquireHostId(hostId, wait)
      File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 501, in 
acquireHostId
        self._domainLock.acquireHostId(hostId, wait)
      File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 
344, in acquireHostId
        raise se.AcquireHostIdFailure(self._sdUUID, e)
    AcquireHostIdFailure: Cannot acquire host id: 
('e1270474-108c-4cae-83d6-51698cffebbf', SanlockException(22, 'Sanlock 
lockspace add failure', 'Invalid argument'))
    ---

    Another symptom is in the hosts view of the engine SPM bounces between 
"Normal" and "Contending". When it's Normal if I select Management -> Select as 
SPM I get "Error while executing action: Cannot force select SPM. Unknown Data 
Center status."

    I've tried rebooting the one remaining host in the cluster no to avail, 
hosted-engine --reinitialize-lockspace also seems to not solve the issue.


    I'm kind of stumped as to what else to try, would appreciate any guidance 
on how to resolve this.

    Thank You

    _______________________________________________
    Users mailing list -- users@ovirt.org
    To unsubscribe send an email to users-le...@ovirt.org
    Privacy Statement: 
https://urldefense.us/v3/__https://www.ovirt.org/privacy-policy.html__;!!PvBDto6Hs4WbVuu7!dlNR61BLA2gYfTKr9Dxmh5LzGF7wduIuE4_d9-4hP9HoDaktLzUQKYAKnOa-BnDoa0bY9WKL6PQ$
 
    oVirt Code of Conduct: 
https://urldefense.us/v3/__https://www.ovirt.org/community/about/community-guidelines/__;!!PvBDto6Hs4WbVuu7!dlNR61BLA2gYfTKr9Dxmh5LzGF7wduIuE4_d9-4hP9HoDaktLzUQKYAKnOa-BnDoa0bY6H2pfZg$
 
    List Archives: 
https://urldefense.us/v3/__https://lists.ovirt.org/archives/list/users@ovirt.org/message/FMJZV2OEKHPTSTROSPLCQ3WJUIPB6CKL/__;!!PvBDto6Hs4WbVuu7!dlNR61BLA2gYfTKr9Dxmh5LzGF7wduIuE4_d9-4hP9HoDaktLzUQKYAKnOa-BnDoa0bYiyin8To$
 

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NIFHQDAGC3PAGS2KONO2VLECX7AWSH3Y/

Reply via email to