Following up on this, I was able to recover everything, with only minor (and 
easy to fix) data loss.

The old hosted engine refused to come up, ever after a few hours of sitting. 
That is when I dug into the issue and found the agent service stating the image 
didn't exist/no such file or directory. It seems that was just one aspect of 
storage being impacted from the unexpected outage.

In regards to the memory issue, I was only getting it on one host, but was able 
to install, and recover, on another host in my cluster without the issue.

The broken host has this version of ansible's engine setup package:
ansible-2.9.18-1.el7.noarch
ovirt-ansible-hosted-engine-setup-1.0.32-1.el7.noarch
ovirt-ansible-engine-setup-1.1.9-1.el7.noarch
ovirt-hosted-engine-setup-2.3.13-1.el7.noarch

The one that works is:
ansible-2.8.3-1.el7.noarch
ovirt-ansible-hosted-engine-setup-1.0.26-1.el7.noarch
ovirt-ansible-engine-setup-1.1.9-1.el7.noarch
ovirt-hosted-engine-setup-2.3.11-1.el7.noarch

All of the SANLOCK issues I saw before, were remediated on the new deployment 
and recovery of the cluster as well.

Regards,
Seann

From: Roman Bednar
Sent: Thursday, April 01, 2021 6:07 AM
To: Thomas Hoberg <[email protected]>
Cc: [email protected]
Subject: [ovirt-users] Re: Power failure makes cluster and hosted engine 
unusable

Hi Thomas,

Thanks for looking into this, the problem is really somewhere around this tasks 
file. However I just tried faking the memory values directly inside the tasks 
file to something way higher and everything looks fine. I think the problem 
resides in registering the output of the "free -m" at the beginning of this 
file. There are also debug tasks which print registered values from the shell 
commands where we could take a closer look, see if it looks normal (stdout 
mainly).

This part that of the output that Seann provided seems particularly strange: 
Available memory ( {'failed': False, 'changed': False, 'ansible_facts': 
{u'max_mem': u'180746'}}MB )

Normally it should just show the exact value/string, here we're getting some 
dictionary from python most likely. I'd check if the latest version of ansible 
is installed and see if this can be reproduced if there was an update available.

If the issue persists please provide full log of the ansible run (ideally with 
-vvvv).


-Roman

On Wed, Mar 31, 2021 at 9:19 PM Thomas Hoberg 
<[email protected]<mailto:[email protected]>> wrote:
Roman, I believe the bug is in 
/usr/share/ansible/roles/ovirt.hosted_engine_setup/tasks/pre_checks/validate_memory_size.yml

  - name: Set Max memory
    set_fact:
      max_mem: "{{ free_mem.stdout|int + cached_mem.stdout|int - 
he_reserved_memory_MB + he_avail_memory_grace_MB }}"


If these lines are casting the result of `free -m` into 'int', that seems to 
fail at bigger RAM sizes.

I wound up having to delete all the available memory checks from that file to 
have the wizard progress on a machine with 512GB of RAM.
_______________________________________________
Users mailing list -- [email protected]<mailto:[email protected]>
To unsubscribe send an email to 
[email protected]<mailto:[email protected]>
Privacy Statement: 
https://www.ovirt.org/privacy-policy.html<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fprivacy-policy.html&data=04%7C01%7Cnombrandue%40tsukinokage.net%7C65b62227bf7d4ae84b4108d8f4f68e5e%7Cc72a24170e014338b318fc2dd908917e%7C0%7C0%7C637528687102898206%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mOp8x%2FOFiNd4mTCAuU3z9bWWtbZmllgtsALtA%2FKo4%2FE%3D&reserved=0>
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fcommunity%2Fabout%2Fcommunity-guidelines%2F&data=04%7C01%7Cnombrandue%40tsukinokage.net%7C65b62227bf7d4ae84b4108d8f4f68e5e%7Cc72a24170e014338b318fc2dd908917e%7C0%7C0%7C637528687102898206%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=NwqKLSkJizqzi7USgjkMbaZwSQvaFLiaRnWmLTiIFG0%3D&reserved=0>
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/CARDJXYUPFUFJT2VE2UNXELL2PSUZSPS/<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.ovirt.org%2Farchives%2Flist%2Fusers%40ovirt.org%2Fmessage%2FCARDJXYUPFUFJT2VE2UNXELL2PSUZSPS%2F&data=04%7C01%7Cnombrandue%40tsukinokage.net%7C65b62227bf7d4ae84b4108d8f4f68e5e%7Cc72a24170e014338b318fc2dd908917e%7C0%7C0%7C637528687102908204%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=77ASpg7w0Yl26yIrnGs4jOjx9iEvpw4U%2BL9NlOLUjgc%3D&reserved=0>
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/YVTTGIG4IPB6SJLLILXBHD2A4YDAY3GX/

Reply via email to