Seann,

If this happens again, try doing nothing (seriously)  Each time I've had a
power failure, the engine takes a really long time to come back up.  I
don't know if it's by design or what.  Host logs are flooded with errors,
everything seemingly storage related. However, my Gluster setup is on fast
SSDs and gets back and running pretty much straight away. It takes maybe 5
minutes for the nodes to re-join and the volumes to show 'UP' with no
heals.  However, it still takes the hosted-engine a good hour or two to
simmer down and finally start up.

Sometimes I try to help by restarting ha-broker and ha-agent, or plunking
in other random commands from the mess of documentation, but it seems to
sort itself out on its own time, regardless of my tinkering.

I wish I could get more insight into the process, but definitely, doing
nothing and waiting has been the most successful troubleshooting step I
have taken.

Cheers!


On Mon, Mar 29, 2021 at 11:32 AM Seann G. Clark via Users <[email protected]>
wrote:

> All,
>
>
>
> After a power failure, and generator failure I lost my cluster, and the
> Hosted engine refused to restart after power was restored. I would expect,
> once storage comes up that the hosted engine comes back online without too
> much of a fight. In practice because the SPM went down as well, there is no
> (clearly documented) way to clear any of the stale locks, and no way to
> recover both the hosted engine and the cluster.
>
>
>
> I have spent the last 12 hours trying to get a functional hosted-engine
> back online, on a new node and each attempt hits a new error, from the
> installer not understanding that 16384mb of dedicated VM memory out of
> 192GB free on the host is indeed bigger than 4096MB, to ansible dying  on
> an error like this “Error while executing action: Cannot add Storage
> Connection. Storage connection already exists.”
>
> The memory error referenced above shows up as:
>
> [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg":
> "Available memory ( {'failed': False, 'changed': False, 'ansible_facts':
> {u'max_mem': u'180746'}}MB ) is less then the minimal requirement (4096MB).
> Be aware that 512MB is reserved for the host and cannot be allocated to the
> engine VM."}
>
> That is what I typically get when I try the steps outlined in the KB
> “CHAPTER 7. RECOVERING A SELF-HOSTED ENGINE FROM AN EXISTING BACKUP” from
> the RH Customer portal. I have tried this numerous ways, and the cluster
> still remains in a bad state, with the hosted engine being 100% inoperable.
>
>
>
> What I do have are the two host that are part of the cluster and can host
> the engine, and backups of the original hosted engine, both disk and
> engine-backup generated. I am not sure what I can do next, to recover this
> cluster, any suggestions would be apricated.
>
>
>
> Regards,
>
> Seann
>
>
>
>
> _______________________________________________
> Users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/[email protected]/message/JLDIFTKYDPQ6YK5IGH7RVOXKTTRD6ZBH/
>
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/ZKG4NSVEXZQ73O4BGKFVURQ2YZNCQC3Q/

Reply via email to