I have done a bit more investigating on this matter. If I restart the node from within oVirt using the power management option "restart", then the node restarts and vdsmd DOES NOT start. If I go into the DRAC and issue the command to power cycle the machine, then the machine restarts and vdsmd DOES start. I can run the following command from another node in the cluster: fence_drac5 -a 192.168.200.105 -l root -p <password> -x -o reboot and the node restarts and vdsmd DOES start.
On Sun, Jan 25, 2015 at 1:56 AM, ILanit Stein <ist...@redhat.com> wrote: > Hi Rob, > > Thanks for this report. > > Would you please provide these logs, at the time frame, the host failure > occur: > 1. oVirt Engine: /var/log/ovirt-engine/engine.log > 2. host: /var/log/vdsm/vdsm.log > > If it is reproducible, please add this info as well. > > You can also check vdsm service status, on host, while host reported as > Non responsive, > by running on host 'service vdsmd status' > There might some problem, that might have prevented from vdsm service to > come up, on host. > > Ilanit. > > ----- Original Message ----- > From: "Rob Abshear" <rabsh...@citytwist.net> > To: users@ovirt.org > Sent: Friday, January 23, 2015 9:22:42 PM > Subject: [ovirt-users] Host remains Non-Responsive after reboot > > > I am running oVirt Engine Version 3.5.0.1-1.el6. I have 4 hosts in the > cluster. Each host has a drac5 and it is configured and working. I am > trying to simulate a node failure. I am running one HA VM on one of the > hosts for testing. I simulate the failure by powering off the host with the > VM running. > > Here is what is happening. > > > * Host is powered off > * ~4 minutes pass and the host is recognized as not responding > * Automatic fence runs and the VM migrates. Another host in the node > is chosen as a proxy to execute Status command on the host. > * Same host is chosen as proxy to execute Start command on the host. > * Same host is chosen as proxy to execute Status command on the host. > * The host DOES physically start. > * The host never shows status of UP. > * I select “confirm host has been rebooted” and I see a manual fence > start. > * Host stays non-responsive. > * I put the host in maintenance and then activate it. > * Host still non-responsive > * I put the host in maintenance and do a reinstall > * Reinstall finishes and host becomes UP > > So, everything seems to go fine with the HA functionality, but the host > never recovers without being reinstalled. Please let me know which logs you > need to look at to help me out with this. > > Thanks > > > Sent with Mixmax > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users