Hello oVirt Users Community,

I've been working with Red Hat and RHEL and clones for about 11 years, though I do still consider myself amateur mostly because I'm more of a networking guy. :) One-man IT department so I get very little time to tinker.

I'm evaluating oVirt (because the boss said no to VMware) and will likely begin implementation soon to virtualize our datacenter. So I have a SuperMicro Twin2 (4 nodes) system and a cheap managed L2+ switch to use for now. Dual 6-core Xeon's and 24GB per node. The two on-board 82574L's are bonded 802.3ad, no issues there (so far). I currently have two 1TB WD RE4 SATA drives configured as RAID1 using the Intel RAID BIOS in each node. I understand this is software RAID. That's all working fine and I did this so that if a drive dies then I can still boot the machine(s). I have a 500MB partition formatted as ext4 for /boot. A 48GB ext4 for the root. 24GB for swap. And finally the rest (800-something GB) is LVM and XFS for Gluster.

I've been following Jason Brooks' "Up and Running with oVirt" guides (which are great, BTW!). I have the cluster up and running with CentOS 7 and oVirt 3.5, hosted-engine on CentOS 6.6 and CTDB to host a virtual IP for the engine NFS mount. There are a couple test VMs running along with the engine on various nodes. I found it interesting that I was able to upload a ripped ISO of Win 2k3 Enterprise (not SP2) and was able to successfully boot it, after which I promptly installed SP2 and oVirt guest tools. I do very little with Windows, but there's always that one remaining customer that needs IIS and we're not about to buy a new Windows Server 2012 license just for them.

So anyway, I'm having a problem with node reboots. They simply will not shut down and reboot cleanly. Instead, it looks like they hang after all processes are shut down, or at least attempted to be shut down. Then after a couple minutes, the hardware watchdog resets the system. I've came to the conclusion that sanlock and/or wdmd is causing the hangup. I'm guessing an active but non-responsive NFS mount is the culprit, possibly the ISO domain NFS mount which is on the engine? I've tried manually shutting down all oVirt, VDSM, etc. processes, unmounting all NFS shares, but it seems sanlock still has a hold on something in /rhev/.. I've Google'd a bit and have come across posts about this as well. Any tips here?

Then I experienced something else odd yesterday. I did a yum update for the glibc vulnerability stuff. Gluster was updated as well which really threw a wrench into things because I wasn't paying attention and quorum broke, etc. I got that fixed. Rebooted all nodes (which is when I found the sanlock/watchdog problem). Nodes 2, 3 and 4 came back up, but node1 did not. I logged into the IPKVM console and found that it had no network configuration. All /etc/sysconfig/network-scripts/ifcfg-* files were gone. I was able to manually reconfigure the physical interfaces, set the bonding back up and add the ovirtmgmt bridge. But then the engine reported the host as non-operational due to '..does not comply with cluster default networks... ovirtmgmt missing' which I was able to resolve by reconfiguring the host's network config within the engine GUI and all is now well. I'm just curious how/why the ifcfg files were wiped out? I haven't touched the network config on any hosts since running hosted-engine --deploy.

Please forgive my ignorance and point me to the correct place if these issues have been discussed and/or resolved already.

And overall I'm very much liking oVirt, especially as a viable and cost-effective alternative to vSphere.

Thanks,
George
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to