On Sun, Jul 3, 2016 at 5:57 AM, Kevin Hung <kh...@nullaxiom.com> wrote: > Looks like there still needs to be some work done on oVirt 4.0 Node and > ovirt-hosted-engine-setup before it's ready for general consumption. I have > spent days trying to get this to work, and only got it running (on one host) > after encountering 8 serious issues (7 below and the initial glusterfs one). > I have not been able to successfully deploy a second host (see issue 7 > below). I will be moving back to deploying hosts using CentOS (with either > oVirt 4.0 or oVirt 3.6) as I need a working oVirt deployment up and running. > > In case anyone is interested in reproducing the issues, I used the Node ISO > here [1] and the latest (7/2/2016) engine appliance OVA here [2]. Those seem > to be the "official" files as far as I can tell (which is difficult as the > documentation is not clear). > > List of issues: > 1. The error I mentioned seems to be an problem with the code. I bypassed it > by deleting /usr/libexec/vdsm/hooks/before_network_setup/50_fcoe. > 2. ovirt-hosted-engine-setup is unable to connect to the vdsm service if the > FQDN of the node is not resolvable (i.e. if a DNS server is not entered in > the initial setup). This should be checked in either the initial oVirt Node > setup process or the beginning of ovirt-hosted-engine-setup. > 3. The management bridge does not get created properly when the server is > set up with a manually configured DNS server and running NetworkManager (the > default on Node). It seems like a bug has been filed for this back in 2014. > [3] > 4. Using cloud-init with default values to customize the engine appliance > can fail on the line "Creating/refreshing DWH database schema" if it takes > longer than 600 seconds to return output. This may apply to any other step > that takes a long time to complete. The VM no longer appears to be exist > after the setup exits that so I am unable to debug.
600 seconds seams more than a reasonable time to create an empty DB, if it requires more than 10 minutes for a simple/short operation there is probably something strange with the storage. > 5. Without using cloud-init, the setup creates an engine VM that I cannot > log into (it does not seem to use the engine admin password or a blank > password). Yes, the engine VM host-name and its root password are configured via cloud-init and there is not default password. If you want to avoid using cloud-init you have to reset the root password of the engine VM as for any el7 machine. > 6. Destroying the VM (option 4) leaves the files intact on the shared > storage so I cannot restart setup without deleting those first. This may be > intentional, but the use of kvm terminology (destroy for power off) is not > common, not to mention that "virsh -r list --all" does not list the VM > anymore. On failures, there is not just the engine VM disk but a whole storage domain for hosted-engine which also contains ancillary disks. Re-deploying over a dirty storage is not supported so please clean up the whole storage domain on failures. > 7. Unable to deploy second host through web UI (error "Failed to configure > management network on host node2 due to setup networks failure.") or using This is not hosted-engine specific: https://bugzilla.redhat.com/show_bug.cgi?id=1350763 > ovirt-hosted-engine-setup (it looks like it can't connect to or doesn't > start the broker service). > 8. Random errors to stderr: "vcpu0 unhandled rdmsr" (this seems to be an Are you running in a nested env? > upstream bug) and "multipath: error getting device" (this has been an issue > for years with oVirt and seems to be due to multipathing being on by default > even for systems where that does not apply). > > [1] > http://resources.ovirt.org/pub/ovirt-4.0/iso/ovirt-node-ng-installer/ovirt-node-ng-installer-ovirt-4.0-2016062412.iso > [2] > http://jenkins.ovirt.org/view/All/job/ovirt-appliance_ovirt-4.0_build-artifacts-el7-x86_64/ > [3] https://bugzilla.redhat.com/show_bug.cgi?id=1160423 > > > On 7/1/2016 8:37 PM, Kevin Hung wrote: >> >> It looks like I'm now getting an error when the deployment tries to >> configure the management bridge. >> >> Setup log: >> >> 2016-07-01 20:29:47 INFO otopi.plugins.gr_he_common.network.bridge >> bridge._misc: >> 372 Configuring the management bridge >> 2016-07-01 20:29:48 DEBUG otopi.plugins.gr_he_common.network.bridge >> bridge._misc >> :384 networks: {'ovirtmgmt': {'nic': 'eno1', 'ipaddr': u'192.168.1.211', >> 'netmask': u'255.255.255.0', 'bootproto': u'none', 'gateway': >> u'192.168.1.1', 'defaultRoute': True}} >> 2016-07-01 20:29:48 DEBUG otopi.plugins.gr_he_common.network.bridge >> bridge._misc >> :385 bonds: {} >> 2016-07-01 20:29:48 DEBUG otopi.plugins.gr_he_common.network.bridge >> bridge._misc >> :386 options: {'connectivityCheck': False} >> 2016-07-01 20:29:48 DEBUG otopi.context context._executeMethod:142 method >> exception >> Traceback (most recent call last): >> File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in >> _executeMethod >> method['method']() >> File >> "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/network/bridge.py", >> line 387, in _misc >> _setupNetworks(conn, networks, bonds, options) >> File >> "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/network/bridge.py", >> line 405, in _setupNetworks >> 'message: "%s"' % (networks, code, message)) >> RuntimeError: Failed to setup networks {'ovirtmgmt': {'nic': 'eno1', >> 'ipaddr': u'192.168.1.211', 'netmask': u'255.255.255.0', 'bootproto': >> u'none', 'gateway': u'192.168.1.1', 'defaultRoute': True}}. Error code: "78" >> message: "Hook error: Hook Error: ('Traceback (most recent call last):\n >> File "/usr/libexec/vdsm/hooks/before_network_setup/50_fcoe", line 18, in >> <module>\n from vdsm.netconfpersistence import >> RunningConfig\nImportError: No module named netconfpersistence\n',)" >> 2016-07-01 20:29:48 ERROR otopi.context context._executeMethod:151 Failed >> to execute stage 'Misc configuration': Failed to setup networks >> {'ovirtmgmt': {'nic': 'eno1', 'ipaddr': u'192.168.1.211', 'netmask': >> u'255.255.255.0', 'bootproto': u'none', 'gateway': u'192.168.1.1', >> 'defaultRoute': True}}. Error code: "78" message: "Hook error: Hook Error: >> ('Traceback (most recent call last):\n File >> "/usr/libexec/vdsm/hooks/before_network_setup/50_fcoe", line 18, in >> <module>\n from vdsm.netconfpersistence import >> RunningConfig\nImportError: No module named netconfpersistence\n',)" >> >> >> On 7/1/2016 5:21 PM, Kevin Hung wrote: >>> >>> Thank you Sahina, that was the issue. I upgraded my glusterfs server to >>> 3.7.11 and I was able to continue with the deployment. I am seeing other >>> issues with deployment, but I will look into those myself first. Bug has >>> been logged [1]. >>> >>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1352165 >>> > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users