On 03/10/2015 10:20 AM, Simone Tiraboschi wrote: > > ----- Original Message ----- >> From: "Bob Doolittle" <b...@doolittle.us.com> >> To: "Simone Tiraboschi" <stira...@redhat.com> >> Cc: "users-ovirt" <users@ovirt.org> >> Sent: Tuesday, March 10, 2015 2:40:13 PM >> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 >> (The VDSM host was found in a failed >> state) >> >> >> On 03/10/2015 04:58 AM, Simone Tiraboschi wrote: >>> ----- Original Message ----- >>>> From: "Bob Doolittle" <b...@doolittle.us.com> >>>> To: "Simone Tiraboschi" <stira...@redhat.com> >>>> Cc: "users-ovirt" <users@ovirt.org> >>>> Sent: Monday, March 9, 2015 11:48:03 PM >>>> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on >>>> F20 (The VDSM host was found in a failed >>>> state) >>>> >>>> >>>> On 03/09/2015 02:47 PM, Bob Doolittle wrote: >>>>> Resending with CC to list (and an update). >>>>> >>>>> On 03/09/2015 01:40 PM, Simone Tiraboschi wrote: >>>>>> ----- Original Message ----- >>>>>>> From: "Bob Doolittle" <b...@doolittle.us.com> >>>>>>> To: "Simone Tiraboschi" <stira...@redhat.com> >>>>>>> Cc: "users-ovirt" <users@ovirt.org> >>>>>>> Sent: Monday, March 9, 2015 6:26:30 PM >>>>>>> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 >>>>>>> on >>>>>>> F20 (Cannot add the host to cluster ... SSH >>>>>>> has failed) >>>>>>> >> ... >>>>>>> OK, I've started over. Simply removing the storage domain was >>>>>>> insufficient, >>>>>>> the hosted-engine deploy failed when it found the HA and Broker >>>>>>> services >>>>>>> already configured. I decided to just start over fresh starting with >>>>>>> re-installing the OS on my host. >>>>>>> >>>>>>> I can't deploy DNS at the moment, so I have to simply replicate >>>>>>> /etc/hosts >>>>>>> files on my host/engine. I did that this time, but have run into a new >>>>>>> problem: >>>>>>> >>>>>>> [ INFO ] Engine replied: DB Up!Welcome to Health Status! >>>>>>> Enter the name of the cluster to which you want to add the >>>>>>> host >>>>>>> (Default) [Default]: >>>>>>> [ INFO ] Waiting for the host to become operational in the engine. >>>>>>> This >>>>>>> may >>>>>>> take several minutes... >>>>>>> [ ERROR ] The VDSM host was found in a failed state. Please check >>>>>>> engine >>>>>>> and >>>>>>> bootstrap installation logs. >>>>>>> [ ERROR ] Unable to add ovirt-vm to the manager >>>>>>> Please shutdown the VM allowing the system to launch it as a >>>>>>> monitored service. >>>>>>> The system will wait until the VM is down. >>>>>>> [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection >>>>>>> refused >>>>>>> [ INFO ] Stage: Clean up >>>>>>> [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection >>>>>>> refused >>>>>>> >>>>>>> >>>>>>> I've attached my engine log and the ovirt-hosted-engine-setup log. I >>>>>>> think I >>>>>>> had an issue with resolving external hostnames, or else a connectivity >>>>>>> issue >>>>>>> during the install. >>>>>> For some reason your engine wasn't able to deploy your hosts but the SSH >>>>>> session this time was established. >>>>>> 2015-03-09 13:05:58,514 ERROR >>>>>> [org.ovirt.engine.core.bll.InstallVdsInternalCommand] >>>>>> (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed >>>>>> for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.: >>>>>> java.io.IOException: Command returned failure code 1 during SSH session >>>>>> 'r...@xion2.smartcity.net' >>>>>> >>>>>> Can you please attach host-deploy logs from the engine VM? >>>>> OK, attached. >>>>> >>>>> Like I said, it looks to me like a name-resolution issue during the yum >>>>> update on the engine. I think I've fixed that, but do you have a better >>>>> suggestion for cleaning up and re-deploying other than installing the OS >>>>> on my host and starting all over again? >>>> I just finished starting over from scratch, starting with OS installation >>>> on >>>> my host/node, and wound up with a very similar problem - the engine >>>> couldn't >>>> reach the hosts during the yum operation. But this time the error was >>>> "Network is unreachable". Which is weird, because I can ssh into the >>>> engine >>>> and ping many of those hosts, after the operation has failed. >>>> >>>> Here's my latest host-deploy log from the engine. I'd appreciate any >>>> clues. >>> It seams that now your host is able to resolve that addresses but it's not >>> able to connect over http. >>> On your hosts some of them resolves as IPv6 addresses; can you please try >>> to use curl to get one of the file that it wasn't able to fetch? >>> Can you please check your network configuration before and after >>> host-deploy? >> I can give you the network configuration after host-deploy, at least for the >> host/Node. The engine won't start for me this morning, after I shut down the >> host for the night. >> >> In order to give you the config before host-deploy (or, apparently for the >> engine), I'll have to re-install the OS on the host and start again from >> scratch. Obviously I'd rather not do that unless absolutely necessary. >> >> Here's the host config after the failed host-deploy: >> >> Host/Node: >> >> # ip route >> 169.254.0.0/16 dev ovirtmgmt scope link metric 1007 >> 172.16.0.0/16 dev ovirtmgmt proto kernel scope link src 172.16.0.58 > You are missing a default gateway and so the issue. > Are you sure that it was properly configured before trying to deploy that > host?
It should have been, it was a fresh OS install. So I'm starting again, and keeping careful records of my network config. Here is my initial network config of my host/node, immediately following a new OS install: % ip route default via 172.16.0.1 dev p3p1 proto static metric 1024 172.16.0.0/16 dev p3p1 proto kernel scope link src 172.16.0.58 % ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet 172.16.0.58/16 brd 172.16.255.255 scope global p3p1 valid_lft forever preferred_lft forever inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 3: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff After the VM is first created, the host/node config is: # ip route default via 172.16.0.1 dev ovirtmgmt 169.254.0.0/16 dev ovirtmgmt scope link metric 1006 172.16.0.0/16 dev ovirtmgmt proto kernel scope link src 172.16.0.58 # ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP group default qlen 1000 link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 3: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff 4: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 92:cb:9d:97:18:36 brd ff:ff:ff:ff:ff:ff 5: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default link/ether 9a:bc:29:52:82:38 brd ff:ff:ff:ff:ff:ff 6: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet 172.16.0.58/16 brd 172.16.255.255 scope global ovirtmgmt valid_lft forever preferred_lft forever inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 7: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UNKNOWN group default qlen 500 link/ether fe:16:3e:16:a4:37 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc16:3eff:fe16:a437/64 scope link valid_lft forever preferred_lft forever At this point, I was already seeing a problem on the host/node. I remembered that a newer version of sos package is delivered from the ovirt repositories. So I tried to do a "yum update" on my host, and got a similar problem: % sudo yum update [sudo] password for rad: Loaded plugins: langpacks, refresh-packagekit Resolving Dependencies --> Running transaction check ---> Package sos.noarch 0:3.1-1.fc20 will be updated ---> Package sos.noarch 0:3.2-0.2.fc20.ovirt will be an update --> Finished Dependency Resolution Dependencies Resolved ================================================================================================================ Package Arch Version Repository Size ================================================================================================================ Updating: sos noarch 3.2-0.2.fc20.ovirt ovirt-3.5 292 k Transaction Summary ================================================================================================================ Upgrade 1 Package Total download size: 292 k Is this ok [y/d/N]: y Downloading packages: No Presto metadata available for ovirt-3.5 sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://www.gtlib.gatech.edu/pub/oVirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm: [Errno 14] curl#6 - "Could not resolve host: www.gtlib.gatech.edu" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED ftp://ftp.gtlib.gatech.edu/pub/oVirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm: [Errno 14] curl#6 - "Could not resolve host: ftp.gtlib.gatech.edu" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://resources.ovirt.org/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm: [Errno 14] curl#6 - "Could not resolve host: resources.ovirt.org" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://ftp.snt.utwente.nl/pub/software/ovirt/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm: [Errno 14] curl#6 - "Could not resolve host: ftp.snt.utwente.nl" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://ftp.nluug.nl/os/Linux/virtual/ovirt/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm: [Errno 14] curl#6 - "Could not resolve host: ftp.nluug.nl" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://mirror.linux.duke.edu/ovirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm: [Errno 14] curl#6 - "Could not resolve host: mirror.linux.duke.edu" Trying other mirror. Error downloading packages: sos-3.2-0.2.fc20.ovirt.noarch: [Errno 256] No more mirrors to try. This was similar to my previous failures. I took a look, and the problem was that /etc/resolv.conf had no nameservers, and the /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt file contained no entries for DNS1 or DOMAIN. So, it appears that when hosted-engine set up my bridged network, it neglected to carry over the DNS configuration necessary to the bridge. Note that I am using *static* network configuration, rather than DHCP. During installation of the OS I am setting up the network configuration as Manual. Perhaps the hosted-engine script is not properly prepared to deal with that? I went ahead and modified the ifcfg-ovirtmgmt network script (for the next service restart/boot) and resolv.conf (I was afraid to restart the network in the middle of hosted-engine execution since I don't know what might already be connected to the engine). This time it got further, but ultimately it still failed at the very end: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] The VDSM Host is now operational Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': Error acquiring VM status [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150310140028.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination At that point, neither the ovirt-ha-broker or ovirt-ha-agent services were running. Note there was no significant pause after it said "The system will wait until the VM is down". After the script completed, I shut down the VM, and manually started the ha services, and the VM came up. I could login to the Administration Portal, and finally see my HostedEngine VM. :-) I seem to be in a bad state however: The Data Center has *no* storage domains attached. I'm not sure what else might need cleaning up. Any assistance appreciated. -Bob >> # ip addr >> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group >> default >> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 >> inet 127.0.0.1/8 scope host lo >> valid_lft forever preferred_lft forever >> inet6 ::1/128 scope host >> valid_lft forever preferred_lft forever >> 2: p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master >> ovirtmgmt state UP group default qlen 1000 >> link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff >> inet6 fe80::baca:3aff:fe79:2212/64 scope link >> valid_lft forever preferred_lft forever >> 3: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue >> state DOWN group default >> link/ether 56:56:f7:cf:73:27 brd ff:ff:ff:ff:ff:ff >> 4: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN >> group default qlen 1000 >> link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff >> 6: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group >> default >> link/ether 22:a1:01:9e:30:71 brd ff:ff:ff:ff:ff:ff >> 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state >> UP group default >> link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff >> inet 172.16.0.58/16 brd 172.16.255.255 scope global ovirtmgmt >> valid_lft forever preferred_lft forever >> inet6 fe80::baca:3aff:fe79:2212/64 scope link >> valid_lft forever preferred_lft forever >> >> >> The only unusual thing about my setup that I can think of, from the network >> perspective, is that my physical host has a wireless interface, which I've >> not configured. Could it be confusing hosted-engine --deploy? >> >> -Bob >> >>
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users