Are the vdsm.conf or mom.conf file in /etc/vdsm of note in this situation?

From: Anthony.Fillmore
Sent: Wednesday, July 19, 2017 9:57 AM
To: 'Alan Griffiths' <apgriffith...@gmail.com>
Cc: Pavel Gashev <p...@acronis.com>; users@ovirt.org; Brandon.Markgraf 
<brandon.markg...@target.com>; Sandeep.Mendiratta 
<sandeep.mendira...@target.com>
Subject: RE: [ovirt-users] [EXTERNAL] Re: Host stuck unresponsive after Network 
Outage

[boxname ~]# systemctl | grep -i dead
mom-vdsm.service                                                                
                                                                                
               start MOM instance configured for VDSM purposes
vdsmd.service                                                                   
                                                                                
               start Virtual Desktop Server Manager


[ boxname ~]# systemctl | grep -i exited
blk-availability.service                                                        
                                                                                
                     Availability of block devices
iptables.service                                                                
                                                                                
                     IPv4 firewall with iptables
kdump.service                                                                   
                                                                                
                     Crash recovery kernel arming
kmod-static-nodes.service                                                       
                                                                                
                     Create list of required static device nodes for the 
current kernel
lvm2-monitor.service                                                            
                                                                                
                     Monitoring of LVM2 mirrors, snapshots etc. using dmeventd 
or progress polling
lvm2-pvscan@253:3.service<mailto:lvm2-pvscan@253:3.service>                     
                                                                                
                                                       LVM2 PV scan on device 
253:3
lvm2-pvscan@253:4.service<mailto:lvm2-pvscan@253:4.service>                     
                                                                                
                                                       LVM2 PV scan on device 
253:4
lvm2-pvscan@8:3.service<mailto:lvm2-pvscan@8:3.service>                         
                                                                                
                                                     LVM2 PV scan on device 8:3
network.service                                                                 
                                                                                
                     LSB: Bring up/down networking
openvswitch-nonetwork.service                                                   
                                                                                
                     Open vSwitch Internal Unit
openvswitch.service                                                             
                                                                                
                     Open vSwitch
rhel-dmesg.service                                                              
                                                                                
                     Dump dmesg to /var/log/dmesg
rhel-import-state.service                                                       
                                                                                
                     Import network configuration from initramfs
rhel-readonly.service                                                           
                                                                                
                     Configure read-only root support
systemd-journal-flush.service                                                   
                                                                                
                     Flush Journal to Persistent Storage
systemd-modules-load.service                                                    
                                                                                
                     Load Kernel Modules
systemd-random-seed.service                                                     
                                                                                
                     Load/Save Random Seed
systemd-readahead-collect.service                                               
                                                                                
                     Collect Read-Ahead Data
systemd-readahead-replay.service                                                
                                                                                
                     Replay Read-Ahead Data
systemd-remount-fs.service                                                      
                                                                                
                     Remount Root and Kernel File Systems
systemd-sysctl.service                                                          
                                                                                
                     Apply Kernel Variables
systemd-tmpfiles-setup-dev.service                                              
                                                                                
                     Create Static Device Nodes in /dev
systemd-tmpfiles-setup.service                                                  
                                                                                
                     Create Volatile Files and Directories
systemd-udev-trigger.service                                                    
                                                                                
                     udev Coldplug all Devices
systemd-update-utmp.service                                                     
                                                                                
                     Update UTMP about System Boot/Shutdown
systemd-user-sessions.service                                                   
                                                                                
                     Permit User Sessions
systemd-vconsole-setup.service                                                  
                                                                                
                     Setup Virtual Console
vdsm-network-init.service                                                       
                                                                                
                     Virtual Desktop Server Manager network IP+link restoration

From: Alan Griffiths [mailto:apgriffith...@gmail.com]
Sent: Wednesday, July 19, 2017 9:47 AM
To: Anthony.Fillmore 
<anthony.fillm...@target.com<mailto:anthony.fillm...@target.com>>
Cc: Pavel Gashev <p...@acronis.com<mailto:p...@acronis.com>>; 
users@ovirt.org<mailto:users@ovirt.org>; Brandon.Markgraf 
<brandon.markg...@target.com<mailto:brandon.markg...@target.com>>; 
Sandeep.Mendiratta 
<sandeep.mendira...@target.com<mailto:sandeep.mendira...@target.com>>
Subject: Re: [ovirt-users] [EXTERNAL] Re: Host stuck unresponsive after Network 
Outage

Are there other failed services?

systemctl --state=failed

On 19 July 2017 at 15:40, Anthony.Fillmore 
<anthony.fillm...@target.com<mailto:anthony.fillm...@target.com>> wrote:
Hey Alan,

Rpcbind is running on my box, looks like no issue there.  Any other ideas on 
what could be keeping vdsmd dead?  I even uninstalled all Ovirt related 
components from the host and went for a reinstall of the host through Ovirt 
(just short of actually fully removing the host from ovirt and re-adding, which 
I want to avoid) and the reinstall ends up timing out when it attempts to start 
VDSM (checking logs can see the service is dead when it gets here).

Thanks,
Tony

From: Alan Griffiths 
[mailto:apgriffith...@gmail.com<mailto:apgriffith...@gmail.com>]
Sent: Wednesday, July 19, 2017 4:14 AM
To: Anthony.Fillmore 
<anthony.fillm...@target.com<mailto:anthony.fillm...@target.com>>
Cc: Pavel Gashev <p...@acronis.com<mailto:p...@acronis.com>>; 
users@ovirt.org<mailto:users@ovirt.org>; Brandon.Markgraf 
<brandon.markg...@target.com<mailto:brandon.markg...@target.com>>; 
Sandeep.Mendiratta 
<sandeep.mendira...@target.com<mailto:sandeep.mendira...@target.com>>
Subject: Re: [ovirt-users] [EXTERNAL] Re: Host stuck unresponsive after Network 
Outage

Is rpcbind running? This is a dependency for vdsmd.

I've seen issues where rpcbind will not start on boot if IPv6 is disabled. The 
solution for me was to rebuild the initramfs, aka "dracut -f"

On 18 July 2017 at 18:13, Anthony.Fillmore 
<anthony.fillm...@target.com<mailto:anthony.fillm...@target.com>> wrote:
[boxname ~]# systemctl status -l vdsm-network
● vdsm-network.service - Virtual Desktop Server Manager network restoration
   Loaded: loaded (/usr/lib/systemd/system/vdsm-network.service; enabled; 
vendor preset: enabled)
   Active: activating (start) since Tue 2017-07-18 10:42:57 CDT; 1h 29min ago
  Process: 8216 ExecStartPre=/usr/bin/vdsm-tool --vvverbose --append 
--logfile=/var/log/vdsm/upgrade.log upgrade-unified-persistence (code=exited, 
status=0/SUCCESS)
Main PID: 8231 (vdsm-tool)
   CGroup: /system.slice/vdsm-network.service
           ├─8231 /usr/bin/python /usr/bin/vdsm-tool restore-nets
           └─8240 /usr/bin/python /usr/share/vdsm/vdsm-restore-net-config

Jul 18 10:42:57 
t0894bmh1001.stores.target.com<http://t0894bmh1001.stores.target.com> 
systemd[1]: Starting Virtual Desktop Server Manager network restoration...

Thanks,
Tony
From: Pavel Gashev [mailto:p...@acronis.com<mailto:p...@acronis.com>]
Sent: Tuesday, July 18, 2017 11:17 AM
To: Anthony.Fillmore 
<anthony.fillm...@target.com<mailto:anthony.fillm...@target.com>>; 
users@ovirt.org<mailto:users@ovirt.org>
Cc: Brandon.Markgraf 
<brandon.markg...@target.com<mailto:brandon.markg...@target.com>>; 
Sandeep.Mendiratta 
<sandeep.mendira...@target.com<mailto:sandeep.mendira...@target.com>>
Subject: [EXTERNAL] Re: [ovirt-users] Host stuck unresponsive after Network 
Outage

Anthony,

Output of “systemctl status -l vdsm-network” would help.


From: <users-boun...@ovirt.org<mailto:users-boun...@ovirt.org>> on behalf of 
"Anthony.Fillmore" 
<anthony.fillm...@target.com<mailto:anthony.fillm...@target.com>>
Date: Tuesday, 18 July 2017 at 18:13
To: "users@ovirt.org<mailto:users@ovirt.org>" 
<users@ovirt.org<mailto:users@ovirt.org>>
Cc: "Brandon.Markgraf" 
<brandon.markg...@target.com<mailto:brandon.markg...@target.com>>, 
"Sandeep.Mendiratta" 
<sandeep.mendira...@target.com<mailto:sandeep.mendira...@target.com>>
Subject: [ovirt-users] Host stuck unresponsive after Network Outage

Hey Ovirt Users and Team,

I have a host that I am unable to recover post a network outage.  The host is 
stuck in unresponsive mode, even though the host is on the network, able to SSH 
and seems to be healthy.  I’ve tried several things to recover the host in 
Ovirt, but have had no success so far.  I’d like to reach out to the community 
before blowing away and rebuilding the host.

Environment: I have an Ovengine server with about 26 Datacenters, with 2 to 3 
hosts per Datacenter.  My Ovengine server is hosted centrally, with my hosts 
being bare-metal and distributed throughout my environment.    Ovengine is 
version 4.0.6.

What I’ve tried: put into maintenance mode, rebooted the host.  Confirmed host 
was rebooted and tried to active, goes back to unresponsive.   Attempted a 
reinstall, which fails.

Checking from the host perspective, I can see the following problems:

[boxname~]# systemctl status vdsmd
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor 
preset: enabled)
   Active: inactive (dead)

Jul 14 12:34:28 boxname systemd[1]: Dependency failed for Virtual Desktop 
Server Manager.
Jul 14 12:34:28 boxname systemd[1]: Job vdsmd.service/start failed with result 
'dependency'.

Going a bit deeper, the results of journalctl –xe:

[root@boxname ~]# journalctl -xe
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit libvirtd.service has begun shutting down.
Jul 18 09:07:31 boxname systemd[1]: Stopped Virtualization daemon.
-- Subject: Unit libvirtd.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit libvirtd.service has finished shutting down.
Jul 18 09:07:31 boxname systemd[1]: Reloading.
Jul 18 09:07:31 boxname systemd[1]: Binding to IPv6 address not available since 
kernel does not support IPv6.
Jul 18 09:07:31 boxname systemd[1]: [/usr/lib/systemd/system/rpcbind.socket:6] 
Failed to parse address value, ignoring: [::
Jul 18 09:07:31 boxname systemd[1]: Started Auxiliary vdsm service for running 
helper functions as root.
-- Subject: Unit supervdsmd.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit supervdsmd.service has finished starting up.
--
-- The start-up result is done.
Jul 18 09:07:31 boxname systemd[1]: Starting Auxiliary vdsm service for running 
helper functions as root...
-- Subject: Unit supervdsmd.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit supervdsmd.service has begun starting up.
Jul 18 09:07:31 boxname systemd[1]: Starting Virtualization daemon...
-- Subject: Unit libvirtd.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit libvirtd.service has begun starting up.
Jul 18 09:07:32 boxname systemd[1]: Started Virtualization daemon.
-- Subject: Unit libvirtd.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit libvirtd.service has finished starting up.
--
-- The start-up result is done.
Jul 18 09:07:32 boxname systemd[1]: Starting Virtual Desktop Server Manager 
network restoration...
-- Subject: Unit vdsm-network.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit vdsm-network.service has begun starting up.
lines 2751-2797/2797 (END)

Does the community have suggestions on what can be done next to recover this 
host within Ovirt?  I can provide additional log dumps as needed, please inform 
with what you need to assist further.

Thank you,
Tony


_______________________________________________
Users mailing list
Users@ovirt.org<mailto:Users@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users


_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to