Re: [ovirt-users] Host loses all network configuration on update to oVirt 3.5.4

2015-09-09 Thread Ondřej Svoboda
Hi everyone,

it turns out that ifcfg files can be lost even in this very simple scenario:

1) Install/upgrade to VDSM 4.16.21/oVirt 3.5.4
2) Setup a network over eth0
   vdsClient -s 0 setupNetworks 
'networks={pokus:{nic:eth0,bootproto:dhcp,blockingdhcp:true,bridged:false}}'
3) Persist the configuration (declare it safe)
   vdsClient -s 0 setSafeNetworkConfig
4) Add a placeholder in /var/lib/vdsm/netconfback/ifcfg-eth0 with:
# original file did not exist
5) Reboot

I created a fix [1] and prepared it for backport to 3.6 [2] and 3.5 branches 
[3] (so as to appear in 3.5.5) and linked it to 
https://bugzilla.redhat.com/show_bug.cgi?id=1256252

Patrick, to apply the patch you can also run the two commands and paste it (the 
line after "nicFile.writelines(l)" is a single space, so please add it if it 
gets eaten by e-mail goblins):

cd /usr/share/vdsm/
patch -p1

diff --git vdsm/network/configurators/ifcfg.py 
vdsm/network/configurators/ifcfg.py
index 161a3b2..8332224 100644
--- vdsm/network/configurators/ifcfg.py
+++ vdsm/network/configurators/ifcfg.py
@@ -647,11 +647,21 @@ class ConfigWriter(object):
 def removeNic(self, nic):
 cf = netinfo.NET_CONF_PREF + nic
 self._backup(cf)
-with open(cf) as nicFile:
-hwlines = [line for line in nicFile if line.startswith('HWADDR=')]
+try:
+with open(cf) as nicFile:
+hwlines = [line for line in nicFile if line.startswith(
+'HWADDR=')]
+except IOError as e:
+logging.warning("%s couldn't be read (errno %s)", cf, e.errno)
+try:
+hwlines = ['HWADDR=%s\n' % netinfo.gethwaddr(nic)]
+except IOError as e:
+logging.exception("couldn't determine hardware address of %s "
+  "(errno %s)", nic, e.errno)
+hwlines = []
 l = [self.CONFFILE_HEADER + '\n', 'DEVICE=%s\n' % nic, 'ONBOOT=yes\n',
  'MTU=%s\n' % netinfo.DEFAULT_MTU] + hwlines
-l += 'NM_CONTROLLED=no\n'
+l.append('NM_CONTROLLED=no\n')
 with open(cf, 'w') as nicFile:
 nicFile.writelines(l)
 

Michael, will you please give it a try as well?

Thanks,
Ondra

[1] https://gerrit.ovirt.org/#/c/45893/
[2] https://gerrit.ovirt.org/#/c/45932/
[3] https://gerrit.ovirt.org/#/c/45933/

- Original Message -----
> From: "Patrick Hurrelmann" 
> To: "Dan Kenigsberg" 
> Cc: "oVirt Mailing List" 
> Sent: Monday, September 7, 2015 2:46:05 PM
> Subject: Re: [ovirt-users] Host loses all network configuration on update to 
> oVirt 3.5.4
> 
> On 07.09.2015 14:44, Patrick Hurrelmann wrote:
> > On 07.09.2015 13:54, Dan Kenigsberg wrote:
> >> On Mon, Sep 07, 2015 at 11:47:48AM +0200, Patrick Hurrelmann wrote:
> >>> On 06.09.2015 11:30, Dan Kenigsberg wrote:
> >>>> On Fri, Sep 04, 2015 at 10:26:39AM +0200, Patrick Hurrelmann wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> I just updated my existing oVirt 3.5.3 installation (iSCSI
> >>>>> hosted-engine on
> >>>>> CentOS 7.1). The engine update went fine. Updating the hosts succeeds
> >>>>> until the
> >>>>> first reboot. After a reboot the host does not come up again. It is
> >>>>> missing all
> >>>>> network configuration. All network cfgs in
> >>>>> /etc/sysconfig/network-scripts are
> >>>>> missing except ifcfg-lo. The host boots up without working networking.
> >>>>> Using
> >>>>> IPMI and config backups, I was able to restore the lost network
> >>>>> configs. Once
> >>>>> these are restored and the host is rebooted again all seems to be back
> >>>>> to good.
> >>>>> This has now happend to 2 updated hosts (this installation has a total
> >>>>> of 4
> >>>>> hosts, so 2 more to debug/try). I'm happy to assist in furter
> >>>>> debugging.
> >>>>>
> >>>>> Before updating the second host, I gathered some information. All these
> >>>>> hosts
> >>>>> have 3 physical nics. One is used for the ovirtmgmt bridge and the
> >>>>> other 2 are
> >>>>> used for iSCSI storage vlans.
> >>>>>
> >>>>> ifcfgs before update:
> >>>>>
> >>>>> /etc/sysconfig/network-scripts/ifcfg-em1
> >>>>> # Generated by VDSM version 4.16.20-0.el7.centos
> >>

Re: [ovirt-users] Host loses all network configuration on update to oVirt 3.5.4

2015-09-07 Thread Patrick Hurrelmann
On 07.09.2015 14:44, Patrick Hurrelmann wrote:
> On 07.09.2015 13:54, Dan Kenigsberg wrote:
>> On Mon, Sep 07, 2015 at 11:47:48AM +0200, Patrick Hurrelmann wrote:
>>> On 06.09.2015 11:30, Dan Kenigsberg wrote:
 On Fri, Sep 04, 2015 at 10:26:39AM +0200, Patrick Hurrelmann wrote:
> Hi all,
>
> I just updated my existing oVirt 3.5.3 installation (iSCSI hosted-engine 
> on
> CentOS 7.1). The engine update went fine. Updating the hosts succeeds 
> until the
> first reboot. After a reboot the host does not come up again. It is 
> missing all
> network configuration. All network cfgs in /etc/sysconfig/network-scripts 
> are
> missing except ifcfg-lo. The host boots up without working networking. 
> Using
> IPMI and config backups, I was able to restore the lost network configs. 
> Once
> these are restored and the host is rebooted again all seems to be back to 
> good.
> This has now happend to 2 updated hosts (this installation has a total of 
> 4
> hosts, so 2 more to debug/try). I'm happy to assist in furter debugging.
>
> Before updating the second host, I gathered some information. All these 
> hosts
> have 3 physical nics. One is used for the ovirtmgmt bridge and the other 
> 2 are
> used for iSCSI storage vlans.
>
> ifcfgs before update:
>
> /etc/sysconfig/network-scripts/ifcfg-em1
> # Generated by VDSM version 4.16.20-0.el7.centos
> DEVICE=em1
> HWADDR=d0:67:e5:f0:e5:c6
> BRIDGE=ovirtmgmt
> ONBOOT=yes
> NM_CONTROLLED=no
 /etc/sysconfig/network-scripts/ifcfg-lo
> DEVICE=lo
> IPADDR=127.0.0.1
> NETMASK=255.0.0.0
> NETWORK=127.0.0.0
> # If you're having problems with gated making 127.0.0.0/8 a martian,
> # you can change this to something else (255.255.255.255, for example)
> BROADCAST=127.255.255.255
> ONBOOT=yes
> NAME=loopback
>
> /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt
> # Generated by VDSM version 4.16.20-0.el7.centos
> DEVICE=ovirtmgmt
> TYPE=Bridge
> DELAY=0
> STP=off
> ONBOOT=yes
> IPADDR=1.2.3.16
> NETMASK=255.255.255.0
> GATEWAY=1.2.3.11
> BOOTPROTO=none
> DEFROUTE=yes
> NM_CONTROLLED=no
> HOTPLUG=no
>
> /etc/sysconfig/network-scripts/ifcfg-p4p1
> # Generated by VDSM version 4.16.20-0.el7.centos
> DEVICE=p4p1
> HWADDR=68:05:ca:01:bc:0c
> ONBOOT=no
> IPADDR=4.5.7.102
> NETMASK=255.255.255.0
> BOOTPROTO=none
> MTU=9000
> DEFROUTE=no
> NM_CONTROLLED=no
>
> /etc/sysconfig/network-scripts/ifcfg-p3p1
> # Generated by VDSM version 4.16.20-0.el7.centos
> DEVICE=p3p1
> HWADDR=68:05:ca:18:86:45
> ONBOOT=no
> IPADDR=4.5.6.102
> NETMASK=255.255.255.0
> BOOTPROTO=none
> MTU=9000
> DEFROUTE=no
> NM_CONTROLLED=no
>
> /etc/sysconfig/network-scripts/ifcfg-lo
>
>
> ip link before update:
> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode 
> DEFAULT
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 2: bond0:  mtu 1500 qdisc noop state DOWN 
> mode DEFAULT
> link/ether 46:50:22:7a:f3:9d brd ff:ff:ff:ff:ff:ff
> 3: em1:  mtu 1500 qdisc mq master 
> ovirtmgmt state UP mode DEFAULT qlen 1000
> link/ether d0:67:e5:f0:e5:c6 brd ff:ff:ff:ff:ff:ff
> 4: p3p1:  mtu 9000 qdisc pfifo_fast 
> state UP mode DEFAULT qlen 1000
> link/ether 68:05:ca:18:86:45 brd ff:ff:ff:ff:ff:ff
> 5: p4p1:  mtu 9000 qdisc pfifo_fast 
> state UP mode DEFAULT qlen 1000
> link/ether 68:05:ca:01:bc:0c brd ff:ff:ff:ff:ff:ff
> 7: ovirtmgmt:  mtu 1500 qdisc noqueue 
> state UP mode DEFAULT
> link/ether d0:67:e5:f0:e5:c6 brd ff:ff:ff:ff:ff:ff
> 8: ;vdsmdummy;:  mtu 1500 qdisc noop state DOWN mode 
> DEFAULT
> link/ether ce:0f:16:49:a7:da brd ff:ff:ff:ff:ff:ff
>
> vdsm files before update:
> /var/lib/vdsm
> /var/lib/vdsm/bonding-defaults.json
> /var/lib/vdsm/netconfback
> /var/lib/vdsm/netconfback/ifcfg-ovirtmgmt
> /var/lib/vdsm/netconfback/ifcfg-em1
> /var/lib/vdsm/netconfback/route-ovirtmgmt
> /var/lib/vdsm/netconfback/rule-ovirtmgmt
> /var/lib/vdsm/netconfback/ifcfg-p4p1
> /var/lib/vdsm/netconfback/ifcfg-p3p1
> /var/lib/vdsm/persistence
> /var/lib/vdsm/persistence/netconf
> /var/lib/vdsm/persistence/netconf.141697752319079
> /var/lib/vdsm/persistence/netconf.141697752319079/nets
> /var/lib/vdsm/persistence/netconf.141697752319079/nets/san1
> /var/lib/vdsm/persistence/netconf.141697752319079/nets/san2
> /var/lib/vdsm/persistence/netconf.141697752319079/nets/ovirtmgmt
> /var/lib/vdsm/upgrade
> /var/lib/vdsm/upgrade/upgrade-unified-persistence
> /var/lib/vdsm/transient
>
>
> File in /var/lib/vdsm/netconfback each only contained a comment:
> # original file did n

Re: [ovirt-users] Host loses all network configuration on update to oVirt 3.5.4

2015-09-07 Thread Patrick Hurrelmann
On 07.09.2015 13:54, Dan Kenigsberg wrote:
> On Mon, Sep 07, 2015 at 11:47:48AM +0200, Patrick Hurrelmann wrote:
>> On 06.09.2015 11:30, Dan Kenigsberg wrote:
>>> On Fri, Sep 04, 2015 at 10:26:39AM +0200, Patrick Hurrelmann wrote:
 Hi all,

 I just updated my existing oVirt 3.5.3 installation (iSCSI hosted-engine on
 CentOS 7.1). The engine update went fine. Updating the hosts succeeds 
 until the
 first reboot. After a reboot the host does not come up again. It is 
 missing all
 network configuration. All network cfgs in /etc/sysconfig/network-scripts 
 are
 missing except ifcfg-lo. The host boots up without working networking. 
 Using
 IPMI and config backups, I was able to restore the lost network configs. 
 Once
 these are restored and the host is rebooted again all seems to be back to 
 good.
 This has now happend to 2 updated hosts (this installation has a total of 4
 hosts, so 2 more to debug/try). I'm happy to assist in furter debugging.

 Before updating the second host, I gathered some information. All these 
 hosts
 have 3 physical nics. One is used for the ovirtmgmt bridge and the other 2 
 are
 used for iSCSI storage vlans.

 ifcfgs before update:

 /etc/sysconfig/network-scripts/ifcfg-em1
 # Generated by VDSM version 4.16.20-0.el7.centos
 DEVICE=em1
 HWADDR=d0:67:e5:f0:e5:c6
 BRIDGE=ovirtmgmt
 ONBOOT=yes
 NM_CONTROLLED=no
>>> /etc/sysconfig/network-scripts/ifcfg-lo
 DEVICE=lo
 IPADDR=127.0.0.1
 NETMASK=255.0.0.0
 NETWORK=127.0.0.0
 # If you're having problems with gated making 127.0.0.0/8 a martian,
 # you can change this to something else (255.255.255.255, for example)
 BROADCAST=127.255.255.255
 ONBOOT=yes
 NAME=loopback

 /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt
 # Generated by VDSM version 4.16.20-0.el7.centos
 DEVICE=ovirtmgmt
 TYPE=Bridge
 DELAY=0
 STP=off
 ONBOOT=yes
 IPADDR=1.2.3.16
 NETMASK=255.255.255.0
 GATEWAY=1.2.3.11
 BOOTPROTO=none
 DEFROUTE=yes
 NM_CONTROLLED=no
 HOTPLUG=no

 /etc/sysconfig/network-scripts/ifcfg-p4p1
 # Generated by VDSM version 4.16.20-0.el7.centos
 DEVICE=p4p1
 HWADDR=68:05:ca:01:bc:0c
 ONBOOT=no
 IPADDR=4.5.7.102
 NETMASK=255.255.255.0
 BOOTPROTO=none
 MTU=9000
 DEFROUTE=no
 NM_CONTROLLED=no

 /etc/sysconfig/network-scripts/ifcfg-p3p1
 # Generated by VDSM version 4.16.20-0.el7.centos
 DEVICE=p3p1
 HWADDR=68:05:ca:18:86:45
 ONBOOT=no
 IPADDR=4.5.6.102
 NETMASK=255.255.255.0
 BOOTPROTO=none
 MTU=9000
 DEFROUTE=no
 NM_CONTROLLED=no

 /etc/sysconfig/network-scripts/ifcfg-lo


 ip link before update:
 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode 
 DEFAULT
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 2: bond0:  mtu 1500 qdisc noop state DOWN mode 
 DEFAULT
 link/ether 46:50:22:7a:f3:9d brd ff:ff:ff:ff:ff:ff
 3: em1:  mtu 1500 qdisc mq master 
 ovirtmgmt state UP mode DEFAULT qlen 1000
 link/ether d0:67:e5:f0:e5:c6 brd ff:ff:ff:ff:ff:ff
 4: p3p1:  mtu 9000 qdisc pfifo_fast state 
 UP mode DEFAULT qlen 1000
 link/ether 68:05:ca:18:86:45 brd ff:ff:ff:ff:ff:ff
 5: p4p1:  mtu 9000 qdisc pfifo_fast state 
 UP mode DEFAULT qlen 1000
 link/ether 68:05:ca:01:bc:0c brd ff:ff:ff:ff:ff:ff
 7: ovirtmgmt:  mtu 1500 qdisc noqueue 
 state UP mode DEFAULT
 link/ether d0:67:e5:f0:e5:c6 brd ff:ff:ff:ff:ff:ff
 8: ;vdsmdummy;:  mtu 1500 qdisc noop state DOWN mode 
 DEFAULT
 link/ether ce:0f:16:49:a7:da brd ff:ff:ff:ff:ff:ff

 vdsm files before update:
 /var/lib/vdsm
 /var/lib/vdsm/bonding-defaults.json
 /var/lib/vdsm/netconfback
 /var/lib/vdsm/netconfback/ifcfg-ovirtmgmt
 /var/lib/vdsm/netconfback/ifcfg-em1
 /var/lib/vdsm/netconfback/route-ovirtmgmt
 /var/lib/vdsm/netconfback/rule-ovirtmgmt
 /var/lib/vdsm/netconfback/ifcfg-p4p1
 /var/lib/vdsm/netconfback/ifcfg-p3p1
 /var/lib/vdsm/persistence
 /var/lib/vdsm/persistence/netconf
 /var/lib/vdsm/persistence/netconf.141697752319079
 /var/lib/vdsm/persistence/netconf.141697752319079/nets
 /var/lib/vdsm/persistence/netconf.141697752319079/nets/san1
 /var/lib/vdsm/persistence/netconf.141697752319079/nets/san2
 /var/lib/vdsm/persistence/netconf.141697752319079/nets/ovirtmgmt
 /var/lib/vdsm/upgrade
 /var/lib/vdsm/upgrade/upgrade-unified-persistence
 /var/lib/vdsm/transient


 File in /var/lib/vdsm/netconfback each only contained a comment:
 # original file did not exist
>>> This is quite peculiar. Do you know when these where created?
>>> Have you made any networking changes on 3.5.3 just before boot?
>>>
 /var/lib/vdsm/persistence/netconf.141697752

Re: [ovirt-users] Host loses all network configuration on update to oVirt 3.5.4

2015-09-07 Thread Dan Kenigsberg
On Mon, Sep 07, 2015 at 11:47:48AM +0200, Patrick Hurrelmann wrote:
> On 06.09.2015 11:30, Dan Kenigsberg wrote:
> > On Fri, Sep 04, 2015 at 10:26:39AM +0200, Patrick Hurrelmann wrote:
> >> Hi all,
> >>
> >> I just updated my existing oVirt 3.5.3 installation (iSCSI hosted-engine on
> >> CentOS 7.1). The engine update went fine. Updating the hosts succeeds 
> >> until the
> >> first reboot. After a reboot the host does not come up again. It is 
> >> missing all
> >> network configuration. All network cfgs in /etc/sysconfig/network-scripts 
> >> are
> >> missing except ifcfg-lo. The host boots up without working networking. 
> >> Using
> >> IPMI and config backups, I was able to restore the lost network configs. 
> >> Once
> >> these are restored and the host is rebooted again all seems to be back to 
> >> good.
> >> This has now happend to 2 updated hosts (this installation has a total of 4
> >> hosts, so 2 more to debug/try). I'm happy to assist in furter debugging.
> >>
> >> Before updating the second host, I gathered some information. All these 
> >> hosts
> >> have 3 physical nics. One is used for the ovirtmgmt bridge and the other 2 
> >> are
> >> used for iSCSI storage vlans.
> >>
> >> ifcfgs before update:
> >>
> >> /etc/sysconfig/network-scripts/ifcfg-em1
> >> # Generated by VDSM version 4.16.20-0.el7.centos
> >> DEVICE=em1
> >> HWADDR=d0:67:e5:f0:e5:c6
> >> BRIDGE=ovirtmgmt
> >> ONBOOT=yes
> >> NM_CONTROLLED=no
> > /etc/sysconfig/network-scripts/ifcfg-lo
> >> DEVICE=lo
> >> IPADDR=127.0.0.1
> >> NETMASK=255.0.0.0
> >> NETWORK=127.0.0.0
> >> # If you're having problems with gated making 127.0.0.0/8 a martian,
> >> # you can change this to something else (255.255.255.255, for example)
> >> BROADCAST=127.255.255.255
> >> ONBOOT=yes
> >> NAME=loopback
> >>
> >> /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt
> >> # Generated by VDSM version 4.16.20-0.el7.centos
> >> DEVICE=ovirtmgmt
> >> TYPE=Bridge
> >> DELAY=0
> >> STP=off
> >> ONBOOT=yes
> >> IPADDR=1.2.3.16
> >> NETMASK=255.255.255.0
> >> GATEWAY=1.2.3.11
> >> BOOTPROTO=none
> >> DEFROUTE=yes
> >> NM_CONTROLLED=no
> >> HOTPLUG=no
> >>
> >> /etc/sysconfig/network-scripts/ifcfg-p4p1
> >> # Generated by VDSM version 4.16.20-0.el7.centos
> >> DEVICE=p4p1
> >> HWADDR=68:05:ca:01:bc:0c
> >> ONBOOT=no
> >> IPADDR=4.5.7.102
> >> NETMASK=255.255.255.0
> >> BOOTPROTO=none
> >> MTU=9000
> >> DEFROUTE=no
> >> NM_CONTROLLED=no
> >>
> >> /etc/sysconfig/network-scripts/ifcfg-p3p1
> >> # Generated by VDSM version 4.16.20-0.el7.centos
> >> DEVICE=p3p1
> >> HWADDR=68:05:ca:18:86:45
> >> ONBOOT=no
> >> IPADDR=4.5.6.102
> >> NETMASK=255.255.255.0
> >> BOOTPROTO=none
> >> MTU=9000
> >> DEFROUTE=no
> >> NM_CONTROLLED=no
> >>
> >> /etc/sysconfig/network-scripts/ifcfg-lo
> >>
> >>
> >> ip link before update:
> >> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode 
> >> DEFAULT
> >> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> >> 2: bond0:  mtu 1500 qdisc noop state DOWN mode 
> >> DEFAULT
> >> link/ether 46:50:22:7a:f3:9d brd ff:ff:ff:ff:ff:ff
> >> 3: em1:  mtu 1500 qdisc mq master 
> >> ovirtmgmt state UP mode DEFAULT qlen 1000
> >> link/ether d0:67:e5:f0:e5:c6 brd ff:ff:ff:ff:ff:ff
> >> 4: p3p1:  mtu 9000 qdisc pfifo_fast state 
> >> UP mode DEFAULT qlen 1000
> >> link/ether 68:05:ca:18:86:45 brd ff:ff:ff:ff:ff:ff
> >> 5: p4p1:  mtu 9000 qdisc pfifo_fast state 
> >> UP mode DEFAULT qlen 1000
> >> link/ether 68:05:ca:01:bc:0c brd ff:ff:ff:ff:ff:ff
> >> 7: ovirtmgmt:  mtu 1500 qdisc noqueue 
> >> state UP mode DEFAULT
> >> link/ether d0:67:e5:f0:e5:c6 brd ff:ff:ff:ff:ff:ff
> >> 8: ;vdsmdummy;:  mtu 1500 qdisc noop state DOWN mode 
> >> DEFAULT
> >> link/ether ce:0f:16:49:a7:da brd ff:ff:ff:ff:ff:ff
> >>
> >> vdsm files before update:
> >> /var/lib/vdsm
> >> /var/lib/vdsm/bonding-defaults.json
> >> /var/lib/vdsm/netconfback
> >> /var/lib/vdsm/netconfback/ifcfg-ovirtmgmt
> >> /var/lib/vdsm/netconfback/ifcfg-em1
> >> /var/lib/vdsm/netconfback/route-ovirtmgmt
> >> /var/lib/vdsm/netconfback/rule-ovirtmgmt
> >> /var/lib/vdsm/netconfback/ifcfg-p4p1
> >> /var/lib/vdsm/netconfback/ifcfg-p3p1
> >> /var/lib/vdsm/persistence
> >> /var/lib/vdsm/persistence/netconf
> >> /var/lib/vdsm/persistence/netconf.141697752319079
> >> /var/lib/vdsm/persistence/netconf.141697752319079/nets
> >> /var/lib/vdsm/persistence/netconf.141697752319079/nets/san1
> >> /var/lib/vdsm/persistence/netconf.141697752319079/nets/san2
> >> /var/lib/vdsm/persistence/netconf.141697752319079/nets/ovirtmgmt
> >> /var/lib/vdsm/upgrade
> >> /var/lib/vdsm/upgrade/upgrade-unified-persistence
> >> /var/lib/vdsm/transient
> >>
> >>
> >> File in /var/lib/vdsm/netconfback each only contained a comment:
> >> # original file did not exist
> > This is quite peculiar. Do you know when these where created?
> > Have you made any networking changes on 3.5.3 just before boot?
> >
> >> /var/lib/vdsm/persistence/netconf.141697752319079/nets/ovirtmgmt
> >> {"nic": "em1", "net

Re: [ovirt-users] Host loses all network configuration on update to oVirt 3.5.4

2015-09-06 Thread Dan Kenigsberg
On Fri, Sep 04, 2015 at 10:26:39AM +0200, Patrick Hurrelmann wrote:
> Hi all,
> 
> I just updated my existing oVirt 3.5.3 installation (iSCSI hosted-engine on
> CentOS 7.1). The engine update went fine. Updating the hosts succeeds until 
> the
> first reboot. After a reboot the host does not come up again. It is missing 
> all
> network configuration. All network cfgs in /etc/sysconfig/network-scripts are
> missing except ifcfg-lo. The host boots up without working networking. Using
> IPMI and config backups, I was able to restore the lost network configs. Once
> these are restored and the host is rebooted again all seems to be back to 
> good.
> This has now happend to 2 updated hosts (this installation has a total of 4
> hosts, so 2 more to debug/try). I'm happy to assist in furter debugging.
> 
> Before updating the second host, I gathered some information. All these hosts
> have 3 physical nics. One is used for the ovirtmgmt bridge and the other 2 are
> used for iSCSI storage vlans.
> 
> ifcfgs before update:
> 
> /etc/sysconfig/network-scripts/ifcfg-em1
> # Generated by VDSM version 4.16.20-0.el7.centos
> DEVICE=em1
> HWADDR=d0:67:e5:f0:e5:c6
> BRIDGE=ovirtmgmt
> ONBOOT=yes
> NM_CONTROLLED=no

/etc/sysconfig/network-scripts/ifcfg-lo
> DEVICE=lo
> IPADDR=127.0.0.1
> NETMASK=255.0.0.0
> NETWORK=127.0.0.0
> # If you're having problems with gated making 127.0.0.0/8 a martian,
> # you can change this to something else (255.255.255.255, for example)
> BROADCAST=127.255.255.255
> ONBOOT=yes
> NAME=loopback
> 
> /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt
> # Generated by VDSM version 4.16.20-0.el7.centos
> DEVICE=ovirtmgmt
> TYPE=Bridge
> DELAY=0
> STP=off
> ONBOOT=yes
> IPADDR=1.2.3.16
> NETMASK=255.255.255.0
> GATEWAY=1.2.3.11
> BOOTPROTO=none
> DEFROUTE=yes
> NM_CONTROLLED=no
> HOTPLUG=no
> 
> /etc/sysconfig/network-scripts/ifcfg-p4p1
> # Generated by VDSM version 4.16.20-0.el7.centos
> DEVICE=p4p1
> HWADDR=68:05:ca:01:bc:0c
> ONBOOT=no
> IPADDR=4.5.7.102
> NETMASK=255.255.255.0
> BOOTPROTO=none
> MTU=9000
> DEFROUTE=no
> NM_CONTROLLED=no
> 
> /etc/sysconfig/network-scripts/ifcfg-p3p1
> # Generated by VDSM version 4.16.20-0.el7.centos
> DEVICE=p3p1
> HWADDR=68:05:ca:18:86:45
> ONBOOT=no
> IPADDR=4.5.6.102
> NETMASK=255.255.255.0
> BOOTPROTO=none
> MTU=9000
> DEFROUTE=no
> NM_CONTROLLED=no
> 
> /etc/sysconfig/network-scripts/ifcfg-lo
> 
> 
> ip link before update:
> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode 
> DEFAULT
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 2: bond0:  mtu 1500 qdisc noop state DOWN mode 
> DEFAULT
> link/ether 46:50:22:7a:f3:9d brd ff:ff:ff:ff:ff:ff
> 3: em1:  mtu 1500 qdisc mq master ovirtmgmt 
> state UP mode DEFAULT qlen 1000
> link/ether d0:67:e5:f0:e5:c6 brd ff:ff:ff:ff:ff:ff
> 4: p3p1:  mtu 9000 qdisc pfifo_fast state UP 
> mode DEFAULT qlen 1000
> link/ether 68:05:ca:18:86:45 brd ff:ff:ff:ff:ff:ff
> 5: p4p1:  mtu 9000 qdisc pfifo_fast state UP 
> mode DEFAULT qlen 1000
> link/ether 68:05:ca:01:bc:0c brd ff:ff:ff:ff:ff:ff
> 7: ovirtmgmt:  mtu 1500 qdisc noqueue state 
> UP mode DEFAULT
> link/ether d0:67:e5:f0:e5:c6 brd ff:ff:ff:ff:ff:ff
> 8: ;vdsmdummy;:  mtu 1500 qdisc noop state DOWN mode 
> DEFAULT
> link/ether ce:0f:16:49:a7:da brd ff:ff:ff:ff:ff:ff
> 
> vdsm files before update:
> /var/lib/vdsm
> /var/lib/vdsm/bonding-defaults.json
> /var/lib/vdsm/netconfback
> /var/lib/vdsm/netconfback/ifcfg-ovirtmgmt
> /var/lib/vdsm/netconfback/ifcfg-em1
> /var/lib/vdsm/netconfback/route-ovirtmgmt
> /var/lib/vdsm/netconfback/rule-ovirtmgmt
> /var/lib/vdsm/netconfback/ifcfg-p4p1
> /var/lib/vdsm/netconfback/ifcfg-p3p1
> /var/lib/vdsm/persistence
> /var/lib/vdsm/persistence/netconf
> /var/lib/vdsm/persistence/netconf.141697752319079
> /var/lib/vdsm/persistence/netconf.141697752319079/nets
> /var/lib/vdsm/persistence/netconf.141697752319079/nets/san1
> /var/lib/vdsm/persistence/netconf.141697752319079/nets/san2
> /var/lib/vdsm/persistence/netconf.141697752319079/nets/ovirtmgmt
> /var/lib/vdsm/upgrade
> /var/lib/vdsm/upgrade/upgrade-unified-persistence
> /var/lib/vdsm/transient
> 
> 
> File in /var/lib/vdsm/netconfback each only contained a comment:
> # original file did not exist

This is quite peculiar. Do you know when these where created?
Have you made any networking changes on 3.5.3 just before boot?

> 
> /var/lib/vdsm/persistence/netconf.141697752319079/nets/ovirtmgmt
> {"nic": "em1", "netmask": "255.255.255.0", "bootproto": "none", "ipaddr": 
> "1.2.3.16", "gateway": "1.2.3.11"}
> 
> /var/lib/vdsm/persistence/netconf.141697752319079/nets/san1
> {"nic": "p3p1", "netmask": "255.255.255.0", "ipaddr": "4.5.6.102", "bridged": 
> "false", "mtu": "9000"}
> 
> /var/lib/vdsm/persistence/netconf.141697752319079/nets/san2
> {"nic": "p4p1", "netmask": "255.255.255.0", "ipaddr": "4.5.7.102", "bridged": 
> "false", "mtu": "9000"}
> 
> 
> After update and reboot, no ifcfg scripts are left. Only interface lo is up.
>