On Thu, Mar 3, 2016 at 2:54 AM, David LeVene <[email protected]> wrote:
> Hi, > > Thanks for the quick responses & help.. answers in-line at the end of this > email. > > Cheers > David > > -----Original Message----- > From: Edward Haas [mailto:[email protected]] > Sent: Wednesday, March 02, 2016 20:05 > To: David LeVene <[email protected]>; Dan Kenigsberg < > [email protected]> > Cc: [email protected] > Subject: Re: [ovirt-users] 3.6 looses network on reboot > > On 03/02/2016 01:36 AM, David LeVene wrote: > > Hi Dan, > > > > I missed the email as the subject line changed! > > > > So we use and run IPv6 in our network - not sure if this is related. The > Addresses are handed out via SLAAC so that would be where the IPv6 address > is coming from. > > > > My memory is a bit sketchy... but I think if I remove the vmfex/SRIOV > vNIC and only run with the one vNIC it works fine, it's when I bring the > second NIC into play with SRIOV the issues arise. > > > > Answers inline. > > > > -----Original Message----- > > From: Dan Kenigsberg [mailto:[email protected]] > > Sent: Tuesday, March 01, 2016 00:28 > > To: David LeVene <[email protected]> > > Cc: [email protected]; [email protected] > > Subject: Re: [ovirt-users] 3.6 looses network on reboot > > > > This sounds very bad. Changing the subject, so the wider, more > problematic issue is visible. > > > > Did any other user see this behavior? > > > > On Mon, Feb 29, 2016 at 06:27:46AM +0000, David LeVene wrote: > >> Hi Dan, > >> > >> Answers as follows; > >> > >> # rpm -qa | grep -i vdsm > >> vdsm-jsonrpc-4.17.18-1.el7.noarch > >> vdsm-hook-vmfex-4.17.18-1.el7.noarch > >> vdsm-infra-4.17.18-1.el7.noarch > >> vdsm-4.17.18-1.el7.noarch > >> vdsm-python-4.17.18-1.el7.noarch > >> vdsm-yajsonrpc-4.17.18-1.el7.noarch > >> vdsm-cli-4.17.18-1.el7.noarch > >> vdsm-xmlrpc-4.17.18-1.el7.noarch > >> vdsm-hook-vmfex-dev-4.17.18-1.el7.noarch > >> > >> > >> There was in this folder ifcfg-ovirtmgnt bridge setup, and also > route-ovirtmgnt & rule-ovirtmgmt.. but they were removed after the reboot. > >> > >> # ls -althr | grep ifcfg > >> -rw-r--r--. 1 root root 254 Sep 16 21:21 ifcfg-lo -rw-r--r--. 1 root > >> root 120 Feb 25 14:07 ifcfg-enp7s0f0 -rw-rw-r--. 1 root root 174 > >> Feb > >> 25 14:40 ifcfg-enp6s0 > >> > >> I think I modified ifcfg-enp6s0 to get networking up again (eg was set > to bridge.. but the bridge wasn't configured).. it was a few days ago.. if > it's important I can reboot the box again to see what state it comes up > with. > >> > >> # cat ifcfg-enp6s0 > >> BOOTPROTO="none" > >> IPADDR="10.80.10.117" > >> NETMASK="255.255.255.0" > >> GATEWAY="10.80.10.1" > >> DEVICE="enp6s0" > >> HWADDR="00:25:b5:00:0b:4f" > >> ONBOOT=yes > >> PEERDNS=yes > >> PEERROUTES=yes > >> MTU=1500 > >> > >> # cat ifcfg-enp7s0f0 > >> # Generated by VDSM version 4.17.18-1.el7 > >> DEVICE=enp7s0f0 > >> ONBOOT=yes > >> MTU=1500 > >> HWADDR=00:25:b5:00:0b:0f > >> NM_CONTROLLED=no > >> > >> # find /var/lib/vdsm/persistence > >> /var/lib/vdsm/persistence > >> /var/lib/vdsm/persistence/netconf > >> /var/lib/vdsm/persistence/netconf.1456371473833165545 > >> /var/lib/vdsm/persistence/netconf.1456371473833165545/nets > >> /var/lib/vdsm/persistence/netconf.1456371473833165545/nets/ovirtmgmt > >> > >> # cat > >> /var/lib/vdsm/persistence/netconf.1456371473833165545/nets/ovirtmgmt > >> { > >> "nic": "enp6s0", > >> "ipaddr": "10.80.10.117", > >> "mtu": "1500", > >> "netmask": "255.255.255.0", > >> "STP": "no", > >> "bridged": "true", > >> "gateway": "10.80.10.1", > >> "defaultRoute": true > >> } > >> > >> Supervdsm log is attached. > > > > Have you editted ifcfg-ovirtmgmt manually? > > Nope > > > > Can you somehow reproduce it, and share its content? > > Yea, I should be able to reproduce it - just gotta fix it first (create > the networking manually and get VDSM on-line). Also it’s a side > project/investigation at the moment so time isn't on my side... > > > > Would it help if I take an sosreport before and after? I don’t' mine > emailing these directly to yourself. > > > > Do you have NetworkManager running? which version? > > NM is disabled, but the version is... > > # rpm -q NetworkManager > > NetworkManager-1.0.6-27.el7.x86_64 > > # systemctl status NetworkManager.service ● NetworkManager.service - > > Network Manager > > Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; > disabled; vendor preset: enabled) > > Active: inactive (dead) > > > > It seems that Vdsm has two bugs: on boot, initscripts end up setting > > an > > ipv6 address that Vdsm never requested > > > > As mentioned above this would have come from SLAAC which we have setup > > in our network > > > > restore-net::INFO::2016-02-25 > > 14:14:58,024::vdsm-restore-net-config::261::root::(_find_changed_or_mi > > ssing) ovirtmgmt is different or missing from persistent > > configuration. current: {'nic': 'enp6s0', 'dhcpv6': False, 'ipaddr': > > '10.80.10.117', 'mtu': '1500', 'netmask': '255.255.255.0', > > 'bootproto': 'none', 'stp': False, 'bridged': True, 'ipv6addr': > > ['2400:7d00:110:3:225:b5ff:fe00:b4f/64'], 'gateway': '10.80.10.1', > > 'defaultRoute': True}, persisted: {u'nic': u'enp6s0', 'dhcpv6': False, > > u'ipaddr': u'10.80.10.117', u'mtu': '1500', u'netmask': > > u'255.255.255.0', 'bootproto': 'none', 'stp': False, u'bridged': True, > > u'gateway': u'10.80.10.1', u'defaultRoute': True} > > > > > > Then, Vdsm tries to drop the > > unsolicited address, but fails. Both must be fixed ASAP. > > > > restore-net::ERROR::2016-02-25 > 14:14:59,490::__init__::58::root::(__exit__) Failed rollback transaction > last known good network. > > Traceback (most recent call last): > > File "/usr/share/vdsm/network/api.py", line 918, in setupNetworks > > keep_bridge=keep_bridge) > > File "/usr/share/vdsm/network/api.py", line 222, in wrapped > > ret = func(**attrs) > > File "/usr/share/vdsm/network/api.py", line 502, in _delNetwork > > configurator.removeQoS(net_ent) > > File "/usr/share/vdsm/network/configurators/__init__.py", line > 122, in removeQoS > > qos.remove_outbound(top_device) > > File "/usr/share/vdsm/network/configurators/qos.py", line 60, in > remove_outbound > > device, pref=_NON_VLANNED_ID if vlan_tag is None else vlan_tag) > > File "/usr/share/vdsm/network/tc/filter.py", line 31, in delete > > _wrapper.process_request(command) > > File "/usr/share/vdsm/network/tc/_wrapper.py", line 38, in > process_request > > raise TrafficControlException(retcode, err, command) > > TrafficControlException: (None, 'Message truncated', > > ['/usr/sbin/tc', 'filter', 'del', 'dev', 'enp6s0', 'pref', '5000']) > > > > Regards, > > Dan. > > > > Hi David, > > You have encountered two issues, the first with IPv6, which we do not > fully support in 3.6 and a the second with an unmanaged failure during > network setup on boot. > We are going to back-port both fixes very soon. > > Can you check our patches? They should resolve the problem we saw in the > log: https://gerrit.ovirt.org/#/c/54237 (based on oVirt-3.6.3) > > -- I've manually applied the patch to the node that I was testing on and > the networking comes on-line correctly - now I'm encountering a gluster > issue with cannot find master domain. > Please attach vdsm logs showing gluster connection attempts. You should have also interesting logs in /var/log/glusterfs/ - there should be a log for each gluster connection (server:/path). Nir
_______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

