[Yahoo-eng-team] [Bug 1460164] Re: restart of openvswitch-switch causes instance network down when l2population enabled

2016-12-01 Thread Corey Bryant
Marking Juno as "Won't fix" for the Ubuntu Cloud Archive since it is
EOL.

** Changed in: cloud-archive/juno
   Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1460164

Title:
  restart of openvswitch-switch causes instance network down when
  l2population enabled

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive icehouse series:
  Fix Released
Status in Ubuntu Cloud Archive juno series:
  Won't Fix
Status in Ubuntu Cloud Archive kilo series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron kilo series:
  New
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Trusty:
  Fix Released
Status in neutron source package in Wily:
  Fix Released
Status in neutron source package in Xenial:
  Fix Released

Bug description:
  [Impact]
  Restarts of openvswitch (typically on upgrade) result in loss of tunnel 
connectivity when the l2population driver is in use.  This results in loss of 
access to all instances on the effected compute hosts

  [Test Case]
  Deploy cloud with ml2/ovs/l2population enabled
  boot instances
  restart ovs; instance connectivity will be lost until the 
neutron-openvswitch-agent is restarted on the compute hosts.

  [Regression Potential]
  Minimal - in multiple stable branches upstream.

  [Original Bug Report]
  On 2015-05-28, our Landscape auto-upgraded packages on two of our
  OpenStack clouds.  On both clouds, but only on some compute nodes, the
  upgrade of openvswitch-switch and corresponding downtime of
  ovs-vswitchd appears to have triggered some sort of race condition
  within neutron-plugin-openvswitch-agent leaving it in a broken state;
  any new instances come up with non-functional network but pre-existing
  instances appear unaffected.  Restarting n-p-ovs-agent on the affected
  compute nodes is sufficient to work around the problem.

  The packages Landscape upgraded (from /var/log/apt/history.log):

  Start-Date: 2015-05-28  14:23:07
  Upgrade: nova-compute-libvirt:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
libsystemd-login0:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
nova-compute-kvm:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
systemd-services:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
isc-dhcp-common:amd64 (4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), nova-common:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), python-nova:amd64 (2014.1.4-0ubuntu2, 
2014.1.4-0ubuntu2.1), libsystemd-daemon0:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), grub-common:amd64 (2.02~beta2-9ubuntu1.1, 
2.02~beta2-9ubuntu1.2), libpam-systemd:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), udev:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
grub2-common:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), 
openvswitch-switch:amd64 (2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2), 
libudev1:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), isc-dhcp-client:amd64 
(4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), python-eventlet:amd64 (0.13.0-1ubuntu2, 
0.13.0-1ubuntu
 2.1), python-novaclient:amd64 (2.17.0-0ubuntu1.1, 2.17.0-0ubuntu1.2), 
grub-pc-bin:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), grub-pc:amd64 
(2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), nova-compute:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), openvswitch-common:amd64 
(2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2)
  End-Date: 2015-05-28  14:24:47

  From /var/log/neutron/openvswitch-agent.log:

  2015-05-28 14:24:18.336 47866 ERROR neutron.agent.linux.ovsdb_monitor
  [-] Error received from ovsdb monitor: ovsdb-client:
  unix:/var/run/openvswitch/db.sock: receive failed (End of file)

  Looking at a stuck instances, all the right tunnels and bridges and
  what not appear to be there:

  root@vector:~# ip l l | grep c-3b
  460002: qbr7ed8b59c-3b:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT group default
  460003: qvo7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
  460004: qvb7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master qbr7ed8b59c-3b state UP mode DEFAULT group default qlen 
1000
  460005: tap7ed8b59c-3b:  mtu 1500 qdisc 
pfifo_fast master qbr7ed8b59c-3b state UNKNOWN mode DEFAULT group default qlen 
500
  root@vector:~# ovs-vsctl list-ports br-int | grep c-3b
  qvo7ed8b59c-3b
  root@vector:~#

  But I can't ping the unit from within the qrouter-${id} namespace on
  the neutron gateway.  If I tcpdump the {q,t}*c-3b interfaces, I don't
  see any traffic.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1460164/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : 

[Yahoo-eng-team] [Bug 1460164] Re: restart of openvswitch-switch causes instance network down when l2population enabled

2016-05-09 Thread Dave Walker
** Also affects: neutron/kilo
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1460164

Title:
  restart of openvswitch-switch causes instance network down when
  l2population enabled

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive icehouse series:
  Fix Released
Status in Ubuntu Cloud Archive juno series:
  New
Status in Ubuntu Cloud Archive kilo series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron kilo series:
  New
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Trusty:
  Fix Released
Status in neutron source package in Wily:
  Fix Released
Status in neutron source package in Xenial:
  Fix Released

Bug description:
  [Impact]
  Restarts of openvswitch (typically on upgrade) result in loss of tunnel 
connectivity when the l2population driver is in use.  This results in loss of 
access to all instances on the effected compute hosts

  [Test Case]
  Deploy cloud with ml2/ovs/l2population enabled
  boot instances
  restart ovs; instance connectivity will be lost until the 
neutron-openvswitch-agent is restarted on the compute hosts.

  [Regression Potential]
  Minimal - in multiple stable branches upstream.

  [Original Bug Report]
  On 2015-05-28, our Landscape auto-upgraded packages on two of our
  OpenStack clouds.  On both clouds, but only on some compute nodes, the
  upgrade of openvswitch-switch and corresponding downtime of
  ovs-vswitchd appears to have triggered some sort of race condition
  within neutron-plugin-openvswitch-agent leaving it in a broken state;
  any new instances come up with non-functional network but pre-existing
  instances appear unaffected.  Restarting n-p-ovs-agent on the affected
  compute nodes is sufficient to work around the problem.

  The packages Landscape upgraded (from /var/log/apt/history.log):

  Start-Date: 2015-05-28  14:23:07
  Upgrade: nova-compute-libvirt:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
libsystemd-login0:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
nova-compute-kvm:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
systemd-services:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
isc-dhcp-common:amd64 (4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), nova-common:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), python-nova:amd64 (2014.1.4-0ubuntu2, 
2014.1.4-0ubuntu2.1), libsystemd-daemon0:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), grub-common:amd64 (2.02~beta2-9ubuntu1.1, 
2.02~beta2-9ubuntu1.2), libpam-systemd:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), udev:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
grub2-common:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), 
openvswitch-switch:amd64 (2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2), 
libudev1:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), isc-dhcp-client:amd64 
(4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), python-eventlet:amd64 (0.13.0-1ubuntu2, 
0.13.0-1ubuntu
 2.1), python-novaclient:amd64 (2.17.0-0ubuntu1.1, 2.17.0-0ubuntu1.2), 
grub-pc-bin:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), grub-pc:amd64 
(2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), nova-compute:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), openvswitch-common:amd64 
(2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2)
  End-Date: 2015-05-28  14:24:47

  From /var/log/neutron/openvswitch-agent.log:

  2015-05-28 14:24:18.336 47866 ERROR neutron.agent.linux.ovsdb_monitor
  [-] Error received from ovsdb monitor: ovsdb-client:
  unix:/var/run/openvswitch/db.sock: receive failed (End of file)

  Looking at a stuck instances, all the right tunnels and bridges and
  what not appear to be there:

  root@vector:~# ip l l | grep c-3b
  460002: qbr7ed8b59c-3b:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT group default
  460003: qvo7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
  460004: qvb7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master qbr7ed8b59c-3b state UP mode DEFAULT group default qlen 
1000
  460005: tap7ed8b59c-3b:  mtu 1500 qdisc 
pfifo_fast master qbr7ed8b59c-3b state UNKNOWN mode DEFAULT group default qlen 
500
  root@vector:~# ovs-vsctl list-ports br-int | grep c-3b
  qvo7ed8b59c-3b
  root@vector:~#

  But I can't ping the unit from within the qrouter-${id} namespace on
  the neutron gateway.  If I tcpdump the {q,t}*c-3b interfaces, I don't
  see any traffic.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1460164/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1460164] Re: restart of openvswitch-switch causes instance network down when l2population enabled

2016-03-30 Thread James Page
** Changed in: cloud-archive
   Status: In Progress => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1460164

Title:
  restart of openvswitch-switch causes instance network down when
  l2population enabled

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive icehouse series:
  Fix Released
Status in Ubuntu Cloud Archive juno series:
  New
Status in Ubuntu Cloud Archive kilo series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Trusty:
  Fix Released
Status in neutron source package in Wily:
  Fix Released
Status in neutron source package in Xenial:
  Fix Released

Bug description:
  [Impact]
  Restarts of openvswitch (typically on upgrade) result in loss of tunnel 
connectivity when the l2population driver is in use.  This results in loss of 
access to all instances on the effected compute hosts

  [Test Case]
  Deploy cloud with ml2/ovs/l2population enabled
  boot instances
  restart ovs; instance connectivity will be lost until the 
neutron-openvswitch-agent is restarted on the compute hosts.

  [Regression Potential]
  Minimal - in multiple stable branches upstream.

  [Original Bug Report]
  On 2015-05-28, our Landscape auto-upgraded packages on two of our
  OpenStack clouds.  On both clouds, but only on some compute nodes, the
  upgrade of openvswitch-switch and corresponding downtime of
  ovs-vswitchd appears to have triggered some sort of race condition
  within neutron-plugin-openvswitch-agent leaving it in a broken state;
  any new instances come up with non-functional network but pre-existing
  instances appear unaffected.  Restarting n-p-ovs-agent on the affected
  compute nodes is sufficient to work around the problem.

  The packages Landscape upgraded (from /var/log/apt/history.log):

  Start-Date: 2015-05-28  14:23:07
  Upgrade: nova-compute-libvirt:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
libsystemd-login0:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
nova-compute-kvm:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
systemd-services:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
isc-dhcp-common:amd64 (4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), nova-common:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), python-nova:amd64 (2014.1.4-0ubuntu2, 
2014.1.4-0ubuntu2.1), libsystemd-daemon0:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), grub-common:amd64 (2.02~beta2-9ubuntu1.1, 
2.02~beta2-9ubuntu1.2), libpam-systemd:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), udev:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
grub2-common:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), 
openvswitch-switch:amd64 (2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2), 
libudev1:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), isc-dhcp-client:amd64 
(4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), python-eventlet:amd64 (0.13.0-1ubuntu2, 
0.13.0-1ubuntu
 2.1), python-novaclient:amd64 (2.17.0-0ubuntu1.1, 2.17.0-0ubuntu1.2), 
grub-pc-bin:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), grub-pc:amd64 
(2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), nova-compute:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), openvswitch-common:amd64 
(2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2)
  End-Date: 2015-05-28  14:24:47

  From /var/log/neutron/openvswitch-agent.log:

  2015-05-28 14:24:18.336 47866 ERROR neutron.agent.linux.ovsdb_monitor
  [-] Error received from ovsdb monitor: ovsdb-client:
  unix:/var/run/openvswitch/db.sock: receive failed (End of file)

  Looking at a stuck instances, all the right tunnels and bridges and
  what not appear to be there:

  root@vector:~# ip l l | grep c-3b
  460002: qbr7ed8b59c-3b:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT group default
  460003: qvo7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
  460004: qvb7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master qbr7ed8b59c-3b state UP mode DEFAULT group default qlen 
1000
  460005: tap7ed8b59c-3b:  mtu 1500 qdisc 
pfifo_fast master qbr7ed8b59c-3b state UNKNOWN mode DEFAULT group default qlen 
500
  root@vector:~# ovs-vsctl list-ports br-int | grep c-3b
  qvo7ed8b59c-3b
  root@vector:~#

  But I can't ping the unit from within the qrouter-${id} namespace on
  the neutron gateway.  If I tcpdump the {q,t}*c-3b interfaces, I don't
  see any traffic.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1460164/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1460164] Re: restart of openvswitch-switch causes instance network down when l2population enabled

2016-03-30 Thread James Page
This bug was fixed in the package neutron - 1:2015.1.3-0ubuntu1
---

 neutron (1:2015.1.3-0ubuntu1) trusty-kilo; urgency=medium
 .
   [ Corey Bryant ]
   * d/tests/neutron-agents: Give daemon 5 seconds to start to prevent race.
 .
   [ James Page ]
   * New upstream stable release (LP: #1559215):
 - d/p/dhcp-protect-against-case-when-device-name-is-none.patch: Dropped,
   included upstream.
   * Ensure tunnels are fully reconstructed after openvswitch restarts
 (LP: #1460164):
 - d/p/ovs-restart.patch: Cherry picked from upstream review.


** Changed in: cloud-archive/kilo
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1460164

Title:
  restart of openvswitch-switch causes instance network down when
  l2population enabled

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive icehouse series:
  Fix Released
Status in Ubuntu Cloud Archive juno series:
  New
Status in Ubuntu Cloud Archive kilo series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Trusty:
  Fix Released
Status in neutron source package in Wily:
  Fix Released
Status in neutron source package in Xenial:
  Fix Released

Bug description:
  [Impact]
  Restarts of openvswitch (typically on upgrade) result in loss of tunnel 
connectivity when the l2population driver is in use.  This results in loss of 
access to all instances on the effected compute hosts

  [Test Case]
  Deploy cloud with ml2/ovs/l2population enabled
  boot instances
  restart ovs; instance connectivity will be lost until the 
neutron-openvswitch-agent is restarted on the compute hosts.

  [Regression Potential]
  Minimal - in multiple stable branches upstream.

  [Original Bug Report]
  On 2015-05-28, our Landscape auto-upgraded packages on two of our
  OpenStack clouds.  On both clouds, but only on some compute nodes, the
  upgrade of openvswitch-switch and corresponding downtime of
  ovs-vswitchd appears to have triggered some sort of race condition
  within neutron-plugin-openvswitch-agent leaving it in a broken state;
  any new instances come up with non-functional network but pre-existing
  instances appear unaffected.  Restarting n-p-ovs-agent on the affected
  compute nodes is sufficient to work around the problem.

  The packages Landscape upgraded (from /var/log/apt/history.log):

  Start-Date: 2015-05-28  14:23:07
  Upgrade: nova-compute-libvirt:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
libsystemd-login0:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
nova-compute-kvm:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
systemd-services:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
isc-dhcp-common:amd64 (4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), nova-common:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), python-nova:amd64 (2014.1.4-0ubuntu2, 
2014.1.4-0ubuntu2.1), libsystemd-daemon0:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), grub-common:amd64 (2.02~beta2-9ubuntu1.1, 
2.02~beta2-9ubuntu1.2), libpam-systemd:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), udev:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
grub2-common:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), 
openvswitch-switch:amd64 (2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2), 
libudev1:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), isc-dhcp-client:amd64 
(4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), python-eventlet:amd64 (0.13.0-1ubuntu2, 
0.13.0-1ubuntu
 2.1), python-novaclient:amd64 (2.17.0-0ubuntu1.1, 2.17.0-0ubuntu1.2), 
grub-pc-bin:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), grub-pc:amd64 
(2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), nova-compute:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), openvswitch-common:amd64 
(2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2)
  End-Date: 2015-05-28  14:24:47

  From /var/log/neutron/openvswitch-agent.log:

  2015-05-28 14:24:18.336 47866 ERROR neutron.agent.linux.ovsdb_monitor
  [-] Error received from ovsdb monitor: ovsdb-client:
  unix:/var/run/openvswitch/db.sock: receive failed (End of file)

  Looking at a stuck instances, all the right tunnels and bridges and
  what not appear to be there:

  root@vector:~# ip l l | grep c-3b
  460002: qbr7ed8b59c-3b:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT group default
  460003: qvo7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
  460004: qvb7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master qbr7ed8b59c-3b state UP mode DEFAULT group default qlen 
1000
  460005: tap7ed8b59c-3b:  mtu 1500 qdisc 
pfifo_fast master qbr7ed8b59c-3b state UNKNOWN mode DEFAULT group default qlen 
500
  root@vector:~# ovs-vsctl list-ports br-int | 

[Yahoo-eng-team] [Bug 1460164] Re: restart of openvswitch-switch causes instance network down when l2population enabled

2016-03-03 Thread Corey Bryant
** Changed in: cloud-archive/icehouse
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1460164

Title:
  restart of openvswitch-switch causes instance network down when
  l2population enabled

Status in Ubuntu Cloud Archive:
  In Progress
Status in Ubuntu Cloud Archive icehouse series:
  Fix Released
Status in Ubuntu Cloud Archive juno series:
  New
Status in Ubuntu Cloud Archive kilo series:
  In Progress
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Trusty:
  Fix Released
Status in neutron source package in Wily:
  Fix Released
Status in neutron source package in Xenial:
  Fix Released

Bug description:
  [Impact]
  Restarts of openvswitch (typically on upgrade) result in loss of tunnel 
connectivity when the l2population driver is in use.  This results in loss of 
access to all instances on the effected compute hosts

  [Test Case]
  Deploy cloud with ml2/ovs/l2population enabled
  boot instances
  restart ovs; instance connectivity will be lost until the 
neutron-openvswitch-agent is restarted on the compute hosts.

  [Regression Potential]
  Minimal - in multiple stable branches upstream.

  [Original Bug Report]
  On 2015-05-28, our Landscape auto-upgraded packages on two of our
  OpenStack clouds.  On both clouds, but only on some compute nodes, the
  upgrade of openvswitch-switch and corresponding downtime of
  ovs-vswitchd appears to have triggered some sort of race condition
  within neutron-plugin-openvswitch-agent leaving it in a broken state;
  any new instances come up with non-functional network but pre-existing
  instances appear unaffected.  Restarting n-p-ovs-agent on the affected
  compute nodes is sufficient to work around the problem.

  The packages Landscape upgraded (from /var/log/apt/history.log):

  Start-Date: 2015-05-28  14:23:07
  Upgrade: nova-compute-libvirt:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
libsystemd-login0:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
nova-compute-kvm:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
systemd-services:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
isc-dhcp-common:amd64 (4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), nova-common:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), python-nova:amd64 (2014.1.4-0ubuntu2, 
2014.1.4-0ubuntu2.1), libsystemd-daemon0:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), grub-common:amd64 (2.02~beta2-9ubuntu1.1, 
2.02~beta2-9ubuntu1.2), libpam-systemd:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), udev:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
grub2-common:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), 
openvswitch-switch:amd64 (2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2), 
libudev1:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), isc-dhcp-client:amd64 
(4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), python-eventlet:amd64 (0.13.0-1ubuntu2, 
0.13.0-1ubuntu
 2.1), python-novaclient:amd64 (2.17.0-0ubuntu1.1, 2.17.0-0ubuntu1.2), 
grub-pc-bin:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), grub-pc:amd64 
(2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), nova-compute:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), openvswitch-common:amd64 
(2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2)
  End-Date: 2015-05-28  14:24:47

  From /var/log/neutron/openvswitch-agent.log:

  2015-05-28 14:24:18.336 47866 ERROR neutron.agent.linux.ovsdb_monitor
  [-] Error received from ovsdb monitor: ovsdb-client:
  unix:/var/run/openvswitch/db.sock: receive failed (End of file)

  Looking at a stuck instances, all the right tunnels and bridges and
  what not appear to be there:

  root@vector:~# ip l l | grep c-3b
  460002: qbr7ed8b59c-3b:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT group default
  460003: qvo7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
  460004: qvb7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master qbr7ed8b59c-3b state UP mode DEFAULT group default qlen 
1000
  460005: tap7ed8b59c-3b:  mtu 1500 qdisc 
pfifo_fast master qbr7ed8b59c-3b state UNKNOWN mode DEFAULT group default qlen 
500
  root@vector:~# ovs-vsctl list-ports br-int | grep c-3b
  qvo7ed8b59c-3b
  root@vector:~#

  But I can't ping the unit from within the qrouter-${id} namespace on
  the neutron gateway.  If I tcpdump the {q,t}*c-3b interfaces, I don't
  see any traffic.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1460164/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1460164] Re: restart of openvswitch-switch causes instance network down when l2population enabled

2016-03-03 Thread Launchpad Bug Tracker
This bug was fixed in the package neutron - 1:2014.1.5-0ubuntu3

---
neutron (1:2014.1.5-0ubuntu3) trusty; urgency=medium

  [ Corey Bryant ]
  * d/p/make_del_fdb_flow_idempotent.patch: Cherry pick from Juno
to prevent KeyError on duplicate port removal in del_fdb_flow()
(LP: #1531963).
  * d/tests/*-plugin: Fix race between service restart and pidof test.

  [ James Page ]
  * d/p/ovs-restart.patch: Ensure that tunnels are fully reset on ovs
restart (LP: #1460164).

 -- Corey Bryant   Wed, 10 Feb 2016 14:52:04
-0500

** Changed in: neutron (Ubuntu Trusty)
   Status: Fix Committed => Fix Released

** Changed in: neutron (Ubuntu Wily)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1460164

Title:
  restart of openvswitch-switch causes instance network down when
  l2population enabled

Status in Ubuntu Cloud Archive:
  In Progress
Status in Ubuntu Cloud Archive icehouse series:
  Fix Committed
Status in Ubuntu Cloud Archive juno series:
  New
Status in Ubuntu Cloud Archive kilo series:
  In Progress
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Trusty:
  Fix Released
Status in neutron source package in Wily:
  Fix Released
Status in neutron source package in Xenial:
  Fix Released

Bug description:
  [Impact]
  Restarts of openvswitch (typically on upgrade) result in loss of tunnel 
connectivity when the l2population driver is in use.  This results in loss of 
access to all instances on the effected compute hosts

  [Test Case]
  Deploy cloud with ml2/ovs/l2population enabled
  boot instances
  restart ovs; instance connectivity will be lost until the 
neutron-openvswitch-agent is restarted on the compute hosts.

  [Regression Potential]
  Minimal - in multiple stable branches upstream.

  [Original Bug Report]
  On 2015-05-28, our Landscape auto-upgraded packages on two of our
  OpenStack clouds.  On both clouds, but only on some compute nodes, the
  upgrade of openvswitch-switch and corresponding downtime of
  ovs-vswitchd appears to have triggered some sort of race condition
  within neutron-plugin-openvswitch-agent leaving it in a broken state;
  any new instances come up with non-functional network but pre-existing
  instances appear unaffected.  Restarting n-p-ovs-agent on the affected
  compute nodes is sufficient to work around the problem.

  The packages Landscape upgraded (from /var/log/apt/history.log):

  Start-Date: 2015-05-28  14:23:07
  Upgrade: nova-compute-libvirt:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
libsystemd-login0:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
nova-compute-kvm:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
systemd-services:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
isc-dhcp-common:amd64 (4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), nova-common:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), python-nova:amd64 (2014.1.4-0ubuntu2, 
2014.1.4-0ubuntu2.1), libsystemd-daemon0:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), grub-common:amd64 (2.02~beta2-9ubuntu1.1, 
2.02~beta2-9ubuntu1.2), libpam-systemd:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), udev:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
grub2-common:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), 
openvswitch-switch:amd64 (2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2), 
libudev1:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), isc-dhcp-client:amd64 
(4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), python-eventlet:amd64 (0.13.0-1ubuntu2, 
0.13.0-1ubuntu
 2.1), python-novaclient:amd64 (2.17.0-0ubuntu1.1, 2.17.0-0ubuntu1.2), 
grub-pc-bin:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), grub-pc:amd64 
(2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), nova-compute:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), openvswitch-common:amd64 
(2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2)
  End-Date: 2015-05-28  14:24:47

  From /var/log/neutron/openvswitch-agent.log:

  2015-05-28 14:24:18.336 47866 ERROR neutron.agent.linux.ovsdb_monitor
  [-] Error received from ovsdb monitor: ovsdb-client:
  unix:/var/run/openvswitch/db.sock: receive failed (End of file)

  Looking at a stuck instances, all the right tunnels and bridges and
  what not appear to be there:

  root@vector:~# ip l l | grep c-3b
  460002: qbr7ed8b59c-3b:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT group default
  460003: qvo7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
  460004: qvb7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master qbr7ed8b59c-3b state UP mode DEFAULT group default qlen 
1000
  460005: tap7ed8b59c-3b:  mtu 1500 qdisc 
pfifo_fast master qbr7ed8b59c-3b 

[Yahoo-eng-team] [Bug 1460164] Re: restart of openvswitch-switch causes instance network down when l2population enabled

2016-02-22 Thread James Page
** Also affects: cloud-archive/icehouse
   Importance: Undecided
   Status: New

** Changed in: cloud-archive/icehouse
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1460164

Title:
  restart of openvswitch-switch causes instance network down when
  l2population enabled

Status in Ubuntu Cloud Archive:
  In Progress
Status in Ubuntu Cloud Archive icehouse series:
  Fix Committed
Status in Ubuntu Cloud Archive juno series:
  New
Status in Ubuntu Cloud Archive kilo series:
  In Progress
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Trusty:
  Fix Committed
Status in neutron source package in Wily:
  In Progress
Status in neutron source package in Xenial:
  Fix Released

Bug description:
  [Impact]
  Restarts of openvswitch (typically on upgrade) result in loss of tunnel 
connectivity when the l2population driver is in use.  This results in loss of 
access to all instances on the effected compute hosts

  [Test Case]
  Deploy cloud with ml2/ovs/l2population enabled
  boot instances
  restart ovs; instance connectivity will be lost until the 
neutron-openvswitch-agent is restarted on the compute hosts.

  [Regression Potential]
  Minimal - in multiple stable branches upstream.

  [Original Bug Report]
  On 2015-05-28, our Landscape auto-upgraded packages on two of our
  OpenStack clouds.  On both clouds, but only on some compute nodes, the
  upgrade of openvswitch-switch and corresponding downtime of
  ovs-vswitchd appears to have triggered some sort of race condition
  within neutron-plugin-openvswitch-agent leaving it in a broken state;
  any new instances come up with non-functional network but pre-existing
  instances appear unaffected.  Restarting n-p-ovs-agent on the affected
  compute nodes is sufficient to work around the problem.

  The packages Landscape upgraded (from /var/log/apt/history.log):

  Start-Date: 2015-05-28  14:23:07
  Upgrade: nova-compute-libvirt:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
libsystemd-login0:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
nova-compute-kvm:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
systemd-services:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
isc-dhcp-common:amd64 (4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), nova-common:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), python-nova:amd64 (2014.1.4-0ubuntu2, 
2014.1.4-0ubuntu2.1), libsystemd-daemon0:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), grub-common:amd64 (2.02~beta2-9ubuntu1.1, 
2.02~beta2-9ubuntu1.2), libpam-systemd:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), udev:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
grub2-common:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), 
openvswitch-switch:amd64 (2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2), 
libudev1:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), isc-dhcp-client:amd64 
(4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), python-eventlet:amd64 (0.13.0-1ubuntu2, 
0.13.0-1ubuntu
 2.1), python-novaclient:amd64 (2.17.0-0ubuntu1.1, 2.17.0-0ubuntu1.2), 
grub-pc-bin:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), grub-pc:amd64 
(2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), nova-compute:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), openvswitch-common:amd64 
(2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2)
  End-Date: 2015-05-28  14:24:47

  From /var/log/neutron/openvswitch-agent.log:

  2015-05-28 14:24:18.336 47866 ERROR neutron.agent.linux.ovsdb_monitor
  [-] Error received from ovsdb monitor: ovsdb-client:
  unix:/var/run/openvswitch/db.sock: receive failed (End of file)

  Looking at a stuck instances, all the right tunnels and bridges and
  what not appear to be there:

  root@vector:~# ip l l | grep c-3b
  460002: qbr7ed8b59c-3b:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT group default
  460003: qvo7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
  460004: qvb7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master qbr7ed8b59c-3b state UP mode DEFAULT group default qlen 
1000
  460005: tap7ed8b59c-3b:  mtu 1500 qdisc 
pfifo_fast master qbr7ed8b59c-3b state UNKNOWN mode DEFAULT group default qlen 
500
  root@vector:~# ovs-vsctl list-ports br-int | grep c-3b
  qvo7ed8b59c-3b
  root@vector:~#

  But I can't ping the unit from within the qrouter-${id} namespace on
  the neutron gateway.  If I tcpdump the {q,t}*c-3b interfaces, I don't
  see any traffic.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1460164/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More 

[Yahoo-eng-team] [Bug 1460164] Re: restart of openvswitch-switch causes instance network down when l2population enabled

2016-02-11 Thread James Page
** Also affects: neutron (Ubuntu Xenial)
   Importance: High
   Status: Triaged

** Also affects: neutron (Ubuntu Trusty)
   Importance: Undecided
   Status: New

** Also affects: neutron (Ubuntu Wily)
   Importance: Undecided
   Status: New

** Changed in: neutron (Ubuntu Xenial)
   Status: Triaged => Fix Released

** Changed in: neutron (Ubuntu Wily)
   Importance: Undecided => High

** Changed in: neutron (Ubuntu Wily)
   Status: New => In Progress

** Changed in: neutron (Ubuntu Wily)
 Assignee: (unassigned) => James Page (james-page)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1460164

Title:
  restart of openvswitch-switch causes instance network down when
  l2population enabled

Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Trusty:
  New
Status in neutron source package in Wily:
  In Progress
Status in neutron source package in Xenial:
  Fix Released

Bug description:
  On 2015-05-28, our Landscape auto-upgraded packages on two of our
  OpenStack clouds.  On both clouds, but only on some compute nodes, the
  upgrade of openvswitch-switch and corresponding downtime of
  ovs-vswitchd appears to have triggered some sort of race condition
  within neutron-plugin-openvswitch-agent leaving it in a broken state;
  any new instances come up with non-functional network but pre-existing
  instances appear unaffected.  Restarting n-p-ovs-agent on the affected
  compute nodes is sufficient to work around the problem.

  The packages Landscape upgraded (from /var/log/apt/history.log):

  Start-Date: 2015-05-28  14:23:07
  Upgrade: nova-compute-libvirt:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
libsystemd-login0:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
nova-compute-kvm:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
systemd-services:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
isc-dhcp-common:amd64 (4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), nova-common:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), python-nova:amd64 (2014.1.4-0ubuntu2, 
2014.1.4-0ubuntu2.1), libsystemd-daemon0:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), grub-common:amd64 (2.02~beta2-9ubuntu1.1, 
2.02~beta2-9ubuntu1.2), libpam-systemd:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), udev:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
grub2-common:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), 
openvswitch-switch:amd64 (2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2), 
libudev1:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), isc-dhcp-client:amd64 
(4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), python-eventlet:amd64 (0.13.0-1ubuntu2, 
0.13.0-1ubuntu
 2.1), python-novaclient:amd64 (2.17.0-0ubuntu1.1, 2.17.0-0ubuntu1.2), 
grub-pc-bin:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), grub-pc:amd64 
(2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), nova-compute:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), openvswitch-common:amd64 
(2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2)
  End-Date: 2015-05-28  14:24:47

  From /var/log/neutron/openvswitch-agent.log:

  2015-05-28 14:24:18.336 47866 ERROR neutron.agent.linux.ovsdb_monitor
  [-] Error received from ovsdb monitor: ovsdb-client:
  unix:/var/run/openvswitch/db.sock: receive failed (End of file)

  Looking at a stuck instances, all the right tunnels and bridges and
  what not appear to be there:

  root@vector:~# ip l l | grep c-3b
  460002: qbr7ed8b59c-3b:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT group default 
  460003: qvo7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
  460004: qvb7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master qbr7ed8b59c-3b state UP mode DEFAULT group default qlen 
1000
  460005: tap7ed8b59c-3b:  mtu 1500 qdisc 
pfifo_fast master qbr7ed8b59c-3b state UNKNOWN mode DEFAULT group default qlen 
500
  root@vector:~# ovs-vsctl list-ports br-int | grep c-3b
  qvo7ed8b59c-3b
  root@vector:~# 

  But I can't ping the unit from within the qrouter-${id} namespace on
  the neutron gateway.  If I tcpdump the {q,t}*c-3b interfaces, I don't
  see any traffic.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1460164/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1460164] Re: restart of openvswitch-switch causes instance network down when l2population enabled

2016-02-11 Thread James Page
** Description changed:

+ [Impact]
+ Restarts of openvswitch (typically on upgrade) result in loss of tunnel 
connectivity when the l2population driver is in use.  This results in loss of 
access to all instances on the effected compute hosts
+ 
+ [Test Case]
+ Deploy cloud with ml2/ovs/l2population enabled
+ boot instances
+ restart ovs; instance connectivity will be lost until the 
neutron-openvswitch-agent is restarted on the compute hosts.
+ 
+ [Regression Potential]
+ Minimal - in multiple stable branches upstream.
+ 
+ [Original Bug Report]
  On 2015-05-28, our Landscape auto-upgraded packages on two of our
  OpenStack clouds.  On both clouds, but only on some compute nodes, the
  upgrade of openvswitch-switch and corresponding downtime of
  ovs-vswitchd appears to have triggered some sort of race condition
  within neutron-plugin-openvswitch-agent leaving it in a broken state;
  any new instances come up with non-functional network but pre-existing
  instances appear unaffected.  Restarting n-p-ovs-agent on the affected
  compute nodes is sufficient to work around the problem.
  
  The packages Landscape upgraded (from /var/log/apt/history.log):
  
  Start-Date: 2015-05-28  14:23:07
  Upgrade: nova-compute-libvirt:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
libsystemd-login0:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
nova-compute-kvm:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
systemd-services:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
isc-dhcp-common:amd64 (4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), nova-common:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), python-nova:amd64 (2014.1.4-0ubuntu2, 
2014.1.4-0ubuntu2.1), libsystemd-daemon0:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), grub-common:amd64 (2.02~beta2-9ubuntu1.1, 
2.02~beta2-9ubuntu1.2), libpam-systemd:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), udev:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
grub2-common:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), 
openvswitch-switch:amd64 (2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2), 
libudev1:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), isc-dhcp-client:amd64 
(4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), python-eventlet:amd64 (0.13.0-1ubuntu2, 
0.13.0-1ubuntu
 2.1), python-novaclient:amd64 (2.17.0-0ubuntu1.1, 2.17.0-0ubuntu1.2), 
grub-pc-bin:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), grub-pc:amd64 
(2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), nova-compute:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), openvswitch-common:amd64 
(2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2)
  End-Date: 2015-05-28  14:24:47
  
  From /var/log/neutron/openvswitch-agent.log:
  
  2015-05-28 14:24:18.336 47866 ERROR neutron.agent.linux.ovsdb_monitor
  [-] Error received from ovsdb monitor: ovsdb-client:
  unix:/var/run/openvswitch/db.sock: receive failed (End of file)
  
  Looking at a stuck instances, all the right tunnels and bridges and
  what not appear to be there:
  
  root@vector:~# ip l l | grep c-3b
- 460002: qbr7ed8b59c-3b:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT group default 
+ 460002: qbr7ed8b59c-3b:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT group default
  460003: qvo7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
  460004: qvb7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master qbr7ed8b59c-3b state UP mode DEFAULT group default qlen 
1000
  460005: tap7ed8b59c-3b:  mtu 1500 qdisc 
pfifo_fast master qbr7ed8b59c-3b state UNKNOWN mode DEFAULT group default qlen 
500
  root@vector:~# ovs-vsctl list-ports br-int | grep c-3b
  qvo7ed8b59c-3b
- root@vector:~# 
+ root@vector:~#
  
  But I can't ping the unit from within the qrouter-${id} namespace on
  the neutron gateway.  If I tcpdump the {q,t}*c-3b interfaces, I don't
  see any traffic.

** Changed in: neutron (Ubuntu Trusty)
   Status: New => In Progress

** Changed in: neutron (Ubuntu Trusty)
 Assignee: (unassigned) => James Page (james-page)

** Also affects: cloud-archive
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/juno
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/kilo
   Importance: Undecided
   Status: New

** Changed in: cloud-archive/kilo
   Importance: Undecided => Medium

** Changed in: cloud-archive/juno
   Importance: Undecided => Medium

** Changed in: neutron (Ubuntu Trusty)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1460164

Title:
  restart of openvswitch-switch causes instance network down when
  l2population enabled

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive juno series:
  New
Status in Ubuntu Cloud Archive kilo series:
  

[Yahoo-eng-team] [Bug 1460164] Re: restart of openvswitch-switch causes instance network down when l2population enabled

2016-01-26 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/259485
Committed: 
https://git.openstack.org/cgit/openstack/neutron/commit/?id=17c14977ce0e2291e911739f8c85838f1c1f3473
Submitter: Jenkins
Branch:master

commit 17c14977ce0e2291e911739f8c85838f1c1f3473
Author: James Page 
Date:   Fri Dec 18 15:02:11 2015 +

Ensure that tunnels are fully reset on ovs restart

When the l2population mechanism driver is enabled, if ovs is restarted
tunnel ports are not re-configured in full due to stale ofport handles
in the OVS agent.

Reset all handles when OVS is restarted to ensure that tunnels are
fully recreated in this situation.

Change-Id: If0e034a034a7f000a1c58aa8a43d2c857dee6582
Closes-bug: #1460164


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1460164

Title:
  restart of openvswitch-switch causes instance network down when
  l2population enabled

Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Triaged

Bug description:
  On 2015-05-28, our Landscape auto-upgraded packages on two of our
  OpenStack clouds.  On both clouds, but only on some compute nodes, the
  upgrade of openvswitch-switch and corresponding downtime of
  ovs-vswitchd appears to have triggered some sort of race condition
  within neutron-plugin-openvswitch-agent leaving it in a broken state;
  any new instances come up with non-functional network but pre-existing
  instances appear unaffected.  Restarting n-p-ovs-agent on the affected
  compute nodes is sufficient to work around the problem.

  The packages Landscape upgraded (from /var/log/apt/history.log):

  Start-Date: 2015-05-28  14:23:07
  Upgrade: nova-compute-libvirt:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
libsystemd-login0:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
nova-compute-kvm:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
systemd-services:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
isc-dhcp-common:amd64 (4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), nova-common:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), python-nova:amd64 (2014.1.4-0ubuntu2, 
2014.1.4-0ubuntu2.1), libsystemd-daemon0:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), grub-common:amd64 (2.02~beta2-9ubuntu1.1, 
2.02~beta2-9ubuntu1.2), libpam-systemd:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), udev:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
grub2-common:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), 
openvswitch-switch:amd64 (2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2), 
libudev1:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), isc-dhcp-client:amd64 
(4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), python-eventlet:amd64 (0.13.0-1ubuntu2, 
0.13.0-1ubuntu
 2.1), python-novaclient:amd64 (2.17.0-0ubuntu1.1, 2.17.0-0ubuntu1.2), 
grub-pc-bin:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), grub-pc:amd64 
(2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), nova-compute:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), openvswitch-common:amd64 
(2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2)
  End-Date: 2015-05-28  14:24:47

  From /var/log/neutron/openvswitch-agent.log:

  2015-05-28 14:24:18.336 47866 ERROR neutron.agent.linux.ovsdb_monitor
  [-] Error received from ovsdb monitor: ovsdb-client:
  unix:/var/run/openvswitch/db.sock: receive failed (End of file)

  Looking at a stuck instances, all the right tunnels and bridges and
  what not appear to be there:

  root@vector:~# ip l l | grep c-3b
  460002: qbr7ed8b59c-3b:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT group default 
  460003: qvo7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
  460004: qvb7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master qbr7ed8b59c-3b state UP mode DEFAULT group default qlen 
1000
  460005: tap7ed8b59c-3b:  mtu 1500 qdisc 
pfifo_fast master qbr7ed8b59c-3b state UNKNOWN mode DEFAULT group default qlen 
500
  root@vector:~# ovs-vsctl list-ports br-int | grep c-3b
  qvo7ed8b59c-3b
  root@vector:~# 

  But I can't ping the unit from within the qrouter-${id} namespace on
  the neutron gateway.  If I tcpdump the {q,t}*c-3b interfaces, I don't
  see any traffic.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1460164/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1460164] Re: restart of openvswitch-switch causes instance network down when l2population enabled

2015-12-18 Thread James Page
The fdb_add method checks to see if the agent already has a registered
ofport for each tunnel it requires - this is now stale, so results in
the tunnel setup being skipped - the tunnel ports are still present in
the ovsdb - however all of the flows are missing resulting in full loss
of instance connectivity.

** Also affects: neutron
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1460164

Title:
  restart of openvswitch-switch causes instance network down when
  l2population enabled

Status in neutron:
  New
Status in neutron package in Ubuntu:
  Triaged

Bug description:
  On 2015-05-28, our Landscape auto-upgraded packages on two of our
  OpenStack clouds.  On both clouds, but only on some compute nodes, the
  upgrade of openvswitch-switch and corresponding downtime of
  ovs-vswitchd appears to have triggered some sort of race condition
  within neutron-plugin-openvswitch-agent leaving it in a broken state;
  any new instances come up with non-functional network but pre-existing
  instances appear unaffected.  Restarting n-p-ovs-agent on the affected
  compute nodes is sufficient to work around the problem.

  The packages Landscape upgraded (from /var/log/apt/history.log):

  Start-Date: 2015-05-28  14:23:07
  Upgrade: nova-compute-libvirt:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
libsystemd-login0:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
nova-compute-kvm:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
systemd-services:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
isc-dhcp-common:amd64 (4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), nova-common:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), python-nova:amd64 (2014.1.4-0ubuntu2, 
2014.1.4-0ubuntu2.1), libsystemd-daemon0:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), grub-common:amd64 (2.02~beta2-9ubuntu1.1, 
2.02~beta2-9ubuntu1.2), libpam-systemd:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), udev:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
grub2-common:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), 
openvswitch-switch:amd64 (2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2), 
libudev1:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), isc-dhcp-client:amd64 
(4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), python-eventlet:amd64 (0.13.0-1ubuntu2, 
0.13.0-1ubuntu
 2.1), python-novaclient:amd64 (2.17.0-0ubuntu1.1, 2.17.0-0ubuntu1.2), 
grub-pc-bin:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), grub-pc:amd64 
(2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), nova-compute:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), openvswitch-common:amd64 
(2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2)
  End-Date: 2015-05-28  14:24:47

  From /var/log/neutron/openvswitch-agent.log:

  2015-05-28 14:24:18.336 47866 ERROR neutron.agent.linux.ovsdb_monitor
  [-] Error received from ovsdb monitor: ovsdb-client:
  unix:/var/run/openvswitch/db.sock: receive failed (End of file)

  Looking at a stuck instances, all the right tunnels and bridges and
  what not appear to be there:

  root@vector:~# ip l l | grep c-3b
  460002: qbr7ed8b59c-3b:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT group default 
  460003: qvo7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
  460004: qvb7ed8b59c-3b:  mtu 1500 
qdisc pfifo_fast master qbr7ed8b59c-3b state UP mode DEFAULT group default qlen 
1000
  460005: tap7ed8b59c-3b:  mtu 1500 qdisc 
pfifo_fast master qbr7ed8b59c-3b state UNKNOWN mode DEFAULT group default qlen 
500
  root@vector:~# ovs-vsctl list-ports br-int | grep c-3b
  qvo7ed8b59c-3b
  root@vector:~# 

  But I can't ping the unit from within the qrouter-${id} namespace on
  the neutron gateway.  If I tcpdump the {q,t}*c-3b interfaces, I don't
  see any traffic.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1460164/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp