[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-23 Thread Edward Hope-Morley
Hi folks, while trying to reproduce this behaviour myself I think i've
stumbled upon some interesting behaviour. I setup a test as follows and
checked for errors at specific points. I have a 4 node setup (24
core/64G ram) with 3 gateways and 1 compute. Neutron is configured with
l3_ha enabled and max_agents_per_router set to 3;

 stage 1: created two projects with 1 router each (which gives two sets
of keepalived each with the same VR_ID (1)) and checked keepalived logs
- system load is minimal, no re-elections observed post-create.

 stage 2: scaled horizontally to 200 projects each with 1 router (giving
200 routers with VR_ID 1 each within their own network). system load is
minimal, no re-elections observed post-create, observed that all master
state routers are on the same host.

 stage 3: scaled one project vertically by creating 200 routers within
same project. As i started to get into the VR_70s i started to see some
of the extant routers get re-elected e.g. "VRRP_Instance(VR_76) Received
higher prio advert". If i run a tcpdump on one of my ha- interfaces
inside a qrouter- namespace I see a flood of "VRRPv2, Advertisement"
with each VR_ID being advertised every 2s from the current master (as
expected since that's the default interval in neutron). The consequence
of this is that neutron is frequently having to cathup with keepalived
(by running neutron-keepalived-state-change) which causes more traffic
and all without cause since there is no need for these failovers to be
occurring.

 Since the advert interval is configurable in neutron [1] I am going to
go ahead and try changing it to see of that can stop these re-elections
but that seems a little hacky as a fix so just wondering if there's
another way to mitigate these effects. I need to double check the vrrp
spec but iirc since these advertisements are sent out by the master, if
the master dies it would affect how long it takes for a re-election to
occur (spec says "(3 * Advertisement_Interval) + Skew_time") and during
that time VMs would be unreachable so maybe there's another way.

[1]
https://github.com/openstack/neutron/blob/b90ec94dc3f83f63bdb505ace1e4c272435c494b/neutron/conf/agent/l3/ha.py#L35

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-16 Thread James Page
I've spent the last two days trying to reproduce these problems on an
OpenStack Cloud with three gateway/network units, each with 8 cores and
16G of RAM;  I have 99 routers configured on the cloud, plumbed into a
shared external provider network, and I've been switching them between
ha and non-ha configuration continually for the last 24 hours using:

   openstack router set --disable $router_id
   openstack router set --{no-}ha $router_id
   openstack router set --enable $router_id

Routers successfully switched between HA and non-HA mode, with
keepalived, netns and ovs ports being tidied as appropriate.

We think we saw issues with ovs performance when the db contained a
large number of dangling ovs interfaces on br-int, so I loaded on of my
units up with 400 dangling ovs instances.

I'll run the switching again for a while and see if that exacerbates
things sufficiently to cause the same issue again.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-15 Thread James Page
Hi Brian

I'm working on a reproducer so this is a little easier to test; once
I've reproduced using Ocata, I'll then re-test using the same process
with pike and/or queens to see if things have improved.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-15 Thread Brian Haley
And just as a follow-on to my previous comment - is it possible to test
this with the latest code from the master branch?  I know you're running
packages from Ubuntu I just didn't want to waste cycles tracking things
down that might be fixed already, and maybe just need a backport to a
stable branch.  That's not saying there isn't a new bug here as there
most likely is.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-14 Thread Brian Haley
You described a lot of issues in comment #14:

* Missing interfaces inside qrouter namespaces (OVS taps)
* Missing iptables rules
* Missing floating IP aliases on OVS interfaces inside the qrouter namespaces

Some of those might be fixed in master, especially the iptables one, and
should have been cherry-picked to the stable branches but probably only
to Ocata.  The "add floating ip" path should re-queue the message and
retry in a second or two, if it doesn't then please see if there is a
trackback and put the info here or another bug.

There could also be something happening with keepalived where it's not
getting things done, since it is managing the VIPs when HA is enabled.

Finally, regarding the traceback, I've never seen that before.  My first
thought is to sprinkle "if instance" in all those code paths, but maybe
there's something else going on here that we should figure out.  For
example, if the initial creation of the instance failed, then a message
came to add a floating IP, returning without doing anything (not
instance case) isn't what we want to do.  This would require some log
examination to figure out what exactly happened.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-14 Thread James Troup
I've unsubscribed Canonical Field Critical as removing the 4095 tagged
ports in OVS on all 3 Neutron Gateway nodes seems to have made the
packet loss go away (for now?).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-14 Thread James Hebden
Correction - The `remove_vip_by_ip_address` call failing just after the
IPTablesManager is called and succeeds.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-14 Thread James Hebden
@james-page, @axino -

Just a +1 to the HA property being changed requiring the router to be
set down prior, and back up after to start the recreation of the router
as HA.

We have seen various other side effects in Neutron/OVS environments and 
specifically the environment in question, such as -
* Missing interfaces inside qrouter namespaces (OVS taps)
* Missing iptables rules 
* Missing floating IP aliases on OVS interfaces inside the qrouter namespaces
All of which are tasks which are performed during bringup of HA routers. We 
have seen fewer of these issues on non-HA routers, and whether the router is HA 
or not, rescheduling the router or converting from HA to non-HA or vice versa 
will rebuild and as a result repair the router.

I should also point out that at the time of these issues, we have rarely
observed high system load, but I do also agree that the number of
routers and therefore the workload on both Neutron and OVS to
orchestrate interface plugging and unplugging and namespace (and
associated network stack plumbing) work is much higher than a typical
environment. Having three servers doing this work rather than scaling
horizontally seems like it might be exposing bottlenecks in either
Neutron or OVS when it comes to the orchestration of these tasks.

I'm not sure if you are seeing the following traceback in the logs
provided, but the below traceback has also been common when this issue
crops up, and shows an example of a task performed during the bringup of
a router (the IPTablesManager initialisation) falling over.

2018-02-14 05:04:32.101 1352665 DEBUG neutron.agent.linux.utils [-] Exit code: 
0 execute /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:158
2018-02-14 05:04:32.103 1352665 DEBUG neutron.agent.linux.iptables_manager [-] 
IPTablesManager.apply completed with success. 0 iptables commands were issued 
_apply_synchronized 
/usr/lib/python2.7/dist-packages/neutron/agent/linux/iptables_manager.py:576
2018-02-14 05:04:32.103 1352665 DEBUG oslo_concurrency.lockutils [-] Releasing 
semaphore "iptables-qrouter-43801324-72ce-469f-a628-a5c645041e30" lock 
/usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:228
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info [-] 
'NoneType' object has no attribute 'remove_vip_by_ip_address'
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info Traceback 
(most recent call last):
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info   File 
"/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 253, in call
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info return 
func(*args, **kwargs)
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 1115, 
in process
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info 
self.process_external()
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 890, 
in process_external
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info 
self._process_external_gateway(ex_gw_port)
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 777, 
in _process_external_gateway
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info 
self.external_gateway_updated(ex_gw_port, interface_name)
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 403, in 
external_gateway_updated
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info 
self._remove_vip(old_gateway_cidr)
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 202, in 
_remove_vip
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info 
instance.remove_vip_by_ip_address(ip_cidr)
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info 
AttributeError: 'NoneType' object has no attribute 'remove_vip_by_ip_address'
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info 
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent [-] Failed to 
process compatible router: 43801324-72ce-469f-a628-a5c645041e30
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent Traceback (most 
recent call last):
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 517, in 
_process_router_update
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent 
self._process_router_if_compatible(router)
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 454, in 

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-14 Thread James Page
OK so the original bug report was actually about packet loss in ovs; the
network nodes have a large number of lost tap devices tagged with 4095
segmentation id, reporting no such device in OVS.

Theory might be is is dosing the ovs-vswitchd process resulting in some
userspace datapath lossyness.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-14 Thread James Page
You can see an update message prior to the remove-from-agent processing
with:

 "ha_vr_id": null,

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-14 Thread James Page
Supposition might be that ha_vr_id has been unset at this point in time?

def _get_keepalived_instance(self):
return self.keepalived_manager.config.get_instance(self.ha_vr_id)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-14 Thread James Page
OK and now:

2018-02-13 14:18:08.281 2065380 DEBUG neutron.agent.l3.agent [-]
Starting router update for 213b6544-ab4b-4e46-a5c6-5d8d587a0c6d, action
1, priority 0 _process_router_update /usr/lib/python2.7/dist-
packages/neutron/agent/l3/agent.py:483

which I think is the actual processing of the message from the work
queue from 14:12:57 (wow).

action 1 is a DELETE_ROUTER.

this fails:

2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent [-] Error while 
deleting router 213b6544-ab4b-4e46-a5c6-5d8d587a0c6d
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent Traceback (most 
recent call last):
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 366, in 
_safe_router_removed
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent 
self._router_removed(router_id)
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 385, in 
_router_removed
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent ri.delete()
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 420, in 
delete
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent 
self.destroy_state_change_monitor(self.process_monitor)
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 355, in 
destroy_state_change_monitor
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent pm = 
self._get_state_change_monitor_process_manager()
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 326, in 
_get_state_change_monitor_process_manager
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent 
default_cmd_callback=self._get_state_change_monitor_callback())
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 330, in 
_get_state_change_monitor_callback
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent ha_cidr = 
self._get_primary_vip()
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 166, in 
_get_primary_vip
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent return 
self._get_keepalived_instance().get_primary_vip()
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent AttributeError: 
'NoneType' object has no attribute 'get_primary_vip'
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent
2018-02-13 14:18:09.960 2065380 DEBUG neutron.agent.l3.agent [-] Finished a 
router update for 213b6544-ab4b-4e46-a5c6-5d8d587a0c6d _process_router_update 
/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py:513

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-14 Thread James Page
Some sequencing:

unbind from agent message inbound:

2018-02-13 14:12:57.931 2065380 DEBUG neutron.agent.l3.agent [req-
13456a6c-5583-4147-9b47-94026ea7f3b4 b327544aba2a482b9f12f1e6e615c394
9a4311b33381401fbc835c739981ce03 - - -] Got router removed from agent
:{u'router_id': u'213b6544-ab4b-4e46-a5c6-5d8d587a0c6d'}
router_removed_from_agent /usr/lib/python2.7/dist-
packages/neutron/agent/l3/agent.py:419

some sort of update message:

2018-02-13 14:13:00.667 2065380 DEBUG neutron.agent.l3.agent [req-
13456a6c-5583-4147-9b47-94026ea7f3b4 b327544aba2a482b9f12f1e6e615c394
9a4311b33381401fbc835c739981ce03 - - -] Got routers updated notification
:[u'213b6544-ab4b-4e46-a5c6-5d8d587a0c6d'] routers_updated
/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py:409

and then we see a state change on the router (goes to master) inferring
that no teardown has occurred since the first message above:

2018-02-13 14:14:33.113 2065380 DEBUG neutron.agent.l3.ha [-] Handling 
notification for router 213b6544-ab4b-4e46-a5c6-5d8d587a0c6d, state master 
enqueue /usr/lib/python2.7/dist-packages/neutron/agent/l3/ha.py:50
2018-02-13 14:14:33.113 2065380 INFO neutron.agent.l3.ha [-] Router 
213b6544-ab4b-4e46-a5c6-5d8d587a0c6d transitioned to master

and then:

2018-02-13 14:14:40.380 2065380 DEBUG neutron.agent.l3.ha [-] Spawning
metadata proxy for router 213b6544-ab4b-4e46-a5c6-5d8d587a0c6d
_update_metadata_proxy /usr/lib/python2.7/dist-
packages/neutron/agent/l3/ha.py:156

(I think the agent still thinks this is a HA router...)

and then:

2018-02-13 14:14:59.839 2065380 DEBUG neutron.agent.l3.ha [-] Updating
server with HA routers states {'28015629-217b-4eec-b557-6f93a2bb0230':
'active', '237d2839-5687-4104-9270-ff974de59800': 'active',
'266167d5-39e5-4d97-a3a8-45d4ddad9407': 'active', '213b6544-ab4b-
4e46-a5c6-5d8d587a0c6d': 'active', '2367f6bd-
02c1-4ef1-a642-fe54d916fe2e': 'active'} notify_server /usr/lib/python2.7
/dist-packages/neutron/agent/l3/ha.py:177

and slight after:

2018-02-13 14:18:08.275 2065380 ERROR neutron.agent.l3.router_info [-]
'NoneType' object has no attribute 'remove_vip_by_ip_address'

This seems to support the theory that the HA router never actually gets
torn down before the new non-ha router is scheduled to the same network
node, resulting in the agent not really knowing whether the router is
arthur or marther.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-14 Thread James Page
@axino

yes switching between ha/non-ha is support so long as the router is in
admin state false at the time of change (which I believe is the case in
this issue).

Some further observations; switching a router from ha mode to non-ha
mode should remove all existing router agent hosting arrangements and
then re-schedule the new non-ha router on an appropriate agent.

In the cloud experiencing this issue, I see only one remaining qrouter
namespace but its also still got a keepalived and state monitor running
- which I don't think is right.

I'm wondering whether due to the sheer number of routers being hosted on
each network node, the router never gets removed before being
rescheduled back to the same node, resulting in the teardown being
skipped in someway.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-14 Thread Junien Fridrick
This is happening on all 3 neutron-l3-agents.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-14 Thread Junien Fridrick
We're going to provide the logs, but neutron-l3-agent is getting spammed
with https://pastebin.canonical.com/p/PBPpJJkRgy/ for dozens of routers.

Does neutron support changing the "ha" property of a router live ?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-14 Thread James Page
Would it be possible to get log files from /var/log/neutron and
/var/log/openvswitch as well as the syslog and auth.log files from an
impacted machine. I suspect that due to the large number of namespaces
being managed something is just taking to long to parse/sync/refresh
causing this issue.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-14 Thread Chris Sanders
I've subscribed Canonical Field Critical

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-14 Thread James Hebden
** Attachment added: "ovs-missing-taps-14-02-2018.log"
   
https://bugs.launchpad.net/neutron/+bug/1749425/+attachment/5054969/+files/ovs-missing-taps-14-02-2018.log

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-14 Thread James Hebden
** Attachment added: "SystemTap output"
   
https://bugs.launchpad.net/neutron/+bug/1749425/+attachment/5054990/+files/drops-14-02-2018.log

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

2018-02-14 Thread James Hebden
** Attachment added: "SystemTap script"
   
https://bugs.launchpad.net/neutron/+bug/1749425/+attachment/5054991/+files/drops-14-02-2018.stp

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs