Hi Stuart, As far as I can tell, this is the first time I hear about this problem. I can't make any judgment with the details you've shared here, but I would initially focus on ovs, the kernel and their interactions. For Neutron's l3 agent the only thing I can say is that it uses the conntrack module for doing SNAT on the default gateway and for managing floating IPs - but I guess that won't help you much.
I think the neutron community could do more to help you if we understand something more about your particular situation. - You mentioned 43 nodes between compute and controllers, but a single "neutron router" (which I reckon it's the l3 agent). How many logical routers is that agent hosting? Are you able to share how many internal interfaces are connected to those routers? The above is to just get an idea of the traffic passing through the l3 agent - Have you noticed any other call counter spiking up? The one you mentioned seems to be called only by nf_nat_used_tuple which is actually used in a number of places. Regards, Salvatore PS: If you have not already done so consider submitting this kind of questions also to ask.openstack.org On 16 August 2014 18:12, Stuart Fox <stu...@demonware.net> wrote: > Hey neutron dev! > > Im having a serious problem with my neutron router getting spin locked in > nf_conntrack_tuple_taken. > Has anybody else experienced it? > "perf top" shows nf_conntrack_tuple_taken at 75% > As the incoming request rate goes up, so nf_conntrack_tuple_taken runs > very hot on CPU0 causing ksoftirqd/0 to run at 100%. At that point internal > pings on the GRE network go sky high and its game over. Pinging from a vm > to the subnet default gateway on the neutron goes from 0.2ms to 11s! > pinging from the same vm to another vm in the same subnet stays constant at > 0.2ms. > > Very much indicates to me that the neutron router is having serious > problems. > No other part of the system seems under pressure. > > ipv6 is disabled, and nf_conntrack_max/nf_conntrack_hash are set to 256k. > We've tried the default 3.13 and the utopic 3.16 kernel (3.16 has lots of > work on removing spinlocks around nf_conntrack). 3.16 survives a little > longer but still gets in the same state > > Neutron router > 1 x Ubuntu 14.04/Icehouse 2014.1.1 on an ibm x3550 with 4 10G intel nics. > eth0 - Mgt > eth1 - GRE > eth2 - Public > eth3 - unused > > Compute/controller nodes > 43 x Ubuntu 14.04/Icehouse 2014.1.1 ibm x240 flex blades with 4 emulex nics > eth0 Mgt > eth2 GRE > > Any help very much appreciated! > Replace the l2/l3 functions with hardware is very much an option if thats > a better solution. > Im running out of time before my client decides to stay on AWS. > > > > BR, > Stuart > > _______________________________________________ > OpenStack-dev mailing list > OpenStackemail@example.com > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
_______________________________________________ OpenStack-dev mailing list OpenStackfirstname.lastname@example.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev