I also can't say I've seen this, but I am curious now.  I did have a few
questions for you though.

1. When you say you set nf_conntrack_max/nf_conntrack_hash to 256k, did you
really set the hash size that large?  Typically the hash is 1/8 of the max,
meaning you'd have 8 entries per hashbucket.

2. Does /sys/module/nf_conntrack/parameters/hashsize look correct?

3. Are you seeing any messages such as "nf_conntrack: table full, dropping 

4. How many entries are the in the conntrack table?  'sudo conntrack -C'

5. Have you been able to drill down any further into what's taking all the time
in nf_conntrack_tuple_taken() ?  I can't imagine you have a single bucket with
tons of entries and you're spinning looking at each, but it could be that 



On 08/16/2014 12:12 PM, Stuart Fox wrote:
> Hey neutron dev!
> Im having a serious problem with my neutron router getting spin locked in
> nf_conntrack_tuple_taken.
> Has anybody else experienced it?
> "perf top" shows nf_conntrack_tuple_taken at 75%
> As the incoming request rate goes up, so nf_conntrack_tuple_taken runs very 
> hot
> on CPU0 causing ksoftirqd/0 to run at 100%. At that point internal pings on 
> the
> GRE network go sky high and its game over. Pinging from a vm to the subnet
> default gateway on the neutron goes from 0.2ms to 11s! pinging from the same 
> vm
> to another vm in the same subnet stays constant at 0.2ms.
> Very much indicates to me that the neutron router is having serious problems.
> No other part of the system seems under pressure.
> ipv6 is disabled, and nf_conntrack_max/nf_conntrack_hash are set to 256k.
> We've tried the default 3.13 and the utopic 3.16 kernel (3.16 has lots of work
> on removing spinlocks around nf_conntrack). 3.16 survives a little longer but
> still gets in the same state
> Neutron router
> 1 x Ubuntu 14.04/Icehouse 2014.1.1 on an ibm x3550 with 4 10G intel nics.
> eth0 - Mgt
> eth1 - GRE
> eth2 - Public
> eth3 - unused 
> Compute/controller nodes
> 43 x Ubuntu 14.04/Icehouse 2014.1.1 ibm x240 flex blades with 4 emulex nics
> eth0 Mgt
> eth2 GRE
> Any help very much appreciated!
> Replace the l2/l3 functions with hardware is very much an option if thats a
> better solution.
> Im running out of time before my client decides to stay on AWS.
> BR,
> Stuart
> _______________________________________________
> OpenStack-dev mailing list

OpenStack-dev mailing list

Reply via email to