Re: [ovs-dev] netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch.

Rukomoinikova Aleksandra Fri, 12 Dec 2025 13:27:22 -0800

Hi one more time! I found another issue. I'll describe it below.

In my opinion, it's relevant after merging [1]  I saw that a fix for 
this commit was merged last week, but but it doesn't fix case I'll 
describe below.


I create limits via Open vSwitch and run a TCP flood this way: hping3 -S 
-I host -p 10880 -i u5 10.255.41.101 -c 100 (Here, -i u5 is important, 
meaning with timers < jiffies; it's more likely the issue won't 
reproduce otherwise)
 And I set the following limit on the zone where these connections 
arrive: ovs-dpctl ct-set-limits zone=9,limit=100

I start traffic, I see traffic on the interface: TCP SYN -> TCP SYN+ACK 
-> TCP RST. Zone 9 overflows, but connections immediately become CLOSED. 
I run hping3 again: hping3 -S -I host -p 10880 -i u5 10.255.41.101 -c 
100 - I don't see anything on the interface and I see messages in dmesg 
from openvswitch saying the number of connections exceeds the limit. At 
this moment, if I call ovs-dpctl ct-get-limits, traffic will immediately 
start flowing again because the limit will be reset to zero.

I think the problem is as follows: before commit [1], we called 
__nf_conncount_gc_list for every connection, and this function iterates 
over all connections and cleans up those already closed.

What we have now is that when trying to add a connection in 
__nf_conncount_add, if we don't find it, then while handling errors, we 
don't continue the iteration further and immediately exit the function 
with zero, which represents the current connection count - we will clean 
connections in the list only until we find the connection we want to 
commit now - meaning the connection count will become outdated.

Furthermore, we then go to check already_closed found connections and 
iterate collect variable, which also doesn't allow connections to be 
fully cleaned up; we will clean up a maximum of 8 
(CONNCOUNT_GC_MAX_NODES) entries per one call to __nf_conncount_add.

Also, with such a TCP attack - when connections transition to CLOSED 
immediately - __nf_conncount_gc_list won't be called at all, because we 
will constantly be calling __nf_conncount_add and updating the last_gc 
status. That's why ovs-dpctl ct-get-limits helps; it simply calls 
__nf_conncount_gc_list and cleans up all closed connections.
I propose the following behavior, which will be similar to what we had 
before [2]

diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
index 19039a0802b8..e5224785f01e 100644
--- a/net/netfilter/nf_conncount.c
+++ b/net/netfilter/nf_conncount.c
@@ -171,6 +171,7 @@ static int __nf_conncount_add(struct net *net,
      struct nf_conn *found_ct;
      unsigned int collect = 0;
      bool refcounted = false;
+    bool need_add = false;

      if (!get_ct_or_tuple_from_skb(net, skb, l3num, &ct, &tuple, &zone, 
&refcounted))
          return -ENOENT;
@@ -196,7 +197,8 @@ static int __nf_conncount_add(struct net *net,
                  if (nf_ct_tuple_equal(&conn->tuple, &tuple) &&
                      nf_ct_zone_id(&conn->zone, conn->zone.dir) ==
                      nf_ct_zone_id(zone, zone->dir))
-                    goto out_put; /* already exists */
+                    /* already exists */
+                    need_add = false;
              } else {
                  collect++;
              }
@@ -214,7 +216,7 @@ static int __nf_conncount_add(struct net *net,
               * Attempt to avoid a re-add in this case.
               */
              nf_ct_put(found_ct);
-            goto out_put;
+            need_add = false;
          } else if (already_closed(found_ct)) {
              /*
               * we do not care about connections which are
@@ -222,13 +224,16 @@ static int __nf_conncount_add(struct net *net,
               */
              nf_ct_put(found_ct);
              conn_free(list, conn);
-            collect++;
              continue;
          }

          nf_ct_put(found_ct);
      }

+    if (!need_add) {
+        goto out_put;
+    }
+
  add_new_node:

[1] netfilter: nf_conncount: reduce unnecessary GC commit 
https://github.com/torvalds/linux/commit/d265929930e2ffafc744c0ae05fb70acd53be1ee
[2] netfilter: nf_conncount: merge lookup and add functions commit. 
https://github.com/torvalds/linux/commit/df4a902509766897f7371fdfa4c3bf8bc321b55d
[3] netfilter: nft_connlimit: update the count if add was skipped 
https://github.com/torvalds/linux/commit/69894e5b4c5e28cda5f32af33d4a92b7a4b93b0e

Do you think I missed any cases, and how will this affect the function's 
performance? Thanks)



On 10.12.2025 11:03, Fernando Fernandez Mancera wrote:
>
> On 12/9/25 9:42 AM, Pablo Neira Ayuso wrote:
>> On Tue, Dec 09, 2025 at 08:57:59AM +0100, Fernando Fernandez Mancera 
>> wrote:
>>> On 12/8/25 1:27 PM, Odintsov Vladislav wrote:
>>>> On 08.12.2025 15:06, Rukomoinikova Aleksandra wrote:
>>>>> Hi!
>>>>> I was testing conntrack limiting using Open vSwitch and noticed the
>>>>> following issue: under certain limits, a CPU lock occurred.
>>>>>
>>>>> [  491.682936] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! 
>>>>> [ovs-
>>>>> dpctl:19437]
>>>>>
>>>>> This occurs during a high packet frequency when trying to get the set
>>>>> limits through ovs-dpctl ct-get-limits.
>>>>>
>>>>> In the trace, I can see that the lock occurred on attempts to 
>>>>> acquire a
>>>>> spinlock.
>>>>>
>>>>> [  491.683056]  <IRQ>
>>>>> [  491.683059]  _raw_spin_lock_bh+0x29/0x30
>>>>> [  491.683064]  count_tree+0x19b/0x1f0 [nf_conncount]
>>>>> [  491.683069]  ovs_ct_commit+0x196/0x490 [openvswitch]
>>>>>
>>>>> Prior to this, in the trace, there was processing of a task from
>>>>> userspace (ovs-dpctl)
>>>>>
>>>>> [  491.683236]  </IRQ>
>>>>> [  491.683237]  <TASK>
>>>>> [  491.683238]  asm_common_interrupt+0x22/0x40
>>>>> [  491.683240] RIP: 0010:nf_conncount_gc_list+0x18a/0x200 
>>>>> [nf_conncount]
>>>>>
>>>>> Inside the nf_conncount_gc_list function, a lock is taken on
>>>>> nf_conncount.c:spin_trylock_bh(&list->list_lock):335. After this, the
>>>>> not-so-fast __nf_conncount_gc_list function is executed. If, at this
>>>>> moment, a packet interrupt arrives on the same сpu core (and
>>>>> spin_trylock_bh doesn't disable interrupts on that core), then 
>>>>> scenario
>>>>> I encountered occurs: the first lock remains held, while the packet
>>>>> interrupt also attempts to acquire it at
>>>>> nf_conncount.c:spin_lock_bh(&rbconn->list.list_lock):502 while
>>>>> committing to conntrack. This attempt fails, leading to a soft 
>>>>> lockup.
>>>>>
>>>
>>> Yes that makes sense. That nf_conncount_gc_list() was added there to 
>>> cover a
>>> different scenario which might be also affected by this soft lockup 
>>> under
>>> the same conditions.
>>
>> See below, a quick browsing tells me OVS forgot to disable BH to
>> perform this GC.
>>
>>>>> Hence my question: shouldn't we avoid calling nf_conncount_gc_list 
>>>>> when
>>>>> querying limits without an skb (as OVS does in openvswitch/
>>>>> conntrack.c:1773)? The limit retrieval operation should be read-only
>>>>> regarding the contract state, not involve potential modification.
>>>>>
>>>>> Like this:
>>>>> --- a/net/netfilter/nf_conncount.c
>>>>> +++ b/net/netfilter/nf_conncount.c
>>>>> @@ -495,7 +495,6 @@ count_tree(struct net *net,
>>>>>                 int ret;
>>>>>
>>>>>                 if (!skb) {
>>>>> -                nf_conncount_gc_list(net, &rbconn->list);
>>>>>                     return rbconn->list.count;
>>>>>                 }
>>>>>
>>>
>>> Let me think on something, I would like to provide a solution that is
>>> suitable for OVS + xt/nft_connlimit. Because this change would break 
>>> some
>>> xt_connlimit use-cases. Also without this nf_conncount_gc_list(), the
>>> connection count wouldn't be accurate.. if some connections closed 
>>> already
>>> the count number would still consider them..
>>
>> Side note, this particular line only affects OVS, which is the only
>> caller passing NULL as skb:
>>
>> net/netfilter/xt_connlimit.c:   connections = 
>> nf_conncount_count_skb(net, skb, xt_family(par), info->data, key);
>> net/openvswitch/conntrack.c:    connections = 
>> nf_conncount_count_skb(net, skb, info->family,
>> net/openvswitch/conntrack.c:    zone_limit.count = 
>> nf_conncount_count_skb(net, NULL, 0, data,
>>
>> Another relevant aspect: nf_conncount_gc_list() is called _without_
>> disabling BH (before recent Fernando's changes).
>>
>> You fix it here, Fernando:
>>
>> commit c0362b5748282e22fa1592a8d3474f726ad964c2
>> Author: Fernando Fernandez Mancera <[email protected]>
>> Date:   Fri Nov 21 01:14:31 2025 +0100
>>
>>      netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH
>>
>> I think it is only a matter of backporting it to -stable.
>
> That is right, thanks Pablo. Just a note, that commit doesn't have a
> fixes tag because I just did it to simplify its use so it won't be
> picked automatically.. should we send a request to stable mailing list?
>
> Thanks,
> Fernando.


-- 
regards,
Alexandra.

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch.

Reply via email to