On 11/6/25 2:39 AM, Florian Westphal wrote:
Fernando Fernandez Mancera <[email protected]> wrote:
Since commit d265929930e2 ("netfilter: nf_conncount: reduce unnecessary
GC") if nftables/iptables connlimit is used without a check for new
connections there can be duplicated entries tracked.

Pass the nf_conn struct directly to the nf_conncount API and check
whether the connection is confirmed or not inside nf_conncount_add(). If
the connection is confirmed, skip it.

I think there is a bit too much noise here, can this be
split in several chunks?


Not sure, I could try but the noise comes from removing zone and tuple which requires many changes around. Otherwise this would compile with warnings/errors. I am not sure I can split this more.

-unsigned int nf_conncount_count(struct net *net,
+unsigned int nf_conncount_count(struct net *net, const struct nf_conn *ct,
                                struct nf_conncount_data *data,
-                               const u32 *key,
-                               const struct nf_conntrack_tuple *tuple,
-                               const struct nf_conntrack_zone *zone);
+                               const u32 *key);

nf_conn *ct has pitfalls that I did not realize earlier.
Current code is also buggy in that regard.
Maybe we need additional function to ease this for callers.

See below.
+static int __nf_conncount_add(const struct nf_conn *ct,
+                             struct nf_conncount_list *list)
  {
        const struct nf_conntrack_tuple_hash *found;
        struct nf_conncount_tuple *conn, *conn_n;
+       const struct nf_conntrack_tuple *tuple;
+       const struct nf_conntrack_zone *zone;
        struct nf_conn *found_ct;
        unsigned int collect = 0;
+ if (!ct || nf_ct_is_confirmed(ct))
+               return -EINVAL;
+

When is the caller expected to pass a NULL ct?
Maybe this just misses a comment.


No, I don't think the caller is expected to pass a NULL ct. That should be removed.

index fc35a11cdca2..e815c0235b62 100644
--- a/net/netfilter/nft_connlimit.c
+++ b/net/netfilter/nft_connlimit.c
@@ -24,28 +24,35 @@ static inline void nft_connlimit_do_eval(struct 
nft_connlimit *priv,
                                         const struct nft_pktinfo *pkt,
                                         const struct nft_set_ext *ext)
  {
-       const struct nf_conntrack_zone *zone = &nf_ct_zone_dflt;
-       const struct nf_conntrack_tuple *tuple_ptr;
+       struct nf_conntrack_tuple_hash *h;
        struct nf_conntrack_tuple tuple;
        enum ip_conntrack_info ctinfo;
        const struct nf_conn *ct;
        unsigned int count;
-
-       tuple_ptr = &tuple;
+       int err;
ct = nf_ct_get(pkt->skb, &ctinfo);
-       if (ct != NULL) {
-               tuple_ptr = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
-               zone = nf_ct_zone(ct);
-       } else if (!nf_ct_get_tuplepr(pkt->skb, skb_network_offset(pkt->skb),
-                                     nft_pf(pkt), nft_net(pkt), &tuple)) {
-               regs->verdict.code = NF_DROP;
-               return;
+       if (!ct) {
+               if (!nf_ct_get_tuplepr(pkt->skb, skb_network_offset(pkt->skb),
+                                      nft_pf(pkt), nft_net(pkt), &tuple)) {
+                       regs->verdict.code = NF_DROP;
+                       return;
+               }
+
+               h = nf_conntrack_find_get(nft_net(pkt), &nf_ct_zone_dflt, 
&tuple);
+               if (!h) {
+                       regs->verdict.code = NF_DROP;
+                       return;
+               }
+               ct = nf_ct_tuplehash_to_ctrack(h);

This ct had its refcount incremented.


Thank you! Will fix it.

I also see that this shared copypaste with xtables.
Would it be possible to pass 'const struct sk_buff *'
and let the nf_conncount core handle this internally?

nf_conncount_add(net, pf, skb, priv->list);

which does:
        ct = nf_ct_get(skb, &ctinfo);
        if (ct && !nf_ct_is_template(ct))
                return __nf_conncount_add(ct, list);

        if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), pf, net,
                                &tuple))
                return -ERR;

        if (ct) /* its a template, so do lookup in right zone */
                zone = nf_ct_zone(ct);
        else
                zone = &nf_ct_zone_dflt;

        h = nf_conntrack_find_get(nft_net(pkt), zone, &tuple);
        if (!h)
                return -ERR;

        ct = nf_ct_tuplehash_to_ctrack(h);

        err = __nf_conncount_add(ct, list);

        nf_ct_put(ct):

        return err;

I.e., the existing nf_conncount_add() becomes a helper that takes
a ct, as you have already proposed, but its renamed and turned into
an internal helper so frontends don't need to duplicate tuple lookup.

Alternatively, no need to rename it and instead add a new API call
that takes the ct argument, e.g. 'nf_conncount_add_ct' or whatever,
and then make nf_conncount_add() internal in a followup patch.


Unfortunately, I do not think this is possible. xt_connlimit is using the rbtree with nf_conncount_count() while nft_connlimit isn't. I believe we do not want to change that. In addition, for the rbtree we need to calculate the key.. I would leave this code as duplicated given that is shared only between xt_connlimit and nft_connlimit. Openvswitch doesn't care about this as they always call nf_conncount_count() while holding a reference to a ct..

+               h = nf_conntrack_find_get(net, &nf_ct_zone_dflt, &tuple);

We should not restrict the lookup to the default zone,
but we should follow what the template (if any) says.

I under-estimated the work to get this right, I think this is too much
code to copypaste between nft+xt frontends.  Sorry for making that
suggestion originally.

No worries at all, I think there is some benefit from this change even with the copy-paste. Maybe we can create a helper function to just get the ct from the sk_buff.. what about "nf_conntrack_get_or_find()"? I accept suggestions for a better name :)

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to