Re: [ovs-dev] [PATCH net] netfilter: nf_nat: fix action not being set for all ct states

2023-12-28 Thread Xin Long
On Sat, Dec 23, 2023 at 9:48 PM Brad Cowie  wrote:
>
> On Sun, 24 Dec 2023 at 10:13, Simon Horman  wrote:
> > Thanks Brad,
> >
> > I agree with your analysis and that the problem appears to
> > have been introduced by the cited commit.
>
> Thanks for the review Simon.
>
> > I am curious to know what use case triggers this /
> > why it when unnoticed for a year.
>
> We encountered this issue while upgrading some routers from
> linux 5.15 to 6.2. The dataplane on these routers is provided
> by an openvswitch bridge which is controlled via openflow by
> faucet. These routers are also performing SNAT on all traffic
> to/from the wan interface via openvswitch conntrack openflow
> rules.
>
> We noticed that after upgrading the linux kernel, traceroute/mtr
> no longer worked when run from clients behind the router.
> We eventually discovered the reason for this is that the
> ICMP time exceeded messages elicited by traceroute were
> matching openflow rules with the incorrect destination ip,
> despite there being an openflow rule to undo the nat.
> Other packets in the established or new state matched the
> expected openflow rules.
>
> A git bisect between 5.15 and 6.2 showed that this change in
> behaviour was introduced by commit ebddb1404900. After the
> above patch is applied our routers perform nat correctly
> again for traceroute/mtr.

Acked-by: Xin Long 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 2/3] net: sched: set IPS_CONFIRMED in tmpl status only when commit is set in act_ct

2023-07-16 Thread Xin Long
With the following flows, the packets will be dropped if OVS TC offload is
enabled.

  'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)'
  'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)'
  'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal'

In the 1st flow, it finds the exp from the hashtable and removes it then
creates the ct with this exp in act_ct. However, in the 2nd flow it goes
to the OVS upcall at the 1st time. When the skb comes back from userspace,
it has to create the ct again without exp(the exp was removed last time).
With no 'rel' set in the ct, the 3rd flow can never get matched.

In OVS conntrack, it works around it by adding its own exp lookup function
ovs_ct_expect_find() where it doesn't remove the exp. Instead of creating
a real ct, it only updates its keys with the exp and its master info. So
when the skb comes back, the exp is still in the hashtable.

However, we can't do this trick in act_ct, as tc flower match is using a
real ct, and passing the exp and its master info to flower parsing via
tc_skb_cb is also not possible (tc_skb_cb size is not big enough).

The simple and clear fix is to not remove the exp at the 1st flow, namely,
not set IPS_CONFIRMED in tmpl when commit is not set in act_ct.

Reported-by: Shuang Li 
Signed-off-by: Xin Long 
---
 net/sched/act_ct.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index abc71a06d634..7c652d14528b 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -1238,7 +1238,8 @@ static int tcf_ct_fill_params(struct net *net,
}
}
 
-   __set_bit(IPS_CONFIRMED_BIT, >status);
+   if (p->ct_action & TCA_CT_ACT_COMMIT)
+   __set_bit(IPS_CONFIRMED_BIT, >status);
return 0;
 err:
nf_ct_put(p->tmpl);
-- 
2.39.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2023-07-16 Thread Xin Long
By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed
from the hashtable when lookup, we can simplify the exp processing code a
lot in openvswitch conntrack.

Signed-off-by: Xin Long 
---
 net/openvswitch/conntrack.c | 78 +
 1 file changed, 10 insertions(+), 68 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 331730fd3580..fa955e892210 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net *net, struct 
sw_flow_key *key,
return 0;
 }
 
-static struct nf_conntrack_expect *
-ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone,
-  u16 proto, const struct sk_buff *skb)
-{
-   struct nf_conntrack_tuple tuple;
-   struct nf_conntrack_expect *exp;
-
-   if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, 
))
-   return NULL;
-
-   exp = __nf_ct_expect_find(net, zone, );
-   if (exp) {
-   struct nf_conntrack_tuple_hash *h;
-
-   /* Delete existing conntrack entry, if it clashes with the
-* expectation.  This can happen since conntrack ALGs do not
-* check for clashes between (new) expectations and existing
-* conntrack entries.  nf_conntrack_in() will check the
-* expectations only if a conntrack entry can not be found,
-* which can lead to OVS finding the expectation (here) in the
-* init direction, but which will not be removed by the
-* nf_conntrack_in() call, if a matching conntrack entry is
-* found instead.  In this case all init direction packets
-* would be reported as new related packets, while reply
-* direction packets would be reported as un-related
-* established packets.
-*/
-   h = nf_conntrack_find_get(net, zone, );
-   if (h) {
-   struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
-
-   nf_ct_delete(ct, 0, 0);
-   nf_ct_put(ct);
-   }
-   }
-
-   return exp;
-}
-
 /* This replicates logic from nf_conntrack_core.c that is not exported. */
 static enum ip_conntrack_info
 ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
@@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
 const struct ovs_conntrack_info *info,
 struct sk_buff *skb)
 {
-   struct nf_conntrack_expect *exp;
-
-   /* If we pass an expected packet through nf_conntrack_in() the
-* expectation is typically removed, but the packet could still be
-* lost in upcall processing.  To prevent this from happening we
-* perform an explicit expectation lookup.  Expected connections are
-* always new, and will be passed through conntrack only when they are
-* committed, as it is OK to remove the expectation at that time.
-*/
-   exp = ovs_ct_expect_find(net, >zone, info->family, skb);
-   if (exp) {
-   u8 state;
-
-   /* NOTE: New connections are NATted and Helped only when
-* committed, so we are not calling into NAT here.
-*/
-   state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED;
-   __ovs_ct_update_key(key, state, >zone, exp->master);
-   } else {
-   struct nf_conn *ct;
-   int err;
+   struct nf_conn *ct;
+   int err;
 
-   err = __ovs_ct_lookup(net, key, info, skb);
-   if (err)
-   return err;
+   err = __ovs_ct_lookup(net, key, info, skb);
+   if (err)
+   return err;
 
-   ct = (struct nf_conn *)skb_nfct(skb);
-   if (ct)
-   nf_ct_deliver_cached_events(ct);
-   }
+   ct = (struct nf_conn *)skb_nfct(skb);
+   if (ct)
+   nf_ct_deliver_cached_events(ct);
 
return 0;
 }
@@ -1460,7 +1401,8 @@ int ovs_ct_copy_action(struct net *net, const struct 
nlattr *attr,
if (err)
goto err_free_ct;
 
-   __set_bit(IPS_CONFIRMED_BIT, _info.ct->status);
+   if (ct_info.commit)
+   __set_bit(IPS_CONFIRMED_BIT, _info.ct->status);
return 0;
 err_free_ct:
__ovs_ct_free_action(_info);
-- 
2.39.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 1/3] netfilter: allow exp not to be removed in nf_ct_find_expectation

2023-07-16 Thread Xin Long
Currently nf_conntrack_in() calling nf_ct_find_expectation() will
remove the exp from the hash table. However, in some scenario, we
expect the exp not to be removed when the created ct will not be
confirmed, like in OVS and TC conntrack in the following patches.

This patch allows exp not to be removed by setting IPS_CONFIRMED
in the status of the tmpl.

Signed-off-by: Xin Long 
---
 include/net/netfilter/nf_conntrack_expect.h | 2 +-
 net/netfilter/nf_conntrack_core.c   | 2 +-
 net/netfilter/nf_conntrack_expect.c | 4 ++--
 net/netfilter/nft_ct.c  | 2 ++
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_expect.h 
b/include/net/netfilter/nf_conntrack_expect.h
index cf0d81be5a96..165e7a03b8e9 100644
--- a/include/net/netfilter/nf_conntrack_expect.h
+++ b/include/net/netfilter/nf_conntrack_expect.h
@@ -100,7 +100,7 @@ nf_ct_expect_find_get(struct net *net,
 struct nf_conntrack_expect *
 nf_ct_find_expectation(struct net *net,
   const struct nf_conntrack_zone *zone,
-  const struct nf_conntrack_tuple *tuple);
+  const struct nf_conntrack_tuple *tuple, bool unlink);
 
 void nf_ct_unlink_expect_report(struct nf_conntrack_expect *exp,
u32 portid, int report);
diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 992393102d5f..9f6f2e643575 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1756,7 +1756,7 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
cnet = nf_ct_pernet(net);
if (cnet->expect_count) {
spin_lock_bh(_conntrack_expect_lock);
-   exp = nf_ct_find_expectation(net, zone, tuple);
+   exp = nf_ct_find_expectation(net, zone, tuple, !tmpl || 
nf_ct_is_confirmed(tmpl));
if (exp) {
/* Welcome, Mr. Bond.  We've been expecting you... */
__set_bit(IPS_EXPECTED_BIT, >status);
diff --git a/net/netfilter/nf_conntrack_expect.c 
b/net/netfilter/nf_conntrack_expect.c
index 96948e98ec53..81ca348915c9 100644
--- a/net/netfilter/nf_conntrack_expect.c
+++ b/net/netfilter/nf_conntrack_expect.c
@@ -171,7 +171,7 @@ EXPORT_SYMBOL_GPL(nf_ct_expect_find_get);
 struct nf_conntrack_expect *
 nf_ct_find_expectation(struct net *net,
   const struct nf_conntrack_zone *zone,
-  const struct nf_conntrack_tuple *tuple)
+  const struct nf_conntrack_tuple *tuple, bool unlink)
 {
struct nf_conntrack_net *cnet = nf_ct_pernet(net);
struct nf_conntrack_expect *i, *exp = NULL;
@@ -211,7 +211,7 @@ nf_ct_find_expectation(struct net *net,
 !refcount_inc_not_zero(>master->ct_general.use)))
return NULL;
 
-   if (exp->flags & NF_CT_EXPECT_PERMANENT) {
+   if (exp->flags & NF_CT_EXPECT_PERMANENT || !unlink) {
refcount_inc(>use);
return exp;
} else if (del_timer(>timeout)) {
diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index 38958e067aa8..e87fd4314c68 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -262,6 +262,7 @@ static void nft_ct_set_zone_eval(const struct nft_expr 
*expr,
regs->verdict.code = NF_DROP;
return;
}
+   __set_bit(IPS_CONFIRMED_BIT, >status);
}
 
nf_ct_set(skb, ct, IP_CT_NEW);
@@ -368,6 +369,7 @@ static bool nft_ct_tmpl_alloc_pcpu(void)
return false;
}
 
+   __set_bit(IPS_CONFIRMED_BIT, >status);
per_cpu(nft_ct_pcpu_template, cpu) = tmp;
}
 
-- 
2.39.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly

2023-07-16 Thread Xin Long
With the OVS upcall, the original ct in the skb will be dropped, and when
the skb comes back from userspace it has to create a new ct again through
nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act().

However, the new ct will not be able to have the exp as the original ct
has taken it away from the hash table in nf_ct_find_expectation(). This
will cause some flow never to be matched, like:

  'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)'
  'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)'
  'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal'

if the 2nd flow triggers the OVS upcall, the 3rd flow will never get
matched.

OVS conntrack works around this by adding its own exp lookup function to
not remove the exp from the hash table and saving the exp and its master
info to the flow keys instead of create a real ct. But this way doesn't
work for TC act_ct.

The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from
the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This
allows both OVS conntrack and TC act_ct to have a simple and clear fix
for this problem in the patch 2/3 and 3/3.

Xin Long (3):
  netfilter: allow exp not to be removed in nf_ct_find_expectation
  net: sched: set IPS_CONFIRMED in tmpl status only when commit is set
in act_ct
  openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set
in conntrack

 include/net/netfilter/nf_conntrack_expect.h |  2 +-
 net/netfilter/nf_conntrack_core.c   |  2 +-
 net/netfilter/nf_conntrack_expect.c |  4 +-
 net/netfilter/nft_ct.c  |  2 +
 net/openvswitch/conntrack.c | 78 +++--
 net/sched/act_ct.c  |  3 +-
 6 files changed, 18 insertions(+), 73 deletions(-)

-- 
2.39.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv2 net-next 3/5] openvswitch: move key and ovs_cb update out of handle_fragments

2023-02-07 Thread Xin Long
This patch has no functional changes and just moves key and ovs_cb update
out of handle_fragments, and skb_clear_hash() and skb->ignore_df change
into handle_fragments(), to make it easier to move the duplicate code
from handle_fragments() into nf_conntrack_ovs later.

Note that it changes to pass info->family to handle_fragments() instead
of key for the packet type check, as info->family is set according to
key->eth.type in ovs_ct_copy_action() when creating the action.

Signed-off-by: Xin Long 
---
 net/openvswitch/conntrack.c | 37 +
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 47a58657b1e4..962e2f70e597 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -437,13 +437,12 @@ static int ovs_ct_set_labels(struct nf_conn *ct, struct 
sw_flow_key *key,
 /* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero
  * value if 'skb' is freed.
  */
-static int handle_fragments(struct net *net, struct sw_flow_key *key,
-   u16 zone, struct sk_buff *skb)
+static int handle_fragments(struct net *net, struct sk_buff *skb,
+   u16 zone, u8 family, u8 *proto, u16 *mru)
 {
-   struct ovs_skb_cb ovs_cb = *OVS_CB(skb);
int err;
 
-   if (key->eth.type == htons(ETH_P_IP)) {
+   if (family == NFPROTO_IPV4) {
enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone;
 
memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
@@ -451,9 +450,9 @@ static int handle_fragments(struct net *net, struct 
sw_flow_key *key,
if (err)
return err;
 
-   ovs_cb.mru = IPCB(skb)->frag_max_size;
+   *mru = IPCB(skb)->frag_max_size;
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
-   } else if (key->eth.type == htons(ETH_P_IPV6)) {
+   } else if (family == NFPROTO_IPV6) {
enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone;
 
memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));
@@ -464,22 +463,35 @@ static int handle_fragments(struct net *net, struct 
sw_flow_key *key,
return err;
}
 
-   key->ip.proto = ipv6_hdr(skb)->nexthdr;
-   ovs_cb.mru = IP6CB(skb)->frag_max_size;
+   *proto = ipv6_hdr(skb)->nexthdr;
+   *mru = IP6CB(skb)->frag_max_size;
 #endif
} else {
kfree_skb(skb);
return -EPFNOSUPPORT;
}
 
+   skb_clear_hash(skb);
+   skb->ignore_df = 1;
+
+   return 0;
+}
+
+static int ovs_ct_handle_fragments(struct net *net, struct sw_flow_key *key,
+  u16 zone, int family, struct sk_buff *skb)
+{
+   struct ovs_skb_cb ovs_cb = *OVS_CB(skb);
+   int err;
+
+   err = handle_fragments(net, skb, zone, family, >ip.proto, 
_cb.mru);
+   if (err)
+   return err;
+
/* The key extracted from the fragment that completed this datagram
 * likely didn't have an L4 header, so regenerate it.
 */
ovs_flow_key_update_l3l4(skb, key);
-
key->ip.frag = OVS_FRAG_TYPE_NONE;
-   skb_clear_hash(skb);
-   skb->ignore_df = 1;
*OVS_CB(skb) = ovs_cb;
 
return 0;
@@ -,7 +1123,8 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
}
 
if (key->ip.frag != OVS_FRAG_TYPE_NONE) {
-   err = handle_fragments(net, key, info->zone.id, skb);
+   err = ovs_ct_handle_fragments(net, key, info->zone.id,
+ info->family, skb);
if (err)
return err;
}
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv2 net-next 5/5] net: extract nf_ct_handle_fragments to nf_conntrack_ovs

2023-02-07 Thread Xin Long
Now handle_fragments() in OVS and TC have the similar code, and
this patch removes the duplicate code by moving the function
to nf_conntrack_ovs.

Note that skb_clear_hash(skb) or skb->ignore_df = 1 should be
done only when defrag returns 0, as it does in other places
in kernel.

Signed-off-by: Xin Long 
---
 include/net/netfilter/nf_conntrack.h |  2 ++
 net/netfilter/nf_conntrack_ovs.c | 48 
 net/openvswitch/conntrack.c  | 45 +-
 net/sched/act_ct.c   | 46 ++
 4 files changed, 53 insertions(+), 88 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index a6e89d7212f8..7bbab8f2b73d 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -363,6 +363,8 @@ static inline struct nf_conntrack_net *nf_ct_pernet(const 
struct net *net)
 }
 
 int nf_ct_skb_network_trim(struct sk_buff *skb, int family);
+int nf_ct_handle_fragments(struct net *net, struct sk_buff *skb,
+  u16 zone, u8 family, u8 *proto, u16 *mru);
 
 #define NF_CT_STAT_INC(net, count)   __this_cpu_inc((net)->ct.stat->count)
 #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count)
diff --git a/net/netfilter/nf_conntrack_ovs.c b/net/netfilter/nf_conntrack_ovs.c
index c60ef71d1aea..52b776bdf526 100644
--- a/net/netfilter/nf_conntrack_ovs.c
+++ b/net/netfilter/nf_conntrack_ovs.c
@@ -3,6 +3,8 @@
 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 /* 'skb' should already be pulled to nh_ofs. */
@@ -128,3 +130,49 @@ int nf_ct_skb_network_trim(struct sk_buff *skb, int family)
return pskb_trim_rcsum(skb, len);
 }
 EXPORT_SYMBOL_GPL(nf_ct_skb_network_trim);
+
+/* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero
+ * value if 'skb' is freed.
+ */
+int nf_ct_handle_fragments(struct net *net, struct sk_buff *skb,
+  u16 zone, u8 family, u8 *proto, u16 *mru)
+{
+   int err;
+
+   if (family == NFPROTO_IPV4) {
+   enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone;
+
+   memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
+   local_bh_disable();
+   err = ip_defrag(net, skb, user);
+   local_bh_enable();
+   if (err)
+   return err;
+
+   *mru = IPCB(skb)->frag_max_size;
+#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
+   } else if (family == NFPROTO_IPV6) {
+   enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone;
+
+   memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));
+   err = nf_ct_frag6_gather(net, skb, user);
+   if (err) {
+   if (err != -EINPROGRESS)
+   kfree_skb(skb);
+   return err;
+   }
+
+   *proto = ipv6_hdr(skb)->nexthdr;
+   *mru = IP6CB(skb)->frag_max_size;
+#endif
+   } else {
+   kfree_skb(skb);
+   return -EPFNOSUPPORT;
+   }
+
+   skb_clear_hash(skb);
+   skb->ignore_df = 1;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(nf_ct_handle_fragments);
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 962e2f70e597..5d40ad02cabc 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -434,56 +434,13 @@ static int ovs_ct_set_labels(struct nf_conn *ct, struct 
sw_flow_key *key,
return 0;
 }
 
-/* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero
- * value if 'skb' is freed.
- */
-static int handle_fragments(struct net *net, struct sk_buff *skb,
-   u16 zone, u8 family, u8 *proto, u16 *mru)
-{
-   int err;
-
-   if (family == NFPROTO_IPV4) {
-   enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone;
-
-   memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
-   err = ip_defrag(net, skb, user);
-   if (err)
-   return err;
-
-   *mru = IPCB(skb)->frag_max_size;
-#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
-   } else if (family == NFPROTO_IPV6) {
-   enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone;
-
-   memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));
-   err = nf_ct_frag6_gather(net, skb, user);
-   if (err) {
-   if (err != -EINPROGRESS)
-   kfree_skb(skb);
-   return err;
-   }
-
-   *proto = ipv6_hdr(skb)->nexthdr;
-   *mru = IP6CB(skb)->frag_max_size;
-#endif
-   } else {
-   kfree_skb(skb);
-   return -EPFNOSUPPORT;
-   }
-
-   skb_clear_hash(skb);
-   skb->ignore_df = 1;
-
-   return 0;
-}
-
 static int 

[ovs-dev] [PATCHv2 net-next 4/5] net: sched: move frag check and tc_skb_cb update out of handle_fragments

2023-02-07 Thread Xin Long
This patch has no functional changes and just moves frag check and
tc_skb_cb update out of handle_fragments, to make it easier to move
the duplicate code from handle_fragments() into nf_conntrack_ovs later.

Signed-off-by: Xin Long 
---
 net/sched/act_ct.c | 71 +-
 1 file changed, 39 insertions(+), 32 deletions(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 0a1ecc972a8b..9f133ed93815 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -778,29 +778,10 @@ static int tcf_ct_ipv6_is_fragment(struct sk_buff *skb, 
bool *frag)
return 0;
 }
 
-static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb,
-  u8 family, u16 zone, bool *defrag)
+static int handle_fragments(struct net *net, struct sk_buff *skb,
+   u16 zone, u8 family, u16 *mru)
 {
-   enum ip_conntrack_info ctinfo;
-   struct nf_conn *ct;
-   int err = 0;
-   bool frag;
-   u16 mru;
-
-   /* Previously seen (loopback)? Ignore. */
-   ct = nf_ct_get(skb, );
-   if ((ct && !nf_ct_is_template(ct)) || ctinfo == IP_CT_UNTRACKED)
-   return 0;
-
-   if (family == NFPROTO_IPV4)
-   err = tcf_ct_ipv4_is_fragment(skb, );
-   else
-   err = tcf_ct_ipv6_is_fragment(skb, );
-   if (err || !frag)
-   return err;
-
-   skb_get(skb);
-   mru = tc_skb_cb(skb)->mru;
+   int err;
 
if (family == NFPROTO_IPV4) {
enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone;
@@ -812,10 +793,8 @@ static int tcf_ct_handle_fragments(struct net *net, struct 
sk_buff *skb,
if (err && err != -EINPROGRESS)
return err;
 
-   if (!err) {
-   *defrag = true;
-   mru = IPCB(skb)->frag_max_size;
-   }
+   if (!err)
+   *mru = IPCB(skb)->frag_max_size;
} else { /* NFPROTO_IPV6 */
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone;
@@ -825,18 +804,14 @@ static int tcf_ct_handle_fragments(struct net *net, 
struct sk_buff *skb,
if (err && err != -EINPROGRESS)
goto out_free;
 
-   if (!err) {
-   *defrag = true;
-   mru = IP6CB(skb)->frag_max_size;
-   }
+   if (!err)
+   *mru = IP6CB(skb)->frag_max_size;
 #else
err = -EOPNOTSUPP;
goto out_free;
 #endif
}
 
-   if (err != -EINPROGRESS)
-   tc_skb_cb(skb)->mru = mru;
skb_clear_hash(skb);
skb->ignore_df = 1;
return err;
@@ -846,6 +821,38 @@ static int tcf_ct_handle_fragments(struct net *net, struct 
sk_buff *skb,
return err;
 }
 
+static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb,
+  u8 family, u16 zone, bool *defrag)
+{
+   enum ip_conntrack_info ctinfo;
+   struct nf_conn *ct;
+   int err = 0;
+   bool frag;
+   u16 mru;
+
+   /* Previously seen (loopback)? Ignore. */
+   ct = nf_ct_get(skb, );
+   if ((ct && !nf_ct_is_template(ct)) || ctinfo == IP_CT_UNTRACKED)
+   return 0;
+
+   if (family == NFPROTO_IPV4)
+   err = tcf_ct_ipv4_is_fragment(skb, );
+   else
+   err = tcf_ct_ipv6_is_fragment(skb, );
+   if (err || !frag)
+   return err;
+
+   skb_get(skb);
+   err = handle_fragments(net, skb, zone, family, );
+   if (err)
+   return err;
+
+   *defrag = true;
+   tc_skb_cb(skb)->mru = mru;
+
+   return 0;
+}
+
 static void tcf_ct_params_free(struct tcf_ct_params *params)
 {
if (params->helper) {
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv2 net-next 1/5] net: create nf_conntrack_ovs for ovs and tc use

2023-02-07 Thread Xin Long
Similar to nf_nat_ovs created by Commit ebddb1404900 ("net: move the
nat function to nf_nat_ovs for ovs and tc"), this patch is to create
nf_conntrack_ovs to get these functions shared by OVS and TC only.

There are nf_ct_helper() and nf_ct_add_helper() from nf_conntrak_helper
in this patch, and will be more in the following patches.

Signed-off-by: Xin Long 
---
 net/netfilter/Kconfig   |   3 +
 net/netfilter/Makefile  |   1 +
 net/netfilter/nf_conntrack_helper.c |  98 --
 net/netfilter/nf_conntrack_ovs.c| 104 
 net/openvswitch/Kconfig |   1 +
 net/sched/Kconfig   |   1 +
 6 files changed, 110 insertions(+), 98 deletions(-)
 create mode 100644 net/netfilter/nf_conntrack_ovs.c

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index f71b41c7ce2f..4d6737160857 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -189,6 +189,9 @@ config NF_CONNTRACK_LABELS
  to connection tracking entries.  It can be used with xtables connlabel
  match and the nftables ct expression.
 
+config NF_CONNTRACK_OVS
+   bool
+
 config NF_CT_PROTO_DCCP
bool 'DCCP protocol connection tracking support'
depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index ba2a6b5e93d9..5ffef1cd6143 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -11,6 +11,7 @@ nf_conntrack-$(CONFIG_NF_CONNTRACK_TIMEOUT) += 
nf_conntrack_timeout.o
 nf_conntrack-$(CONFIG_NF_CONNTRACK_TIMESTAMP) += nf_conntrack_timestamp.o
 nf_conntrack-$(CONFIG_NF_CONNTRACK_EVENTS) += nf_conntrack_ecache.o
 nf_conntrack-$(CONFIG_NF_CONNTRACK_LABELS) += nf_conntrack_labels.o
+nf_conntrack-$(CONFIG_NF_CONNTRACK_OVS) += nf_conntrack_ovs.o
 nf_conntrack-$(CONFIG_NF_CT_PROTO_DCCP) += nf_conntrack_proto_dccp.o
 nf_conntrack-$(CONFIG_NF_CT_PROTO_SCTP) += nf_conntrack_proto_sctp.o
 nf_conntrack-$(CONFIG_NF_CT_PROTO_GRE) += nf_conntrack_proto_gre.o
diff --git a/net/netfilter/nf_conntrack_helper.c 
b/net/netfilter/nf_conntrack_helper.c
index 48ea6d0264b5..0c4db2f2ac43 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -242,104 +242,6 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct 
nf_conn *tmpl,
 }
 EXPORT_SYMBOL_GPL(__nf_ct_try_assign_helper);
 
-/* 'skb' should already be pulled to nh_ofs. */
-int nf_ct_helper(struct sk_buff *skb, struct nf_conn *ct,
-enum ip_conntrack_info ctinfo, u16 proto)
-{
-   const struct nf_conntrack_helper *helper;
-   const struct nf_conn_help *help;
-   unsigned int protoff;
-   int err;
-
-   if (ctinfo == IP_CT_RELATED_REPLY)
-   return NF_ACCEPT;
-
-   help = nfct_help(ct);
-   if (!help)
-   return NF_ACCEPT;
-
-   helper = rcu_dereference(help->helper);
-   if (!helper)
-   return NF_ACCEPT;
-
-   if (helper->tuple.src.l3num != NFPROTO_UNSPEC &&
-   helper->tuple.src.l3num != proto)
-   return NF_ACCEPT;
-
-   switch (proto) {
-   case NFPROTO_IPV4:
-   protoff = ip_hdrlen(skb);
-   proto = ip_hdr(skb)->protocol;
-   break;
-   case NFPROTO_IPV6: {
-   u8 nexthdr = ipv6_hdr(skb)->nexthdr;
-   __be16 frag_off;
-   int ofs;
-
-   ofs = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), ,
-  _off);
-   if (ofs < 0 || (frag_off & htons(~0x7)) != 0) {
-   pr_debug("proto header not found\n");
-   return NF_ACCEPT;
-   }
-   protoff = ofs;
-   proto = nexthdr;
-   break;
-   }
-   default:
-   WARN_ONCE(1, "helper invoked on non-IP family!");
-   return NF_DROP;
-   }
-
-   if (helper->tuple.dst.protonum != proto)
-   return NF_ACCEPT;
-
-   err = helper->help(skb, protoff, ct, ctinfo);
-   if (err != NF_ACCEPT)
-   return err;
-
-   /* Adjust seqs after helper.  This is needed due to some helpers (e.g.,
-* FTP with NAT) adusting the TCP payload size when mangling IP
-* addresses and/or port numbers in the text-based control connection.
-*/
-   if (test_bit(IPS_SEQ_ADJUST_BIT, >status) &&
-   !nf_ct_seq_adjust(skb, ct, ctinfo, protoff))
-   return NF_DROP;
-   return NF_ACCEPT;
-}
-EXPORT_SYMBOL_GPL(nf_ct_helper);
-
-int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family,
-u8 proto, bool nat, struct nf_conntrack_helper **hp)
-{
-   struct nf_conntrack_helper *helper;
-   struct nf_conn_help *help;
-   int ret = 0;
-
-   helper = nf_conntrack_helper_try_module_get(name, family, proto);
-   if (!hel

[ovs-dev] [PATCHv2 net-next 0/5] net: move more duplicate code of ovs and tc conntrack into nf_conntrack_ovs

2023-02-07 Thread Xin Long
We've moved some duplicate code into nf_nat_ovs in:

  "net: eliminate the duplicate code in the ct nat functions of ovs and tc"

This patchset addresses more code duplication in the conntrack of ovs
and tc then creates nf_conntrack_ovs for them, and four functions will
be extracted and moved into it:

  nf_ct_handle_fragments()
  nf_ct_skb_network_trim()
  nf_ct_helper()
  nf_ct_add_helper()

v1->v2:
  - In patch 1/5, fix the wrong option name 'NF_NF_CONNTRACK' used in
net/openvswitch/Kconfig, found by kernel test robot.

Xin Long (5):
  net: create nf_conntrack_ovs for ovs and tc use
  net: extract nf_ct_skb_network_trim function to nf_conntrack_ovs
  openvswitch: move key and ovs_cb update out of handle_fragments
  net: sched: move frag check and tc_skb_cb update out of
handle_fragments
  net: extract nf_ct_handle_fragments to nf_conntrack_ovs

 include/net/netfilter/nf_conntrack.h |   4 +
 net/netfilter/Kconfig|   3 +
 net/netfilter/Makefile   |   1 +
 net/netfilter/nf_conntrack_helper.c  |  98 ---
 net/netfilter/nf_conntrack_ovs.c | 178 +++
 net/openvswitch/Kconfig  |   1 +
 net/openvswitch/conntrack.c  |  80 ++--
 net/sched/Kconfig|   1 +
 net/sched/act_ct.c   |  76 ++--
 9 files changed, 207 insertions(+), 235 deletions(-)
 create mode 100644 net/netfilter/nf_conntrack_ovs.c

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv2 net-next 2/5] net: extract nf_ct_skb_network_trim function to nf_conntrack_ovs

2023-02-07 Thread Xin Long
There are almost the same code in ovs_skb_network_trim() and
tcf_ct_skb_network_trim(), this patch extracts them into a function
nf_ct_skb_network_trim() and moves the function to nf_conntrack_ovs.

Signed-off-by: Xin Long 
---
 include/net/netfilter/nf_conntrack.h |  2 ++
 net/netfilter/nf_conntrack_ovs.c | 26 
 net/openvswitch/conntrack.c  | 36 
 net/sched/act_ct.c   | 27 +
 4 files changed, 33 insertions(+), 58 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index 6a2019aaa464..a6e89d7212f8 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -362,6 +362,8 @@ static inline struct nf_conntrack_net *nf_ct_pernet(const 
struct net *net)
return net_generic(net, nf_conntrack_net_id);
 }
 
+int nf_ct_skb_network_trim(struct sk_buff *skb, int family);
+
 #define NF_CT_STAT_INC(net, count)   __this_cpu_inc((net)->ct.stat->count)
 #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count)
 #define NF_CT_STAT_ADD_ATOMIC(net, count, v) 
this_cpu_add((net)->ct.stat->count, (v))
diff --git a/net/netfilter/nf_conntrack_ovs.c b/net/netfilter/nf_conntrack_ovs.c
index eff4d53f8b8c..c60ef71d1aea 100644
--- a/net/netfilter/nf_conntrack_ovs.c
+++ b/net/netfilter/nf_conntrack_ovs.c
@@ -102,3 +102,29 @@ int nf_ct_add_helper(struct nf_conn *ct, const char *name, 
u8 family,
return ret;
 }
 EXPORT_SYMBOL_GPL(nf_ct_add_helper);
+
+/* Trim the skb to the length specified by the IP/IPv6 header,
+ * removing any trailing lower-layer padding. This prepares the skb
+ * for higher-layer processing that assumes skb->len excludes padding
+ * (such as nf_ip_checksum). The caller needs to pull the skb to the
+ * network header, and ensure ip_hdr/ipv6_hdr points to valid data.
+ */
+int nf_ct_skb_network_trim(struct sk_buff *skb, int family)
+{
+   unsigned int len;
+
+   switch (family) {
+   case NFPROTO_IPV4:
+   len = skb_ip_totlen(skb);
+   break;
+   case NFPROTO_IPV6:
+   len = sizeof(struct ipv6hdr)
+   + ntohs(ipv6_hdr(skb)->payload_len);
+   break;
+   default:
+   len = skb->len;
+   }
+
+   return pskb_trim_rcsum(skb, len);
+}
+EXPORT_SYMBOL_GPL(nf_ct_skb_network_trim);
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 2172930b1f17..47a58657b1e4 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -1090,36 +1090,6 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
return 0;
 }
 
-/* Trim the skb to the length specified by the IP/IPv6 header,
- * removing any trailing lower-layer padding. This prepares the skb
- * for higher-layer processing that assumes skb->len excludes padding
- * (such as nf_ip_checksum). The caller needs to pull the skb to the
- * network header, and ensure ip_hdr/ipv6_hdr points to valid data.
- */
-static int ovs_skb_network_trim(struct sk_buff *skb)
-{
-   unsigned int len;
-   int err;
-
-   switch (skb->protocol) {
-   case htons(ETH_P_IP):
-   len = skb_ip_totlen(skb);
-   break;
-   case htons(ETH_P_IPV6):
-   len = sizeof(struct ipv6hdr)
-   + ntohs(ipv6_hdr(skb)->payload_len);
-   break;
-   default:
-   len = skb->len;
-   }
-
-   err = pskb_trim_rcsum(skb, len);
-   if (err)
-   kfree_skb(skb);
-
-   return err;
-}
-
 /* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero
  * value if 'skb' is freed.
  */
@@ -1134,9 +1104,11 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
nh_ofs = skb_network_offset(skb);
skb_pull_rcsum(skb, nh_ofs);
 
-   err = ovs_skb_network_trim(skb);
-   if (err)
+   err = nf_ct_skb_network_trim(skb, info->family);
+   if (err) {
+   kfree_skb(skb);
return err;
+   }
 
if (key->ip.frag != OVS_FRAG_TYPE_NONE) {
err = handle_fragments(net, key, info->zone.id, skb);
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index b126f03c1bb6..0a1ecc972a8b 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -726,31 +726,6 @@ static bool tcf_ct_skb_nfct_cached(struct net *net, struct 
sk_buff *skb,
return false;
 }
 
-/* Trim the skb to the length specified by the IP/IPv6 header,
- * removing any trailing lower-layer padding. This prepares the skb
- * for higher-layer processing that assumes skb->len excludes padding
- * (such as nf_ip_checksum). The caller needs to pull the skb to the
- * network header, and ensure ip_hdr/ipv6_hdr points to valid data.
- */
-static int tcf_ct_skb_network_trim(struct sk_buff *skb, int family)
-{
-   unsigne

Re: [ovs-dev] [PATCH net-next 1/5] net: create nf_conntrack_ovs for ovs and tc use

2023-02-04 Thread Xin Long
On Sat, Feb 4, 2023 at 8:11 PM kernel test robot  wrote:
>
> Hi Xin,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on net-next/master]
>
> url:
> https://github.com/intel-lab-lkp/linux/commits/Xin-Long/net-create-nf_conntrack_ovs-for-ovs-and-tc-use/20230205-060514
> patch link:
> https://lore.kernel.org/r/6eca3cf10a8c06f733fac943bcb997c06ec5daa3.1675548023.git.lucien.xin%40gmail.com
> patch subject: [PATCH net-next 1/5] net: create nf_conntrack_ovs for ovs and 
> tc use
> config: s390-defconfig 
> (https://download.01.org/0day-ci/archive/20230205/202302050823.jwxmwch9-...@intel.com/config)
> compiler: s390-linux-gcc (GCC) 12.1.0
> reproduce (this is a W=1 build):
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # 
> https://github.com/intel-lab-lkp/linux/commit/e2bb7f965a86f833a4ae6ec28444ba328fdfc358
> git remote add linux-review https://github.com/intel-lab-lkp/linux
> git fetch --no-tags linux-review 
> Xin-Long/net-create-nf_conntrack_ovs-for-ovs-and-tc-use/20230205-060514
> git checkout e2bb7f965a86f833a4ae6ec28444ba328fdfc358
> # save the config file
> mkdir build_dir && cp config build_dir/.config
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 
> O=build_dir ARCH=s390 olddefconfig
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 
> O=build_dir ARCH=s390 SHELL=/bin/bash
>
> If you fix the issue, kindly add following tag where applicable
> | Reported-by: kernel test robot 
>
> All errors (new ones prefixed by >>, old ones prefixed by <<):
>
> >> ERROR: modpost: "nf_ct_helper" [net/openvswitch/openvswitch.ko] undefined!
> >> ERROR: modpost: "nf_ct_add_helper" [net/openvswitch/openvswitch.ko] 
> >> undefined!
>
caused by:

+   select NF_CONNTRACK_OVS if NF_NF_CONNTRACK

"NF_NF_CONNTRACK", incorrect option name was used, will fix it.
The same cause is for Patch 2/5 and 5/5.

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 5/5] net: extract nf_ct_handle_fragments to nf_conntrack_ovs

2023-02-04 Thread Xin Long
Now handle_fragments() in OVS and TC have the similar code, and
this patch removes the duplicate code by moving the function
to nf_conntrack_ovs.

Note that skb_clear_hash(skb) or skb->ignore_df = 1 should be
done only when defrag returns 0, as it does in other places
in kernel.

Signed-off-by: Xin Long 
---
 include/net/netfilter/nf_conntrack.h |  2 ++
 net/netfilter/nf_conntrack_ovs.c | 48 
 net/openvswitch/conntrack.c  | 45 +-
 net/sched/act_ct.c   | 46 ++
 4 files changed, 53 insertions(+), 88 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index a6e89d7212f8..7bbab8f2b73d 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -363,6 +363,8 @@ static inline struct nf_conntrack_net *nf_ct_pernet(const 
struct net *net)
 }
 
 int nf_ct_skb_network_trim(struct sk_buff *skb, int family);
+int nf_ct_handle_fragments(struct net *net, struct sk_buff *skb,
+  u16 zone, u8 family, u8 *proto, u16 *mru);
 
 #define NF_CT_STAT_INC(net, count)   __this_cpu_inc((net)->ct.stat->count)
 #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count)
diff --git a/net/netfilter/nf_conntrack_ovs.c b/net/netfilter/nf_conntrack_ovs.c
index c60ef71d1aea..52b776bdf526 100644
--- a/net/netfilter/nf_conntrack_ovs.c
+++ b/net/netfilter/nf_conntrack_ovs.c
@@ -3,6 +3,8 @@
 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 /* 'skb' should already be pulled to nh_ofs. */
@@ -128,3 +130,49 @@ int nf_ct_skb_network_trim(struct sk_buff *skb, int family)
return pskb_trim_rcsum(skb, len);
 }
 EXPORT_SYMBOL_GPL(nf_ct_skb_network_trim);
+
+/* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero
+ * value if 'skb' is freed.
+ */
+int nf_ct_handle_fragments(struct net *net, struct sk_buff *skb,
+  u16 zone, u8 family, u8 *proto, u16 *mru)
+{
+   int err;
+
+   if (family == NFPROTO_IPV4) {
+   enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone;
+
+   memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
+   local_bh_disable();
+   err = ip_defrag(net, skb, user);
+   local_bh_enable();
+   if (err)
+   return err;
+
+   *mru = IPCB(skb)->frag_max_size;
+#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
+   } else if (family == NFPROTO_IPV6) {
+   enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone;
+
+   memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));
+   err = nf_ct_frag6_gather(net, skb, user);
+   if (err) {
+   if (err != -EINPROGRESS)
+   kfree_skb(skb);
+   return err;
+   }
+
+   *proto = ipv6_hdr(skb)->nexthdr;
+   *mru = IP6CB(skb)->frag_max_size;
+#endif
+   } else {
+   kfree_skb(skb);
+   return -EPFNOSUPPORT;
+   }
+
+   skb_clear_hash(skb);
+   skb->ignore_df = 1;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(nf_ct_handle_fragments);
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 962e2f70e597..5d40ad02cabc 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -434,56 +434,13 @@ static int ovs_ct_set_labels(struct nf_conn *ct, struct 
sw_flow_key *key,
return 0;
 }
 
-/* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero
- * value if 'skb' is freed.
- */
-static int handle_fragments(struct net *net, struct sk_buff *skb,
-   u16 zone, u8 family, u8 *proto, u16 *mru)
-{
-   int err;
-
-   if (family == NFPROTO_IPV4) {
-   enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone;
-
-   memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
-   err = ip_defrag(net, skb, user);
-   if (err)
-   return err;
-
-   *mru = IPCB(skb)->frag_max_size;
-#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
-   } else if (family == NFPROTO_IPV6) {
-   enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone;
-
-   memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));
-   err = nf_ct_frag6_gather(net, skb, user);
-   if (err) {
-   if (err != -EINPROGRESS)
-   kfree_skb(skb);
-   return err;
-   }
-
-   *proto = ipv6_hdr(skb)->nexthdr;
-   *mru = IP6CB(skb)->frag_max_size;
-#endif
-   } else {
-   kfree_skb(skb);
-   return -EPFNOSUPPORT;
-   }
-
-   skb_clear_hash(skb);
-   skb->ignore_df = 1;
-
-   return 0;
-}
-
 static int 

[ovs-dev] [PATCH net-next 3/5] openvswitch: move key and ovs_cb update out of handle_fragments

2023-02-04 Thread Xin Long
This patch has no functional changes and just moves key and ovs_cb update
out of handle_fragments, and skb_clear_hash() and skb->ignore_df change
into handle_fragments(), to make it easier to move the duplicate code
from handle_fragments() into nf_conntrack_ovs later.

Note that it changes to pass info->family to handle_fragments() instead
of key for the packet type check, as info->family is set according to
key->eth.type in ovs_ct_copy_action() when creating the action.

Signed-off-by: Xin Long 
---
 net/openvswitch/conntrack.c | 37 +
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 47a58657b1e4..962e2f70e597 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -437,13 +437,12 @@ static int ovs_ct_set_labels(struct nf_conn *ct, struct 
sw_flow_key *key,
 /* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero
  * value if 'skb' is freed.
  */
-static int handle_fragments(struct net *net, struct sw_flow_key *key,
-   u16 zone, struct sk_buff *skb)
+static int handle_fragments(struct net *net, struct sk_buff *skb,
+   u16 zone, u8 family, u8 *proto, u16 *mru)
 {
-   struct ovs_skb_cb ovs_cb = *OVS_CB(skb);
int err;
 
-   if (key->eth.type == htons(ETH_P_IP)) {
+   if (family == NFPROTO_IPV4) {
enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone;
 
memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
@@ -451,9 +450,9 @@ static int handle_fragments(struct net *net, struct 
sw_flow_key *key,
if (err)
return err;
 
-   ovs_cb.mru = IPCB(skb)->frag_max_size;
+   *mru = IPCB(skb)->frag_max_size;
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
-   } else if (key->eth.type == htons(ETH_P_IPV6)) {
+   } else if (family == NFPROTO_IPV6) {
enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone;
 
memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));
@@ -464,22 +463,35 @@ static int handle_fragments(struct net *net, struct 
sw_flow_key *key,
return err;
}
 
-   key->ip.proto = ipv6_hdr(skb)->nexthdr;
-   ovs_cb.mru = IP6CB(skb)->frag_max_size;
+   *proto = ipv6_hdr(skb)->nexthdr;
+   *mru = IP6CB(skb)->frag_max_size;
 #endif
} else {
kfree_skb(skb);
return -EPFNOSUPPORT;
}
 
+   skb_clear_hash(skb);
+   skb->ignore_df = 1;
+
+   return 0;
+}
+
+static int ovs_ct_handle_fragments(struct net *net, struct sw_flow_key *key,
+  u16 zone, int family, struct sk_buff *skb)
+{
+   struct ovs_skb_cb ovs_cb = *OVS_CB(skb);
+   int err;
+
+   err = handle_fragments(net, skb, zone, family, >ip.proto, 
_cb.mru);
+   if (err)
+   return err;
+
/* The key extracted from the fragment that completed this datagram
 * likely didn't have an L4 header, so regenerate it.
 */
ovs_flow_key_update_l3l4(skb, key);
-
key->ip.frag = OVS_FRAG_TYPE_NONE;
-   skb_clear_hash(skb);
-   skb->ignore_df = 1;
*OVS_CB(skb) = ovs_cb;
 
return 0;
@@ -,7 +1123,8 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
}
 
if (key->ip.frag != OVS_FRAG_TYPE_NONE) {
-   err = handle_fragments(net, key, info->zone.id, skb);
+   err = ovs_ct_handle_fragments(net, key, info->zone.id,
+ info->family, skb);
if (err)
return err;
}
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 4/5] net: sched: move frag check and tc_skb_cb update out of handle_fragments

2023-02-04 Thread Xin Long
This patch has no functional changes and just moves frag check and
tc_skb_cb update out of handle_fragments, to make it easier to move
the duplicate code from handle_fragments() into nf_conntrack_ovs later.

Signed-off-by: Xin Long 
---
 net/sched/act_ct.c | 71 +-
 1 file changed, 39 insertions(+), 32 deletions(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 0a1ecc972a8b..9f133ed93815 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -778,29 +778,10 @@ static int tcf_ct_ipv6_is_fragment(struct sk_buff *skb, 
bool *frag)
return 0;
 }
 
-static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb,
-  u8 family, u16 zone, bool *defrag)
+static int handle_fragments(struct net *net, struct sk_buff *skb,
+   u16 zone, u8 family, u16 *mru)
 {
-   enum ip_conntrack_info ctinfo;
-   struct nf_conn *ct;
-   int err = 0;
-   bool frag;
-   u16 mru;
-
-   /* Previously seen (loopback)? Ignore. */
-   ct = nf_ct_get(skb, );
-   if ((ct && !nf_ct_is_template(ct)) || ctinfo == IP_CT_UNTRACKED)
-   return 0;
-
-   if (family == NFPROTO_IPV4)
-   err = tcf_ct_ipv4_is_fragment(skb, );
-   else
-   err = tcf_ct_ipv6_is_fragment(skb, );
-   if (err || !frag)
-   return err;
-
-   skb_get(skb);
-   mru = tc_skb_cb(skb)->mru;
+   int err;
 
if (family == NFPROTO_IPV4) {
enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone;
@@ -812,10 +793,8 @@ static int tcf_ct_handle_fragments(struct net *net, struct 
sk_buff *skb,
if (err && err != -EINPROGRESS)
return err;
 
-   if (!err) {
-   *defrag = true;
-   mru = IPCB(skb)->frag_max_size;
-   }
+   if (!err)
+   *mru = IPCB(skb)->frag_max_size;
} else { /* NFPROTO_IPV6 */
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone;
@@ -825,18 +804,14 @@ static int tcf_ct_handle_fragments(struct net *net, 
struct sk_buff *skb,
if (err && err != -EINPROGRESS)
goto out_free;
 
-   if (!err) {
-   *defrag = true;
-   mru = IP6CB(skb)->frag_max_size;
-   }
+   if (!err)
+   *mru = IP6CB(skb)->frag_max_size;
 #else
err = -EOPNOTSUPP;
goto out_free;
 #endif
}
 
-   if (err != -EINPROGRESS)
-   tc_skb_cb(skb)->mru = mru;
skb_clear_hash(skb);
skb->ignore_df = 1;
return err;
@@ -846,6 +821,38 @@ static int tcf_ct_handle_fragments(struct net *net, struct 
sk_buff *skb,
return err;
 }
 
+static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb,
+  u8 family, u16 zone, bool *defrag)
+{
+   enum ip_conntrack_info ctinfo;
+   struct nf_conn *ct;
+   int err = 0;
+   bool frag;
+   u16 mru;
+
+   /* Previously seen (loopback)? Ignore. */
+   ct = nf_ct_get(skb, );
+   if ((ct && !nf_ct_is_template(ct)) || ctinfo == IP_CT_UNTRACKED)
+   return 0;
+
+   if (family == NFPROTO_IPV4)
+   err = tcf_ct_ipv4_is_fragment(skb, );
+   else
+   err = tcf_ct_ipv6_is_fragment(skb, );
+   if (err || !frag)
+   return err;
+
+   skb_get(skb);
+   err = handle_fragments(net, skb, zone, family, );
+   if (err)
+   return err;
+
+   *defrag = true;
+   tc_skb_cb(skb)->mru = mru;
+
+   return 0;
+}
+
 static void tcf_ct_params_free(struct tcf_ct_params *params)
 {
if (params->helper) {
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 0/5] net: move more duplicate code of ovs and tc conntrack into nf_conntrack_ovs

2023-02-04 Thread Xin Long
We've moved some duplicate code into nf_nat_ovs in:

  "net: eliminate the duplicate code in the ct nat functions of ovs and tc"

This patchset addresses more code duplication in the conntrack of ovs
and tc then creates nf_conntrack_ovs for them, and four functions will
be extracted and moved into it:

  nf_ct_handle_fragments()
  nf_ct_skb_network_trim()
  nf_ct_helper()
  nf_ct_add_helper()

Xin Long (5):
  net: create nf_conntrack_ovs for ovs and tc use
  net: extract nf_ct_skb_network_trim function to nf_conntrack_ovs
  openvswitch: move key and ovs_cb update out of handle_fragments
  net: sched: move frag check and tc_skb_cb update out of
handle_fragments
  net: extract nf_ct_handle_fragments to nf_conntrack_ovs

 include/net/netfilter/nf_conntrack.h |   4 +
 net/netfilter/Kconfig|   6 +
 net/netfilter/Makefile   |   1 +
 net/netfilter/nf_conntrack_helper.c  |  98 ---
 net/netfilter/nf_conntrack_ovs.c | 178 +++
 net/openvswitch/Kconfig  |   1 +
 net/openvswitch/conntrack.c  |  80 ++--
 net/sched/Kconfig|   1 +
 net/sched/act_ct.c   |  76 ++--
 9 files changed, 210 insertions(+), 235 deletions(-)
 create mode 100644 net/netfilter/nf_conntrack_ovs.c

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 2/5] net: extract nf_ct_skb_network_trim function to nf_conntrack_ovs

2023-02-04 Thread Xin Long
There are almost the same code in ovs_skb_network_trim() and
tcf_ct_skb_network_trim(), this patch extracts them into a function
nf_ct_skb_network_trim() and moves the function to nf_conntrack_ovs.

Signed-off-by: Xin Long 
---
 include/net/netfilter/nf_conntrack.h |  2 ++
 net/netfilter/nf_conntrack_ovs.c | 26 
 net/openvswitch/conntrack.c  | 36 
 net/sched/act_ct.c   | 27 +
 4 files changed, 33 insertions(+), 58 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index 6a2019aaa464..a6e89d7212f8 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -362,6 +362,8 @@ static inline struct nf_conntrack_net *nf_ct_pernet(const 
struct net *net)
return net_generic(net, nf_conntrack_net_id);
 }
 
+int nf_ct_skb_network_trim(struct sk_buff *skb, int family);
+
 #define NF_CT_STAT_INC(net, count)   __this_cpu_inc((net)->ct.stat->count)
 #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count)
 #define NF_CT_STAT_ADD_ATOMIC(net, count, v) 
this_cpu_add((net)->ct.stat->count, (v))
diff --git a/net/netfilter/nf_conntrack_ovs.c b/net/netfilter/nf_conntrack_ovs.c
index eff4d53f8b8c..c60ef71d1aea 100644
--- a/net/netfilter/nf_conntrack_ovs.c
+++ b/net/netfilter/nf_conntrack_ovs.c
@@ -102,3 +102,29 @@ int nf_ct_add_helper(struct nf_conn *ct, const char *name, 
u8 family,
return ret;
 }
 EXPORT_SYMBOL_GPL(nf_ct_add_helper);
+
+/* Trim the skb to the length specified by the IP/IPv6 header,
+ * removing any trailing lower-layer padding. This prepares the skb
+ * for higher-layer processing that assumes skb->len excludes padding
+ * (such as nf_ip_checksum). The caller needs to pull the skb to the
+ * network header, and ensure ip_hdr/ipv6_hdr points to valid data.
+ */
+int nf_ct_skb_network_trim(struct sk_buff *skb, int family)
+{
+   unsigned int len;
+
+   switch (family) {
+   case NFPROTO_IPV4:
+   len = skb_ip_totlen(skb);
+   break;
+   case NFPROTO_IPV6:
+   len = sizeof(struct ipv6hdr)
+   + ntohs(ipv6_hdr(skb)->payload_len);
+   break;
+   default:
+   len = skb->len;
+   }
+
+   return pskb_trim_rcsum(skb, len);
+}
+EXPORT_SYMBOL_GPL(nf_ct_skb_network_trim);
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 2172930b1f17..47a58657b1e4 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -1090,36 +1090,6 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
return 0;
 }
 
-/* Trim the skb to the length specified by the IP/IPv6 header,
- * removing any trailing lower-layer padding. This prepares the skb
- * for higher-layer processing that assumes skb->len excludes padding
- * (such as nf_ip_checksum). The caller needs to pull the skb to the
- * network header, and ensure ip_hdr/ipv6_hdr points to valid data.
- */
-static int ovs_skb_network_trim(struct sk_buff *skb)
-{
-   unsigned int len;
-   int err;
-
-   switch (skb->protocol) {
-   case htons(ETH_P_IP):
-   len = skb_ip_totlen(skb);
-   break;
-   case htons(ETH_P_IPV6):
-   len = sizeof(struct ipv6hdr)
-   + ntohs(ipv6_hdr(skb)->payload_len);
-   break;
-   default:
-   len = skb->len;
-   }
-
-   err = pskb_trim_rcsum(skb, len);
-   if (err)
-   kfree_skb(skb);
-
-   return err;
-}
-
 /* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero
  * value if 'skb' is freed.
  */
@@ -1134,9 +1104,11 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
nh_ofs = skb_network_offset(skb);
skb_pull_rcsum(skb, nh_ofs);
 
-   err = ovs_skb_network_trim(skb);
-   if (err)
+   err = nf_ct_skb_network_trim(skb, info->family);
+   if (err) {
+   kfree_skb(skb);
return err;
+   }
 
if (key->ip.frag != OVS_FRAG_TYPE_NONE) {
err = handle_fragments(net, key, info->zone.id, skb);
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index b126f03c1bb6..0a1ecc972a8b 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -726,31 +726,6 @@ static bool tcf_ct_skb_nfct_cached(struct net *net, struct 
sk_buff *skb,
return false;
 }
 
-/* Trim the skb to the length specified by the IP/IPv6 header,
- * removing any trailing lower-layer padding. This prepares the skb
- * for higher-layer processing that assumes skb->len excludes padding
- * (such as nf_ip_checksum). The caller needs to pull the skb to the
- * network header, and ensure ip_hdr/ipv6_hdr points to valid data.
- */
-static int tcf_ct_skb_network_trim(struct sk_buff *skb, int family)
-{
-   unsigne

[ovs-dev] [PATCH net-next 1/5] net: create nf_conntrack_ovs for ovs and tc use

2023-02-04 Thread Xin Long
Similar to nf_nat_ovs created by Commit ebddb1404900 ("net: move the
nat function to nf_nat_ovs for ovs and tc"), this patch is to create
nf_conntrack_ovs to get these functions shared by OVS and TC only.

There are nf_ct_helper() and nf_ct_add_helper() from nf_conntrak_helper
in this patch, and will be more in the following patches.

Signed-off-by: Xin Long 
---
 net/netfilter/Kconfig   |   6 ++
 net/netfilter/Makefile  |   1 +
 net/netfilter/nf_conntrack_helper.c |  98 --
 net/netfilter/nf_conntrack_ovs.c| 104 
 net/openvswitch/Kconfig |   1 +
 net/sched/Kconfig   |   1 +
 6 files changed, 113 insertions(+), 98 deletions(-)
 create mode 100644 net/netfilter/nf_conntrack_ovs.c

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index f71b41c7ce2f..645664f462e5 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -189,6 +189,12 @@ config NF_CONNTRACK_LABELS
  to connection tracking entries.  It can be used with xtables connlabel
  match and the nftables ct expression.
 
+config NF_CONNTRACK_OVS
+   bool "Connection tracking for openvswitch"
+   help
+ This option enables support for openvswitch.  It can be used by
+ openvswitch and tc conntrack.
+
 config NF_CT_PROTO_DCCP
bool 'DCCP protocol connection tracking support'
depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index ba2a6b5e93d9..5ffef1cd6143 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -11,6 +11,7 @@ nf_conntrack-$(CONFIG_NF_CONNTRACK_TIMEOUT) += 
nf_conntrack_timeout.o
 nf_conntrack-$(CONFIG_NF_CONNTRACK_TIMESTAMP) += nf_conntrack_timestamp.o
 nf_conntrack-$(CONFIG_NF_CONNTRACK_EVENTS) += nf_conntrack_ecache.o
 nf_conntrack-$(CONFIG_NF_CONNTRACK_LABELS) += nf_conntrack_labels.o
+nf_conntrack-$(CONFIG_NF_CONNTRACK_OVS) += nf_conntrack_ovs.o
 nf_conntrack-$(CONFIG_NF_CT_PROTO_DCCP) += nf_conntrack_proto_dccp.o
 nf_conntrack-$(CONFIG_NF_CT_PROTO_SCTP) += nf_conntrack_proto_sctp.o
 nf_conntrack-$(CONFIG_NF_CT_PROTO_GRE) += nf_conntrack_proto_gre.o
diff --git a/net/netfilter/nf_conntrack_helper.c 
b/net/netfilter/nf_conntrack_helper.c
index 48ea6d0264b5..0c4db2f2ac43 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -242,104 +242,6 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct 
nf_conn *tmpl,
 }
 EXPORT_SYMBOL_GPL(__nf_ct_try_assign_helper);
 
-/* 'skb' should already be pulled to nh_ofs. */
-int nf_ct_helper(struct sk_buff *skb, struct nf_conn *ct,
-enum ip_conntrack_info ctinfo, u16 proto)
-{
-   const struct nf_conntrack_helper *helper;
-   const struct nf_conn_help *help;
-   unsigned int protoff;
-   int err;
-
-   if (ctinfo == IP_CT_RELATED_REPLY)
-   return NF_ACCEPT;
-
-   help = nfct_help(ct);
-   if (!help)
-   return NF_ACCEPT;
-
-   helper = rcu_dereference(help->helper);
-   if (!helper)
-   return NF_ACCEPT;
-
-   if (helper->tuple.src.l3num != NFPROTO_UNSPEC &&
-   helper->tuple.src.l3num != proto)
-   return NF_ACCEPT;
-
-   switch (proto) {
-   case NFPROTO_IPV4:
-   protoff = ip_hdrlen(skb);
-   proto = ip_hdr(skb)->protocol;
-   break;
-   case NFPROTO_IPV6: {
-   u8 nexthdr = ipv6_hdr(skb)->nexthdr;
-   __be16 frag_off;
-   int ofs;
-
-   ofs = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), ,
-  _off);
-   if (ofs < 0 || (frag_off & htons(~0x7)) != 0) {
-   pr_debug("proto header not found\n");
-   return NF_ACCEPT;
-   }
-   protoff = ofs;
-   proto = nexthdr;
-   break;
-   }
-   default:
-   WARN_ONCE(1, "helper invoked on non-IP family!");
-   return NF_DROP;
-   }
-
-   if (helper->tuple.dst.protonum != proto)
-   return NF_ACCEPT;
-
-   err = helper->help(skb, protoff, ct, ctinfo);
-   if (err != NF_ACCEPT)
-   return err;
-
-   /* Adjust seqs after helper.  This is needed due to some helpers (e.g.,
-* FTP with NAT) adusting the TCP payload size when mangling IP
-* addresses and/or port numbers in the text-based control connection.
-*/
-   if (test_bit(IPS_SEQ_ADJUST_BIT, >status) &&
-   !nf_ct_seq_adjust(skb, ct, ctinfo, protoff))
-   return NF_DROP;
-   return NF_ACCEPT;
-}
-EXPORT_SYMBOL_GPL(nf_ct_helper);
-
-int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family,
-u8 proto, bool nat, struct nf_conntrack_helper **hp)
-{
-   str

[ovs-dev] [PATCHv4 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc

2022-12-08 Thread Xin Long
There are two nat functions are nearly the same in both OVS and
TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat().

This patch creates nf_nat_ovs.c under netfilter and moves them
there then exports nf_ct_nat() so that it can be shared by both
OVS and TC, and keeps the nat (type) check and nat flag update
in OVS and TC's own place, as these parts are different between
OVS and TC.

Note that in OVS nat function it was using skb->protocol to get
the proto as it already skips vlans in key_extract(), while it
doesn't in TC, and TC has to call skb_protocol() to get proto.
So in nf_ct_nat_execute(), we keep using skb_protocol() which
works for both OVS and TC contrack.

Signed-off-by: Xin Long 
---
 include/net/netfilter/nf_nat.h |   4 +
 net/netfilter/Kconfig  |   3 +
 net/netfilter/Makefile |   1 +
 net/netfilter/nf_nat_ovs.c | 135 
 net/openvswitch/Kconfig|   1 +
 net/openvswitch/conntrack.c| 137 +++--
 net/sched/Kconfig  |   1 +
 net/sched/act_ct.c | 136 +++-
 8 files changed, 166 insertions(+), 252 deletions(-)
 create mode 100644 net/netfilter/nf_nat_ovs.c

diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h
index e9eb01e99d2f..9877f064548a 100644
--- a/include/net/netfilter/nf_nat.h
+++ b/include/net/netfilter/nf_nat.h
@@ -104,6 +104,10 @@ unsigned int
 nf_nat_inet_fn(void *priv, struct sk_buff *skb,
   const struct nf_hook_state *state);
 
+int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
+ enum ip_conntrack_info ctinfo, int *action,
+ const struct nf_nat_range2 *range, bool commit);
+
 static inline int nf_nat_initialized(const struct nf_conn *ct,
 enum nf_nat_manip_type manip)
 {
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 0846bd75b1da..f71b41c7ce2f 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -459,6 +459,9 @@ config NF_NAT_REDIRECT
 config NF_NAT_MASQUERADE
bool
 
+config NF_NAT_OVS
+   bool
+
 config NETFILTER_SYNPROXY
tristate
 
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 1d4db1943936..3754eb06fb41 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_NF_LOG_SYSLOG) += nf_log_syslog.o
 obj-$(CONFIG_NF_NAT) += nf_nat.o
 nf_nat-$(CONFIG_NF_NAT_REDIRECT) += nf_nat_redirect.o
 nf_nat-$(CONFIG_NF_NAT_MASQUERADE) += nf_nat_masquerade.o
+nf_nat-$(CONFIG_NF_NAT_OVS) += nf_nat_ovs.o
 
 ifeq ($(CONFIG_NF_NAT),m)
 nf_nat-$(CONFIG_DEBUG_INFO_BTF_MODULES) += nf_nat_bpf.o
diff --git a/net/netfilter/nf_nat_ovs.c b/net/netfilter/nf_nat_ovs.c
new file mode 100644
index ..551abd2da614
--- /dev/null
+++ b/net/netfilter/nf_nat_ovs.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Support nat functions for openvswitch and used by OVS and TC conntrack. */
+
+#include 
+
+/* Modelled after nf_nat_ipv[46]_fn().
+ * range is only used for new, uninitialized NAT state.
+ * Returns either NF_ACCEPT or NF_DROP.
+ */
+static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
+enum ip_conntrack_info ctinfo, int *action,
+const struct nf_nat_range2 *range,
+enum nf_nat_manip_type maniptype)
+{
+   __be16 proto = skb_protocol(skb, true);
+   int hooknum, err = NF_ACCEPT;
+
+   /* See HOOK2MANIP(). */
+   if (maniptype == NF_NAT_MANIP_SRC)
+   hooknum = NF_INET_LOCAL_IN; /* Source NAT */
+   else
+   hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
+
+   switch (ctinfo) {
+   case IP_CT_RELATED:
+   case IP_CT_RELATED_REPLY:
+   if (proto == htons(ETH_P_IP) &&
+   ip_hdr(skb)->protocol == IPPROTO_ICMP) {
+   if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
+  hooknum))
+   err = NF_DROP;
+   goto out;
+   } else if (IS_ENABLED(CONFIG_IPV6) && proto == 
htons(ETH_P_IPV6)) {
+   __be16 frag_off;
+   u8 nexthdr = ipv6_hdr(skb)->nexthdr;
+   int hdrlen = ipv6_skip_exthdr(skb,
+ sizeof(struct ipv6hdr),
+ , _off);
+
+   if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
+   if (!nf_nat_icmpv6_reply_translation(skb, ct,
+ctinfo,
+hooknum,
+hdrlen))
+

[ovs-dev] [PATCHv4 net-next 4/5] net: sched: update the nat flag for icmp error packets in ct_nat_execute

2022-12-08 Thread Xin Long
In ovs_ct_nat_execute(), the packet flow key nat flags are updated
when it processes ICMP(v6) error packets translation successfully.

In ct_nat_execute() when processing ICMP(v6) error packets translation
successfully, it should have done the same in ct_nat_execute() to set
post_ct_s/dnat flag, which will be used to update flow key nat flags
in OVS module later.

Reviewed-by: Saeed Mahameed 
Signed-off-by: Xin Long 
---
 net/sched/act_ct.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index dd5ae7551956..bb87d1e910ea 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -936,13 +936,13 @@ static int ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
}
 
err = nf_nat_packet(ct, ctinfo, hooknum, skb);
+out:
if (err == NF_ACCEPT) {
if (maniptype == NF_NAT_MANIP_SRC)
tc_skb_cb(skb)->post_ct_snat = 1;
if (maniptype == NF_NAT_MANIP_DST)
tc_skb_cb(skb)->post_ct_dnat = 1;
}
-out:
return err;
 }
 #endif /* CONFIG_NF_NAT */
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv4 net-next 1/5] openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute

2022-12-08 Thread Xin Long
The calls to ovs_ct_nat_execute() are as below:

  ovs_ct_execute()
ovs_ct_lookup()
  __ovs_ct_lookup()
ovs_ct_nat()
  ovs_ct_nat_execute()
ovs_ct_commit()
  __ovs_ct_lookup()
ovs_ct_nat()
  ovs_ct_nat_execute()

and since skb_pull_rcsum() and skb_push_rcsum() are already
called in ovs_ct_execute(), there's no need to do it again
in ovs_ct_nat_execute().

Reviewed-by: Saeed Mahameed 
Acked-by: Aaron Conole 
Signed-off-by: Xin Long 
---
 net/openvswitch/conntrack.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index d78f0fc4337d..dff093a10d6d 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -735,10 +735,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
  const struct nf_nat_range2 *range,
  enum nf_nat_manip_type maniptype, struct 
sw_flow_key *key)
 {
-   int hooknum, nh_off, err = NF_ACCEPT;
-
-   nh_off = skb_network_offset(skb);
-   skb_pull_rcsum(skb, nh_off);
+   int hooknum, err = NF_ACCEPT;
 
/* See HOOK2MANIP(). */
if (maniptype == NF_NAT_MANIP_SRC)
@@ -755,7 +752,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
   hooknum))
err = NF_DROP;
-   goto push;
+   goto out;
} else if (IS_ENABLED(CONFIG_IPV6) &&
   skb->protocol == htons(ETH_P_IPV6)) {
__be16 frag_off;
@@ -770,7 +767,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
 hooknum,
 hdrlen))
err = NF_DROP;
-   goto push;
+   goto out;
}
}
/* Non-ICMP, fall thru to initialize if needed. */
@@ -788,7 +785,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
? nf_nat_setup_info(ct, range, maniptype)
: nf_nat_alloc_null_binding(ct, hooknum);
if (err != NF_ACCEPT)
-   goto push;
+   goto out;
}
break;
 
@@ -798,13 +795,11 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
 
default:
err = NF_DROP;
-   goto push;
+   goto out;
}
 
err = nf_nat_packet(ct, ctinfo, hooknum, skb);
-push:
-   skb_push_rcsum(skb, nh_off);
-
+out:
/* Update the flow key if NAT successful. */
if (err == NF_ACCEPT)
ovs_nat_update_key(key, skb, maniptype);
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv4 net-next 2/5] openvswitch: return NF_ACCEPT when OVS_CT_NAT is not set in info nat

2022-12-08 Thread Xin Long
Either OVS_CT_SRC_NAT or OVS_CT_DST_NAT is set, OVS_CT_NAT must be
set in info->nat. Thus, if OVS_CT_NAT is not set in info->nat, it
will definitely not do NAT but returns NF_ACCEPT in ovs_ct_nat().

This patch changes nothing funcational but only makes this return
earlier in ovs_ct_nat() to keep consistent with TC's processing
in tcf_ct_act_nat().

Reviewed-by: Saeed Mahameed 
Acked-by: Aaron Conole 
Signed-off-by: Xin Long 
---
 net/openvswitch/conntrack.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index dff093a10d6d..5ea74270da46 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -816,6 +816,9 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key 
*key,
enum nf_nat_manip_type maniptype;
int err;
 
+   if (!(info->nat & OVS_CT_NAT))
+   return NF_ACCEPT;
+
/* Add NAT extension if not confirmed yet. */
if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
return NF_ACCEPT;   /* Can't NAT. */
@@ -825,8 +828,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key 
*key,
 * Make sure new expected connections (IP_CT_RELATED) are NATted only
 * when committing.
 */
-   if (info->nat & OVS_CT_NAT && ctinfo != IP_CT_NEW &&
-   ct->status & IPS_NAT_MASK &&
+   if (ctinfo != IP_CT_NEW && ct->status & IPS_NAT_MASK &&
(ctinfo != IP_CT_RELATED || info->commit)) {
/* NAT an established or related connection like before. */
if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv4 net-next 3/5] openvswitch: return NF_DROP when fails to add nat ext in ovs_ct_nat

2022-12-08 Thread Xin Long
When it fails to allocate nat ext, the packet should be dropped, like
the memory allocation failures in other places in ovs_ct_nat().

This patch changes to return NF_DROP when fails to add nat ext before
doing NAT in ovs_ct_nat(), also it would keep consistent with tc
action ct' processing in tcf_ct_act_nat().

Signed-off-by: Xin Long 
---
 net/openvswitch/conntrack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 5ea74270da46..58c9f0edc3c4 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -821,7 +821,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key 
*key,
 
/* Add NAT extension if not confirmed yet. */
if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
-   return NF_ACCEPT;   /* Can't NAT. */
+   return NF_DROP;   /* Can't NAT. */
 
/* Determine NAT type.
 * Check if the NAT type can be deduced from the tracked connection.
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv4 net-next 0/5] net: eliminate the duplicate code in the ct nat functions of ovs and tc

2022-12-08 Thread Xin Long
The changes in the patchset:

  "net: add helper support in tc act_ct for ovs offloading"

had moved some common ct code used by both OVS and TC into netfilter.

There are still some big functions pretty similar defined and used in
each of OVS and TC. It is not good to maintain such big function in 2
places. This patchset is to extract the functions for NAT processing
from OVS and TC to netfilter.

To make this change clear and safe, this patchset gets the common code
out of OVS and TC step by step: The patch 1-4 make some minor changes
in OVS and TC to make the NAT code of them completely the same, then
the patch 5 moves the common code to the netfilter and exports one
function called by each of OVS and TC.

v1->v2:
  - Create nf_nat_ovs.c to include the nat functions, as Pablo suggested.
v2->v3:
  - fix a typo in subject of patch 2/5, as Marcelo noticed.
  - fix in openvswitch to keep OVS ct nat and TC ct nat consistent in
patch 3/5 instead of in tc, as Marcelo noticed.
  - use BIT(var) macro instead of (1 << var) in patch 5/5, as Marcelo
suggested.
  - use ifdef in netfilter/Makefile to build nf_nat_ovs only when OVS
or TC ct action is enabled in patch 5/5, as Marcelo suggested.
v3->v4:
  - add NF_NAT_OVS in netfilter/Kconfig and add select NF_NAT_OVS in
OVS and TC Kconfig instead of using ifdef in netfilter/Makefile,
    as Pablo suggested.

Xin Long (5):
  openvswitch: delete the unncessary skb_pull_rcsum call in
ovs_ct_nat_execute
  openvswitch: return NF_ACCEPT when OVS_CT_NAT is not set in info nat
  openvswitch: return NF_DROP when fails to add nat ext in ovs_ct_nat
  net: sched: update the nat flag for icmp error packets in
ct_nat_execute
  net: move the nat function to nf_nat_ovs for ovs and tc

 include/net/netfilter/nf_nat.h |   4 +
 net/netfilter/Kconfig  |   3 +
 net/netfilter/Makefile |   1 +
 net/netfilter/nf_nat_ovs.c | 135 ++
 net/openvswitch/Kconfig|   1 +
 net/openvswitch/conntrack.c| 146 +++--
 net/sched/Kconfig  |   1 +
 net/sched/act_ct.c | 136 +++---
 8 files changed, 169 insertions(+), 258 deletions(-)
 create mode 100644 net/netfilter/nf_nat_ovs.c

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv3 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc

2022-12-07 Thread Xin Long
On Tue, Dec 6, 2022 at 10:32 PM Xin Long  wrote:
>
> On Tue, Dec 6, 2022 at 6:54 PM Pablo Neira Ayuso  wrote:
> >
> > On Tue, Dec 06, 2022 at 06:31:16PM -0500, Xin Long wrote:
> > [...]
> > > diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
> > > index 1d4db1943936..0976d34b1e5f 100644
> > > --- a/net/netfilter/Makefile
> > > +++ b/net/netfilter/Makefile
> > > @@ -54,6 +54,12 @@ obj-$(CONFIG_NF_CONNTRACK_TFTP) += nf_conntrack_tftp.o
> > >
> > >  nf_nat-y := nf_nat_core.o nf_nat_proto.o nf_nat_helper.o
> > >
> > > +ifdef CONFIG_OPENVSWITCH
> > > +nf_nat-y += nf_nat_ovs.o
> > > +else ifdef CONFIG_NET_ACT_CT
> > > +nf_nat-y += nf_nat_ovs.o
> > > +endif
> >
> > Maybe add CONFIG_NF_NAT_OVS and select it from OPENVSWITCH Kconfig
> > (select is a hammer, but it should be fine in this case since
> > OPENVSWITCH already depends on NF_NAT?).
> not really completely depends, it's:
>
>   depends on (!NF_NAT || NF_NAT)
>
> but it's fine, the select will be:
>
>   select NF_NAT_OVS if NF_NAT
>
> >
> > Then in Makefile:
> >
> > nf_nat-$(CONFIG_NF_NAT_OVS)  += nf_nat_ovs.o
> >
> > And CONFIG_NF_NAT_OVS depends on OPENVSWITCH.
> Sounds great!
> Then it will be:
>
> --- a/net/netfilter/Kconfig
> +++ b/net/netfilter/Kconfig
> @@ -459,6 +459,10 @@ config NF_NAT_REDIRECT
>  config NF_NAT_MASQUERADE
> bool
>
> +config NF_NAT_OVS
> +   bool
> +   depends on OPENVSWITCH || NET_ACT_CT
> +
Just FYI, "depends on" is not necessary in this case.
Even without this "depends on OPENVSWITCH || NET_ACT_CT",
it will still be disabled automatically if OPENVSWITCH and
NET_ACT_CT are disabled, and you can't enable it manually either.

Thanks.

>
> --- a/net/netfilter/Makefile
> +++ b/net/netfilter/Makefile
> @@ -59,6 +59,7 @@ obj-$(CONFIG_NF_LOG_SYSLOG) += nf_log_syslog.o
>  obj-$(CONFIG_NF_NAT) += nf_nat.o
>  nf_nat-$(CONFIG_NF_NAT_REDIRECT) += nf_nat_redirect.o
>  nf_nat-$(CONFIG_NF_NAT_MASQUERADE) += nf_nat_masquerade.o
> +nf_nat-$(CONFIG_NF_NAT_OVS)  += nf_nat_ovs.o
>
> --- a/net/openvswitch/Kconfig
> +++ b/net/openvswitch/Kconfig
> @@ -15,6 +15,7 @@ config OPENVSWITCH
> select NET_MPLS_GSO
> select DST_CACHE
> select NET_NSH
> +   select NF_NAT_OVS if NF_NAT
>
> --- a/net/sched/Kconfig
> +++ b/net/sched/Kconfig
> @@ -977,6 +977,7 @@ config NET_ACT_TUNNEL_KEY
>  config NET_ACT_CT
> tristate "connection tracking tc action"
> depends on NET_CLS_ACT && NF_CONNTRACK && (!NF_NAT || NF_NAT)
> && NF_FLOW_TABLE
> +   select NF_NAT_OVS if NF_NAT
>
>
> I will prepare v4, Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv3 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc

2022-12-06 Thread Xin Long
On Tue, Dec 6, 2022 at 6:54 PM Pablo Neira Ayuso  wrote:
>
> On Tue, Dec 06, 2022 at 06:31:16PM -0500, Xin Long wrote:
> [...]
> > diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
> > index 1d4db1943936..0976d34b1e5f 100644
> > --- a/net/netfilter/Makefile
> > +++ b/net/netfilter/Makefile
> > @@ -54,6 +54,12 @@ obj-$(CONFIG_NF_CONNTRACK_TFTP) += nf_conntrack_tftp.o
> >
> >  nf_nat-y := nf_nat_core.o nf_nat_proto.o nf_nat_helper.o
> >
> > +ifdef CONFIG_OPENVSWITCH
> > +nf_nat-y += nf_nat_ovs.o
> > +else ifdef CONFIG_NET_ACT_CT
> > +nf_nat-y += nf_nat_ovs.o
> > +endif
>
> Maybe add CONFIG_NF_NAT_OVS and select it from OPENVSWITCH Kconfig
> (select is a hammer, but it should be fine in this case since
> OPENVSWITCH already depends on NF_NAT?).
not really completely depends, it's:

  depends on (!NF_NAT || NF_NAT)

but it's fine, the select will be:

  select NF_NAT_OVS if NF_NAT

>
> Then in Makefile:
>
> nf_nat-$(CONFIG_NF_NAT_OVS)  += nf_nat_ovs.o
>
> And CONFIG_NF_NAT_OVS depends on OPENVSWITCH.
Sounds great!
Then it will be:

--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -459,6 +459,10 @@ config NF_NAT_REDIRECT
 config NF_NAT_MASQUERADE
bool

+config NF_NAT_OVS
+   bool
+   depends on OPENVSWITCH || NET_ACT_CT
+

--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_NF_LOG_SYSLOG) += nf_log_syslog.o
 obj-$(CONFIG_NF_NAT) += nf_nat.o
 nf_nat-$(CONFIG_NF_NAT_REDIRECT) += nf_nat_redirect.o
 nf_nat-$(CONFIG_NF_NAT_MASQUERADE) += nf_nat_masquerade.o
+nf_nat-$(CONFIG_NF_NAT_OVS)  += nf_nat_ovs.o

--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -15,6 +15,7 @@ config OPENVSWITCH
select NET_MPLS_GSO
select DST_CACHE
select NET_NSH
+   select NF_NAT_OVS if NF_NAT

--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -977,6 +977,7 @@ config NET_ACT_TUNNEL_KEY
 config NET_ACT_CT
tristate "connection tracking tc action"
depends on NET_CLS_ACT && NF_CONNTRACK && (!NF_NAT || NF_NAT)
&& NF_FLOW_TABLE
+   select NF_NAT_OVS if NF_NAT


I will prepare v4, Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv3 net-next 4/5] net: sched: update the nat flag for icmp error packets in ct_nat_execute

2022-12-06 Thread Xin Long
In ovs_ct_nat_execute(), the packet flow key nat flags are updated
when it processes ICMP(v6) error packets translation successfully.

In ct_nat_execute() when processing ICMP(v6) error packets translation
successfully, it should have done the same in ct_nat_execute() to set
post_ct_s/dnat flag, which will be used to update flow key nat flags
in OVS module later.

Reviewed-by: Saeed Mahameed 
Signed-off-by: Xin Long 
---
 net/sched/act_ct.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index dd5ae7551956..bb87d1e910ea 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -936,13 +936,13 @@ static int ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
}
 
err = nf_nat_packet(ct, ctinfo, hooknum, skb);
+out:
if (err == NF_ACCEPT) {
if (maniptype == NF_NAT_MANIP_SRC)
tc_skb_cb(skb)->post_ct_snat = 1;
if (maniptype == NF_NAT_MANIP_DST)
tc_skb_cb(skb)->post_ct_dnat = 1;
}
-out:
return err;
 }
 #endif /* CONFIG_NF_NAT */
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv3 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc

2022-12-06 Thread Xin Long
There are two nat functions are nearly the same in both OVS and
TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat().

This patch creates nf_nat_ovs.c under netfilter and moves them
there then exports nf_ct_nat() so that it can be shared by both
OVS and TC, and keeps the nat (type) check and nat flag update
in OVS and TC's own place, as these parts are different between
OVS and TC.

Note that in OVS nat function it was using skb->protocol to get
the proto as it already skips vlans in key_extract(), while it
doesn't in TC, and TC has to call skb_protocol() to get proto.
So in nf_ct_nat_execute(), we keep using skb_protocol() which
works for both OVS and TC contrack.

Reviewed-by: Saeed Mahameed 
Signed-off-by: Xin Long 
---
 include/net/netfilter/nf_nat.h |   4 +
 net/netfilter/Makefile |   6 ++
 net/netfilter/nf_nat_ovs.c | 135 
 net/openvswitch/conntrack.c| 137 +++--
 net/sched/act_ct.c | 136 +++-
 5 files changed, 166 insertions(+), 252 deletions(-)
 create mode 100644 net/netfilter/nf_nat_ovs.c

diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h
index e9eb01e99d2f..9877f064548a 100644
--- a/include/net/netfilter/nf_nat.h
+++ b/include/net/netfilter/nf_nat.h
@@ -104,6 +104,10 @@ unsigned int
 nf_nat_inet_fn(void *priv, struct sk_buff *skb,
   const struct nf_hook_state *state);
 
+int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
+ enum ip_conntrack_info ctinfo, int *action,
+ const struct nf_nat_range2 *range, bool commit);
+
 static inline int nf_nat_initialized(const struct nf_conn *ct,
 enum nf_nat_manip_type manip)
 {
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 1d4db1943936..0976d34b1e5f 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -54,6 +54,12 @@ obj-$(CONFIG_NF_CONNTRACK_TFTP) += nf_conntrack_tftp.o
 
 nf_nat-y   := nf_nat_core.o nf_nat_proto.o nf_nat_helper.o
 
+ifdef CONFIG_OPENVSWITCH
+nf_nat-y += nf_nat_ovs.o
+else ifdef CONFIG_NET_ACT_CT
+nf_nat-y += nf_nat_ovs.o
+endif
+
 obj-$(CONFIG_NF_LOG_SYSLOG) += nf_log_syslog.o
 
 obj-$(CONFIG_NF_NAT) += nf_nat.o
diff --git a/net/netfilter/nf_nat_ovs.c b/net/netfilter/nf_nat_ovs.c
new file mode 100644
index ..551abd2da614
--- /dev/null
+++ b/net/netfilter/nf_nat_ovs.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Support nat functions for openvswitch and used by OVS and TC conntrack. */
+
+#include 
+
+/* Modelled after nf_nat_ipv[46]_fn().
+ * range is only used for new, uninitialized NAT state.
+ * Returns either NF_ACCEPT or NF_DROP.
+ */
+static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
+enum ip_conntrack_info ctinfo, int *action,
+const struct nf_nat_range2 *range,
+enum nf_nat_manip_type maniptype)
+{
+   __be16 proto = skb_protocol(skb, true);
+   int hooknum, err = NF_ACCEPT;
+
+   /* See HOOK2MANIP(). */
+   if (maniptype == NF_NAT_MANIP_SRC)
+   hooknum = NF_INET_LOCAL_IN; /* Source NAT */
+   else
+   hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
+
+   switch (ctinfo) {
+   case IP_CT_RELATED:
+   case IP_CT_RELATED_REPLY:
+   if (proto == htons(ETH_P_IP) &&
+   ip_hdr(skb)->protocol == IPPROTO_ICMP) {
+   if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
+  hooknum))
+   err = NF_DROP;
+   goto out;
+   } else if (IS_ENABLED(CONFIG_IPV6) && proto == 
htons(ETH_P_IPV6)) {
+   __be16 frag_off;
+   u8 nexthdr = ipv6_hdr(skb)->nexthdr;
+   int hdrlen = ipv6_skip_exthdr(skb,
+ sizeof(struct ipv6hdr),
+ , _off);
+
+   if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
+   if (!nf_nat_icmpv6_reply_translation(skb, ct,
+ctinfo,
+hooknum,
+hdrlen))
+   err = NF_DROP;
+   goto out;
+   }
+   }
+   /* Non-ICMP, fall thru to initialize if needed. */
+   fallthrough;
+   case IP_CT_NEW:
+   /* Seen it before?  This can happen for loopback, retrans,
+* or local packets.
+*/
+   i

[ovs-dev] [PATCHv3 net-next 2/5] openvswitch: return NF_ACCEPT when OVS_CT_NAT is not set in info nat

2022-12-06 Thread Xin Long
Either OVS_CT_SRC_NAT or OVS_CT_DST_NAT is set, OVS_CT_NAT must be
set in info->nat. Thus, if OVS_CT_NAT is not set in info->nat, it
will definitely not do NAT but returns NF_ACCEPT in ovs_ct_nat().

This patch changes nothing funcational but only makes this return
earlier in ovs_ct_nat() to keep consistent with TC's processing
in tcf_ct_act_nat().

Reviewed-by: Saeed Mahameed 
Acked-by: Aaron Conole 
Signed-off-by: Xin Long 
---
 net/openvswitch/conntrack.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index dff093a10d6d..5ea74270da46 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -816,6 +816,9 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key 
*key,
enum nf_nat_manip_type maniptype;
int err;
 
+   if (!(info->nat & OVS_CT_NAT))
+   return NF_ACCEPT;
+
/* Add NAT extension if not confirmed yet. */
if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
return NF_ACCEPT;   /* Can't NAT. */
@@ -825,8 +828,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key 
*key,
 * Make sure new expected connections (IP_CT_RELATED) are NATted only
 * when committing.
 */
-   if (info->nat & OVS_CT_NAT && ctinfo != IP_CT_NEW &&
-   ct->status & IPS_NAT_MASK &&
+   if (ctinfo != IP_CT_NEW && ct->status & IPS_NAT_MASK &&
(ctinfo != IP_CT_RELATED || info->commit)) {
/* NAT an established or related connection like before. */
if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv3 net-next 1/5] openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute

2022-12-06 Thread Xin Long
The calls to ovs_ct_nat_execute() are as below:

  ovs_ct_execute()
ovs_ct_lookup()
  __ovs_ct_lookup()
ovs_ct_nat()
  ovs_ct_nat_execute()
ovs_ct_commit()
  __ovs_ct_lookup()
ovs_ct_nat()
  ovs_ct_nat_execute()

and since skb_pull_rcsum() and skb_push_rcsum() are already
called in ovs_ct_execute(), there's no need to do it again
in ovs_ct_nat_execute().

Reviewed-by: Saeed Mahameed 
Acked-by: Aaron Conole 
Signed-off-by: Xin Long 
---
 net/openvswitch/conntrack.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index d78f0fc4337d..dff093a10d6d 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -735,10 +735,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
  const struct nf_nat_range2 *range,
  enum nf_nat_manip_type maniptype, struct 
sw_flow_key *key)
 {
-   int hooknum, nh_off, err = NF_ACCEPT;
-
-   nh_off = skb_network_offset(skb);
-   skb_pull_rcsum(skb, nh_off);
+   int hooknum, err = NF_ACCEPT;
 
/* See HOOK2MANIP(). */
if (maniptype == NF_NAT_MANIP_SRC)
@@ -755,7 +752,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
   hooknum))
err = NF_DROP;
-   goto push;
+   goto out;
} else if (IS_ENABLED(CONFIG_IPV6) &&
   skb->protocol == htons(ETH_P_IPV6)) {
__be16 frag_off;
@@ -770,7 +767,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
 hooknum,
 hdrlen))
err = NF_DROP;
-   goto push;
+   goto out;
}
}
/* Non-ICMP, fall thru to initialize if needed. */
@@ -788,7 +785,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
? nf_nat_setup_info(ct, range, maniptype)
: nf_nat_alloc_null_binding(ct, hooknum);
if (err != NF_ACCEPT)
-   goto push;
+   goto out;
}
break;
 
@@ -798,13 +795,11 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
 
default:
err = NF_DROP;
-   goto push;
+   goto out;
}
 
err = nf_nat_packet(ct, ctinfo, hooknum, skb);
-push:
-   skb_push_rcsum(skb, nh_off);
-
+out:
/* Update the flow key if NAT successful. */
if (err == NF_ACCEPT)
ovs_nat_update_key(key, skb, maniptype);
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv3 net-next 3/5] openvswitch: return NF_DROP when fails to add nat ext in ovs_ct_nat

2022-12-06 Thread Xin Long
When it fails to allocate nat ext, the packet should be dropped, like
the memory allocation failures in other places in ovs_ct_nat().

This patch changes to return NF_DROP when fails to add nat ext before
doing NAT in ovs_ct_nat(), also it would keep consistent with tc
action ct' processing in tcf_ct_act_nat().

Signed-off-by: Xin Long 
---
 net/openvswitch/conntrack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 5ea74270da46..58c9f0edc3c4 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -821,7 +821,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key 
*key,
 
/* Add NAT extension if not confirmed yet. */
if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
-   return NF_ACCEPT;   /* Can't NAT. */
+   return NF_DROP;   /* Can't NAT. */
 
/* Determine NAT type.
 * Check if the NAT type can be deduced from the tracked connection.
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv3 net-next 0/5] net: eliminate the duplicate code in the ct nat functions of ovs and tc

2022-12-06 Thread Xin Long
The changes in the patchset:

  "net: add helper support in tc act_ct for ovs offloading"

had moved some common ct code used by both OVS and TC into netfilter.

There are still some big functions pretty similar defined and used in
each of OVS and TC. It is not good to maintain such big function in 2
places. This patchset is to extract the functions for NAT processing
from OVS and TC to netfilter.

To make this change clear and safe, this patchset gets the common code
out of OVS and TC step by step: The patch 1-4 make some minor changes
in OVS and TC to make the NAT code of them completely the same, then
the patch 5 moves the common code to the netfilter and exports one
function called by each of OVS and TC.

v1->v2:
  - Create nf_nat_ovs.c to include the nat functions, as Pablo suggested.
v2->v3:
  - Fix a typo in subject of patch 2/5, as Marcelo noticed.
  - Fix in openvswitch to keep OVS ct nat and TC ct nat consistent in
patch 3/5 instead of in tc, as Marcelo noticed.
  - Use BIT(var) macro instead of (1 << var) in patch 5/5, as Marcelo
suggested.
  - Use ifdef in netfilter/Makefile to build nf_nat_ovs only when OVS
or TC ct action is enabled in patch 5/5, as Marcelo suggested.

Xin Long (5):
  openvswitch: delete the unncessary skb_pull_rcsum call in
ovs_ct_nat_execute
  openvswitch: return NF_ACCEPT when OVS_CT_NAT is not set in info nat
  openvswitch: return NF_DROP when fails to add nat ext in ovs_ct_nat
  net: sched: update the nat flag for icmp error packets in
ct_nat_execute
  net: move the nat function to nf_nat_ovs for ovs and tc

 include/net/netfilter/nf_nat.h |   4 +
 net/netfilter/Makefile |   6 ++
 net/netfilter/nf_nat_ovs.c | 135 ++
 net/openvswitch/conntrack.c| 146 +++--
 net/sched/act_ct.c | 136 +++---
 5 files changed, 169 insertions(+), 258 deletions(-)
 create mode 100644 net/netfilter/nf_nat_ovs.c

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv2 net-next 3/5] net: sched: return NF_ACCEPT when fails to add nat ext in tcf_ct_act_nat

2022-12-01 Thread Xin Long
On Wed, Nov 23, 2022 at 9:24 AM Marcelo Ricardo Leitner
 wrote:
>
> On Tue, Nov 22, 2022 at 12:32:19PM -0500, Xin Long wrote:
> > This patch changes to return NF_ACCEPT when fails to add nat
> > ext before doing NAT in tcf_ct_act_nat(), to keep consistent
> > with OVS' processing in ovs_ct_nat().
> >
> > Reviewed-by: Saeed Mahameed 
> > Signed-off-by: Xin Long 
> > ---
> >  net/sched/act_ct.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
> > index da0b7f665277..8869b3ef6642 100644
> > --- a/net/sched/act_ct.c
> > +++ b/net/sched/act_ct.c
> > @@ -994,7 +994,7 @@ static int tcf_ct_act_nat(struct sk_buff *skb,
> >
> >   /* Add NAT extension if not confirmed yet. */
> >   if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
> > - return NF_DROP;   /* Can't NAT. */
> > + return NF_ACCEPT;   /* Can't NAT. */
>
> I'm wondering if the fix should actually be in OVS, to make it drop
> the packet? Aaron, Eelco?
>
> If the user asked for NAT, and it can't NAT, it doesn't seem right to
> forward the packet while not performing the asked action.
>
> If we follow the code here, it may even commit the entry without the
> NAT extension, rendering the connection useless/broken per the first
> if condition above. It just won't try again.
nf_ct_nat_ext_add() returning NULL is caused by krealloc() failure
like an -ENOMEM error, and a similar thing could happen in
nfct_seqadj_ext_add() called by ovs_ct_nat() -> nf_nat_setup_info()
when doing NAT where it returns DROP. So I think it's right that
we should fix this in openvswitch instead of TC.

Anyway, in ovs_ct_nat():

if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
return NF_ACCEPT;   /* Can't NAT. */

git blame shows this was added at the beginning by:

05752523e565 ("openvswitch: Interface with NAT.")

So add Jarno Rajahalme to Cc: list.

Thanks.

>
> >
> >   if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) &&
> >   (ctinfo != IP_CT_RELATED || commit)) {
> > --
> > 2.31.1
> >
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv2 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc

2022-12-01 Thread Xin Long
On Wed, Nov 23, 2022 at 1:52 PM Marcelo Ricardo Leitner
 wrote:
>
> On Tue, Nov 22, 2022 at 12:32:21PM -0500, Xin Long wrote:
> > --- a/net/netfilter/Makefile
> > +++ b/net/netfilter/Makefile
> > @@ -52,7 +52,7 @@ obj-$(CONFIG_NF_CONNTRACK_SANE) += nf_conntrack_sane.o
> >  obj-$(CONFIG_NF_CONNTRACK_SIP) += nf_conntrack_sip.o
> >  obj-$(CONFIG_NF_CONNTRACK_TFTP) += nf_conntrack_tftp.o
> >
> > -nf_nat-y := nf_nat_core.o nf_nat_proto.o nf_nat_helper.o
> > +nf_nat-y := nf_nat_core.o nf_nat_proto.o nf_nat_helper.o nf_nat_ovs.o
>
> Considering that the code in nf_nat_ovs is only used if ovs or act_ct
> are enabled, shouldn't it be using an invisible knob here that gets
> automatically selected by them? Pablo?
It's good to not build it if no place is using it.
Making nf_nat_ovs a module might be too much.
If it's okay to Netfilter devel, I will change to use "ifdef" in Makefile:

+ifdef CONFIG_OPENVSWITCH
+nf_nat-y += nf_nat_ovs.o
+else ifdef CONFIG_NET_ACT_CT
+nf_nat-y += nf_nat_ovs.o
+endif
+

Thanks.

>
> I think this is my last comment on this series. The rest LGTM.
>
> >
> >  obj-$(CONFIG_NF_LOG_SYSLOG) += nf_log_syslog.o
> >
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv2 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc

2022-11-23 Thread Xin Long
On Wed, Nov 23, 2022 at 4:21 PM Marcelo Ricardo Leitner
 wrote:
>
> On Wed, Nov 23, 2022 at 02:55:05PM -0500, Xin Long wrote:
> > On Wed, Nov 23, 2022 at 2:17 PM Marcelo Ricardo Leitner
> >  wrote:
> > >
> > > On Wed, Nov 23, 2022 at 01:54:41PM -0500, Xin Long wrote:
> > > > On Wed, Nov 23, 2022 at 1:48 PM Marcelo Ricardo Leitner
> > > >  wrote:
> > > > >
> > > > > On Wed, Nov 23, 2022 at 12:31:38PM -0500, Xin Long wrote:
> > > > > > On Wed, Nov 23, 2022 at 10:13 AM Marcelo Ricardo Leitner
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Wed, Nov 23, 2022 at 12:09:55PM -0300, Marcelo Ricardo Leitner 
> > > > > > > wrote:
> > > > > > > > On Tue, Nov 22, 2022 at 12:32:21PM -0500, Xin Long wrote:
> > > > > > > > > +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
> > > > > > > > > + enum ip_conntrack_info ctinfo, int *action,
> > > > > > > > > + const struct nf_nat_range2 *range, bool commit)
> > > > > > > > > +{
> > > > > > > > > +   enum nf_nat_manip_type maniptype;
> > > > > > > > > +   int err, ct_action = *action;
> > > > > > > > > +
> > > > > > > > > +   *action = 0;
> > > > > > > > > +
> > > > > > > > > +   /* Add NAT extension if not confirmed yet. */
> > > > > > > > > +   if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
> > > > > > > > > +   return NF_ACCEPT;   /* Can't NAT. */
> > > > > > > > > +
> > > > > > > > > +   if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) &&
> > > > > > > > > +   (ctinfo != IP_CT_RELATED || commit)) {
> > > > > > > > > +   /* NAT an established or related connection like 
> > > > > > > > > before. */
> > > > > > > > > +   if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
> > > > > > > > > +   /* This is the REPLY direction for a 
> > > > > > > > > connection
> > > > > > > > > +* for which NAT was applied in the 
> > > > > > > > > forward
> > > > > > > > > +* direction.  Do the reverse NAT.
> > > > > > > > > +*/
> > > > > > > > > +   maniptype = ct->status & IPS_SRC_NAT
> > > > > > > > > +   ? NF_NAT_MANIP_DST : 
> > > > > > > > > NF_NAT_MANIP_SRC;
> > > > > > > > > +   else
> > > > > > > > > +   maniptype = ct->status & IPS_SRC_NAT
> > > > > > > > > +   ? NF_NAT_MANIP_SRC : 
> > > > > > > > > NF_NAT_MANIP_DST;
> > > > > > > > > +   } else if (ct_action & (1 << NF_NAT_MANIP_SRC)) {
> > > > > > > > > +   maniptype = NF_NAT_MANIP_SRC;
> > > > > > > > > +   } else if (ct_action & (1 << NF_NAT_MANIP_DST)) {
> > > > > > > > > +   maniptype = NF_NAT_MANIP_DST;
> > > > > > > > > +   } else {
> > > > > > > > > +   return NF_ACCEPT;
> > > > > > > > > +   }
> > > > > > > > > +
> > > > > > > > > +   err = nf_ct_nat_execute(skb, ct, ctinfo, action, range, 
> > > > > > > > > maniptype);
> > > > > > > > > +   if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
> > > > > > > > > +   if (ct->status & IPS_SRC_NAT) {
> > > > > > > > > +   if (maniptype == NF_NAT_MANIP_SRC)
> > > > > > > > > +   maniptype = NF_NAT_MANIP_DST;
> > > > > > > > > +   else
> > > > > > > > > +   maniptype = NF_NAT_MANIP_SRC;
> > > > > > > > > +
> > > > > > > > > +   err = nf_ct_n

Re: [ovs-dev] [PATCHv2 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc

2022-11-23 Thread Xin Long
On Wed, Nov 23, 2022 at 2:17 PM Marcelo Ricardo Leitner
 wrote:
>
> On Wed, Nov 23, 2022 at 01:54:41PM -0500, Xin Long wrote:
> > On Wed, Nov 23, 2022 at 1:48 PM Marcelo Ricardo Leitner
> >  wrote:
> > >
> > > On Wed, Nov 23, 2022 at 12:31:38PM -0500, Xin Long wrote:
> > > > On Wed, Nov 23, 2022 at 10:13 AM Marcelo Ricardo Leitner
> > > >  wrote:
> > > > >
> > > > > On Wed, Nov 23, 2022 at 12:09:55PM -0300, Marcelo Ricardo Leitner 
> > > > > wrote:
> > > > > > On Tue, Nov 22, 2022 at 12:32:21PM -0500, Xin Long wrote:
> > > > > > > +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
> > > > > > > + enum ip_conntrack_info ctinfo, int *action,
> > > > > > > + const struct nf_nat_range2 *range, bool commit)
> > > > > > > +{
> > > > > > > +   enum nf_nat_manip_type maniptype;
> > > > > > > +   int err, ct_action = *action;
> > > > > > > +
> > > > > > > +   *action = 0;
> > > > > > > +
> > > > > > > +   /* Add NAT extension if not confirmed yet. */
> > > > > > > +   if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
> > > > > > > +   return NF_ACCEPT;   /* Can't NAT. */
> > > > > > > +
> > > > > > > +   if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) &&
> > > > > > > +   (ctinfo != IP_CT_RELATED || commit)) {
> > > > > > > +   /* NAT an established or related connection like 
> > > > > > > before. */
> > > > > > > +   if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
> > > > > > > +   /* This is the REPLY direction for a 
> > > > > > > connection
> > > > > > > +* for which NAT was applied in the forward
> > > > > > > +* direction.  Do the reverse NAT.
> > > > > > > +*/
> > > > > > > +   maniptype = ct->status & IPS_SRC_NAT
> > > > > > > +   ? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC;
> > > > > > > +   else
> > > > > > > +   maniptype = ct->status & IPS_SRC_NAT
> > > > > > > +   ? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST;
> > > > > > > +   } else if (ct_action & (1 << NF_NAT_MANIP_SRC)) {
> > > > > > > +   maniptype = NF_NAT_MANIP_SRC;
> > > > > > > +   } else if (ct_action & (1 << NF_NAT_MANIP_DST)) {
> > > > > > > +   maniptype = NF_NAT_MANIP_DST;
> > > > > > > +   } else {
> > > > > > > +   return NF_ACCEPT;
> > > > > > > +   }
> > > > > > > +
> > > > > > > +   err = nf_ct_nat_execute(skb, ct, ctinfo, action, range, 
> > > > > > > maniptype);
> > > > > > > +   if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
> > > > > > > +   if (ct->status & IPS_SRC_NAT) {
> > > > > > > +   if (maniptype == NF_NAT_MANIP_SRC)
> > > > > > > +   maniptype = NF_NAT_MANIP_DST;
> > > > > > > +   else
> > > > > > > +   maniptype = NF_NAT_MANIP_SRC;
> > > > > > > +
> > > > > > > +   err = nf_ct_nat_execute(skb, ct, ctinfo, 
> > > > > > > action, range,
> > > > > > > +   maniptype);
> > > > > > > +   } else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
> > > > > > > +   err = nf_ct_nat_execute(skb, ct, ctinfo, 
> > > > > > > action, NULL,
> > > > > > > +   NF_NAT_MANIP_SRC);
> > > > > > > +   }
> > > > > > > +   }
> > > > > > > +   return err;
> > > > > > > +}
> > > > > > > +EXPORT_SYMBOL_GPL(nf_ct_nat);
> > > > > > > diff --git a/net/openvswitch/conntrack.c 
> > > > 

Re: [ovs-dev] [PATCHv2 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc

2022-11-23 Thread Xin Long
On Wed, Nov 23, 2022 at 1:48 PM Marcelo Ricardo Leitner
 wrote:
>
> On Wed, Nov 23, 2022 at 12:31:38PM -0500, Xin Long wrote:
> > On Wed, Nov 23, 2022 at 10:13 AM Marcelo Ricardo Leitner
> >  wrote:
> > >
> > > On Wed, Nov 23, 2022 at 12:09:55PM -0300, Marcelo Ricardo Leitner wrote:
> > > > On Tue, Nov 22, 2022 at 12:32:21PM -0500, Xin Long wrote:
> > > > > +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
> > > > > + enum ip_conntrack_info ctinfo, int *action,
> > > > > + const struct nf_nat_range2 *range, bool commit)
> > > > > +{
> > > > > +   enum nf_nat_manip_type maniptype;
> > > > > +   int err, ct_action = *action;
> > > > > +
> > > > > +   *action = 0;
> > > > > +
> > > > > +   /* Add NAT extension if not confirmed yet. */
> > > > > +   if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
> > > > > +   return NF_ACCEPT;   /* Can't NAT. */
> > > > > +
> > > > > +   if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) &&
> > > > > +   (ctinfo != IP_CT_RELATED || commit)) {
> > > > > +   /* NAT an established or related connection like before. 
> > > > > */
> > > > > +   if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
> > > > > +   /* This is the REPLY direction for a connection
> > > > > +* for which NAT was applied in the forward
> > > > > +* direction.  Do the reverse NAT.
> > > > > +*/
> > > > > +   maniptype = ct->status & IPS_SRC_NAT
> > > > > +   ? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC;
> > > > > +   else
> > > > > +   maniptype = ct->status & IPS_SRC_NAT
> > > > > +   ? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST;
> > > > > +   } else if (ct_action & (1 << NF_NAT_MANIP_SRC)) {
> > > > > +   maniptype = NF_NAT_MANIP_SRC;
> > > > > +   } else if (ct_action & (1 << NF_NAT_MANIP_DST)) {
> > > > > +   maniptype = NF_NAT_MANIP_DST;
> > > > > +   } else {
> > > > > +   return NF_ACCEPT;
> > > > > +   }
> > > > > +
> > > > > +   err = nf_ct_nat_execute(skb, ct, ctinfo, action, range, 
> > > > > maniptype);
> > > > > +   if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
> > > > > +   if (ct->status & IPS_SRC_NAT) {
> > > > > +   if (maniptype == NF_NAT_MANIP_SRC)
> > > > > +   maniptype = NF_NAT_MANIP_DST;
> > > > > +   else
> > > > > +   maniptype = NF_NAT_MANIP_SRC;
> > > > > +
> > > > > +   err = nf_ct_nat_execute(skb, ct, ctinfo, action, 
> > > > > range,
> > > > > +   maniptype);
> > > > > +   } else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
> > > > > +   err = nf_ct_nat_execute(skb, ct, ctinfo, action, 
> > > > > NULL,
> > > > > +   NF_NAT_MANIP_SRC);
> > > > > +   }
> > > > > +   }
> > > > > +   return err;
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(nf_ct_nat);
> > > > > diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> > > > > index cc643a556ea1..d03c75165663 100644
> > > > > --- a/net/openvswitch/conntrack.c
> > > > > +++ b/net/openvswitch/conntrack.c
> > > > > @@ -726,144 +726,27 @@ static void ovs_nat_update_key(struct 
> > > > > sw_flow_key *key,
> > > > > }
> > > > >  }
> > > > >
> > > > > -/* Modelled after nf_nat_ipv[46]_fn().
> > > > > - * range is only used for new, uninitialized NAT state.
> > > > > - * Returns either NF_ACCEPT or NF_DROP.
> > > > > - */
> > > > > -static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn 
> > > > > *ct,
> > > > > - enum ip_conntrack_info ctinfo,
> &g

Re: [ovs-dev] [PATCHv2 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc

2022-11-23 Thread Xin Long
On Wed, Nov 23, 2022 at 10:13 AM Marcelo Ricardo Leitner
 wrote:
>
> On Wed, Nov 23, 2022 at 12:09:55PM -0300, Marcelo Ricardo Leitner wrote:
> > On Tue, Nov 22, 2022 at 12:32:21PM -0500, Xin Long wrote:
> > > +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
> > > + enum ip_conntrack_info ctinfo, int *action,
> > > + const struct nf_nat_range2 *range, bool commit)
> > > +{
> > > +   enum nf_nat_manip_type maniptype;
> > > +   int err, ct_action = *action;
> > > +
> > > +   *action = 0;
> > > +
> > > +   /* Add NAT extension if not confirmed yet. */
> > > +   if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
> > > +   return NF_ACCEPT;   /* Can't NAT. */
> > > +
> > > +   if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) &&
> > > +   (ctinfo != IP_CT_RELATED || commit)) {
> > > +   /* NAT an established or related connection like before. */
> > > +   if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
> > > +   /* This is the REPLY direction for a connection
> > > +* for which NAT was applied in the forward
> > > +* direction.  Do the reverse NAT.
> > > +*/
> > > +   maniptype = ct->status & IPS_SRC_NAT
> > > +   ? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC;
> > > +   else
> > > +   maniptype = ct->status & IPS_SRC_NAT
> > > +   ? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST;
> > > +   } else if (ct_action & (1 << NF_NAT_MANIP_SRC)) {
> > > +   maniptype = NF_NAT_MANIP_SRC;
> > > +   } else if (ct_action & (1 << NF_NAT_MANIP_DST)) {
> > > +   maniptype = NF_NAT_MANIP_DST;
> > > +   } else {
> > > +   return NF_ACCEPT;
> > > +   }
> > > +
> > > +   err = nf_ct_nat_execute(skb, ct, ctinfo, action, range, maniptype);
> > > +   if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
> > > +   if (ct->status & IPS_SRC_NAT) {
> > > +   if (maniptype == NF_NAT_MANIP_SRC)
> > > +   maniptype = NF_NAT_MANIP_DST;
> > > +   else
> > > +   maniptype = NF_NAT_MANIP_SRC;
> > > +
> > > +   err = nf_ct_nat_execute(skb, ct, ctinfo, action, 
> > > range,
> > > +   maniptype);
> > > +   } else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
> > > +   err = nf_ct_nat_execute(skb, ct, ctinfo, action, NULL,
> > > +   NF_NAT_MANIP_SRC);
> > > +   }
> > > +   }
> > > +   return err;
> > > +}
> > > +EXPORT_SYMBOL_GPL(nf_ct_nat);
> > > diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> > > index cc643a556ea1..d03c75165663 100644
> > > --- a/net/openvswitch/conntrack.c
> > > +++ b/net/openvswitch/conntrack.c
> > > @@ -726,144 +726,27 @@ static void ovs_nat_update_key(struct sw_flow_key 
> > > *key,
> > > }
> > >  }
> > >
> > > -/* Modelled after nf_nat_ipv[46]_fn().
> > > - * range is only used for new, uninitialized NAT state.
> > > - * Returns either NF_ACCEPT or NF_DROP.
> > > - */
> > > -static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
> > > - enum ip_conntrack_info ctinfo,
> > > - const struct nf_nat_range2 *range,
> > > - enum nf_nat_manip_type maniptype, struct 
> > > sw_flow_key *key)
> > > -{
> > > -   int hooknum, err = NF_ACCEPT;
> > > -
> > > -   /* See HOOK2MANIP(). */
> > > -   if (maniptype == NF_NAT_MANIP_SRC)
> > > -   hooknum = NF_INET_LOCAL_IN; /* Source NAT */
> > > -   else
> > > -   hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
> > > -
> > > -   switch (ctinfo) {
> > > -   case IP_CT_RELATED:
> > > -   case IP_CT_RELATED_REPLY:
> > > -   if (IS_ENABLED(CONFIG_NF_NAT) &&
> > > -   skb->protocol == htons(ETH_P_IP) &&
> > > -   ip_hdr(skb)->protocol == IPPROTO_ICMP) {
> > >

[ovs-dev] [PATCHv2 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc

2022-11-22 Thread Xin Long
There are two nat functions are nearly the same in both OVS and
TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat().

This patch creates nf_nat_ovs.c under netfilter and moves them
there then exports nf_ct_nat() so that it can be shared by both
OVS and TC, and keeps the nat (type) check and nat flag update
in OVS and TC's own place, as these parts are different between
OVS and TC.

Note that in OVS nat function it was using skb->protocol to get
the proto as it already skips vlans in key_extract(), while it
doesn't in TC, and TC has to call skb_protocol() to get proto.
So in nf_ct_nat_execute(), we keep using skb_protocol() which
works for both OVS and TC contrack.

Reviewed-by: Saeed Mahameed 
Signed-off-by: Xin Long 
---
 include/net/netfilter/nf_nat.h |   4 +
 net/netfilter/Makefile |   2 +-
 net/netfilter/nf_nat_ovs.c | 135 
 net/openvswitch/conntrack.c| 137 +++--
 net/sched/act_ct.c | 136 +++-
 5 files changed, 161 insertions(+), 253 deletions(-)
 create mode 100644 net/netfilter/nf_nat_ovs.c

diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h
index e9eb01e99d2f..9877f064548a 100644
--- a/include/net/netfilter/nf_nat.h
+++ b/include/net/netfilter/nf_nat.h
@@ -104,6 +104,10 @@ unsigned int
 nf_nat_inet_fn(void *priv, struct sk_buff *skb,
   const struct nf_hook_state *state);
 
+int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
+ enum ip_conntrack_info ctinfo, int *action,
+ const struct nf_nat_range2 *range, bool commit);
+
 static inline int nf_nat_initialized(const struct nf_conn *ct,
 enum nf_nat_manip_type manip)
 {
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 1d4db1943936..4fa50d2842ec 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -52,7 +52,7 @@ obj-$(CONFIG_NF_CONNTRACK_SANE) += nf_conntrack_sane.o
 obj-$(CONFIG_NF_CONNTRACK_SIP) += nf_conntrack_sip.o
 obj-$(CONFIG_NF_CONNTRACK_TFTP) += nf_conntrack_tftp.o
 
-nf_nat-y   := nf_nat_core.o nf_nat_proto.o nf_nat_helper.o
+nf_nat-y   := nf_nat_core.o nf_nat_proto.o nf_nat_helper.o nf_nat_ovs.o
 
 obj-$(CONFIG_NF_LOG_SYSLOG) += nf_log_syslog.o
 
diff --git a/net/netfilter/nf_nat_ovs.c b/net/netfilter/nf_nat_ovs.c
new file mode 100644
index ..daff80e7a43a
--- /dev/null
+++ b/net/netfilter/nf_nat_ovs.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Support nat functions for openvswitch and used by OVS and TC conntrack. */
+
+#include 
+
+/* Modelled after nf_nat_ipv[46]_fn().
+ * range is only used for new, uninitialized NAT state.
+ * Returns either NF_ACCEPT or NF_DROP.
+ */
+static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
+enum ip_conntrack_info ctinfo, int *action,
+const struct nf_nat_range2 *range,
+enum nf_nat_manip_type maniptype)
+{
+   __be16 proto = skb_protocol(skb, true);
+   int hooknum, err = NF_ACCEPT;
+
+   /* See HOOK2MANIP(). */
+   if (maniptype == NF_NAT_MANIP_SRC)
+   hooknum = NF_INET_LOCAL_IN; /* Source NAT */
+   else
+   hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
+
+   switch (ctinfo) {
+   case IP_CT_RELATED:
+   case IP_CT_RELATED_REPLY:
+   if (proto == htons(ETH_P_IP) &&
+   ip_hdr(skb)->protocol == IPPROTO_ICMP) {
+   if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
+  hooknum))
+   err = NF_DROP;
+   goto out;
+   } else if (IS_ENABLED(CONFIG_IPV6) && proto == 
htons(ETH_P_IPV6)) {
+   __be16 frag_off;
+   u8 nexthdr = ipv6_hdr(skb)->nexthdr;
+   int hdrlen = ipv6_skip_exthdr(skb,
+ sizeof(struct ipv6hdr),
+ , _off);
+
+   if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
+   if (!nf_nat_icmpv6_reply_translation(skb, ct,
+ctinfo,
+hooknum,
+hdrlen))
+   err = NF_DROP;
+   goto out;
+   }
+   }
+   /* Non-ICMP, fall thru to initialize if needed. */
+   fallthrough;
+   case IP_CT_NEW:
+   /* Seen it before?  This can happen for loopback, retrans,
+* or local packets.
+*/
+ 

[ovs-dev] [PATCHv2 net-next 4/5] net: sched: update the nat flag for icmp error packets in ct_nat_execute

2022-11-22 Thread Xin Long
In ovs_ct_nat_execute(), the packet flow key nat flags are updated
when it processes ICMP(v6) error packets translation successfully.

In ct_nat_execute() when processing ICMP(v6) error packets translation
successfully, it should have done the same in ct_nat_execute() to set
post_ct_s/dnat flag, which will be used to update flow key nat flags
in OVS module later.

Reviewed-by: Saeed Mahameed 
Signed-off-by: Xin Long 
---
 net/sched/act_ct.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 8869b3ef6642..c7782c9a6ab6 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -936,13 +936,13 @@ static int ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
}
 
err = nf_nat_packet(ct, ctinfo, hooknum, skb);
+out:
if (err == NF_ACCEPT) {
if (maniptype == NF_NAT_MANIP_SRC)
tc_skb_cb(skb)->post_ct_snat = 1;
if (maniptype == NF_NAT_MANIP_DST)
tc_skb_cb(skb)->post_ct_dnat = 1;
}
-out:
return err;
 }
 #endif /* CONFIG_NF_NAT */
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv2 net-next 3/5] net: sched: return NF_ACCEPT when fails to add nat ext in tcf_ct_act_nat

2022-11-22 Thread Xin Long
This patch changes to return NF_ACCEPT when fails to add nat
ext before doing NAT in tcf_ct_act_nat(), to keep consistent
with OVS' processing in ovs_ct_nat().

Reviewed-by: Saeed Mahameed 
Signed-off-by: Xin Long 
---
 net/sched/act_ct.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index da0b7f665277..8869b3ef6642 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -994,7 +994,7 @@ static int tcf_ct_act_nat(struct sk_buff *skb,
 
/* Add NAT extension if not confirmed yet. */
if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
-   return NF_DROP;   /* Can't NAT. */
+   return NF_ACCEPT;   /* Can't NAT. */
 
if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) &&
(ctinfo != IP_CT_RELATED || commit)) {
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv2 net-next 2/5] openvswitch: return NF_ACCEPT when OVS_CT_NAT is net set in info nat

2022-11-22 Thread Xin Long
Either OVS_CT_SRC_NAT or OVS_CT_DST_NAT is set, OVS_CT_NAT must be
set in info->nat. Thus, if OVS_CT_NAT is not set in info->nat, it
will definitely not do NAT but returns NF_ACCEPT in ovs_ct_nat().

This patch changes nothing funcational but only makes this return
earlier in ovs_ct_nat() to keep consistent with TC's processing
in tcf_ct_act_nat().

Reviewed-by: Saeed Mahameed 
Acked-by: Aaron Conole 
Signed-off-by: Xin Long 
---
 net/openvswitch/conntrack.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 4c5e5a6475af..cc643a556ea1 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -816,6 +816,9 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key 
*key,
enum nf_nat_manip_type maniptype;
int err;
 
+   if (!(info->nat & OVS_CT_NAT))
+   return NF_ACCEPT;
+
/* Add NAT extension if not confirmed yet. */
if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
return NF_ACCEPT;   /* Can't NAT. */
@@ -825,8 +828,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key 
*key,
 * Make sure new expected connections (IP_CT_RELATED) are NATted only
 * when committing.
 */
-   if (info->nat & OVS_CT_NAT && ctinfo != IP_CT_NEW &&
-   ct->status & IPS_NAT_MASK &&
+   if (ctinfo != IP_CT_NEW && ct->status & IPS_NAT_MASK &&
(ctinfo != IP_CT_RELATED || info->commit)) {
/* NAT an established or related connection like before. */
if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv2 net-next 1/5] openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute

2022-11-22 Thread Xin Long
The calls to ovs_ct_nat_execute() are as below:

  ovs_ct_execute()
ovs_ct_lookup()
  __ovs_ct_lookup()
ovs_ct_nat()
  ovs_ct_nat_execute()
ovs_ct_commit()
  __ovs_ct_lookup()
ovs_ct_nat()
  ovs_ct_nat_execute()

and since skb_pull_rcsum() and skb_push_rcsum() are already
called in ovs_ct_execute(), there's no need to do it again
in ovs_ct_nat_execute().

Reviewed-by: Saeed Mahameed 
Acked-by: Aaron Conole 
Signed-off-by: Xin Long 
---
 net/openvswitch/conntrack.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 4348321856af..4c5e5a6475af 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -735,10 +735,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
  const struct nf_nat_range2 *range,
  enum nf_nat_manip_type maniptype, struct 
sw_flow_key *key)
 {
-   int hooknum, nh_off, err = NF_ACCEPT;
-
-   nh_off = skb_network_offset(skb);
-   skb_pull_rcsum(skb, nh_off);
+   int hooknum, err = NF_ACCEPT;
 
/* See HOOK2MANIP(). */
if (maniptype == NF_NAT_MANIP_SRC)
@@ -755,7 +752,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
   hooknum))
err = NF_DROP;
-   goto push;
+   goto out;
} else if (IS_ENABLED(CONFIG_IPV6) &&
   skb->protocol == htons(ETH_P_IPV6)) {
__be16 frag_off;
@@ -770,7 +767,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
 hooknum,
 hdrlen))
err = NF_DROP;
-   goto push;
+   goto out;
}
}
/* Non-ICMP, fall thru to initialize if needed. */
@@ -788,7 +785,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
? nf_nat_setup_info(ct, range, maniptype)
: nf_nat_alloc_null_binding(ct, hooknum);
if (err != NF_ACCEPT)
-   goto push;
+   goto out;
}
break;
 
@@ -798,13 +795,11 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
 
default:
err = NF_DROP;
-   goto push;
+   goto out;
}
 
err = nf_nat_packet(ct, ctinfo, hooknum, skb);
-push:
-   skb_push_rcsum(skb, nh_off);
-
+out:
/* Update the flow key if NAT successful. */
if (err == NF_ACCEPT)
ovs_nat_update_key(key, skb, maniptype);
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv2 net-next 0/5] net: eliminate the duplicate code in the ct nat functions of ovs and tc

2022-11-22 Thread Xin Long
The changes in the patchset:

  "net: add helper support in tc act_ct for ovs offloading"

had moved some common ct code used by both OVS and TC into netfilter.

There are still some big functions pretty similar defined and used in
each of OVS and TC. It is not good to maintain such big function in 2
places. This patchset is to extract the functions for NAT processing
from OVS and TC to netfilter.

To make this change clear and safe, this patchset gets the common code
out of OVS and TC step by step: The patch 1-4 make some minor changes
in OVS and TC to make the NAT code of them completely the same, then
the patch 5 moves the common code to the netfilter and exports one
function called by each of OVS and TC.

v1->v2:
  - Create nf_nat_ovs.c to include the nat functions, as Pablo suggested.

Xin Long (5):
  openvswitch: delete the unncessary skb_pull_rcsum call in
ovs_ct_nat_execute
  openvswitch: return NF_ACCEPT when OVS_CT_NAT is net set in info nat
  net: sched: return NF_ACCEPT when fails to add nat ext in
tcf_ct_act_nat
  net: sched: update the nat flag for icmp error packets in
ct_nat_execute
  net: move the nat function to nf_nat_ovs for ovs and tc

 include/net/netfilter/nf_nat.h |   4 +
 net/netfilter/Makefile |   2 +-
 net/netfilter/nf_nat_ovs.c | 135 ++
 net/openvswitch/conntrack.c| 146 +++--
 net/sched/act_ct.c | 136 +++---
 5 files changed, 164 insertions(+), 259 deletions(-)
 create mode 100644 net/netfilter/nf_nat_ovs.c

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc

2022-11-17 Thread Xin Long
On Thu, Nov 17, 2022 at 5:57 AM Pablo Neira Ayuso  wrote:
>
> On Wed, Nov 16, 2022 at 07:51:40PM -0500, Xin Long wrote:
> > On Wed, Nov 16, 2022 at 4:54 PM Pablo Neira Ayuso  
> > wrote:
> > > On Tue, Nov 15, 2022 at 10:50:57AM -0500, Xin Long wrote:
> [...]
> > > I'd suggest you move this code to nf_nat_ovs.c or such so we remember
> > > these symbols are used by act_ct.c and ovs.
> >
> > Good idea, do you think we should also create nf_conntrack_ovs.c
> > to have nf_ct_helper() and nf_ct_add_helper()?
> > which were added by:
> >
> > https://lore.kernel.org/netdev/20221101150031.a6rtrgzwfd7kzknn@t14s.localdomain/T/
>
> If it is used by ovs infra, I would suggest to move there too.
OK, I will create only "nf_conntrack_ovs.c" to have the nat functions
in this patch.
and post another patch to move nf_ct_helper() and nf_ct_add_helper() there, too.

I think one file "nf_conntrack_ovs.c" should be enough to include all
these functions
used by ovs conntrack infra.

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc

2022-11-16 Thread Xin Long
On Wed, Nov 16, 2022 at 4:54 PM Pablo Neira Ayuso  wrote:
>
> On Tue, Nov 15, 2022 at 10:50:57AM -0500, Xin Long wrote:
> > diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
> > index e29e4ccb5c5a..1c72b8caa24e 100644
> > --- a/net/netfilter/nf_nat_core.c
> > +++ b/net/netfilter/nf_nat_core.c
> > @@ -784,6 +784,137 @@ nf_nat_inet_fn(void *priv, struct sk_buff *skb,
> >  }
> >  EXPORT_SYMBOL_GPL(nf_nat_inet_fn);
> >
> > +/* Modelled after nf_nat_ipv[46]_fn().
> > + * range is only used for new, uninitialized NAT state.
> > + * Returns either NF_ACCEPT or NF_DROP.
> > + */
> > +static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
> > +  enum ip_conntrack_info ctinfo, int *action,
> > +  const struct nf_nat_range2 *range,
> > +  enum nf_nat_manip_type maniptype)
> > +{
> > + __be16 proto = skb_protocol(skb, true);
> > + int hooknum, err = NF_ACCEPT;
> > +
> > + /* See HOOK2MANIP(). */
> > + if (maniptype == NF_NAT_MANIP_SRC)
> > + hooknum = NF_INET_LOCAL_IN; /* Source NAT */
> > + else
> > + hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
> > +
> > + switch (ctinfo) {
> > + case IP_CT_RELATED:
> > + case IP_CT_RELATED_REPLY:
> > + if (proto == htons(ETH_P_IP) &&
> > + ip_hdr(skb)->protocol == IPPROTO_ICMP) {
> > + if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
> > +hooknum))
> > + err = NF_DROP;
> > + goto out;
> > + } else if (IS_ENABLED(CONFIG_IPV6) && proto == 
> > htons(ETH_P_IPV6)) {
> > + __be16 frag_off;
> > + u8 nexthdr = ipv6_hdr(skb)->nexthdr;
> > + int hdrlen = ipv6_skip_exthdr(skb,
> > +   sizeof(struct ipv6hdr),
> > +   , _off);
> > +
> > + if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
> > + if (!nf_nat_icmpv6_reply_translation(skb, ct,
> > +  ctinfo,
> > +  hooknum,
> > +  hdrlen))
> > + err = NF_DROP;
> > + goto out;
> > + }
> > + }
> > + /* Non-ICMP, fall thru to initialize if needed. */
> > + fallthrough;
> > + case IP_CT_NEW:
> > + /* Seen it before?  This can happen for loopback, retrans,
> > +  * or local packets.
> > +  */
> > + if (!nf_nat_initialized(ct, maniptype)) {
> > + /* Initialize according to the NAT action. */
> > + err = (range && range->flags & NF_NAT_RANGE_MAP_IPS)
> > + /* Action is set up to establish a new
> > +  * mapping.
> > +  */
> > + ? nf_nat_setup_info(ct, range, maniptype)
> > + : nf_nat_alloc_null_binding(ct, hooknum);
> > + if (err != NF_ACCEPT)
> > + goto out;
> > + }
> > + break;
> > +
> > + case IP_CT_ESTABLISHED:
> > + case IP_CT_ESTABLISHED_REPLY:
> > + break;
> > +
> > + default:
> > + err = NF_DROP;
> > + goto out;
> > + }
> > +
> > + err = nf_nat_packet(ct, ctinfo, hooknum, skb);
> > + if (err == NF_ACCEPT)
> > + *action |= (1 << maniptype);
> > +out:
> > + return err;
> > +}
> > +
> > +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
> > +   enum ip_conntrack_info ctinfo, int *action,
> > +   const struct nf_nat_range2 *range, bool commit)
> > +{
> > + enum nf_nat_manip_type maniptype;
> > + int err, ct_action = *action;
> > +
> > + *action = 0;
> > +
> > + /* Add NAT extension if not confirmed yet. */
> > + if (!nf_ct_is_confirmed(c

Re: [ovs-dev] [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc

2022-11-16 Thread Xin Long
On Wed, Nov 16, 2022 at 4:05 PM Aaron Conole  wrote:
>
> Hi Xin,
>
> Xin Long  writes:
>
> > There are two nat functions are nearly the same in both OVS and
> > TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat().
> >
> > This patch is to move them to netfilter nf_nat_core and export
> > nf_ct_nat() so that it can be shared by both OVS and TC, and
> > keep the nat (type) check and nat flag update in OVS and TC's
> > own place, as these parts are different between OVS and TC.
> >
> > Note that in OVS nat function it was using skb->protocol to get
> > the proto as it already skips vlans in key_extract(), while it
> > doesn't in TC, and TC has to call skb_protocol() to get proto.
> > So in nf_ct_nat_execute(), we keep using skb_protocol() which
> > works for both OVS and TC.
> >
> > Signed-off-by: Xin Long 
> > ---
> >  include/net/netfilter/nf_nat.h |   4 +
> >  net/netfilter/nf_nat_core.c| 131 +++
> >  net/openvswitch/conntrack.c| 137 +++--
> >  net/sched/act_ct.c | 136 +++-
> >  4 files changed, 156 insertions(+), 252 deletions(-)
> >
> > diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h
> > index e9eb01e99d2f..9877f064548a 100644
> > --- a/include/net/netfilter/nf_nat.h
> > +++ b/include/net/netfilter/nf_nat.h
> > @@ -104,6 +104,10 @@ unsigned int
> >  nf_nat_inet_fn(void *priv, struct sk_buff *skb,
> >  const struct nf_hook_state *state);
> >
> > +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
> > +   enum ip_conntrack_info ctinfo, int *action,
> > +   const struct nf_nat_range2 *range, bool commit);
> > +
> >  static inline int nf_nat_initialized(const struct nf_conn *ct,
> >enum nf_nat_manip_type manip)
> >  {
> > diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
> > index e29e4ccb5c5a..1c72b8caa24e 100644
> > --- a/net/netfilter/nf_nat_core.c
> > +++ b/net/netfilter/nf_nat_core.c
> > @@ -784,6 +784,137 @@ nf_nat_inet_fn(void *priv, struct sk_buff *skb,
> >  }
> >  EXPORT_SYMBOL_GPL(nf_nat_inet_fn);
> >
> > +/* Modelled after nf_nat_ipv[46]_fn().
> > + * range is only used for new, uninitialized NAT state.
> > + * Returns either NF_ACCEPT or NF_DROP.
> > + */
> > +static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
> > +  enum ip_conntrack_info ctinfo, int *action,
> > +  const struct nf_nat_range2 *range,
> > +  enum nf_nat_manip_type maniptype)
> > +{
> > + __be16 proto = skb_protocol(skb, true);
> > + int hooknum, err = NF_ACCEPT;
> > +
> > + /* See HOOK2MANIP(). */
> > + if (maniptype == NF_NAT_MANIP_SRC)
> > + hooknum = NF_INET_LOCAL_IN; /* Source NAT */
> > + else
> > + hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
> > +
> > + switch (ctinfo) {
> > + case IP_CT_RELATED:
> > + case IP_CT_RELATED_REPLY:
> > + if (proto == htons(ETH_P_IP) &&
> > + ip_hdr(skb)->protocol == IPPROTO_ICMP) {
> > + if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
> > +hooknum))
> > + err = NF_DROP;
> > + goto out;
> > + } else if (IS_ENABLED(CONFIG_IPV6) && proto == 
> > htons(ETH_P_IPV6)) {
> > + __be16 frag_off;
> > + u8 nexthdr = ipv6_hdr(skb)->nexthdr;
> > + int hdrlen = ipv6_skip_exthdr(skb,
> > +   sizeof(struct ipv6hdr),
> > +   , _off);
> > +
> > + if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
> > + if (!nf_nat_icmpv6_reply_translation(skb, ct,
> > +  ctinfo,
> > +  hooknum,
> > +  hdrlen))
> > + err = NF_DROP;
> > + goto out;
> > + }
> > + }
> &g

[ovs-dev] [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc

2022-11-15 Thread Xin Long
There are two nat functions are nearly the same in both OVS and
TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat().

This patch is to move them to netfilter nf_nat_core and export
nf_ct_nat() so that it can be shared by both OVS and TC, and
keep the nat (type) check and nat flag update in OVS and TC's
own place, as these parts are different between OVS and TC.

Note that in OVS nat function it was using skb->protocol to get
the proto as it already skips vlans in key_extract(), while it
doesn't in TC, and TC has to call skb_protocol() to get proto.
So in nf_ct_nat_execute(), we keep using skb_protocol() which
works for both OVS and TC.

Signed-off-by: Xin Long 
---
 include/net/netfilter/nf_nat.h |   4 +
 net/netfilter/nf_nat_core.c| 131 +++
 net/openvswitch/conntrack.c| 137 +++--
 net/sched/act_ct.c | 136 +++-
 4 files changed, 156 insertions(+), 252 deletions(-)

diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h
index e9eb01e99d2f..9877f064548a 100644
--- a/include/net/netfilter/nf_nat.h
+++ b/include/net/netfilter/nf_nat.h
@@ -104,6 +104,10 @@ unsigned int
 nf_nat_inet_fn(void *priv, struct sk_buff *skb,
   const struct nf_hook_state *state);
 
+int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct,
+ enum ip_conntrack_info ctinfo, int *action,
+ const struct nf_nat_range2 *range, bool commit);
+
 static inline int nf_nat_initialized(const struct nf_conn *ct,
 enum nf_nat_manip_type manip)
 {
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index e29e4ccb5c5a..1c72b8caa24e 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -784,6 +784,137 @@ nf_nat_inet_fn(void *priv, struct sk_buff *skb,
 }
 EXPORT_SYMBOL_GPL(nf_nat_inet_fn);
 
+/* Modelled after nf_nat_ipv[46]_fn().
+ * range is only used for new, uninitialized NAT state.
+ * Returns either NF_ACCEPT or NF_DROP.
+ */
+static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
+enum ip_conntrack_info ctinfo, int *action,
+const struct nf_nat_range2 *range,
+enum nf_nat_manip_type maniptype)
+{
+   __be16 proto = skb_protocol(skb, true);
+   int hooknum, err = NF_ACCEPT;
+
+   /* See HOOK2MANIP(). */
+   if (maniptype == NF_NAT_MANIP_SRC)
+   hooknum = NF_INET_LOCAL_IN; /* Source NAT */
+   else
+   hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
+
+   switch (ctinfo) {
+   case IP_CT_RELATED:
+   case IP_CT_RELATED_REPLY:
+   if (proto == htons(ETH_P_IP) &&
+   ip_hdr(skb)->protocol == IPPROTO_ICMP) {
+   if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
+  hooknum))
+   err = NF_DROP;
+   goto out;
+   } else if (IS_ENABLED(CONFIG_IPV6) && proto == 
htons(ETH_P_IPV6)) {
+   __be16 frag_off;
+   u8 nexthdr = ipv6_hdr(skb)->nexthdr;
+   int hdrlen = ipv6_skip_exthdr(skb,
+ sizeof(struct ipv6hdr),
+ , _off);
+
+   if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
+   if (!nf_nat_icmpv6_reply_translation(skb, ct,
+ctinfo,
+hooknum,
+hdrlen))
+   err = NF_DROP;
+   goto out;
+   }
+   }
+   /* Non-ICMP, fall thru to initialize if needed. */
+   fallthrough;
+   case IP_CT_NEW:
+   /* Seen it before?  This can happen for loopback, retrans,
+* or local packets.
+*/
+   if (!nf_nat_initialized(ct, maniptype)) {
+   /* Initialize according to the NAT action. */
+   err = (range && range->flags & NF_NAT_RANGE_MAP_IPS)
+   /* Action is set up to establish a new
+* mapping.
+*/
+   ? nf_nat_setup_info(ct, range, maniptype)
+   : nf_nat_alloc_null_binding(ct, hooknum);
+   if (err != NF_ACCEPT)
+   goto out;
+   }
+   break;
+
+   case IP_CT_ESTABLISHED:
+   case IP_CT_ESTABLIS

[ovs-dev] [PATCH net-next 4/5] net: sched: update the nat flag for icmp error packets in ct_nat_execute

2022-11-15 Thread Xin Long
In ovs_ct_nat_execute(), the packet flow key nat flags are updated
when it processes ICMP(v6) error packets translation successfully.

In ct_nat_execute() when processing ICMP(v6) error packets translation
successfully, it should have done the same in ct_nat_execute() to set
post_ct_s/dnat flag, which will be used to update flow key nat flags
in OVS module later.

Signed-off-by: Xin Long 
---
 net/sched/act_ct.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 8869b3ef6642..c7782c9a6ab6 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -936,13 +936,13 @@ static int ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
}
 
err = nf_nat_packet(ct, ctinfo, hooknum, skb);
+out:
if (err == NF_ACCEPT) {
if (maniptype == NF_NAT_MANIP_SRC)
tc_skb_cb(skb)->post_ct_snat = 1;
if (maniptype == NF_NAT_MANIP_DST)
tc_skb_cb(skb)->post_ct_dnat = 1;
}
-out:
return err;
 }
 #endif /* CONFIG_NF_NAT */
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 3/5] net: sched: return NF_ACCEPT when fails to add nat ext in tcf_ct_act_nat

2022-11-15 Thread Xin Long
This patch changes to return NF_ACCEPT when fails to add nat
ext before doing NAT in tcf_ct_act_nat(), to keep consistent
with OVS' processing in ovs_ct_nat().

Signed-off-by: Xin Long 
---
 net/sched/act_ct.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index da0b7f665277..8869b3ef6642 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -994,7 +994,7 @@ static int tcf_ct_act_nat(struct sk_buff *skb,
 
/* Add NAT extension if not confirmed yet. */
if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
-   return NF_DROP;   /* Can't NAT. */
+   return NF_ACCEPT;   /* Can't NAT. */
 
if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) &&
(ctinfo != IP_CT_RELATED || commit)) {
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 1/5] openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute

2022-11-15 Thread Xin Long
The calls to ovs_ct_nat_execute() are as below:

  ovs_ct_execute()
ovs_ct_lookup()
  __ovs_ct_lookup()
ovs_ct_nat()
  ovs_ct_nat_execute()
ovs_ct_commit()
  __ovs_ct_lookup()
ovs_ct_nat()
  ovs_ct_nat_execute()

and since skb_pull_rcsum() and skb_push_rcsum() are already
called in ovs_ct_execute(), there's no need to do it again
in ovs_ct_nat_execute().

Signed-off-by: Xin Long 
---
 net/openvswitch/conntrack.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 4348321856af..4c5e5a6475af 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -735,10 +735,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
  const struct nf_nat_range2 *range,
  enum nf_nat_manip_type maniptype, struct 
sw_flow_key *key)
 {
-   int hooknum, nh_off, err = NF_ACCEPT;
-
-   nh_off = skb_network_offset(skb);
-   skb_pull_rcsum(skb, nh_off);
+   int hooknum, err = NF_ACCEPT;
 
/* See HOOK2MANIP(). */
if (maniptype == NF_NAT_MANIP_SRC)
@@ -755,7 +752,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
   hooknum))
err = NF_DROP;
-   goto push;
+   goto out;
} else if (IS_ENABLED(CONFIG_IPV6) &&
   skb->protocol == htons(ETH_P_IPV6)) {
__be16 frag_off;
@@ -770,7 +767,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
 hooknum,
 hdrlen))
err = NF_DROP;
-   goto push;
+   goto out;
}
}
/* Non-ICMP, fall thru to initialize if needed. */
@@ -788,7 +785,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
? nf_nat_setup_info(ct, range, maniptype)
: nf_nat_alloc_null_binding(ct, hooknum);
if (err != NF_ACCEPT)
-   goto push;
+   goto out;
}
break;
 
@@ -798,13 +795,11 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
 
default:
err = NF_DROP;
-   goto push;
+   goto out;
}
 
err = nf_nat_packet(ct, ctinfo, hooknum, skb);
-push:
-   skb_push_rcsum(skb, nh_off);
-
+out:
/* Update the flow key if NAT successful. */
if (err == NF_ACCEPT)
ovs_nat_update_key(key, skb, maniptype);
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 0/5] net: eliminate the duplicate code in the ct nat functions of ovs and tc

2022-11-15 Thread Xin Long
The changes in the patchset:

  "net: add helper support in tc act_ct for ovs offloading"

had moved some common ct code used by both OVS and TC into netfilter.

There are still some big functions pretty similar defined and used in
each of OVS and TC. It is not good to maintain such similar code in 2
places. This patchset is to extract the functions for NAT processing
from OVS and TC to netfilter.

To make this change clear and safe, this patchset gets the common code
out of OVS and TC step by step: The patch 1-4 make some minor changes
in OVS and TC to make the NAT code of them completely the same, then
the patch 5 moves the common code to the netfilter and exports one
function called by each of OVS and TC.

Xin Long (5):
  openvswitch: delete the unncessary skb_pull_rcsum call in
ovs_ct_nat_execute
  openvswitch: return NF_ACCEPT when OVS_CT_NAT is net set in info nat
  net: sched: return NF_ACCEPT when fails to add nat ext in
tcf_ct_act_nat
  net: sched: update the nat flag for icmp error packets in
ct_nat_execute
  net: move the nat function to nf_nat_core for ovs and tc

 include/net/netfilter/nf_nat.h |   4 +
 net/netfilter/nf_nat_core.c| 131 +
 net/openvswitch/conntrack.c| 146 +++--
 net/sched/act_ct.c | 136 +++---
 4 files changed, 159 insertions(+), 258 deletions(-)

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 2/5] openvswitch: return NF_ACCEPT when OVS_CT_NAT is net set in info nat

2022-11-15 Thread Xin Long
Either OVS_CT_SRC_NAT or OVS_CT_DST_NAT is set, OVS_CT_NAT must be
set in info->nat. Thus, if OVS_CT_NAT is not set in info->nat, it
will definitely not do NAT but returns NF_ACCEPT in ovs_ct_nat().

This patch changes nothing funcational but only makes this return
earlier in ovs_ct_nat() to keep consistent with TC's processing
in tcf_ct_act_nat().

Signed-off-by: Xin Long 
---
 net/openvswitch/conntrack.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 4c5e5a6475af..cc643a556ea1 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -816,6 +816,9 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key 
*key,
enum nf_nat_manip_type maniptype;
int err;
 
+   if (!(info->nat & OVS_CT_NAT))
+   return NF_ACCEPT;
+
/* Add NAT extension if not confirmed yet. */
if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
return NF_ACCEPT;   /* Can't NAT. */
@@ -825,8 +828,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key 
*key,
 * Make sure new expected connections (IP_CT_RELATED) are NATted only
 * when committing.
 */
-   if (info->nat & OVS_CT_NAT && ctinfo != IP_CT_NEW &&
-   ct->status & IPS_NAT_MASK &&
+   if (ctinfo != IP_CT_NEW && ct->status & IPS_NAT_MASK &&
(ctinfo != IP_CT_RELATED || info->commit)) {
/* NAT an established or related connection like before. */
if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv4 net-next 4/4] net: sched: add helper support in act_ct

2022-11-06 Thread Xin Long
This patch is to add helper support in act_ct for OVS actions=ct(alg=xxx)
offloading, which is corresponding to Commit cae3a2627520 ("openvswitch:
Allow attaching helpers to ct action") in OVS kernel part.

The difference is when adding TC actions family and proto cannot be got
from the filter/match, other than helper name in tb[TCA_CT_HELPER_NAME],
we also need to send the family in tb[TCA_CT_HELPER_FAMILY] and the
proto in tb[TCA_CT_HELPER_PROTO] to kernel.

Acked-by: Marcelo Ricardo Leitner 
Signed-off-by: Xin Long 
---
 include/net/tc_act/tc_ct.h|  1 +
 include/uapi/linux/tc_act/tc_ct.h |  3 ++
 net/sched/act_ct.c| 89 ---
 3 files changed, 85 insertions(+), 8 deletions(-)

diff --git a/include/net/tc_act/tc_ct.h b/include/net/tc_act/tc_ct.h
index 8250d6f0a462..b24ea2d9400b 100644
--- a/include/net/tc_act/tc_ct.h
+++ b/include/net/tc_act/tc_ct.h
@@ -10,6 +10,7 @@
 #include 
 
 struct tcf_ct_params {
+   struct nf_conntrack_helper *helper;
struct nf_conn *tmpl;
u16 zone;
 
diff --git a/include/uapi/linux/tc_act/tc_ct.h 
b/include/uapi/linux/tc_act/tc_ct.h
index 5fb1d7ac1027..6c5200f0ed38 100644
--- a/include/uapi/linux/tc_act/tc_ct.h
+++ b/include/uapi/linux/tc_act/tc_ct.h
@@ -22,6 +22,9 @@ enum {
TCA_CT_NAT_PORT_MIN,/* be16 */
TCA_CT_NAT_PORT_MAX,/* be16 */
TCA_CT_PAD,
+   TCA_CT_HELPER_NAME, /* string */
+   TCA_CT_HELPER_FAMILY,   /* u8 */
+   TCA_CT_HELPER_PROTO,/* u8 */
__TCA_CT_MAX
 };
 
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 193a460a9d7f..da0b7f665277 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static struct workqueue_struct *act_ct_wq;
@@ -655,7 +656,7 @@ struct tc_ct_action_net {
 
 /* Determine whether skb->_nfct is equal to the result of conntrack lookup. */
 static bool tcf_ct_skb_nfct_cached(struct net *net, struct sk_buff *skb,
-  u16 zone_id, bool force)
+  struct tcf_ct_params *p)
 {
enum ip_conntrack_info ctinfo;
struct nf_conn *ct;
@@ -665,11 +666,19 @@ static bool tcf_ct_skb_nfct_cached(struct net *net, 
struct sk_buff *skb,
return false;
if (!net_eq(net, read_pnet(>ct_net)))
goto drop_ct;
-   if (nf_ct_zone(ct)->id != zone_id)
+   if (nf_ct_zone(ct)->id != p->zone)
goto drop_ct;
+   if (p->helper) {
+   struct nf_conn_help *help;
+
+   help = nf_ct_ext_find(ct, NF_CT_EXT_HELPER);
+   if (help && rcu_access_pointer(help->helper) != p->helper)
+   goto drop_ct;
+   }
 
/* Force conntrack entry direction. */
-   if (force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
+   if ((p->ct_action & TCA_CT_ACT_FORCE) &&
+   CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
if (nf_ct_is_confirmed(ct))
nf_ct_kill(ct);
 
@@ -832,6 +841,13 @@ static int tcf_ct_handle_fragments(struct net *net, struct 
sk_buff *skb,
 
 static void tcf_ct_params_free(struct tcf_ct_params *params)
 {
+   if (params->helper) {
+#if IS_ENABLED(CONFIG_NF_NAT)
+   if (params->ct_action & TCA_CT_ACT_NAT)
+   nf_nat_helper_put(params->helper);
+#endif
+   nf_conntrack_helper_put(params->helper);
+   }
if (params->ct_ft)
tcf_ct_flow_table_put(params->ct_ft);
if (params->tmpl)
@@ -1026,13 +1042,14 @@ static int tcf_ct_act(struct sk_buff *skb, const struct 
tc_action *a,
  struct tcf_result *res)
 {
struct net *net = dev_net(skb->dev);
-   bool cached, commit, clear, force;
enum ip_conntrack_info ctinfo;
struct tcf_ct *c = to_ct(a);
struct nf_conn *tmpl = NULL;
struct nf_hook_state state;
+   bool cached, commit, clear;
int nh_ofs, err, retval;
struct tcf_ct_params *p;
+   bool add_helper = false;
bool skip_add = false;
bool defrag = false;
struct nf_conn *ct;
@@ -1043,7 +1060,6 @@ static int tcf_ct_act(struct sk_buff *skb, const struct 
tc_action *a,
retval = READ_ONCE(c->tcf_action);
commit = p->ct_action & TCA_CT_ACT_COMMIT;
clear = p->ct_action & TCA_CT_ACT_CLEAR;
-   force = p->ct_action & TCA_CT_ACT_FORCE;
tmpl = p->tmpl;
 
tcf_lastuse_update(>tcf_tm);
@@ -1086,7 +1102,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct 
tc_action *a,
 * actually run the packet through conntrack twice unless it's for a
 * different zone.
 */
-   cached = tcf_ct_skb_nfct_cached(net, skb, p->zone, force);
+   ca

[ovs-dev] [PATCHv4 net-next 3/4] net: sched: call tcf_ct_params_free to free params in tcf_ct_init

2022-11-06 Thread Xin Long
This patch is to make the err path simple by calling tcf_ct_params_free(),
so that it won't cause problems when more members are added into param and
need freeing on the err path.

Acked-by: Marcelo Ricardo Leitner 
Signed-off-by: Xin Long 
---
 net/sched/act_ct.c | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index b38d91d6b249..193a460a9d7f 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -345,11 +345,9 @@ static void tcf_ct_flow_table_cleanup_work(struct 
work_struct *work)
module_put(THIS_MODULE);
 }
 
-static void tcf_ct_flow_table_put(struct tcf_ct_params *params)
+static void tcf_ct_flow_table_put(struct tcf_ct_flow_table *ct_ft)
 {
-   struct tcf_ct_flow_table *ct_ft = params->ct_ft;
-
-   if (refcount_dec_and_test(>ct_ft->ref)) {
+   if (refcount_dec_and_test(_ft->ref)) {
rhashtable_remove_fast(_ht, _ft->node, zones_params);
INIT_RCU_WORK(_ft->rwork, tcf_ct_flow_table_cleanup_work);
queue_rcu_work(act_ct_wq, _ft->rwork);
@@ -832,18 +830,23 @@ static int tcf_ct_handle_fragments(struct net *net, 
struct sk_buff *skb,
return err;
 }
 
-static void tcf_ct_params_free(struct rcu_head *head)
+static void tcf_ct_params_free(struct tcf_ct_params *params)
 {
-   struct tcf_ct_params *params = container_of(head,
-   struct tcf_ct_params, rcu);
-
-   tcf_ct_flow_table_put(params);
-
+   if (params->ct_ft)
+   tcf_ct_flow_table_put(params->ct_ft);
if (params->tmpl)
nf_ct_put(params->tmpl);
kfree(params);
 }
 
+static void tcf_ct_params_free_rcu(struct rcu_head *head)
+{
+   struct tcf_ct_params *params;
+
+   params = container_of(head, struct tcf_ct_params, rcu);
+   tcf_ct_params_free(params);
+}
+
 #if IS_ENABLED(CONFIG_NF_NAT)
 /* Modelled after nf_nat_ipv[46]_fn().
  * range is only used for new, uninitialized NAT state.
@@ -1390,7 +1393,7 @@ static int tcf_ct_init(struct net *net, struct nlattr 
*nla,
 
err = tcf_ct_flow_table_get(net, params);
if (err)
-   goto cleanup_params;
+   goto cleanup;
 
spin_lock_bh(>tcf_lock);
goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
@@ -1401,17 +1404,15 @@ static int tcf_ct_init(struct net *net, struct nlattr 
*nla,
if (goto_ch)
tcf_chain_put_by_act(goto_ch);
if (params)
-   call_rcu(>rcu, tcf_ct_params_free);
+   call_rcu(>rcu, tcf_ct_params_free_rcu);
 
return res;
 
-cleanup_params:
-   if (params->tmpl)
-   nf_ct_put(params->tmpl);
 cleanup:
if (goto_ch)
tcf_chain_put_by_act(goto_ch);
-   kfree(params);
+   if (params)
+   tcf_ct_params_free(params);
tcf_idr_release(*a, bind);
return err;
 }
@@ -1423,7 +1424,7 @@ static void tcf_ct_cleanup(struct tc_action *a)
 
params = rcu_dereference_protected(c->params, 1);
if (params)
-   call_rcu(>rcu, tcf_ct_params_free);
+   call_rcu(>rcu, tcf_ct_params_free_rcu);
 }
 
 static int tcf_ct_dump_key_val(struct sk_buff *skb,
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv4 net-next 2/4] net: move add ct helper function to nf_conntrack_helper for ovs and tc

2022-11-06 Thread Xin Long
Move ovs_ct_add_helper from openvswitch to nf_conntrack_helper and
rename as nf_ct_add_helper, so that it can be used in TC act_ct in
the next patch.

Acked-by: Marcelo Ricardo Leitner 
Signed-off-by: Xin Long 
---
 include/net/netfilter/nf_conntrack_helper.h |  2 +
 net/netfilter/nf_conntrack_helper.c | 31 +++
 net/openvswitch/conntrack.c | 44 +++--
 3 files changed, 38 insertions(+), 39 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_helper.h 
b/include/net/netfilter/nf_conntrack_helper.h
index b6676249eeeb..f30b1694b690 100644
--- a/include/net/netfilter/nf_conntrack_helper.h
+++ b/include/net/netfilter/nf_conntrack_helper.h
@@ -117,6 +117,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct 
nf_conn *tmpl,
 
 int nf_ct_helper(struct sk_buff *skb, struct nf_conn *ct,
 enum ip_conntrack_info ctinfo, u16 proto);
+int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family,
+u8 proto, bool nat, struct nf_conntrack_helper **hp);
 
 void nf_ct_helper_destroy(struct nf_conn *ct);
 
diff --git a/net/netfilter/nf_conntrack_helper.c 
b/net/netfilter/nf_conntrack_helper.c
index 88039eedadea..48ea6d0264b5 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -309,6 +309,37 @@ int nf_ct_helper(struct sk_buff *skb, struct nf_conn *ct,
 }
 EXPORT_SYMBOL_GPL(nf_ct_helper);
 
+int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family,
+u8 proto, bool nat, struct nf_conntrack_helper **hp)
+{
+   struct nf_conntrack_helper *helper;
+   struct nf_conn_help *help;
+   int ret = 0;
+
+   helper = nf_conntrack_helper_try_module_get(name, family, proto);
+   if (!helper)
+   return -EINVAL;
+
+   help = nf_ct_helper_ext_add(ct, GFP_KERNEL);
+   if (!help) {
+   nf_conntrack_helper_put(helper);
+   return -ENOMEM;
+   }
+#if IS_ENABLED(CONFIG_NF_NAT)
+   if (nat) {
+   ret = nf_nat_helper_try_module_get(name, family, proto);
+   if (ret) {
+   nf_conntrack_helper_put(helper);
+   return ret;
+   }
+   }
+#endif
+   rcu_assign_pointer(help->helper, helper);
+   *hp = helper;
+   return ret;
+}
+EXPORT_SYMBOL_GPL(nf_ct_add_helper);
+
 /* appropriate ct lock protecting must be taken by caller */
 static int unhelp(struct nf_conn *ct, void *me)
 {
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 18f54fa38e8f..4348321856af 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -1291,43 +1291,6 @@ int ovs_ct_clear(struct sk_buff *skb, struct sw_flow_key 
*key)
return 0;
 }
 
-static int ovs_ct_add_helper(struct ovs_conntrack_info *info, const char *name,
-const struct sw_flow_key *key, bool log)
-{
-   struct nf_conntrack_helper *helper;
-   struct nf_conn_help *help;
-   int ret = 0;
-
-   helper = nf_conntrack_helper_try_module_get(name, info->family,
-   key->ip.proto);
-   if (!helper) {
-   OVS_NLERR(log, "Unknown helper \"%s\"", name);
-   return -EINVAL;
-   }
-
-   help = nf_ct_helper_ext_add(info->ct, GFP_KERNEL);
-   if (!help) {
-   nf_conntrack_helper_put(helper);
-   return -ENOMEM;
-   }
-
-#if IS_ENABLED(CONFIG_NF_NAT)
-   if (info->nat) {
-   ret = nf_nat_helper_try_module_get(name, info->family,
-  key->ip.proto);
-   if (ret) {
-   nf_conntrack_helper_put(helper);
-   OVS_NLERR(log, "Failed to load \"%s\" NAT helper, 
error: %d",
- name, ret);
-   return ret;
-   }
-   }
-#endif
-   rcu_assign_pointer(help->helper, helper);
-   info->helper = helper;
-   return ret;
-}
-
 #if IS_ENABLED(CONFIG_NF_NAT)
 static int parse_nat(const struct nlattr *attr,
 struct ovs_conntrack_info *info, bool log)
@@ -1661,9 +1624,12 @@ int ovs_ct_copy_action(struct net *net, const struct 
nlattr *attr,
}
 
if (helper) {
-   err = ovs_ct_add_helper(_info, helper, key, log);
-   if (err)
+   err = nf_ct_add_helper(ct_info.ct, helper, ct_info.family,
+  key->ip.proto, ct_info.nat, 
_info.helper);
+   if (err) {
+   OVS_NLERR(log, "Failed to add %s helper %d", helper, 
err);
goto err_free_ct;
+   }
}
 
err = ovs_nla_add_action(sfa, OVS_ACTION_ATTR_CT, _info,
-- 
2.31.1

__

[ovs-dev] [PATCHv4 net-next 1/4] net: move the ct helper function to nf_conntrack_helper for ovs and tc

2022-11-06 Thread Xin Long
Move ovs_ct_helper from openvswitch to nf_conntrack_helper and rename
as nf_ct_helper so that it can be used in TC act_ct in the next patch.
Note that it also adds the checks for the family and proto, as in TC
act_ct, the packets with correct family and proto are not guaranteed.

Acked-by: Marcelo Ricardo Leitner 
Signed-off-by: Xin Long 
---
 include/net/netfilter/nf_conntrack_helper.h |  3 +
 net/netfilter/nf_conntrack_helper.c | 69 +
 net/openvswitch/conntrack.c | 61 +-
 3 files changed, 73 insertions(+), 60 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_helper.h 
b/include/net/netfilter/nf_conntrack_helper.h
index 9939c366f720..b6676249eeeb 100644
--- a/include/net/netfilter/nf_conntrack_helper.h
+++ b/include/net/netfilter/nf_conntrack_helper.h
@@ -115,6 +115,9 @@ struct nf_conn_help *nf_ct_helper_ext_add(struct nf_conn 
*ct, gfp_t gfp);
 int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
  gfp_t flags);
 
+int nf_ct_helper(struct sk_buff *skb, struct nf_conn *ct,
+enum ip_conntrack_info ctinfo, u16 proto);
+
 void nf_ct_helper_destroy(struct nf_conn *ct);
 
 static inline struct nf_conn_help *nfct_help(const struct nf_conn *ct)
diff --git a/net/netfilter/nf_conntrack_helper.c 
b/net/netfilter/nf_conntrack_helper.c
index ff737a76052e..88039eedadea 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -26,7 +26,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 
 static DEFINE_MUTEX(nf_ct_helper_mutex);
 struct hlist_head *nf_ct_helper_hash __read_mostly;
@@ -240,6 +242,73 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct 
nf_conn *tmpl,
 }
 EXPORT_SYMBOL_GPL(__nf_ct_try_assign_helper);
 
+/* 'skb' should already be pulled to nh_ofs. */
+int nf_ct_helper(struct sk_buff *skb, struct nf_conn *ct,
+enum ip_conntrack_info ctinfo, u16 proto)
+{
+   const struct nf_conntrack_helper *helper;
+   const struct nf_conn_help *help;
+   unsigned int protoff;
+   int err;
+
+   if (ctinfo == IP_CT_RELATED_REPLY)
+   return NF_ACCEPT;
+
+   help = nfct_help(ct);
+   if (!help)
+   return NF_ACCEPT;
+
+   helper = rcu_dereference(help->helper);
+   if (!helper)
+   return NF_ACCEPT;
+
+   if (helper->tuple.src.l3num != NFPROTO_UNSPEC &&
+   helper->tuple.src.l3num != proto)
+   return NF_ACCEPT;
+
+   switch (proto) {
+   case NFPROTO_IPV4:
+   protoff = ip_hdrlen(skb);
+   proto = ip_hdr(skb)->protocol;
+   break;
+   case NFPROTO_IPV6: {
+   u8 nexthdr = ipv6_hdr(skb)->nexthdr;
+   __be16 frag_off;
+   int ofs;
+
+   ofs = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), ,
+  _off);
+   if (ofs < 0 || (frag_off & htons(~0x7)) != 0) {
+   pr_debug("proto header not found\n");
+   return NF_ACCEPT;
+   }
+   protoff = ofs;
+   proto = nexthdr;
+   break;
+   }
+   default:
+   WARN_ONCE(1, "helper invoked on non-IP family!");
+   return NF_DROP;
+   }
+
+   if (helper->tuple.dst.protonum != proto)
+   return NF_ACCEPT;
+
+   err = helper->help(skb, protoff, ct, ctinfo);
+   if (err != NF_ACCEPT)
+   return err;
+
+   /* Adjust seqs after helper.  This is needed due to some helpers (e.g.,
+* FTP with NAT) adusting the TCP payload size when mangling IP
+* addresses and/or port numbers in the text-based control connection.
+*/
+   if (test_bit(IPS_SEQ_ADJUST_BIT, >status) &&
+   !nf_ct_seq_adjust(skb, ct, ctinfo, protoff))
+   return NF_DROP;
+   return NF_ACCEPT;
+}
+EXPORT_SYMBOL_GPL(nf_ct_helper);
+
 /* appropriate ct lock protecting must be taken by caller */
 static int unhelp(struct nf_conn *ct, void *me)
 {
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index c7b10234cf7c..18f54fa38e8f 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -434,65 +434,6 @@ static int ovs_ct_set_labels(struct nf_conn *ct, struct 
sw_flow_key *key,
return 0;
 }
 
-/* 'skb' should already be pulled to nh_ofs. */
-static int ovs_ct_helper(struct sk_buff *skb, u16 proto)
-{
-   const struct nf_conntrack_helper *helper;
-   const struct nf_conn_help *help;
-   enum ip_conntrack_info ctinfo;
-   unsigned int protoff;
-   struct nf_conn *ct;
-   int err;
-
-   ct = nf_ct_get(skb, );
-   if (!ct || ctinfo == IP_CT_RELATED_REPLY)
-   return NF_ACCEPT;
-
-   help = nfct_help(ct);
-   if (!help)
- 

[ovs-dev] [PATCHv4 net-next 0/4] net: add helper support in tc act_ct for ovs offloading

2022-11-06 Thread Xin Long
Ilya reported an issue that FTP traffic would be broken when the OVS flow
with ct(commit,alg=ftp) installed in the OVS kernel module, and it was
caused by that TC didn't support the ftp helper offloaded from OVS.

This patchset is to add the helper support in act_ct for OVS offloading
in kernel net/sched.

The 1st and 2nd patches move some common code into nf_conntrack_helper from
openvswitch so that they could be used by net/sched in the 4th patch (Note
there are still some other common code used in both OVS and TC, and I will
extract it in other patches). The 3rd patch extracts another function in
net/sched to make the 4th patch easier to write. The 4th patch adds this
feature in net/sched.

The user space part will be added in another patch, and with it these OVS
flows (FTP over SNAT) can be used to test this feature:

  table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk \
actions=ct(table=1, nat), normal
  table=0, in_port=veth2,tcp,ct_state=-trk actions=ct(table=1, nat)
  table=0, in_port=veth1,tcp,ct_state=-trk actions=ct(table=0, nat)
  table=0, in_port=veth1,tcp,ct_state=+trk+rel actions=ct(commit, nat),normal
  table=0, in_port=veth1,tcp,ct_state=+trk+est actions=veth2"

  table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new \
actions=ct(commit, nat(src=7.7.16.1), alg=ftp),normal"
  table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+est actions=veth2"
  table=1, in_port=veth2,tcp,ct_state=+trk+est actions=veth1"

v1->v2:
  - add the 2nd patch to extract nf_ct_add_helper from openvswitch for
tc act_ct use.
  - go to drop instead of return -EINVAL when fails to add SEQADJ ext in
tcf_ct_act() as Paolo noticed in the 4th patch.
  - add ct exts only when the ct is not confirmed as Pablo noticed in the
4th patch.

v2->v3:
  - fix a warning of unused variable 'err' when CONFIG_NF_NAT is disabled
in 2nd patch.

v3->v4:
  - have the nf_conn and ip_conntrack_info passed into nf_ct_helper() as
Aaron suggested in the 1st patch.
  - no need to pass the 'force' into tcf_ct_skb_nfct_cached() as Marcelo
noticed, and remove the unnecessary variable 'force' in tcf_ct_act()
in the 4th patch.
  - fix a typo err in the cover letter as Marcelo noticed.

Xin Long (4):
  net: move the ct helper function to nf_conntrack_helper for ovs and tc
  net: move add ct helper function to nf_conntrack_helper for ovs and tc
  net: sched: call tcf_ct_params_free to free params in tcf_ct_init
  net: sched: add helper support in act_ct

 include/net/netfilter/nf_conntrack_helper.h |   5 +
 include/net/tc_act/tc_ct.h  |   1 +
 include/uapi/linux/tc_act/tc_ct.h   |   3 +
 net/netfilter/nf_conntrack_helper.c | 100 
 net/openvswitch/conntrack.c | 105 +
 net/sched/act_ct.c  | 124 
 6 files changed, 214 insertions(+), 124 deletions(-)

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv3 net-next 4/4] net: sched: add helper support in act_ct

2022-11-02 Thread Xin Long
On Tue, Nov 1, 2022 at 11:00 AM Marcelo Ricardo Leitner
 wrote:
>
> On Mon, Oct 31, 2022 at 11:36:10AM -0400, Xin Long wrote:
> ...
> > --- a/net/sched/act_ct.c
> > +++ b/net/sched/act_ct.c
> > @@ -33,6 +33,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >
> >  static struct workqueue_struct *act_ct_wq;
> > @@ -655,7 +656,7 @@ struct tc_ct_action_net {
> >
> >  /* Determine whether skb->_nfct is equal to the result of conntrack 
> > lookup. */
> >  static bool tcf_ct_skb_nfct_cached(struct net *net, struct sk_buff *skb,
> > -u16 zone_id, bool force)
> > +struct tcf_ct_params *p, bool force)
>
> Nit, it could have fetched 'force' from p->ct_action too then, as it
> is only used in this function.
right, we can save one variable here.

>
> There's a typo in Ilya's name in the cover letter.
Ah sorry, it was a letter missed when copying from the last cover.

>
> Other than this, LGTM.
> Acked-by: Marcelo Ricardo Leitner 
Thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv3 net-next 1/4] net: move the ct helper function to nf_conntrack_helper for ovs and tc

2022-11-02 Thread Xin Long
On Wed, Nov 2, 2022 at 3:21 PM Aaron Conole  wrote:
>
> Xin Long  writes:
>
> > Move ovs_ct_helper from openvswitch to nf_conntrack_helper and rename
> > as nf_ct_helper so that it can be used in TC act_ct in the next patch.
> > Note that it also adds the checks for the family and proto, as in TC
> > act_ct, the packets with correct family and proto are not guaranteed.
> >
> > Signed-off-by: Xin Long 
> > ---
>
> Hi Xin,
>
> >  include/net/netfilter/nf_conntrack_helper.h |  2 +
> >  net/netfilter/nf_conntrack_helper.c | 71 +
> >  net/openvswitch/conntrack.c | 61 +-
> >  3 files changed, 74 insertions(+), 60 deletions(-)
> >
> > diff --git a/include/net/netfilter/nf_conntrack_helper.h 
> > b/include/net/netfilter/nf_conntrack_helper.h
> > index 9939c366f720..6c32e59fc16f 100644
> > --- a/include/net/netfilter/nf_conntrack_helper.h
> > +++ b/include/net/netfilter/nf_conntrack_helper.h
> > @@ -115,6 +115,8 @@ struct nf_conn_help *nf_ct_helper_ext_add(struct 
> > nf_conn *ct, gfp_t gfp);
> >  int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
> > gfp_t flags);
> >
> > +int nf_ct_helper(struct sk_buff *skb, u16 proto);
> > +
> >  void nf_ct_helper_destroy(struct nf_conn *ct);
> >
> >  static inline struct nf_conn_help *nfct_help(const struct nf_conn *ct)
> > diff --git a/net/netfilter/nf_conntrack_helper.c 
> > b/net/netfilter/nf_conntrack_helper.c
> > index ff737a76052e..83615e479f87 100644
> > --- a/net/netfilter/nf_conntrack_helper.c
> > +++ b/net/netfilter/nf_conntrack_helper.c
> > @@ -26,7 +26,9 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> > +#include 
> >
> >  static DEFINE_MUTEX(nf_ct_helper_mutex);
> >  struct hlist_head *nf_ct_helper_hash __read_mostly;
> > @@ -240,6 +242,75 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, 
> > struct nf_conn *tmpl,
> >  }
> >  EXPORT_SYMBOL_GPL(__nf_ct_try_assign_helper);
> >
> > +/* 'skb' should already be pulled to nh_ofs. */
> > +int nf_ct_helper(struct sk_buff *skb, u16 proto)
>
> AFAICT, in all the places we call this we will have the nf_conn and
> ip_conntrack_info already.  Maybe it makes sense to pass them here
> rather than looking up again?  Originally, that information wasn't
> available in the calling function - over time this has been added so we
> might as well reduce the amount of "extra lookups" performed.
I just tried to keep nf_ct_helper() as similar as possible
to the original ovs_ct_helper().

But sure, we can also pass ct and ctinfo as the arguments,
like some other functions where it passes all of skb, ct and ctinfo.

Thanks.

>
> > +{
> > + const struct nf_conntrack_helper *helper;
> > + const struct nf_conn_help *help;
> > + enum ip_conntrack_info ctinfo;
> > + unsigned int protoff;
> > + struct nf_conn *ct;
> > + int err;
> > +
> > + ct = nf_ct_get(skb, );
> > + if (!ct || ctinfo == IP_CT_RELATED_REPLY)
> > + return NF_ACCEPT;
> > +
> > + help = nfct_help(ct);
> > + if (!help)
> > + return NF_ACCEPT;
> > +
> > + helper = rcu_dereference(help->helper);
> > + if (!helper)
> > + return NF_ACCEPT;
> > +
> > + if (helper->tuple.src.l3num != NFPROTO_UNSPEC &&
> > + helper->tuple.src.l3num != proto)
> > + return NF_ACCEPT;
> > +
> > + switch (proto) {
> > + case NFPROTO_IPV4:
> > + protoff = ip_hdrlen(skb);
> > + proto = ip_hdr(skb)->protocol;
> > + break;
> > + case NFPROTO_IPV6: {
> > + u8 nexthdr = ipv6_hdr(skb)->nexthdr;
> > + __be16 frag_off;
> > + int ofs;
> > +
> > + ofs = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), ,
> > +_off);
> > + if (ofs < 0 || (frag_off & htons(~0x7)) != 0) {
> > + pr_debug("proto header not found\n");
> > + return NF_ACCEPT;
> > + }
> > + protoff = ofs;
> > + proto = nexthdr;
> > + break;
> > + }
> > + default:
> > + WARN_ONCE(1, "helper invoked on non-IP family!");
> > + return NF_DROP;
> > + }
> > +
>

[ovs-dev] [PATCHv3 net-next 4/4] net: sched: add helper support in act_ct

2022-10-31 Thread Xin Long
This patch is to add helper support in act_ct for OVS actions=ct(alg=xxx)
offloading, which is corresponding to Commit cae3a2627520 ("openvswitch:
Allow attaching helpers to ct action") in OVS kernel part.

The difference is when adding TC actions family and proto cannot be got
from the filter/match, other than helper name in tb[TCA_CT_HELPER_NAME],
we also need to send the family in tb[TCA_CT_HELPER_FAMILY] and the
proto in tb[TCA_CT_HELPER_PROTO] to kernel.

Signed-off-by: Xin Long 
---
 include/net/tc_act/tc_ct.h|  1 +
 include/uapi/linux/tc_act/tc_ct.h |  3 ++
 net/sched/act_ct.c| 83 +--
 3 files changed, 82 insertions(+), 5 deletions(-)

diff --git a/include/net/tc_act/tc_ct.h b/include/net/tc_act/tc_ct.h
index 8250d6f0a462..b24ea2d9400b 100644
--- a/include/net/tc_act/tc_ct.h
+++ b/include/net/tc_act/tc_ct.h
@@ -10,6 +10,7 @@
 #include 
 
 struct tcf_ct_params {
+   struct nf_conntrack_helper *helper;
struct nf_conn *tmpl;
u16 zone;
 
diff --git a/include/uapi/linux/tc_act/tc_ct.h 
b/include/uapi/linux/tc_act/tc_ct.h
index 5fb1d7ac1027..6c5200f0ed38 100644
--- a/include/uapi/linux/tc_act/tc_ct.h
+++ b/include/uapi/linux/tc_act/tc_ct.h
@@ -22,6 +22,9 @@ enum {
TCA_CT_NAT_PORT_MIN,/* be16 */
TCA_CT_NAT_PORT_MAX,/* be16 */
TCA_CT_PAD,
+   TCA_CT_HELPER_NAME, /* string */
+   TCA_CT_HELPER_FAMILY,   /* u8 */
+   TCA_CT_HELPER_PROTO,/* u8 */
__TCA_CT_MAX
 };
 
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 193a460a9d7f..85f4dc5650da 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static struct workqueue_struct *act_ct_wq;
@@ -655,7 +656,7 @@ struct tc_ct_action_net {
 
 /* Determine whether skb->_nfct is equal to the result of conntrack lookup. */
 static bool tcf_ct_skb_nfct_cached(struct net *net, struct sk_buff *skb,
-  u16 zone_id, bool force)
+  struct tcf_ct_params *p, bool force)
 {
enum ip_conntrack_info ctinfo;
struct nf_conn *ct;
@@ -665,8 +666,15 @@ static bool tcf_ct_skb_nfct_cached(struct net *net, struct 
sk_buff *skb,
return false;
if (!net_eq(net, read_pnet(>ct_net)))
goto drop_ct;
-   if (nf_ct_zone(ct)->id != zone_id)
+   if (nf_ct_zone(ct)->id != p->zone)
goto drop_ct;
+   if (p->helper) {
+   struct nf_conn_help *help;
+
+   help = nf_ct_ext_find(ct, NF_CT_EXT_HELPER);
+   if (help && rcu_access_pointer(help->helper) != p->helper)
+   goto drop_ct;
+   }
 
/* Force conntrack entry direction. */
if (force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
@@ -832,6 +840,13 @@ static int tcf_ct_handle_fragments(struct net *net, struct 
sk_buff *skb,
 
 static void tcf_ct_params_free(struct tcf_ct_params *params)
 {
+   if (params->helper) {
+#if IS_ENABLED(CONFIG_NF_NAT)
+   if (params->ct_action & TCA_CT_ACT_NAT)
+   nf_nat_helper_put(params->helper);
+#endif
+   nf_conntrack_helper_put(params->helper);
+   }
if (params->ct_ft)
tcf_ct_flow_table_put(params->ct_ft);
if (params->tmpl)
@@ -1033,6 +1048,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct 
tc_action *a,
struct nf_hook_state state;
int nh_ofs, err, retval;
struct tcf_ct_params *p;
+   bool add_helper = false;
bool skip_add = false;
bool defrag = false;
struct nf_conn *ct;
@@ -1086,7 +1102,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct 
tc_action *a,
 * actually run the packet through conntrack twice unless it's for a
 * different zone.
 */
-   cached = tcf_ct_skb_nfct_cached(net, skb, p->zone, force);
+   cached = tcf_ct_skb_nfct_cached(net, skb, p, force);
if (!cached) {
if (tcf_ct_flow_table_lookup(p, skb, family)) {
skip_add = true;
@@ -1119,6 +1135,22 @@ static int tcf_ct_act(struct sk_buff *skb, const struct 
tc_action *a,
if (err != NF_ACCEPT)
goto drop;
 
+   if (!nf_ct_is_confirmed(ct) && commit && p->helper && !nfct_help(ct)) {
+   err = __nf_ct_try_assign_helper(ct, p->tmpl, GFP_ATOMIC);
+   if (err)
+   goto drop;
+   add_helper = true;
+   if (p->ct_action & TCA_CT_ACT_NAT && !nfct_seqadj(ct)) {
+   if (!nfct_seqadj_ext_add(ct))
+   goto drop;
+   }
+   }
+
+   if (nf_ct_is_confirmed(ct) ? ((!cached && !skip_add) || add_helper) : 
comm

[ovs-dev] [PATCHv3 net-next 3/4] net: sched: call tcf_ct_params_free to free params in tcf_ct_init

2022-10-31 Thread Xin Long
This patch is to make the err path simple by calling tcf_ct_params_free(),
so that it won't cause problems when more members are added into param and
need freeing on the err path.

Signed-off-by: Xin Long 
---
 net/sched/act_ct.c | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index b38d91d6b249..193a460a9d7f 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -345,11 +345,9 @@ static void tcf_ct_flow_table_cleanup_work(struct 
work_struct *work)
module_put(THIS_MODULE);
 }
 
-static void tcf_ct_flow_table_put(struct tcf_ct_params *params)
+static void tcf_ct_flow_table_put(struct tcf_ct_flow_table *ct_ft)
 {
-   struct tcf_ct_flow_table *ct_ft = params->ct_ft;
-
-   if (refcount_dec_and_test(>ct_ft->ref)) {
+   if (refcount_dec_and_test(_ft->ref)) {
rhashtable_remove_fast(_ht, _ft->node, zones_params);
INIT_RCU_WORK(_ft->rwork, tcf_ct_flow_table_cleanup_work);
queue_rcu_work(act_ct_wq, _ft->rwork);
@@ -832,18 +830,23 @@ static int tcf_ct_handle_fragments(struct net *net, 
struct sk_buff *skb,
return err;
 }
 
-static void tcf_ct_params_free(struct rcu_head *head)
+static void tcf_ct_params_free(struct tcf_ct_params *params)
 {
-   struct tcf_ct_params *params = container_of(head,
-   struct tcf_ct_params, rcu);
-
-   tcf_ct_flow_table_put(params);
-
+   if (params->ct_ft)
+   tcf_ct_flow_table_put(params->ct_ft);
if (params->tmpl)
nf_ct_put(params->tmpl);
kfree(params);
 }
 
+static void tcf_ct_params_free_rcu(struct rcu_head *head)
+{
+   struct tcf_ct_params *params;
+
+   params = container_of(head, struct tcf_ct_params, rcu);
+   tcf_ct_params_free(params);
+}
+
 #if IS_ENABLED(CONFIG_NF_NAT)
 /* Modelled after nf_nat_ipv[46]_fn().
  * range is only used for new, uninitialized NAT state.
@@ -1390,7 +1393,7 @@ static int tcf_ct_init(struct net *net, struct nlattr 
*nla,
 
err = tcf_ct_flow_table_get(net, params);
if (err)
-   goto cleanup_params;
+   goto cleanup;
 
spin_lock_bh(>tcf_lock);
goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
@@ -1401,17 +1404,15 @@ static int tcf_ct_init(struct net *net, struct nlattr 
*nla,
if (goto_ch)
tcf_chain_put_by_act(goto_ch);
if (params)
-   call_rcu(>rcu, tcf_ct_params_free);
+   call_rcu(>rcu, tcf_ct_params_free_rcu);
 
return res;
 
-cleanup_params:
-   if (params->tmpl)
-   nf_ct_put(params->tmpl);
 cleanup:
if (goto_ch)
tcf_chain_put_by_act(goto_ch);
-   kfree(params);
+   if (params)
+   tcf_ct_params_free(params);
tcf_idr_release(*a, bind);
return err;
 }
@@ -1423,7 +1424,7 @@ static void tcf_ct_cleanup(struct tc_action *a)
 
params = rcu_dereference_protected(c->params, 1);
if (params)
-   call_rcu(>rcu, tcf_ct_params_free);
+   call_rcu(>rcu, tcf_ct_params_free_rcu);
 }
 
 static int tcf_ct_dump_key_val(struct sk_buff *skb,
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv3 net-next 2/4] net: move add ct helper function to nf_conntrack_helper for ovs and tc

2022-10-31 Thread Xin Long
Move ovs_ct_add_helper from openvswitch to nf_conntrack_helper and
rename as nf_ct_add_helper, so that it can be used in TC act_ct in
the next patch.

Signed-off-by: Xin Long 
---
 include/net/netfilter/nf_conntrack_helper.h |  2 +
 net/netfilter/nf_conntrack_helper.c | 31 +++
 net/openvswitch/conntrack.c | 44 +++--
 3 files changed, 38 insertions(+), 39 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_helper.h 
b/include/net/netfilter/nf_conntrack_helper.h
index 6c32e59fc16f..ad1adbfbeee2 100644
--- a/include/net/netfilter/nf_conntrack_helper.h
+++ b/include/net/netfilter/nf_conntrack_helper.h
@@ -116,6 +116,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct 
nf_conn *tmpl,
  gfp_t flags);
 
 int nf_ct_helper(struct sk_buff *skb, u16 proto);
+int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family,
+u8 proto, bool nat, struct nf_conntrack_helper **hp);
 
 void nf_ct_helper_destroy(struct nf_conn *ct);
 
diff --git a/net/netfilter/nf_conntrack_helper.c 
b/net/netfilter/nf_conntrack_helper.c
index 83615e479f87..1a2ab77d1bd7 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -311,6 +311,37 @@ int nf_ct_helper(struct sk_buff *skb, u16 proto)
 }
 EXPORT_SYMBOL_GPL(nf_ct_helper);
 
+int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family,
+u8 proto, bool nat, struct nf_conntrack_helper **hp)
+{
+   struct nf_conntrack_helper *helper;
+   struct nf_conn_help *help;
+   int ret = 0;
+
+   helper = nf_conntrack_helper_try_module_get(name, family, proto);
+   if (!helper)
+   return -EINVAL;
+
+   help = nf_ct_helper_ext_add(ct, GFP_KERNEL);
+   if (!help) {
+   nf_conntrack_helper_put(helper);
+   return -ENOMEM;
+   }
+#if IS_ENABLED(CONFIG_NF_NAT)
+   if (nat) {
+   ret = nf_nat_helper_try_module_get(name, family, proto);
+   if (ret) {
+   nf_conntrack_helper_put(helper);
+   return ret;
+   }
+   }
+#endif
+   rcu_assign_pointer(help->helper, helper);
+   *hp = helper;
+   return ret;
+}
+EXPORT_SYMBOL_GPL(nf_ct_add_helper);
+
 /* appropriate ct lock protecting must be taken by caller */
 static int unhelp(struct nf_conn *ct, void *me)
 {
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 19b5c54615c8..d37011e678c2 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -1291,43 +1291,6 @@ int ovs_ct_clear(struct sk_buff *skb, struct sw_flow_key 
*key)
return 0;
 }
 
-static int ovs_ct_add_helper(struct ovs_conntrack_info *info, const char *name,
-const struct sw_flow_key *key, bool log)
-{
-   struct nf_conntrack_helper *helper;
-   struct nf_conn_help *help;
-   int ret = 0;
-
-   helper = nf_conntrack_helper_try_module_get(name, info->family,
-   key->ip.proto);
-   if (!helper) {
-   OVS_NLERR(log, "Unknown helper \"%s\"", name);
-   return -EINVAL;
-   }
-
-   help = nf_ct_helper_ext_add(info->ct, GFP_KERNEL);
-   if (!help) {
-   nf_conntrack_helper_put(helper);
-   return -ENOMEM;
-   }
-
-#if IS_ENABLED(CONFIG_NF_NAT)
-   if (info->nat) {
-   ret = nf_nat_helper_try_module_get(name, info->family,
-  key->ip.proto);
-   if (ret) {
-   nf_conntrack_helper_put(helper);
-   OVS_NLERR(log, "Failed to load \"%s\" NAT helper, 
error: %d",
- name, ret);
-   return ret;
-   }
-   }
-#endif
-   rcu_assign_pointer(help->helper, helper);
-   info->helper = helper;
-   return ret;
-}
-
 #if IS_ENABLED(CONFIG_NF_NAT)
 static int parse_nat(const struct nlattr *attr,
 struct ovs_conntrack_info *info, bool log)
@@ -1661,9 +1624,12 @@ int ovs_ct_copy_action(struct net *net, const struct 
nlattr *attr,
}
 
if (helper) {
-   err = ovs_ct_add_helper(_info, helper, key, log);
-   if (err)
+   err = nf_ct_add_helper(ct_info.ct, helper, ct_info.family,
+  key->ip.proto, ct_info.nat, 
_info.helper);
+   if (err) {
+   OVS_NLERR(log, "Failed to add %s helper %d", helper, 
err);
goto err_free_ct;
+   }
}
 
err = ovs_nla_add_action(sfa, OVS_ACTION_ATTR_CT, _info,
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv3 net-next 1/4] net: move the ct helper function to nf_conntrack_helper for ovs and tc

2022-10-31 Thread Xin Long
Move ovs_ct_helper from openvswitch to nf_conntrack_helper and rename
as nf_ct_helper so that it can be used in TC act_ct in the next patch.
Note that it also adds the checks for the family and proto, as in TC
act_ct, the packets with correct family and proto are not guaranteed.

Signed-off-by: Xin Long 
---
 include/net/netfilter/nf_conntrack_helper.h |  2 +
 net/netfilter/nf_conntrack_helper.c | 71 +
 net/openvswitch/conntrack.c | 61 +-
 3 files changed, 74 insertions(+), 60 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_helper.h 
b/include/net/netfilter/nf_conntrack_helper.h
index 9939c366f720..6c32e59fc16f 100644
--- a/include/net/netfilter/nf_conntrack_helper.h
+++ b/include/net/netfilter/nf_conntrack_helper.h
@@ -115,6 +115,8 @@ struct nf_conn_help *nf_ct_helper_ext_add(struct nf_conn 
*ct, gfp_t gfp);
 int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
  gfp_t flags);
 
+int nf_ct_helper(struct sk_buff *skb, u16 proto);
+
 void nf_ct_helper_destroy(struct nf_conn *ct);
 
 static inline struct nf_conn_help *nfct_help(const struct nf_conn *ct)
diff --git a/net/netfilter/nf_conntrack_helper.c 
b/net/netfilter/nf_conntrack_helper.c
index ff737a76052e..83615e479f87 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -26,7 +26,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 
 static DEFINE_MUTEX(nf_ct_helper_mutex);
 struct hlist_head *nf_ct_helper_hash __read_mostly;
@@ -240,6 +242,75 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct 
nf_conn *tmpl,
 }
 EXPORT_SYMBOL_GPL(__nf_ct_try_assign_helper);
 
+/* 'skb' should already be pulled to nh_ofs. */
+int nf_ct_helper(struct sk_buff *skb, u16 proto)
+{
+   const struct nf_conntrack_helper *helper;
+   const struct nf_conn_help *help;
+   enum ip_conntrack_info ctinfo;
+   unsigned int protoff;
+   struct nf_conn *ct;
+   int err;
+
+   ct = nf_ct_get(skb, );
+   if (!ct || ctinfo == IP_CT_RELATED_REPLY)
+   return NF_ACCEPT;
+
+   help = nfct_help(ct);
+   if (!help)
+   return NF_ACCEPT;
+
+   helper = rcu_dereference(help->helper);
+   if (!helper)
+   return NF_ACCEPT;
+
+   if (helper->tuple.src.l3num != NFPROTO_UNSPEC &&
+   helper->tuple.src.l3num != proto)
+   return NF_ACCEPT;
+
+   switch (proto) {
+   case NFPROTO_IPV4:
+   protoff = ip_hdrlen(skb);
+   proto = ip_hdr(skb)->protocol;
+   break;
+   case NFPROTO_IPV6: {
+   u8 nexthdr = ipv6_hdr(skb)->nexthdr;
+   __be16 frag_off;
+   int ofs;
+
+   ofs = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), ,
+  _off);
+   if (ofs < 0 || (frag_off & htons(~0x7)) != 0) {
+   pr_debug("proto header not found\n");
+   return NF_ACCEPT;
+   }
+   protoff = ofs;
+   proto = nexthdr;
+   break;
+   }
+   default:
+   WARN_ONCE(1, "helper invoked on non-IP family!");
+   return NF_DROP;
+   }
+
+   if (helper->tuple.dst.protonum != proto)
+   return NF_ACCEPT;
+
+   err = helper->help(skb, protoff, ct, ctinfo);
+   if (err != NF_ACCEPT)
+   return err;
+
+   /* Adjust seqs after helper.  This is needed due to some helpers (e.g.,
+* FTP with NAT) adusting the TCP payload size when mangling IP
+* addresses and/or port numbers in the text-based control connection.
+*/
+   if (test_bit(IPS_SEQ_ADJUST_BIT, >status) &&
+   !nf_ct_seq_adjust(skb, ct, ctinfo, protoff))
+   return NF_DROP;
+   return NF_ACCEPT;
+}
+EXPORT_SYMBOL_GPL(nf_ct_helper);
+
 /* appropriate ct lock protecting must be taken by caller */
 static int unhelp(struct nf_conn *ct, void *me)
 {
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index c7b10234cf7c..19b5c54615c8 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -434,65 +434,6 @@ static int ovs_ct_set_labels(struct nf_conn *ct, struct 
sw_flow_key *key,
return 0;
 }
 
-/* 'skb' should already be pulled to nh_ofs. */
-static int ovs_ct_helper(struct sk_buff *skb, u16 proto)
-{
-   const struct nf_conntrack_helper *helper;
-   const struct nf_conn_help *help;
-   enum ip_conntrack_info ctinfo;
-   unsigned int protoff;
-   struct nf_conn *ct;
-   int err;
-
-   ct = nf_ct_get(skb, );
-   if (!ct || ctinfo == IP_CT_RELATED_REPLY)
-   return NF_ACCEPT;
-
-   help = nfct_help(ct);
-   if (!help)
-   return NF_ACCEPT;
-
-   helper = rcu_dereference(h

[ovs-dev] [PATCHv3 net-next 0/4] net: add helper support in tc act_ct for ovs offloading

2022-10-31 Thread Xin Long
lya reported an issue that FTP traffic would be broken when the OVS flow
with ct(commit,alg=ftp) installed in the OVS kernel module, and it was
caused by that TC didn't support the ftp helper offloaded from OVS.

This patchset is to add the helper support in act_ct for OVS offloading
in kernel net/sched.

The 1st and 2nd patches move some common code into nf_conntrack_helper from
openvswitch so that they could be used by net/sched in the 4th patch (Note
there are still some other common code used in both OVS and TC, and I will
extract it in other patches). The 3rd patch extracts another function in
net/sched to make the 4th patch easier to write. The 4th patch adds this
feature in net/sched.

The user space part will be added in another patch, and with it these OVS
flows (FTP over SNAT) can be used to test this feature:

  table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk \
actions=ct(table=1, nat), normal
  table=0, in_port=veth2,tcp,ct_state=-trk actions=ct(table=1, nat)
  table=0, in_port=veth1,tcp,ct_state=-trk actions=ct(table=0, nat)
  table=0, in_port=veth1,tcp,ct_state=+trk+rel actions=ct(commit, nat),normal
  table=0, in_port=veth1,tcp,ct_state=+trk+est actions=veth2"

  table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new \
actions=ct(commit, nat(src=7.7.16.1), alg=ftp),normal"
  table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+est actions=veth2"
  table=1, in_port=veth2,tcp,ct_state=+trk+est actions=veth1"

v1->v2:
  - go to drop instead of return -EINVAL when fails to add SEQADJ ext in
tcf_ct_act() as Paolo noticed.
  - add the 2nd patch to extract nf_ct_add_helper from openvswitch for
tc act_ct use.
  - add ct exts only when the ct is not confirmed as Pablo noticed.

v2->v3:
  - fix a warning of unused variable 'err' when CONFIG_NF_NAT is disabled.

Xin Long (4):
  net: move the ct helper function to nf_conntrack_helper for ovs and tc
  net: move add ct helper function to nf_conntrack_helper for ovs and tc
  net: sched: call tcf_ct_params_free to free params in tcf_ct_init
  net: sched: add helper support in act_ct

 include/net/netfilter/nf_conntrack_helper.h |   4 +
 include/net/tc_act/tc_ct.h  |   1 +
 include/uapi/linux/tc_act/tc_ct.h   |   3 +
 net/netfilter/nf_conntrack_helper.c | 102 +
 net/openvswitch/conntrack.c | 105 +
 net/sched/act_ct.c  | 118 
 6 files changed, 212 insertions(+), 121 deletions(-)

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 4/4] net: sched: add helper support in act_ct

2022-10-17 Thread Xin Long
This patch is to add helper support in act_ct for OVS actions=ct(alg=xxx)
offloading, which is corresponding to Commit cae3a2627520 ("openvswitch:
Allow attaching helpers to ct action") in OVS kernel part.

The difference is when adding TC actions family and proto cannot be got
from the filter/match, other than helper name in tb[TCA_CT_HELPER_NAME],
we also need to send the family in tb[TCA_CT_HELPER_FAMILY] and the
proto in tb[TCA_CT_HELPER_PROTO] to kernel.

Signed-off-by: Xin Long 
---
 include/net/tc_act/tc_ct.h|  1 +
 include/uapi/linux/tc_act/tc_ct.h |  3 ++
 net/sched/act_ct.c| 83 +--
 3 files changed, 82 insertions(+), 5 deletions(-)

diff --git a/include/net/tc_act/tc_ct.h b/include/net/tc_act/tc_ct.h
index 8250d6f0a462..b24ea2d9400b 100644
--- a/include/net/tc_act/tc_ct.h
+++ b/include/net/tc_act/tc_ct.h
@@ -10,6 +10,7 @@
 #include 
 
 struct tcf_ct_params {
+   struct nf_conntrack_helper *helper;
struct nf_conn *tmpl;
u16 zone;
 
diff --git a/include/uapi/linux/tc_act/tc_ct.h 
b/include/uapi/linux/tc_act/tc_ct.h
index 5fb1d7ac1027..6c5200f0ed38 100644
--- a/include/uapi/linux/tc_act/tc_ct.h
+++ b/include/uapi/linux/tc_act/tc_ct.h
@@ -22,6 +22,9 @@ enum {
TCA_CT_NAT_PORT_MIN,/* be16 */
TCA_CT_NAT_PORT_MAX,/* be16 */
TCA_CT_PAD,
+   TCA_CT_HELPER_NAME, /* string */
+   TCA_CT_HELPER_FAMILY,   /* u8 */
+   TCA_CT_HELPER_PROTO,/* u8 */
__TCA_CT_MAX
 };
 
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 193a460a9d7f..85f4dc5650da 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static struct workqueue_struct *act_ct_wq;
@@ -655,7 +656,7 @@ struct tc_ct_action_net {
 
 /* Determine whether skb->_nfct is equal to the result of conntrack lookup. */
 static bool tcf_ct_skb_nfct_cached(struct net *net, struct sk_buff *skb,
-  u16 zone_id, bool force)
+  struct tcf_ct_params *p, bool force)
 {
enum ip_conntrack_info ctinfo;
struct nf_conn *ct;
@@ -665,8 +666,15 @@ static bool tcf_ct_skb_nfct_cached(struct net *net, struct 
sk_buff *skb,
return false;
if (!net_eq(net, read_pnet(>ct_net)))
goto drop_ct;
-   if (nf_ct_zone(ct)->id != zone_id)
+   if (nf_ct_zone(ct)->id != p->zone)
goto drop_ct;
+   if (p->helper) {
+   struct nf_conn_help *help;
+
+   help = nf_ct_ext_find(ct, NF_CT_EXT_HELPER);
+   if (help && rcu_access_pointer(help->helper) != p->helper)
+   goto drop_ct;
+   }
 
/* Force conntrack entry direction. */
if (force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
@@ -832,6 +840,13 @@ static int tcf_ct_handle_fragments(struct net *net, struct 
sk_buff *skb,
 
 static void tcf_ct_params_free(struct tcf_ct_params *params)
 {
+   if (params->helper) {
+#if IS_ENABLED(CONFIG_NF_NAT)
+   if (params->ct_action & TCA_CT_ACT_NAT)
+   nf_nat_helper_put(params->helper);
+#endif
+   nf_conntrack_helper_put(params->helper);
+   }
if (params->ct_ft)
tcf_ct_flow_table_put(params->ct_ft);
if (params->tmpl)
@@ -1033,6 +1048,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct 
tc_action *a,
struct nf_hook_state state;
int nh_ofs, err, retval;
struct tcf_ct_params *p;
+   bool add_helper = false;
bool skip_add = false;
bool defrag = false;
struct nf_conn *ct;
@@ -1086,7 +1102,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct 
tc_action *a,
 * actually run the packet through conntrack twice unless it's for a
 * different zone.
 */
-   cached = tcf_ct_skb_nfct_cached(net, skb, p->zone, force);
+   cached = tcf_ct_skb_nfct_cached(net, skb, p, force);
if (!cached) {
if (tcf_ct_flow_table_lookup(p, skb, family)) {
skip_add = true;
@@ -1119,6 +1135,22 @@ static int tcf_ct_act(struct sk_buff *skb, const struct 
tc_action *a,
if (err != NF_ACCEPT)
goto drop;
 
+   if (!nf_ct_is_confirmed(ct) && commit && p->helper && !nfct_help(ct)) {
+   err = __nf_ct_try_assign_helper(ct, p->tmpl, GFP_ATOMIC);
+   if (err)
+   goto drop;
+   add_helper = true;
+   if (p->ct_action & TCA_CT_ACT_NAT && !nfct_seqadj(ct)) {
+   if (!nfct_seqadj_ext_add(ct))
+   goto drop;
+   }
+   }
+
+   if (nf_ct_is_confirmed(ct) ? ((!cached && !skip_add) || add_helper) : 
comm

[ovs-dev] [PATCH net-next 3/4] net: sched: call tcf_ct_params_free to free params in tcf_ct_init

2022-10-17 Thread Xin Long
This patch is to make the err path simple by calling tcf_ct_params_free(),
so that it won't cause problems when more members are added into param and
need freeing on the err path.

Signed-off-by: Xin Long 
---
 net/sched/act_ct.c | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index b38d91d6b249..193a460a9d7f 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -345,11 +345,9 @@ static void tcf_ct_flow_table_cleanup_work(struct 
work_struct *work)
module_put(THIS_MODULE);
 }
 
-static void tcf_ct_flow_table_put(struct tcf_ct_params *params)
+static void tcf_ct_flow_table_put(struct tcf_ct_flow_table *ct_ft)
 {
-   struct tcf_ct_flow_table *ct_ft = params->ct_ft;
-
-   if (refcount_dec_and_test(>ct_ft->ref)) {
+   if (refcount_dec_and_test(_ft->ref)) {
rhashtable_remove_fast(_ht, _ft->node, zones_params);
INIT_RCU_WORK(_ft->rwork, tcf_ct_flow_table_cleanup_work);
queue_rcu_work(act_ct_wq, _ft->rwork);
@@ -832,18 +830,23 @@ static int tcf_ct_handle_fragments(struct net *net, 
struct sk_buff *skb,
return err;
 }
 
-static void tcf_ct_params_free(struct rcu_head *head)
+static void tcf_ct_params_free(struct tcf_ct_params *params)
 {
-   struct tcf_ct_params *params = container_of(head,
-   struct tcf_ct_params, rcu);
-
-   tcf_ct_flow_table_put(params);
-
+   if (params->ct_ft)
+   tcf_ct_flow_table_put(params->ct_ft);
if (params->tmpl)
nf_ct_put(params->tmpl);
kfree(params);
 }
 
+static void tcf_ct_params_free_rcu(struct rcu_head *head)
+{
+   struct tcf_ct_params *params;
+
+   params = container_of(head, struct tcf_ct_params, rcu);
+   tcf_ct_params_free(params);
+}
+
 #if IS_ENABLED(CONFIG_NF_NAT)
 /* Modelled after nf_nat_ipv[46]_fn().
  * range is only used for new, uninitialized NAT state.
@@ -1390,7 +1393,7 @@ static int tcf_ct_init(struct net *net, struct nlattr 
*nla,
 
err = tcf_ct_flow_table_get(net, params);
if (err)
-   goto cleanup_params;
+   goto cleanup;
 
spin_lock_bh(>tcf_lock);
goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
@@ -1401,17 +1404,15 @@ static int tcf_ct_init(struct net *net, struct nlattr 
*nla,
if (goto_ch)
tcf_chain_put_by_act(goto_ch);
if (params)
-   call_rcu(>rcu, tcf_ct_params_free);
+   call_rcu(>rcu, tcf_ct_params_free_rcu);
 
return res;
 
-cleanup_params:
-   if (params->tmpl)
-   nf_ct_put(params->tmpl);
 cleanup:
if (goto_ch)
tcf_chain_put_by_act(goto_ch);
-   kfree(params);
+   if (params)
+   tcf_ct_params_free(params);
tcf_idr_release(*a, bind);
return err;
 }
@@ -1423,7 +1424,7 @@ static void tcf_ct_cleanup(struct tc_action *a)
 
params = rcu_dereference_protected(c->params, 1);
if (params)
-   call_rcu(>rcu, tcf_ct_params_free);
+   call_rcu(>rcu, tcf_ct_params_free_rcu);
 }
 
 static int tcf_ct_dump_key_val(struct sk_buff *skb,
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 2/4] net: move add ct helper function to nf_conntrack_helper for ovs and tc

2022-10-17 Thread Xin Long
Move ovs_ct_add_helper from openvswitch to nf_conntrack_helper and
rename as nf_ct_add_helper, so that it can be used in TC act_ct in
the next patch.

Signed-off-by: Xin Long 
---
 include/net/netfilter/nf_conntrack_helper.h |  2 +
 net/netfilter/nf_conntrack_helper.c | 31 +++
 net/openvswitch/conntrack.c | 44 +++--
 3 files changed, 38 insertions(+), 39 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_helper.h 
b/include/net/netfilter/nf_conntrack_helper.h
index 6c32e59fc16f..ad1adbfbeee2 100644
--- a/include/net/netfilter/nf_conntrack_helper.h
+++ b/include/net/netfilter/nf_conntrack_helper.h
@@ -116,6 +116,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct 
nf_conn *tmpl,
  gfp_t flags);
 
 int nf_ct_helper(struct sk_buff *skb, u16 proto);
+int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family,
+u8 proto, bool nat, struct nf_conntrack_helper **hp);
 
 void nf_ct_helper_destroy(struct nf_conn *ct);
 
diff --git a/net/netfilter/nf_conntrack_helper.c 
b/net/netfilter/nf_conntrack_helper.c
index 83615e479f87..ee22b8a059cd 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -311,6 +311,37 @@ int nf_ct_helper(struct sk_buff *skb, u16 proto)
 }
 EXPORT_SYMBOL_GPL(nf_ct_helper);
 
+int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family,
+u8 proto, bool nat, struct nf_conntrack_helper **hp)
+{
+   struct nf_conntrack_helper *helper;
+   struct nf_conn_help *help;
+   int err;
+
+   helper = nf_conntrack_helper_try_module_get(name, family, proto);
+   if (!helper)
+   return -EINVAL;
+
+   help = nf_ct_helper_ext_add(ct, GFP_KERNEL);
+   if (!help) {
+   nf_conntrack_helper_put(helper);
+   return -ENOMEM;
+   }
+#if IS_ENABLED(CONFIG_NF_NAT)
+   if (nat) {
+   err = nf_nat_helper_try_module_get(name, family, proto);
+   if (err) {
+   nf_conntrack_helper_put(helper);
+   return err;
+   }
+   }
+#endif
+   rcu_assign_pointer(help->helper, helper);
+   *hp = helper;
+   return 0;
+}
+EXPORT_SYMBOL_GPL(nf_ct_add_helper);
+
 /* appropriate ct lock protecting must be taken by caller */
 static int unhelp(struct nf_conn *ct, void *me)
 {
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 19b5c54615c8..d37011e678c2 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -1291,43 +1291,6 @@ int ovs_ct_clear(struct sk_buff *skb, struct sw_flow_key 
*key)
return 0;
 }
 
-static int ovs_ct_add_helper(struct ovs_conntrack_info *info, const char *name,
-const struct sw_flow_key *key, bool log)
-{
-   struct nf_conntrack_helper *helper;
-   struct nf_conn_help *help;
-   int ret = 0;
-
-   helper = nf_conntrack_helper_try_module_get(name, info->family,
-   key->ip.proto);
-   if (!helper) {
-   OVS_NLERR(log, "Unknown helper \"%s\"", name);
-   return -EINVAL;
-   }
-
-   help = nf_ct_helper_ext_add(info->ct, GFP_KERNEL);
-   if (!help) {
-   nf_conntrack_helper_put(helper);
-   return -ENOMEM;
-   }
-
-#if IS_ENABLED(CONFIG_NF_NAT)
-   if (info->nat) {
-   ret = nf_nat_helper_try_module_get(name, info->family,
-  key->ip.proto);
-   if (ret) {
-   nf_conntrack_helper_put(helper);
-   OVS_NLERR(log, "Failed to load \"%s\" NAT helper, 
error: %d",
- name, ret);
-   return ret;
-   }
-   }
-#endif
-   rcu_assign_pointer(help->helper, helper);
-   info->helper = helper;
-   return ret;
-}
-
 #if IS_ENABLED(CONFIG_NF_NAT)
 static int parse_nat(const struct nlattr *attr,
 struct ovs_conntrack_info *info, bool log)
@@ -1661,9 +1624,12 @@ int ovs_ct_copy_action(struct net *net, const struct 
nlattr *attr,
}
 
if (helper) {
-   err = ovs_ct_add_helper(_info, helper, key, log);
-   if (err)
+   err = nf_ct_add_helper(ct_info.ct, helper, ct_info.family,
+  key->ip.proto, ct_info.nat, 
_info.helper);
+   if (err) {
+   OVS_NLERR(log, "Failed to add %s helper %d", helper, 
err);
goto err_free_ct;
+   }
}
 
err = ovs_nla_add_action(sfa, OVS_ACTION_ATTR_CT, _info,
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 1/4] net: move the ct helper function to nf_conntrack_helper for ovs and tc

2022-10-17 Thread Xin Long
Move ovs_ct_helper from openvswitch to nf_conntrack_helper and rename
as nf_ct_helper so that it can be used in TC act_ct in the next patch.
Note that it also adds the checks for the family and proto, as in TC
act_ct, the packets with correct family and proto are not guaranteed.

Signed-off-by: Xin Long 
---
 include/net/netfilter/nf_conntrack_helper.h |  2 +
 net/netfilter/nf_conntrack_helper.c | 71 +
 net/openvswitch/conntrack.c | 61 +-
 3 files changed, 74 insertions(+), 60 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_helper.h 
b/include/net/netfilter/nf_conntrack_helper.h
index 9939c366f720..6c32e59fc16f 100644
--- a/include/net/netfilter/nf_conntrack_helper.h
+++ b/include/net/netfilter/nf_conntrack_helper.h
@@ -115,6 +115,8 @@ struct nf_conn_help *nf_ct_helper_ext_add(struct nf_conn 
*ct, gfp_t gfp);
 int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
  gfp_t flags);
 
+int nf_ct_helper(struct sk_buff *skb, u16 proto);
+
 void nf_ct_helper_destroy(struct nf_conn *ct);
 
 static inline struct nf_conn_help *nfct_help(const struct nf_conn *ct)
diff --git a/net/netfilter/nf_conntrack_helper.c 
b/net/netfilter/nf_conntrack_helper.c
index ff737a76052e..83615e479f87 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -26,7 +26,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 
 static DEFINE_MUTEX(nf_ct_helper_mutex);
 struct hlist_head *nf_ct_helper_hash __read_mostly;
@@ -240,6 +242,75 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct 
nf_conn *tmpl,
 }
 EXPORT_SYMBOL_GPL(__nf_ct_try_assign_helper);
 
+/* 'skb' should already be pulled to nh_ofs. */
+int nf_ct_helper(struct sk_buff *skb, u16 proto)
+{
+   const struct nf_conntrack_helper *helper;
+   const struct nf_conn_help *help;
+   enum ip_conntrack_info ctinfo;
+   unsigned int protoff;
+   struct nf_conn *ct;
+   int err;
+
+   ct = nf_ct_get(skb, );
+   if (!ct || ctinfo == IP_CT_RELATED_REPLY)
+   return NF_ACCEPT;
+
+   help = nfct_help(ct);
+   if (!help)
+   return NF_ACCEPT;
+
+   helper = rcu_dereference(help->helper);
+   if (!helper)
+   return NF_ACCEPT;
+
+   if (helper->tuple.src.l3num != NFPROTO_UNSPEC &&
+   helper->tuple.src.l3num != proto)
+   return NF_ACCEPT;
+
+   switch (proto) {
+   case NFPROTO_IPV4:
+   protoff = ip_hdrlen(skb);
+   proto = ip_hdr(skb)->protocol;
+   break;
+   case NFPROTO_IPV6: {
+   u8 nexthdr = ipv6_hdr(skb)->nexthdr;
+   __be16 frag_off;
+   int ofs;
+
+   ofs = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), ,
+  _off);
+   if (ofs < 0 || (frag_off & htons(~0x7)) != 0) {
+   pr_debug("proto header not found\n");
+   return NF_ACCEPT;
+   }
+   protoff = ofs;
+   proto = nexthdr;
+   break;
+   }
+   default:
+   WARN_ONCE(1, "helper invoked on non-IP family!");
+   return NF_DROP;
+   }
+
+   if (helper->tuple.dst.protonum != proto)
+   return NF_ACCEPT;
+
+   err = helper->help(skb, protoff, ct, ctinfo);
+   if (err != NF_ACCEPT)
+   return err;
+
+   /* Adjust seqs after helper.  This is needed due to some helpers (e.g.,
+* FTP with NAT) adusting the TCP payload size when mangling IP
+* addresses and/or port numbers in the text-based control connection.
+*/
+   if (test_bit(IPS_SEQ_ADJUST_BIT, >status) &&
+   !nf_ct_seq_adjust(skb, ct, ctinfo, protoff))
+   return NF_DROP;
+   return NF_ACCEPT;
+}
+EXPORT_SYMBOL_GPL(nf_ct_helper);
+
 /* appropriate ct lock protecting must be taken by caller */
 static int unhelp(struct nf_conn *ct, void *me)
 {
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index c7b10234cf7c..19b5c54615c8 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -434,65 +434,6 @@ static int ovs_ct_set_labels(struct nf_conn *ct, struct 
sw_flow_key *key,
return 0;
 }
 
-/* 'skb' should already be pulled to nh_ofs. */
-static int ovs_ct_helper(struct sk_buff *skb, u16 proto)
-{
-   const struct nf_conntrack_helper *helper;
-   const struct nf_conn_help *help;
-   enum ip_conntrack_info ctinfo;
-   unsigned int protoff;
-   struct nf_conn *ct;
-   int err;
-
-   ct = nf_ct_get(skb, );
-   if (!ct || ctinfo == IP_CT_RELATED_REPLY)
-   return NF_ACCEPT;
-
-   help = nfct_help(ct);
-   if (!help)
-   return NF_ACCEPT;
-
-   helper = rcu_dereference(h

[ovs-dev] [PATCH net-next 0/4] net: add helper support in tc act_ct for ovs offloading

2022-10-17 Thread Xin Long
Ilya reported an issue that FTP traffic would be broken when the OVS flow
with ct(commit,alg=ftp) installed in the OVS kernel module, and it was
caused by that TC didn't support the ftp helper offloaded from OVS.

This patchset is to add the helper support in act_ct for OVS offloading
in kernel net/sched.

The 1st and 2nd patches move some common code into nf_conntrack_helper from
openvswitch so that they could be used by net/sched in the 4th patch (Note
there are still some other common code used in both OVS and TC, and I will
extract it in other patches). The 3rd patch extracts another function in
net/sched to make the 4th patch easier to write. The 4th patch adds this
feature in net/sched.

The user space part will be added in another patch, and with it these OVS
flows (FTP over SNAT) can be used to test this feature:

  table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk \
actions=ct(table=1, nat), normal
  table=0, in_port=veth2,tcp,ct_state=-trk actions=ct(table=1, nat)
  table=0, in_port=veth1,tcp,ct_state=-trk actions=ct(table=0, nat)
  table=0, in_port=veth1,tcp,ct_state=+trk+rel actions=ct(commit, nat),normal
  table=0, in_port=veth1,tcp,ct_state=+trk+est actions=veth2"

  table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new \
actions=ct(commit, nat(src=7.7.16.1), alg=ftp),normal"
  table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+est actions=veth2"
  table=1, in_port=veth2,tcp,ct_state=+trk+est actions=veth1"

v1->v2:
  - go to drop instead of return -EINVAL when fails to add SEQADJ ext in
tcf_ct_act() as Paolo noticed.
  - add the 2nd patch to extract nf_ct_add_helper from openvswitch for
tc act_ct use.
  - add ct exts only when the ct is not confirmed as Pablo noticed.

Xin Long (4):
  net: move the ct helper function to nf_conntrack_helper for ovs and tc
  net: move add ct helper function to nf_conntrack_helper for ovs and tc
  net: sched: call tcf_ct_params_free to free params in tcf_ct_init
  net: sched: add helper support in act_ct

 include/net/netfilter/nf_conntrack_helper.h |   4 +
 include/net/tc_act/tc_ct.h  |   1 +
 include/uapi/linux/tc_act/tc_ct.h   |   3 +
 net/netfilter/nf_conntrack_helper.c | 102 +
 net/openvswitch/conntrack.c | 105 +
 net/sched/act_ct.c  | 118 
 6 files changed, 212 insertions(+), 121 deletions(-)

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net] openvswitch: add nf_ct_is_confirmed check before assigning the helper

2022-10-11 Thread Xin Long
On Tue, Oct 11, 2022 at 10:06 AM Aaron Conole  wrote:
>
> Aaron Conole  writes:
>
> > Xin Long  writes:
> >
> >> A WARN_ON call trace would be triggered when 'ct(commit, alg=helper)'
> >> applies on a confirmed connection:
> >>
> >>   WARNING: CPU: 0 PID: 1251 at net/netfilter/nf_conntrack_extend.c:98
> >>   RIP: 0010:nf_ct_ext_add+0x12d/0x150 [nf_conntrack]
> >>   Call Trace:
> >>
> >>nf_ct_helper_ext_add+0x12/0x60 [nf_conntrack]
> >>__nf_ct_try_assign_helper+0xc4/0x160 [nf_conntrack]
> >>__ovs_ct_lookup+0x72e/0x780 [openvswitch]
> >>ovs_ct_execute+0x1d8/0x920 [openvswitch]
> >>do_execute_actions+0x4e6/0xb60 [openvswitch]
> >>ovs_execute_actions+0x60/0x140 [openvswitch]
> >>ovs_packet_cmd_execute+0x2ad/0x310 [openvswitch]
> >>genl_family_rcv_msg_doit.isra.15+0x113/0x150
> >>genl_rcv_msg+0xef/0x1f0
> >>
> >> which can be reproduced with these OVS flows:
> >>
> >>   table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk
> >>   actions=ct(commit, table=1)
> >>   table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new
> >>   actions=ct(commit, alg=ftp),normal
> >>
> >> The issue was introduced by commit 248d45f1e193 ("openvswitch: Allow
> >> attaching helper in later commit") where it somehow removed the check
> >> of nf_ct_is_confirmed before asigning the helper. This patch is to fix
> >> it by bringing it back.
> >>
> >> Fixes: 248d45f1e193 ("openvswitch: Allow attaching helper in later commit")
> >> Reported-by: Pablo Neira Ayuso 
> >> Signed-off-by: Xin Long 
> >> ---
> >
> > Hi Xin,
> >
> > Looking at the original commit, I think this will read like a revert.  I
> > am doing some testing now, but I think we need input from Yi-Hung to
> > find out what the use case is that the original fixed.
>
> I'm also not able to reproduce the WARN_ON.  My env:
>
> kernel: 4c86114194e6 ("Merge tag 'iomap-6.1-merge-1' of 
> git://git.kernel.org/pub/scm/fs/xfs/xfs-linux")
>
> Using current upstream OVS
> I used your flows (adjusting the port names):
>
>  cookie=0x0, duration=246.240s, table=0, n_packets=17, n_bytes=1130, 
> ct_state=-trk,tcp,in_port=v0,tp_dst=2121 actions=ct(commit,table=1)
>  cookie=0x0, duration=246.240s, table=1, n_packets=1, n_bytes=74, 
> ct_state=+new+trk,tcp,in_port=v0,tp_dst=2121 actions=ct(commit,alg=ftp),NORMAL
>
> and ran:
>
> $ ip netns exec server python3 -m pyftpdlib -i 172.31.110.20 &
> $ ip netns exec client curl ftp://172.31.110.20:2121
>
> but no WARN_ON message got triggered.  Are there additional flows you
> used that I am missing, or perhaps this should be on a different kernel
> commit?
>
> > -Aaron
>
Hi, Aaron, thanks for looking at this.

Here is a script to reproduce it, you can try, and let me know if it's
still not reproduced
(it's also upstream ovs, for the first 2 lines, you may adjust on your env.)

export PATH=$PATH:/usr/local/share/openvswitch/scripts
ovs-ctl restart #systemctl restart openvswitch

ip net add ns0
ip net add ns1
ip link add veth0 type veth peer name veth1
ip link add veth3 type veth peer name veth2
ip link set veth0 netns ns0
ip link set veth3 netns ns1
ip net exec ns0 ip addr add  7.7.7.1/24 dev veth0
mac1=`ip net exec ns1 cat /sys/class/net/veth3/address`
mac2=`ip net exec ns0 cat /sys/class/net/veth0/address`
ip net exec ns0 ip neigh add 7.7.16.2 dev veth0 lladdr $mac1
ip net exec ns1 ip addr add  7.7.16.2/24 dev veth3
ip net exec ns1 ip neigh add 7.7.16.1 dev veth3 lladdr $mac2
ip net exec ns0 ip link set veth0 up
ip net exec ns1 ip link set veth3 up
ip net exec ns0 ip route add  7.7.16.2 dev veth0

sleep 0.5
ovs-vsctl set Open_vSwitch . other_config:hw-offload=false
ovs-vsctl set Open_vSwitch . other_config:tc-policy=skip_hw
ovs-vsctl add-br br-ovs
ovs-vsctl add-port br-ovs veth1
ovs-vsctl add-port br-ovs veth2
ip link set br-ovs up
ip link set veth1 up
ip link set veth2 up

ovs-ofctl del-flows br-ovs
ovs-ofctl add-flow br-ovs arp,actions=normal
ovs-ofctl add-flow br-ovs icmp,actions=normal
ovs-ofctl add-flow br-ovs "table=0,
in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk actions=ct(commit,
table=1)"
#ovs-ofctl add-flow br-ovs "table=0,
in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk actions=ct(table=1, nat)"
ovs-ofctl add-flow br-ovs "table=0, in_port=veth2,tcp,ct_state=-trk
actions=ct(table=1, nat)"
ovs-ofctl add-flow br-ovs "table=0, in_port=veth1,tcp,ct_state=-trk
actions=ct(table=0, nat)"

ovs-ofctl add-flow br-ovs "table=0,
in_port=veth1,tcp,ct_state=+trk+rel actions=ct(commit, nat),normal"
ovs-o

[ovs-dev] [PATCH net] openvswitch: add nf_ct_is_confirmed check before assigning the helper

2022-10-06 Thread Xin Long
A WARN_ON call trace would be triggered when 'ct(commit, alg=helper)'
applies on a confirmed connection:

  WARNING: CPU: 0 PID: 1251 at net/netfilter/nf_conntrack_extend.c:98
  RIP: 0010:nf_ct_ext_add+0x12d/0x150 [nf_conntrack]
  Call Trace:
   
   nf_ct_helper_ext_add+0x12/0x60 [nf_conntrack]
   __nf_ct_try_assign_helper+0xc4/0x160 [nf_conntrack]
   __ovs_ct_lookup+0x72e/0x780 [openvswitch]
   ovs_ct_execute+0x1d8/0x920 [openvswitch]
   do_execute_actions+0x4e6/0xb60 [openvswitch]
   ovs_execute_actions+0x60/0x140 [openvswitch]
   ovs_packet_cmd_execute+0x2ad/0x310 [openvswitch]
   genl_family_rcv_msg_doit.isra.15+0x113/0x150
   genl_rcv_msg+0xef/0x1f0

which can be reproduced with these OVS flows:

  table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk
  actions=ct(commit, table=1)
  table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new
  actions=ct(commit, alg=ftp),normal

The issue was introduced by commit 248d45f1e193 ("openvswitch: Allow
attaching helper in later commit") where it somehow removed the check
of nf_ct_is_confirmed before asigning the helper. This patch is to fix
it by bringing it back.

Fixes: 248d45f1e193 ("openvswitch: Allow attaching helper in later commit")
Reported-by: Pablo Neira Ayuso 
Signed-off-by: Xin Long 
---
 net/openvswitch/conntrack.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 4e70df91d0f2..6862475f0f88 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -1015,7 +1015,8 @@ static int __ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
 * connections which we will commit, we may need to attach
 * the helper here.
 */
-   if (info->commit && info->helper && !nfct_help(ct)) {
+   if (!nf_ct_is_confirmed(ct) && info->commit &&
+   info->helper && !nfct_help(ct)) {
int err = __nf_ct_try_assign_helper(ct, info->ct,
GFP_ATOMIC);
if (err)
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 3/3] net: sched: add helper support in act_ct

2022-10-06 Thread Xin Long
On Thu, Oct 6, 2022 at 10:11 AM Pablo Neira Ayuso  wrote:
>
> On Tue, Oct 04, 2022 at 09:19:56PM -0400, Xin Long wrote:
> [...]
> > @@ -1119,6 +1135,22 @@ static int tcf_ct_act(struct sk_buff *skb, const 
> > struct tc_action *a,
> >   if (err != NF_ACCEPT)
> >   goto drop;
> >
> > + if (commit && p->helper && !nfct_help(ct)) {
> > + err = __nf_ct_try_assign_helper(ct, p->tmpl, GFP_ATOMIC);
> > + if (err)
> > + goto drop;
> > + add_helper = true;
> > + if (p->ct_action & TCA_CT_ACT_NAT && !nfct_seqadj(ct)) {
> > + if (!nfct_seqadj_ext_add(ct))
>
> You can only add ct extensions if ct is !nf_ct_is_confirmed(ct)), is
> this guaranteed in this codepath?
This is a good catch, the same issue exists on __nf_ct_try_assign_helper(),
and also in __ovs_ct_lookup().

I could trigger the warning on OVS conntrack with the flow:

table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk
actions=ct(commit, table=1)
table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new
actions=ct(commit, alg=ftp),normal

I will prepare a fix for ovs conntrack first.

Thanks.

>
> > + return -EINVAL;
> > + }
> > + }
> > +
> > + if (nf_ct_is_confirmed(ct) ? ((!cached && !skip_add) || add_helper) : 
> > commit) {
> > + if (nf_ct_helper(skb, family) != NF_ACCEPT)
> > + goto drop;
> > + }
> > +
> >   if (commit) {
> >   tcf_ct_act_set_mark(ct, p->mark, p->mark_mask);
> >   tcf_ct_act_set_labels(ct, p->labels, p->labels_mask);
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 3/3] net: sched: add helper support in act_ct

2022-10-06 Thread Xin Long
On Thu, Oct 6, 2022 at 5:57 AM Paolo Abeni  wrote:
>
> On Tue, 2022-10-04 at 21:19 -0400, Xin Long wrote:
> > This patch is to add helper support in act_ct for OVS actions=ct(alg=xxx)
> > offloading, which is corresponding to Commit cae3a2627520 ("openvswitch:
> > Allow attaching helpers to ct action") in OVS kernel part.
> >
> > The difference is when adding TC actions family and proto cannot be got
> > from the filter/match, other than helper name in tb[TCA_CT_HELPER_NAME],
> > we also need to send the family in tb[TCA_CT_HELPER_FAMILY] and the
> > proto in tb[TCA_CT_HELPER_PROTO] to kernel.
> >
> > Signed-off-by: Xin Long 
> > ---
> >  include/net/tc_act/tc_ct.h|   1 +
> >  include/uapi/linux/tc_act/tc_ct.h |   3 +
> >  net/sched/act_ct.c| 113 --
> >  3 files changed, 112 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/net/tc_act/tc_ct.h b/include/net/tc_act/tc_ct.h
> > index 8250d6f0a462..b24ea2d9400b 100644
> > --- a/include/net/tc_act/tc_ct.h
> > +++ b/include/net/tc_act/tc_ct.h
> > @@ -10,6 +10,7 @@
> >  #include 
> >
> >  struct tcf_ct_params {
> > + struct nf_conntrack_helper *helper;
> >   struct nf_conn *tmpl;
> >   u16 zone;
> >
> > diff --git a/include/uapi/linux/tc_act/tc_ct.h 
> > b/include/uapi/linux/tc_act/tc_ct.h
> > index 5fb1d7ac1027..6c5200f0ed38 100644
> > --- a/include/uapi/linux/tc_act/tc_ct.h
> > +++ b/include/uapi/linux/tc_act/tc_ct.h
> > @@ -22,6 +22,9 @@ enum {
> >   TCA_CT_NAT_PORT_MIN,/* be16 */
> >   TCA_CT_NAT_PORT_MAX,/* be16 */
> >   TCA_CT_PAD,
> > + TCA_CT_HELPER_NAME, /* string */
> > + TCA_CT_HELPER_FAMILY,   /* u8 */
> > + TCA_CT_HELPER_PROTO,/* u8 */
> >   __TCA_CT_MAX
> >  };
> >
> > diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
> > index 193a460a9d7f..f237c27079db 100644
> > --- a/net/sched/act_ct.c
> > +++ b/net/sched/act_ct.c
> > @@ -33,6 +33,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >
> >  static struct workqueue_struct *act_ct_wq;
> > @@ -655,7 +656,7 @@ struct tc_ct_action_net {
> >
> >  /* Determine whether skb->_nfct is equal to the result of conntrack 
> > lookup. */
> >  static bool tcf_ct_skb_nfct_cached(struct net *net, struct sk_buff *skb,
> > -u16 zone_id, bool force)
> > +struct tcf_ct_params *p, bool force)
> >  {
> >   enum ip_conntrack_info ctinfo;
> >   struct nf_conn *ct;
> > @@ -665,8 +666,15 @@ static bool tcf_ct_skb_nfct_cached(struct net *net, 
> > struct sk_buff *skb,
> >   return false;
> >   if (!net_eq(net, read_pnet(>ct_net)))
> >   goto drop_ct;
> > - if (nf_ct_zone(ct)->id != zone_id)
> > + if (nf_ct_zone(ct)->id != p->zone)
> >   goto drop_ct;
> > + if (p->helper) {
> > + struct nf_conn_help *help;
> > +
> > + help = nf_ct_ext_find(ct, NF_CT_EXT_HELPER);
> > + if (help && rcu_access_pointer(help->helper) != p->helper)
> > + goto drop_ct;
> > + }
> >
> >   /* Force conntrack entry direction. */
> >   if (force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
> > @@ -832,6 +840,13 @@ static int tcf_ct_handle_fragments(struct net *net, 
> > struct sk_buff *skb,
> >
> >  static void tcf_ct_params_free(struct tcf_ct_params *params)
> >  {
> > + if (params->helper) {
> > +#if IS_ENABLED(CONFIG_NF_NAT)
> > + if (params->ct_action & TCA_CT_ACT_NAT)
> > + nf_nat_helper_put(params->helper);
> > +#endif
> > + nf_conntrack_helper_put(params->helper);
> > + }
>
> There is exactly the same code chunk in __ovs_ct_free_action(), I guess
> you can extract a common helper here, too.
>
> >   if (params->ct_ft)
> >   tcf_ct_flow_table_put(params->ct_ft);
> >   if (params->tmpl)
> > @@ -1033,6 +1048,7 @@ static int tcf_ct_act(struct sk_buff *skb, const 
> > struct tc_action *a,
> >   struct nf_hook_state state;
> >   int nh_ofs, err, retval;
> >   struct tcf_ct_params *p;
> > + bool add_helper = false;
> >   bool skip_add = false;
> >   bool defrag = false;
> >   struct nf_conn *c

[ovs-dev] [PATCH net-next 3/3] net: sched: add helper support in act_ct

2022-10-05 Thread Xin Long
This patch is to add helper support in act_ct for OVS actions=ct(alg=xxx)
offloading, which is corresponding to Commit cae3a2627520 ("openvswitch:
Allow attaching helpers to ct action") in OVS kernel part.

The difference is when adding TC actions family and proto cannot be got
from the filter/match, other than helper name in tb[TCA_CT_HELPER_NAME],
we also need to send the family in tb[TCA_CT_HELPER_FAMILY] and the
proto in tb[TCA_CT_HELPER_PROTO] to kernel.

Signed-off-by: Xin Long 
---
 include/net/tc_act/tc_ct.h|   1 +
 include/uapi/linux/tc_act/tc_ct.h |   3 +
 net/sched/act_ct.c| 113 --
 3 files changed, 112 insertions(+), 5 deletions(-)

diff --git a/include/net/tc_act/tc_ct.h b/include/net/tc_act/tc_ct.h
index 8250d6f0a462..b24ea2d9400b 100644
--- a/include/net/tc_act/tc_ct.h
+++ b/include/net/tc_act/tc_ct.h
@@ -10,6 +10,7 @@
 #include 
 
 struct tcf_ct_params {
+   struct nf_conntrack_helper *helper;
struct nf_conn *tmpl;
u16 zone;
 
diff --git a/include/uapi/linux/tc_act/tc_ct.h 
b/include/uapi/linux/tc_act/tc_ct.h
index 5fb1d7ac1027..6c5200f0ed38 100644
--- a/include/uapi/linux/tc_act/tc_ct.h
+++ b/include/uapi/linux/tc_act/tc_ct.h
@@ -22,6 +22,9 @@ enum {
TCA_CT_NAT_PORT_MIN,/* be16 */
TCA_CT_NAT_PORT_MAX,/* be16 */
TCA_CT_PAD,
+   TCA_CT_HELPER_NAME, /* string */
+   TCA_CT_HELPER_FAMILY,   /* u8 */
+   TCA_CT_HELPER_PROTO,/* u8 */
__TCA_CT_MAX
 };
 
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 193a460a9d7f..f237c27079db 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static struct workqueue_struct *act_ct_wq;
@@ -655,7 +656,7 @@ struct tc_ct_action_net {
 
 /* Determine whether skb->_nfct is equal to the result of conntrack lookup. */
 static bool tcf_ct_skb_nfct_cached(struct net *net, struct sk_buff *skb,
-  u16 zone_id, bool force)
+  struct tcf_ct_params *p, bool force)
 {
enum ip_conntrack_info ctinfo;
struct nf_conn *ct;
@@ -665,8 +666,15 @@ static bool tcf_ct_skb_nfct_cached(struct net *net, struct 
sk_buff *skb,
return false;
if (!net_eq(net, read_pnet(>ct_net)))
goto drop_ct;
-   if (nf_ct_zone(ct)->id != zone_id)
+   if (nf_ct_zone(ct)->id != p->zone)
goto drop_ct;
+   if (p->helper) {
+   struct nf_conn_help *help;
+
+   help = nf_ct_ext_find(ct, NF_CT_EXT_HELPER);
+   if (help && rcu_access_pointer(help->helper) != p->helper)
+   goto drop_ct;
+   }
 
/* Force conntrack entry direction. */
if (force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
@@ -832,6 +840,13 @@ static int tcf_ct_handle_fragments(struct net *net, struct 
sk_buff *skb,
 
 static void tcf_ct_params_free(struct tcf_ct_params *params)
 {
+   if (params->helper) {
+#if IS_ENABLED(CONFIG_NF_NAT)
+   if (params->ct_action & TCA_CT_ACT_NAT)
+   nf_nat_helper_put(params->helper);
+#endif
+   nf_conntrack_helper_put(params->helper);
+   }
if (params->ct_ft)
tcf_ct_flow_table_put(params->ct_ft);
if (params->tmpl)
@@ -1033,6 +1048,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct 
tc_action *a,
struct nf_hook_state state;
int nh_ofs, err, retval;
struct tcf_ct_params *p;
+   bool add_helper = false;
bool skip_add = false;
bool defrag = false;
struct nf_conn *ct;
@@ -1086,7 +1102,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct 
tc_action *a,
 * actually run the packet through conntrack twice unless it's for a
 * different zone.
 */
-   cached = tcf_ct_skb_nfct_cached(net, skb, p->zone, force);
+   cached = tcf_ct_skb_nfct_cached(net, skb, p, force);
if (!cached) {
if (tcf_ct_flow_table_lookup(p, skb, family)) {
skip_add = true;
@@ -1119,6 +1135,22 @@ static int tcf_ct_act(struct sk_buff *skb, const struct 
tc_action *a,
if (err != NF_ACCEPT)
goto drop;
 
+   if (commit && p->helper && !nfct_help(ct)) {
+   err = __nf_ct_try_assign_helper(ct, p->tmpl, GFP_ATOMIC);
+   if (err)
+   goto drop;
+   add_helper = true;
+   if (p->ct_action & TCA_CT_ACT_NAT && !nfct_seqadj(ct)) {
+   if (!nfct_seqadj_ext_add(ct))
+   return -EINVAL;
+   }
+   }
+
+   if (nf_ct_is_confirmed(ct) ? ((!cached && !skip_add) || add_helper) : 
commit) {
+ 

[ovs-dev] [PATCH net-next 2/3] net: sched: call tcf_ct_params_free to free params in tcf_ct_init

2022-10-05 Thread Xin Long
This patch is to make the err path simple by calling tcf_ct_params_free(),
so that it won't cause problems when more members are added into param and
need freeing on the err path.

Signed-off-by: Xin Long 
---
 net/sched/act_ct.c | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index b38d91d6b249..193a460a9d7f 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -345,11 +345,9 @@ static void tcf_ct_flow_table_cleanup_work(struct 
work_struct *work)
module_put(THIS_MODULE);
 }
 
-static void tcf_ct_flow_table_put(struct tcf_ct_params *params)
+static void tcf_ct_flow_table_put(struct tcf_ct_flow_table *ct_ft)
 {
-   struct tcf_ct_flow_table *ct_ft = params->ct_ft;
-
-   if (refcount_dec_and_test(>ct_ft->ref)) {
+   if (refcount_dec_and_test(_ft->ref)) {
rhashtable_remove_fast(_ht, _ft->node, zones_params);
INIT_RCU_WORK(_ft->rwork, tcf_ct_flow_table_cleanup_work);
queue_rcu_work(act_ct_wq, _ft->rwork);
@@ -832,18 +830,23 @@ static int tcf_ct_handle_fragments(struct net *net, 
struct sk_buff *skb,
return err;
 }
 
-static void tcf_ct_params_free(struct rcu_head *head)
+static void tcf_ct_params_free(struct tcf_ct_params *params)
 {
-   struct tcf_ct_params *params = container_of(head,
-   struct tcf_ct_params, rcu);
-
-   tcf_ct_flow_table_put(params);
-
+   if (params->ct_ft)
+   tcf_ct_flow_table_put(params->ct_ft);
if (params->tmpl)
nf_ct_put(params->tmpl);
kfree(params);
 }
 
+static void tcf_ct_params_free_rcu(struct rcu_head *head)
+{
+   struct tcf_ct_params *params;
+
+   params = container_of(head, struct tcf_ct_params, rcu);
+   tcf_ct_params_free(params);
+}
+
 #if IS_ENABLED(CONFIG_NF_NAT)
 /* Modelled after nf_nat_ipv[46]_fn().
  * range is only used for new, uninitialized NAT state.
@@ -1390,7 +1393,7 @@ static int tcf_ct_init(struct net *net, struct nlattr 
*nla,
 
err = tcf_ct_flow_table_get(net, params);
if (err)
-   goto cleanup_params;
+   goto cleanup;
 
spin_lock_bh(>tcf_lock);
goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
@@ -1401,17 +1404,15 @@ static int tcf_ct_init(struct net *net, struct nlattr 
*nla,
if (goto_ch)
tcf_chain_put_by_act(goto_ch);
if (params)
-   call_rcu(>rcu, tcf_ct_params_free);
+   call_rcu(>rcu, tcf_ct_params_free_rcu);
 
return res;
 
-cleanup_params:
-   if (params->tmpl)
-   nf_ct_put(params->tmpl);
 cleanup:
if (goto_ch)
tcf_chain_put_by_act(goto_ch);
-   kfree(params);
+   if (params)
+   tcf_ct_params_free(params);
tcf_idr_release(*a, bind);
return err;
 }
@@ -1423,7 +1424,7 @@ static void tcf_ct_cleanup(struct tc_action *a)
 
params = rcu_dereference_protected(c->params, 1);
if (params)
-   call_rcu(>rcu, tcf_ct_params_free);
+   call_rcu(>rcu, tcf_ct_params_free_rcu);
 }
 
 static int tcf_ct_dump_key_val(struct sk_buff *skb,
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 1/3] net: move the helper function to nf_conntrack_helper for ovs and tc

2022-10-05 Thread Xin Long
Move ovs_ct_helper from openvswitch to nf_conntrack_helper and rename
as nf_ct_helper so that it can be used in TC act_ct in the next patch.
Note that it also adds the checks for the family and proto, as in TC
act_ct, the packets with correct family and proto are not guaranteed.

Signed-off-by: Xin Long 
---
 include/net/netfilter/nf_conntrack_helper.h |  2 +
 net/netfilter/nf_conntrack_helper.c | 71 +
 net/openvswitch/conntrack.c | 61 +-
 3 files changed, 74 insertions(+), 60 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_helper.h 
b/include/net/netfilter/nf_conntrack_helper.h
index 9939c366f720..6c32e59fc16f 100644
--- a/include/net/netfilter/nf_conntrack_helper.h
+++ b/include/net/netfilter/nf_conntrack_helper.h
@@ -115,6 +115,8 @@ struct nf_conn_help *nf_ct_helper_ext_add(struct nf_conn 
*ct, gfp_t gfp);
 int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
  gfp_t flags);
 
+int nf_ct_helper(struct sk_buff *skb, u16 proto);
+
 void nf_ct_helper_destroy(struct nf_conn *ct);
 
 static inline struct nf_conn_help *nfct_help(const struct nf_conn *ct)
diff --git a/net/netfilter/nf_conntrack_helper.c 
b/net/netfilter/nf_conntrack_helper.c
index ff737a76052e..83615e479f87 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -26,7 +26,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 
 static DEFINE_MUTEX(nf_ct_helper_mutex);
 struct hlist_head *nf_ct_helper_hash __read_mostly;
@@ -240,6 +242,75 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct 
nf_conn *tmpl,
 }
 EXPORT_SYMBOL_GPL(__nf_ct_try_assign_helper);
 
+/* 'skb' should already be pulled to nh_ofs. */
+int nf_ct_helper(struct sk_buff *skb, u16 proto)
+{
+   const struct nf_conntrack_helper *helper;
+   const struct nf_conn_help *help;
+   enum ip_conntrack_info ctinfo;
+   unsigned int protoff;
+   struct nf_conn *ct;
+   int err;
+
+   ct = nf_ct_get(skb, );
+   if (!ct || ctinfo == IP_CT_RELATED_REPLY)
+   return NF_ACCEPT;
+
+   help = nfct_help(ct);
+   if (!help)
+   return NF_ACCEPT;
+
+   helper = rcu_dereference(help->helper);
+   if (!helper)
+   return NF_ACCEPT;
+
+   if (helper->tuple.src.l3num != NFPROTO_UNSPEC &&
+   helper->tuple.src.l3num != proto)
+   return NF_ACCEPT;
+
+   switch (proto) {
+   case NFPROTO_IPV4:
+   protoff = ip_hdrlen(skb);
+   proto = ip_hdr(skb)->protocol;
+   break;
+   case NFPROTO_IPV6: {
+   u8 nexthdr = ipv6_hdr(skb)->nexthdr;
+   __be16 frag_off;
+   int ofs;
+
+   ofs = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), ,
+  _off);
+   if (ofs < 0 || (frag_off & htons(~0x7)) != 0) {
+   pr_debug("proto header not found\n");
+   return NF_ACCEPT;
+   }
+   protoff = ofs;
+   proto = nexthdr;
+   break;
+   }
+   default:
+   WARN_ONCE(1, "helper invoked on non-IP family!");
+   return NF_DROP;
+   }
+
+   if (helper->tuple.dst.protonum != proto)
+   return NF_ACCEPT;
+
+   err = helper->help(skb, protoff, ct, ctinfo);
+   if (err != NF_ACCEPT)
+   return err;
+
+   /* Adjust seqs after helper.  This is needed due to some helpers (e.g.,
+* FTP with NAT) adusting the TCP payload size when mangling IP
+* addresses and/or port numbers in the text-based control connection.
+*/
+   if (test_bit(IPS_SEQ_ADJUST_BIT, >status) &&
+   !nf_ct_seq_adjust(skb, ct, ctinfo, protoff))
+   return NF_DROP;
+   return NF_ACCEPT;
+}
+EXPORT_SYMBOL_GPL(nf_ct_helper);
+
 /* appropriate ct lock protecting must be taken by caller */
 static int unhelp(struct nf_conn *ct, void *me)
 {
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index cb255d8ed99a..75e709303815 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -434,65 +434,6 @@ static int ovs_ct_set_labels(struct nf_conn *ct, struct 
sw_flow_key *key,
return 0;
 }
 
-/* 'skb' should already be pulled to nh_ofs. */
-static int ovs_ct_helper(struct sk_buff *skb, u16 proto)
-{
-   const struct nf_conntrack_helper *helper;
-   const struct nf_conn_help *help;
-   enum ip_conntrack_info ctinfo;
-   unsigned int protoff;
-   struct nf_conn *ct;
-   int err;
-
-   ct = nf_ct_get(skb, );
-   if (!ct || ctinfo == IP_CT_RELATED_REPLY)
-   return NF_ACCEPT;
-
-   help = nfct_help(ct);
-   if (!help)
-   return NF_ACCEPT;
-
-   helper = rcu_dereference(h

[ovs-dev] [PATCH net-next 0/3] net: add helper support in tc act_ct for ovs offloading

2022-10-05 Thread Xin Long
Ilya reported an issue that FTP traffic would be broken when the OVS flow
with ct(commit,alg=ftp) installed in the OVS kernel module, and it was
caused by that TC didn't support the ftp helper offloaded from OVS.

This patchset is to add the helper support in act_ct for ovs offloading
in kernel net/sched.

The 1st patch moves some code from openvswitch into nf_conntrack_helper
so that it can be used by net/sched in the 3rd patch (Note there are
still some other common code used in both OVS and TC and I will extract
it in other patches). The 2nd patch extracts another function in net/
sched to make the 3rd patch easier to write.

The user space part will be added in another patch, and with it these OVS
flows (FTP over SNAT) can be used to test this feature:

  table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk \
actions=ct(table=1, nat), normal
  table=0, in_port=veth2,tcp,ct_state=-trk actions=ct(table=1, nat)
  table=0, in_port=veth1,tcp,ct_state=-trk actions=ct(table=0, nat)
  table=0, in_port=veth1,tcp,ct_state=+trk+rel actions=ct(commit, nat),normal
  table=0, in_port=veth1,tcp,ct_state=+trk+est actions=veth2"

  table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new \
actions=ct(commit, nat(src=7.7.16.1), alg=ftp),normal"
  table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+est actions=veth2"
  table=1, in_port=veth2,tcp,ct_state=+trk+est actions=veth1"

Xin Long (3):
  net: move the helper function to nf_conntrack_helper for ovs and tc
  net: sched: call tcf_ct_params_free to free params in tcf_ct_init
  net: sched: add helper support in act_ct

 include/net/netfilter/nf_conntrack_helper.h |   2 +
 include/net/tc_act/tc_ct.h  |   1 +
 include/uapi/linux/tc_act/tc_ct.h   |   3 +
 net/netfilter/nf_conntrack_helper.c |  71 ++
 net/openvswitch/conntrack.c |  61 +---
 net/sched/act_ct.c  | 148 +---
 6 files changed, 204 insertions(+), 82 deletions(-)

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev