Re: [ovs-dev] [PATCH net] netfilter: nf_nat: fix action not being set for all ct states
On Sat, Dec 23, 2023 at 9:48 PM Brad Cowie wrote: > > On Sun, 24 Dec 2023 at 10:13, Simon Horman wrote: > > Thanks Brad, > > > > I agree with your analysis and that the problem appears to > > have been introduced by the cited commit. > > Thanks for the review Simon. > > > I am curious to know what use case triggers this / > > why it when unnoticed for a year. > > We encountered this issue while upgrading some routers from > linux 5.15 to 6.2. The dataplane on these routers is provided > by an openvswitch bridge which is controlled via openflow by > faucet. These routers are also performing SNAT on all traffic > to/from the wan interface via openvswitch conntrack openflow > rules. > > We noticed that after upgrading the linux kernel, traceroute/mtr > no longer worked when run from clients behind the router. > We eventually discovered the reason for this is that the > ICMP time exceeded messages elicited by traceroute were > matching openflow rules with the incorrect destination ip, > despite there being an openflow rule to undo the nat. > Other packets in the established or new state matched the > expected openflow rules. > > A git bisect between 5.15 and 6.2 showed that this change in > behaviour was introduced by commit ebddb1404900. After the > above patch is applied our routers perform nat correctly > again for traceroute/mtr. Acked-by: Xin Long ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next 2/3] net: sched: set IPS_CONFIRMED in tmpl status only when commit is set in act_ct
With the following flows, the packets will be dropped if OVS TC offload is enabled. 'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)' 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)' 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal' In the 1st flow, it finds the exp from the hashtable and removes it then creates the ct with this exp in act_ct. However, in the 2nd flow it goes to the OVS upcall at the 1st time. When the skb comes back from userspace, it has to create the ct again without exp(the exp was removed last time). With no 'rel' set in the ct, the 3rd flow can never get matched. In OVS conntrack, it works around it by adding its own exp lookup function ovs_ct_expect_find() where it doesn't remove the exp. Instead of creating a real ct, it only updates its keys with the exp and its master info. So when the skb comes back, the exp is still in the hashtable. However, we can't do this trick in act_ct, as tc flower match is using a real ct, and passing the exp and its master info to flower parsing via tc_skb_cb is also not possible (tc_skb_cb size is not big enough). The simple and clear fix is to not remove the exp at the 1st flow, namely, not set IPS_CONFIRMED in tmpl when commit is not set in act_ct. Reported-by: Shuang Li Signed-off-by: Xin Long --- net/sched/act_ct.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index abc71a06d634..7c652d14528b 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -1238,7 +1238,8 @@ static int tcf_ct_fill_params(struct net *net, } } - __set_bit(IPS_CONFIRMED_BIT, >status); + if (p->ct_action & TCA_CT_ACT_COMMIT) + __set_bit(IPS_CONFIRMED_BIT, >status); return 0; err: nf_ct_put(p->tmpl); -- 2.39.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack
By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed from the hashtable when lookup, we can simplify the exp processing code a lot in openvswitch conntrack. Signed-off-by: Xin Long --- net/openvswitch/conntrack.c | 78 + 1 file changed, 10 insertions(+), 68 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 331730fd3580..fa955e892210 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net *net, struct sw_flow_key *key, return 0; } -static struct nf_conntrack_expect * -ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone, - u16 proto, const struct sk_buff *skb) -{ - struct nf_conntrack_tuple tuple; - struct nf_conntrack_expect *exp; - - if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, )) - return NULL; - - exp = __nf_ct_expect_find(net, zone, ); - if (exp) { - struct nf_conntrack_tuple_hash *h; - - /* Delete existing conntrack entry, if it clashes with the -* expectation. This can happen since conntrack ALGs do not -* check for clashes between (new) expectations and existing -* conntrack entries. nf_conntrack_in() will check the -* expectations only if a conntrack entry can not be found, -* which can lead to OVS finding the expectation (here) in the -* init direction, but which will not be removed by the -* nf_conntrack_in() call, if a matching conntrack entry is -* found instead. In this case all init direction packets -* would be reported as new related packets, while reply -* direction packets would be reported as un-related -* established packets. -*/ - h = nf_conntrack_find_get(net, zone, ); - if (h) { - struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); - - nf_ct_delete(ct, 0, 0); - nf_ct_put(ct); - } - } - - return exp; -} - /* This replicates logic from nf_conntrack_core.c that is not exported. */ static enum ip_conntrack_info ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h) @@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct sw_flow_key *key, const struct ovs_conntrack_info *info, struct sk_buff *skb) { - struct nf_conntrack_expect *exp; - - /* If we pass an expected packet through nf_conntrack_in() the -* expectation is typically removed, but the packet could still be -* lost in upcall processing. To prevent this from happening we -* perform an explicit expectation lookup. Expected connections are -* always new, and will be passed through conntrack only when they are -* committed, as it is OK to remove the expectation at that time. -*/ - exp = ovs_ct_expect_find(net, >zone, info->family, skb); - if (exp) { - u8 state; - - /* NOTE: New connections are NATted and Helped only when -* committed, so we are not calling into NAT here. -*/ - state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED; - __ovs_ct_update_key(key, state, >zone, exp->master); - } else { - struct nf_conn *ct; - int err; + struct nf_conn *ct; + int err; - err = __ovs_ct_lookup(net, key, info, skb); - if (err) - return err; + err = __ovs_ct_lookup(net, key, info, skb); + if (err) + return err; - ct = (struct nf_conn *)skb_nfct(skb); - if (ct) - nf_ct_deliver_cached_events(ct); - } + ct = (struct nf_conn *)skb_nfct(skb); + if (ct) + nf_ct_deliver_cached_events(ct); return 0; } @@ -1460,7 +1401,8 @@ int ovs_ct_copy_action(struct net *net, const struct nlattr *attr, if (err) goto err_free_ct; - __set_bit(IPS_CONFIRMED_BIT, _info.ct->status); + if (ct_info.commit) + __set_bit(IPS_CONFIRMED_BIT, _info.ct->status); return 0; err_free_ct: __ovs_ct_free_action(_info); -- 2.39.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next 1/3] netfilter: allow exp not to be removed in nf_ct_find_expectation
Currently nf_conntrack_in() calling nf_ct_find_expectation() will remove the exp from the hash table. However, in some scenario, we expect the exp not to be removed when the created ct will not be confirmed, like in OVS and TC conntrack in the following patches. This patch allows exp not to be removed by setting IPS_CONFIRMED in the status of the tmpl. Signed-off-by: Xin Long --- include/net/netfilter/nf_conntrack_expect.h | 2 +- net/netfilter/nf_conntrack_core.c | 2 +- net/netfilter/nf_conntrack_expect.c | 4 ++-- net/netfilter/nft_ct.c | 2 ++ 4 files changed, 6 insertions(+), 4 deletions(-) diff --git a/include/net/netfilter/nf_conntrack_expect.h b/include/net/netfilter/nf_conntrack_expect.h index cf0d81be5a96..165e7a03b8e9 100644 --- a/include/net/netfilter/nf_conntrack_expect.h +++ b/include/net/netfilter/nf_conntrack_expect.h @@ -100,7 +100,7 @@ nf_ct_expect_find_get(struct net *net, struct nf_conntrack_expect * nf_ct_find_expectation(struct net *net, const struct nf_conntrack_zone *zone, - const struct nf_conntrack_tuple *tuple); + const struct nf_conntrack_tuple *tuple, bool unlink); void nf_ct_unlink_expect_report(struct nf_conntrack_expect *exp, u32 portid, int report); diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 992393102d5f..9f6f2e643575 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1756,7 +1756,7 @@ init_conntrack(struct net *net, struct nf_conn *tmpl, cnet = nf_ct_pernet(net); if (cnet->expect_count) { spin_lock_bh(_conntrack_expect_lock); - exp = nf_ct_find_expectation(net, zone, tuple); + exp = nf_ct_find_expectation(net, zone, tuple, !tmpl || nf_ct_is_confirmed(tmpl)); if (exp) { /* Welcome, Mr. Bond. We've been expecting you... */ __set_bit(IPS_EXPECTED_BIT, >status); diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c index 96948e98ec53..81ca348915c9 100644 --- a/net/netfilter/nf_conntrack_expect.c +++ b/net/netfilter/nf_conntrack_expect.c @@ -171,7 +171,7 @@ EXPORT_SYMBOL_GPL(nf_ct_expect_find_get); struct nf_conntrack_expect * nf_ct_find_expectation(struct net *net, const struct nf_conntrack_zone *zone, - const struct nf_conntrack_tuple *tuple) + const struct nf_conntrack_tuple *tuple, bool unlink) { struct nf_conntrack_net *cnet = nf_ct_pernet(net); struct nf_conntrack_expect *i, *exp = NULL; @@ -211,7 +211,7 @@ nf_ct_find_expectation(struct net *net, !refcount_inc_not_zero(>master->ct_general.use))) return NULL; - if (exp->flags & NF_CT_EXPECT_PERMANENT) { + if (exp->flags & NF_CT_EXPECT_PERMANENT || !unlink) { refcount_inc(>use); return exp; } else if (del_timer(>timeout)) { diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c index 38958e067aa8..e87fd4314c68 100644 --- a/net/netfilter/nft_ct.c +++ b/net/netfilter/nft_ct.c @@ -262,6 +262,7 @@ static void nft_ct_set_zone_eval(const struct nft_expr *expr, regs->verdict.code = NF_DROP; return; } + __set_bit(IPS_CONFIRMED_BIT, >status); } nf_ct_set(skb, ct, IP_CT_NEW); @@ -368,6 +369,7 @@ static bool nft_ct_tmpl_alloc_pcpu(void) return false; } + __set_bit(IPS_CONFIRMED_BIT, >status); per_cpu(nft_ct_pcpu_template, cpu) = tmp; } -- 2.39.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly
With the OVS upcall, the original ct in the skb will be dropped, and when the skb comes back from userspace it has to create a new ct again through nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act(). However, the new ct will not be able to have the exp as the original ct has taken it away from the hash table in nf_ct_find_expectation(). This will cause some flow never to be matched, like: 'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)' 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)' 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal' if the 2nd flow triggers the OVS upcall, the 3rd flow will never get matched. OVS conntrack works around this by adding its own exp lookup function to not remove the exp from the hash table and saving the exp and its master info to the flow keys instead of create a real ct. But this way doesn't work for TC act_ct. The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This allows both OVS conntrack and TC act_ct to have a simple and clear fix for this problem in the patch 2/3 and 3/3. Xin Long (3): netfilter: allow exp not to be removed in nf_ct_find_expectation net: sched: set IPS_CONFIRMED in tmpl status only when commit is set in act_ct openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack include/net/netfilter/nf_conntrack_expect.h | 2 +- net/netfilter/nf_conntrack_core.c | 2 +- net/netfilter/nf_conntrack_expect.c | 4 +- net/netfilter/nft_ct.c | 2 + net/openvswitch/conntrack.c | 78 +++-- net/sched/act_ct.c | 3 +- 6 files changed, 18 insertions(+), 73 deletions(-) -- 2.39.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv2 net-next 3/5] openvswitch: move key and ovs_cb update out of handle_fragments
This patch has no functional changes and just moves key and ovs_cb update out of handle_fragments, and skb_clear_hash() and skb->ignore_df change into handle_fragments(), to make it easier to move the duplicate code from handle_fragments() into nf_conntrack_ovs later. Note that it changes to pass info->family to handle_fragments() instead of key for the packet type check, as info->family is set according to key->eth.type in ovs_ct_copy_action() when creating the action. Signed-off-by: Xin Long --- net/openvswitch/conntrack.c | 37 + 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 47a58657b1e4..962e2f70e597 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -437,13 +437,12 @@ static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key, /* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero * value if 'skb' is freed. */ -static int handle_fragments(struct net *net, struct sw_flow_key *key, - u16 zone, struct sk_buff *skb) +static int handle_fragments(struct net *net, struct sk_buff *skb, + u16 zone, u8 family, u8 *proto, u16 *mru) { - struct ovs_skb_cb ovs_cb = *OVS_CB(skb); int err; - if (key->eth.type == htons(ETH_P_IP)) { + if (family == NFPROTO_IPV4) { enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone; memset(IPCB(skb), 0, sizeof(struct inet_skb_parm)); @@ -451,9 +450,9 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key, if (err) return err; - ovs_cb.mru = IPCB(skb)->frag_max_size; + *mru = IPCB(skb)->frag_max_size; #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) - } else if (key->eth.type == htons(ETH_P_IPV6)) { + } else if (family == NFPROTO_IPV6) { enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone; memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm)); @@ -464,22 +463,35 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key, return err; } - key->ip.proto = ipv6_hdr(skb)->nexthdr; - ovs_cb.mru = IP6CB(skb)->frag_max_size; + *proto = ipv6_hdr(skb)->nexthdr; + *mru = IP6CB(skb)->frag_max_size; #endif } else { kfree_skb(skb); return -EPFNOSUPPORT; } + skb_clear_hash(skb); + skb->ignore_df = 1; + + return 0; +} + +static int ovs_ct_handle_fragments(struct net *net, struct sw_flow_key *key, + u16 zone, int family, struct sk_buff *skb) +{ + struct ovs_skb_cb ovs_cb = *OVS_CB(skb); + int err; + + err = handle_fragments(net, skb, zone, family, >ip.proto, _cb.mru); + if (err) + return err; + /* The key extracted from the fragment that completed this datagram * likely didn't have an L4 header, so regenerate it. */ ovs_flow_key_update_l3l4(skb, key); - key->ip.frag = OVS_FRAG_TYPE_NONE; - skb_clear_hash(skb); - skb->ignore_df = 1; *OVS_CB(skb) = ovs_cb; return 0; @@ -,7 +1123,8 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb, } if (key->ip.frag != OVS_FRAG_TYPE_NONE) { - err = handle_fragments(net, key, info->zone.id, skb); + err = ovs_ct_handle_fragments(net, key, info->zone.id, + info->family, skb); if (err) return err; } -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv2 net-next 5/5] net: extract nf_ct_handle_fragments to nf_conntrack_ovs
Now handle_fragments() in OVS and TC have the similar code, and this patch removes the duplicate code by moving the function to nf_conntrack_ovs. Note that skb_clear_hash(skb) or skb->ignore_df = 1 should be done only when defrag returns 0, as it does in other places in kernel. Signed-off-by: Xin Long --- include/net/netfilter/nf_conntrack.h | 2 ++ net/netfilter/nf_conntrack_ovs.c | 48 net/openvswitch/conntrack.c | 45 +- net/sched/act_ct.c | 46 ++ 4 files changed, 53 insertions(+), 88 deletions(-) diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index a6e89d7212f8..7bbab8f2b73d 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -363,6 +363,8 @@ static inline struct nf_conntrack_net *nf_ct_pernet(const struct net *net) } int nf_ct_skb_network_trim(struct sk_buff *skb, int family); +int nf_ct_handle_fragments(struct net *net, struct sk_buff *skb, + u16 zone, u8 family, u8 *proto, u16 *mru); #define NF_CT_STAT_INC(net, count) __this_cpu_inc((net)->ct.stat->count) #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count) diff --git a/net/netfilter/nf_conntrack_ovs.c b/net/netfilter/nf_conntrack_ovs.c index c60ef71d1aea..52b776bdf526 100644 --- a/net/netfilter/nf_conntrack_ovs.c +++ b/net/netfilter/nf_conntrack_ovs.c @@ -3,6 +3,8 @@ #include #include +#include +#include #include /* 'skb' should already be pulled to nh_ofs. */ @@ -128,3 +130,49 @@ int nf_ct_skb_network_trim(struct sk_buff *skb, int family) return pskb_trim_rcsum(skb, len); } EXPORT_SYMBOL_GPL(nf_ct_skb_network_trim); + +/* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero + * value if 'skb' is freed. + */ +int nf_ct_handle_fragments(struct net *net, struct sk_buff *skb, + u16 zone, u8 family, u8 *proto, u16 *mru) +{ + int err; + + if (family == NFPROTO_IPV4) { + enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone; + + memset(IPCB(skb), 0, sizeof(struct inet_skb_parm)); + local_bh_disable(); + err = ip_defrag(net, skb, user); + local_bh_enable(); + if (err) + return err; + + *mru = IPCB(skb)->frag_max_size; +#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) + } else if (family == NFPROTO_IPV6) { + enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone; + + memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm)); + err = nf_ct_frag6_gather(net, skb, user); + if (err) { + if (err != -EINPROGRESS) + kfree_skb(skb); + return err; + } + + *proto = ipv6_hdr(skb)->nexthdr; + *mru = IP6CB(skb)->frag_max_size; +#endif + } else { + kfree_skb(skb); + return -EPFNOSUPPORT; + } + + skb_clear_hash(skb); + skb->ignore_df = 1; + + return 0; +} +EXPORT_SYMBOL_GPL(nf_ct_handle_fragments); diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 962e2f70e597..5d40ad02cabc 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -434,56 +434,13 @@ static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key, return 0; } -/* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero - * value if 'skb' is freed. - */ -static int handle_fragments(struct net *net, struct sk_buff *skb, - u16 zone, u8 family, u8 *proto, u16 *mru) -{ - int err; - - if (family == NFPROTO_IPV4) { - enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone; - - memset(IPCB(skb), 0, sizeof(struct inet_skb_parm)); - err = ip_defrag(net, skb, user); - if (err) - return err; - - *mru = IPCB(skb)->frag_max_size; -#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) - } else if (family == NFPROTO_IPV6) { - enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone; - - memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm)); - err = nf_ct_frag6_gather(net, skb, user); - if (err) { - if (err != -EINPROGRESS) - kfree_skb(skb); - return err; - } - - *proto = ipv6_hdr(skb)->nexthdr; - *mru = IP6CB(skb)->frag_max_size; -#endif - } else { - kfree_skb(skb); - return -EPFNOSUPPORT; - } - - skb_clear_hash(skb); - skb->ignore_df = 1; - - return 0; -} - static int
[ovs-dev] [PATCHv2 net-next 4/5] net: sched: move frag check and tc_skb_cb update out of handle_fragments
This patch has no functional changes and just moves frag check and tc_skb_cb update out of handle_fragments, to make it easier to move the duplicate code from handle_fragments() into nf_conntrack_ovs later. Signed-off-by: Xin Long --- net/sched/act_ct.c | 71 +- 1 file changed, 39 insertions(+), 32 deletions(-) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 0a1ecc972a8b..9f133ed93815 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -778,29 +778,10 @@ static int tcf_ct_ipv6_is_fragment(struct sk_buff *skb, bool *frag) return 0; } -static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, - u8 family, u16 zone, bool *defrag) +static int handle_fragments(struct net *net, struct sk_buff *skb, + u16 zone, u8 family, u16 *mru) { - enum ip_conntrack_info ctinfo; - struct nf_conn *ct; - int err = 0; - bool frag; - u16 mru; - - /* Previously seen (loopback)? Ignore. */ - ct = nf_ct_get(skb, ); - if ((ct && !nf_ct_is_template(ct)) || ctinfo == IP_CT_UNTRACKED) - return 0; - - if (family == NFPROTO_IPV4) - err = tcf_ct_ipv4_is_fragment(skb, ); - else - err = tcf_ct_ipv6_is_fragment(skb, ); - if (err || !frag) - return err; - - skb_get(skb); - mru = tc_skb_cb(skb)->mru; + int err; if (family == NFPROTO_IPV4) { enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone; @@ -812,10 +793,8 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, if (err && err != -EINPROGRESS) return err; - if (!err) { - *defrag = true; - mru = IPCB(skb)->frag_max_size; - } + if (!err) + *mru = IPCB(skb)->frag_max_size; } else { /* NFPROTO_IPV6 */ #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone; @@ -825,18 +804,14 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, if (err && err != -EINPROGRESS) goto out_free; - if (!err) { - *defrag = true; - mru = IP6CB(skb)->frag_max_size; - } + if (!err) + *mru = IP6CB(skb)->frag_max_size; #else err = -EOPNOTSUPP; goto out_free; #endif } - if (err != -EINPROGRESS) - tc_skb_cb(skb)->mru = mru; skb_clear_hash(skb); skb->ignore_df = 1; return err; @@ -846,6 +821,38 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, return err; } +static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, + u8 family, u16 zone, bool *defrag) +{ + enum ip_conntrack_info ctinfo; + struct nf_conn *ct; + int err = 0; + bool frag; + u16 mru; + + /* Previously seen (loopback)? Ignore. */ + ct = nf_ct_get(skb, ); + if ((ct && !nf_ct_is_template(ct)) || ctinfo == IP_CT_UNTRACKED) + return 0; + + if (family == NFPROTO_IPV4) + err = tcf_ct_ipv4_is_fragment(skb, ); + else + err = tcf_ct_ipv6_is_fragment(skb, ); + if (err || !frag) + return err; + + skb_get(skb); + err = handle_fragments(net, skb, zone, family, ); + if (err) + return err; + + *defrag = true; + tc_skb_cb(skb)->mru = mru; + + return 0; +} + static void tcf_ct_params_free(struct tcf_ct_params *params) { if (params->helper) { -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv2 net-next 1/5] net: create nf_conntrack_ovs for ovs and tc use
Similar to nf_nat_ovs created by Commit ebddb1404900 ("net: move the nat function to nf_nat_ovs for ovs and tc"), this patch is to create nf_conntrack_ovs to get these functions shared by OVS and TC only. There are nf_ct_helper() and nf_ct_add_helper() from nf_conntrak_helper in this patch, and will be more in the following patches. Signed-off-by: Xin Long --- net/netfilter/Kconfig | 3 + net/netfilter/Makefile | 1 + net/netfilter/nf_conntrack_helper.c | 98 -- net/netfilter/nf_conntrack_ovs.c| 104 net/openvswitch/Kconfig | 1 + net/sched/Kconfig | 1 + 6 files changed, 110 insertions(+), 98 deletions(-) create mode 100644 net/netfilter/nf_conntrack_ovs.c diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index f71b41c7ce2f..4d6737160857 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -189,6 +189,9 @@ config NF_CONNTRACK_LABELS to connection tracking entries. It can be used with xtables connlabel match and the nftables ct expression. +config NF_CONNTRACK_OVS + bool + config NF_CT_PROTO_DCCP bool 'DCCP protocol connection tracking support' depends on NETFILTER_ADVANCED diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index ba2a6b5e93d9..5ffef1cd6143 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -11,6 +11,7 @@ nf_conntrack-$(CONFIG_NF_CONNTRACK_TIMEOUT) += nf_conntrack_timeout.o nf_conntrack-$(CONFIG_NF_CONNTRACK_TIMESTAMP) += nf_conntrack_timestamp.o nf_conntrack-$(CONFIG_NF_CONNTRACK_EVENTS) += nf_conntrack_ecache.o nf_conntrack-$(CONFIG_NF_CONNTRACK_LABELS) += nf_conntrack_labels.o +nf_conntrack-$(CONFIG_NF_CONNTRACK_OVS) += nf_conntrack_ovs.o nf_conntrack-$(CONFIG_NF_CT_PROTO_DCCP) += nf_conntrack_proto_dccp.o nf_conntrack-$(CONFIG_NF_CT_PROTO_SCTP) += nf_conntrack_proto_sctp.o nf_conntrack-$(CONFIG_NF_CT_PROTO_GRE) += nf_conntrack_proto_gre.o diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c index 48ea6d0264b5..0c4db2f2ac43 100644 --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -242,104 +242,6 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, } EXPORT_SYMBOL_GPL(__nf_ct_try_assign_helper); -/* 'skb' should already be pulled to nh_ofs. */ -int nf_ct_helper(struct sk_buff *skb, struct nf_conn *ct, -enum ip_conntrack_info ctinfo, u16 proto) -{ - const struct nf_conntrack_helper *helper; - const struct nf_conn_help *help; - unsigned int protoff; - int err; - - if (ctinfo == IP_CT_RELATED_REPLY) - return NF_ACCEPT; - - help = nfct_help(ct); - if (!help) - return NF_ACCEPT; - - helper = rcu_dereference(help->helper); - if (!helper) - return NF_ACCEPT; - - if (helper->tuple.src.l3num != NFPROTO_UNSPEC && - helper->tuple.src.l3num != proto) - return NF_ACCEPT; - - switch (proto) { - case NFPROTO_IPV4: - protoff = ip_hdrlen(skb); - proto = ip_hdr(skb)->protocol; - break; - case NFPROTO_IPV6: { - u8 nexthdr = ipv6_hdr(skb)->nexthdr; - __be16 frag_off; - int ofs; - - ofs = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), , - _off); - if (ofs < 0 || (frag_off & htons(~0x7)) != 0) { - pr_debug("proto header not found\n"); - return NF_ACCEPT; - } - protoff = ofs; - proto = nexthdr; - break; - } - default: - WARN_ONCE(1, "helper invoked on non-IP family!"); - return NF_DROP; - } - - if (helper->tuple.dst.protonum != proto) - return NF_ACCEPT; - - err = helper->help(skb, protoff, ct, ctinfo); - if (err != NF_ACCEPT) - return err; - - /* Adjust seqs after helper. This is needed due to some helpers (e.g., -* FTP with NAT) adusting the TCP payload size when mangling IP -* addresses and/or port numbers in the text-based control connection. -*/ - if (test_bit(IPS_SEQ_ADJUST_BIT, >status) && - !nf_ct_seq_adjust(skb, ct, ctinfo, protoff)) - return NF_DROP; - return NF_ACCEPT; -} -EXPORT_SYMBOL_GPL(nf_ct_helper); - -int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family, -u8 proto, bool nat, struct nf_conntrack_helper **hp) -{ - struct nf_conntrack_helper *helper; - struct nf_conn_help *help; - int ret = 0; - - helper = nf_conntrack_helper_try_module_get(name, family, proto); - if (!hel
[ovs-dev] [PATCHv2 net-next 0/5] net: move more duplicate code of ovs and tc conntrack into nf_conntrack_ovs
We've moved some duplicate code into nf_nat_ovs in: "net: eliminate the duplicate code in the ct nat functions of ovs and tc" This patchset addresses more code duplication in the conntrack of ovs and tc then creates nf_conntrack_ovs for them, and four functions will be extracted and moved into it: nf_ct_handle_fragments() nf_ct_skb_network_trim() nf_ct_helper() nf_ct_add_helper() v1->v2: - In patch 1/5, fix the wrong option name 'NF_NF_CONNTRACK' used in net/openvswitch/Kconfig, found by kernel test robot. Xin Long (5): net: create nf_conntrack_ovs for ovs and tc use net: extract nf_ct_skb_network_trim function to nf_conntrack_ovs openvswitch: move key and ovs_cb update out of handle_fragments net: sched: move frag check and tc_skb_cb update out of handle_fragments net: extract nf_ct_handle_fragments to nf_conntrack_ovs include/net/netfilter/nf_conntrack.h | 4 + net/netfilter/Kconfig| 3 + net/netfilter/Makefile | 1 + net/netfilter/nf_conntrack_helper.c | 98 --- net/netfilter/nf_conntrack_ovs.c | 178 +++ net/openvswitch/Kconfig | 1 + net/openvswitch/conntrack.c | 80 ++-- net/sched/Kconfig| 1 + net/sched/act_ct.c | 76 ++-- 9 files changed, 207 insertions(+), 235 deletions(-) create mode 100644 net/netfilter/nf_conntrack_ovs.c -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv2 net-next 2/5] net: extract nf_ct_skb_network_trim function to nf_conntrack_ovs
There are almost the same code in ovs_skb_network_trim() and tcf_ct_skb_network_trim(), this patch extracts them into a function nf_ct_skb_network_trim() and moves the function to nf_conntrack_ovs. Signed-off-by: Xin Long --- include/net/netfilter/nf_conntrack.h | 2 ++ net/netfilter/nf_conntrack_ovs.c | 26 net/openvswitch/conntrack.c | 36 net/sched/act_ct.c | 27 + 4 files changed, 33 insertions(+), 58 deletions(-) diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index 6a2019aaa464..a6e89d7212f8 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -362,6 +362,8 @@ static inline struct nf_conntrack_net *nf_ct_pernet(const struct net *net) return net_generic(net, nf_conntrack_net_id); } +int nf_ct_skb_network_trim(struct sk_buff *skb, int family); + #define NF_CT_STAT_INC(net, count) __this_cpu_inc((net)->ct.stat->count) #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count) #define NF_CT_STAT_ADD_ATOMIC(net, count, v) this_cpu_add((net)->ct.stat->count, (v)) diff --git a/net/netfilter/nf_conntrack_ovs.c b/net/netfilter/nf_conntrack_ovs.c index eff4d53f8b8c..c60ef71d1aea 100644 --- a/net/netfilter/nf_conntrack_ovs.c +++ b/net/netfilter/nf_conntrack_ovs.c @@ -102,3 +102,29 @@ int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family, return ret; } EXPORT_SYMBOL_GPL(nf_ct_add_helper); + +/* Trim the skb to the length specified by the IP/IPv6 header, + * removing any trailing lower-layer padding. This prepares the skb + * for higher-layer processing that assumes skb->len excludes padding + * (such as nf_ip_checksum). The caller needs to pull the skb to the + * network header, and ensure ip_hdr/ipv6_hdr points to valid data. + */ +int nf_ct_skb_network_trim(struct sk_buff *skb, int family) +{ + unsigned int len; + + switch (family) { + case NFPROTO_IPV4: + len = skb_ip_totlen(skb); + break; + case NFPROTO_IPV6: + len = sizeof(struct ipv6hdr) + + ntohs(ipv6_hdr(skb)->payload_len); + break; + default: + len = skb->len; + } + + return pskb_trim_rcsum(skb, len); +} +EXPORT_SYMBOL_GPL(nf_ct_skb_network_trim); diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 2172930b1f17..47a58657b1e4 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -1090,36 +1090,6 @@ static int ovs_ct_commit(struct net *net, struct sw_flow_key *key, return 0; } -/* Trim the skb to the length specified by the IP/IPv6 header, - * removing any trailing lower-layer padding. This prepares the skb - * for higher-layer processing that assumes skb->len excludes padding - * (such as nf_ip_checksum). The caller needs to pull the skb to the - * network header, and ensure ip_hdr/ipv6_hdr points to valid data. - */ -static int ovs_skb_network_trim(struct sk_buff *skb) -{ - unsigned int len; - int err; - - switch (skb->protocol) { - case htons(ETH_P_IP): - len = skb_ip_totlen(skb); - break; - case htons(ETH_P_IPV6): - len = sizeof(struct ipv6hdr) - + ntohs(ipv6_hdr(skb)->payload_len); - break; - default: - len = skb->len; - } - - err = pskb_trim_rcsum(skb, len); - if (err) - kfree_skb(skb); - - return err; -} - /* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero * value if 'skb' is freed. */ @@ -1134,9 +1104,11 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb, nh_ofs = skb_network_offset(skb); skb_pull_rcsum(skb, nh_ofs); - err = ovs_skb_network_trim(skb); - if (err) + err = nf_ct_skb_network_trim(skb, info->family); + if (err) { + kfree_skb(skb); return err; + } if (key->ip.frag != OVS_FRAG_TYPE_NONE) { err = handle_fragments(net, key, info->zone.id, skb); diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index b126f03c1bb6..0a1ecc972a8b 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -726,31 +726,6 @@ static bool tcf_ct_skb_nfct_cached(struct net *net, struct sk_buff *skb, return false; } -/* Trim the skb to the length specified by the IP/IPv6 header, - * removing any trailing lower-layer padding. This prepares the skb - * for higher-layer processing that assumes skb->len excludes padding - * (such as nf_ip_checksum). The caller needs to pull the skb to the - * network header, and ensure ip_hdr/ipv6_hdr points to valid data. - */ -static int tcf_ct_skb_network_trim(struct sk_buff *skb, int family) -{ - unsigne
Re: [ovs-dev] [PATCH net-next 1/5] net: create nf_conntrack_ovs for ovs and tc use
On Sat, Feb 4, 2023 at 8:11 PM kernel test robot wrote: > > Hi Xin, > > I love your patch! Yet something to improve: > > [auto build test ERROR on net-next/master] > > url: > https://github.com/intel-lab-lkp/linux/commits/Xin-Long/net-create-nf_conntrack_ovs-for-ovs-and-tc-use/20230205-060514 > patch link: > https://lore.kernel.org/r/6eca3cf10a8c06f733fac943bcb997c06ec5daa3.1675548023.git.lucien.xin%40gmail.com > patch subject: [PATCH net-next 1/5] net: create nf_conntrack_ovs for ovs and > tc use > config: s390-defconfig > (https://download.01.org/0day-ci/archive/20230205/202302050823.jwxmwch9-...@intel.com/config) > compiler: s390-linux-gcc (GCC) 12.1.0 > reproduce (this is a W=1 build): > wget > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O > ~/bin/make.cross > chmod +x ~/bin/make.cross > # > https://github.com/intel-lab-lkp/linux/commit/e2bb7f965a86f833a4ae6ec28444ba328fdfc358 > git remote add linux-review https://github.com/intel-lab-lkp/linux > git fetch --no-tags linux-review > Xin-Long/net-create-nf_conntrack_ovs-for-ovs-and-tc-use/20230205-060514 > git checkout e2bb7f965a86f833a4ae6ec28444ba328fdfc358 > # save the config file > mkdir build_dir && cp config build_dir/.config > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 > O=build_dir ARCH=s390 olddefconfig > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 > O=build_dir ARCH=s390 SHELL=/bin/bash > > If you fix the issue, kindly add following tag where applicable > | Reported-by: kernel test robot > > All errors (new ones prefixed by >>, old ones prefixed by <<): > > >> ERROR: modpost: "nf_ct_helper" [net/openvswitch/openvswitch.ko] undefined! > >> ERROR: modpost: "nf_ct_add_helper" [net/openvswitch/openvswitch.ko] > >> undefined! > caused by: + select NF_CONNTRACK_OVS if NF_NF_CONNTRACK "NF_NF_CONNTRACK", incorrect option name was used, will fix it. The same cause is for Patch 2/5 and 5/5. Thanks. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next 5/5] net: extract nf_ct_handle_fragments to nf_conntrack_ovs
Now handle_fragments() in OVS and TC have the similar code, and this patch removes the duplicate code by moving the function to nf_conntrack_ovs. Note that skb_clear_hash(skb) or skb->ignore_df = 1 should be done only when defrag returns 0, as it does in other places in kernel. Signed-off-by: Xin Long --- include/net/netfilter/nf_conntrack.h | 2 ++ net/netfilter/nf_conntrack_ovs.c | 48 net/openvswitch/conntrack.c | 45 +- net/sched/act_ct.c | 46 ++ 4 files changed, 53 insertions(+), 88 deletions(-) diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index a6e89d7212f8..7bbab8f2b73d 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -363,6 +363,8 @@ static inline struct nf_conntrack_net *nf_ct_pernet(const struct net *net) } int nf_ct_skb_network_trim(struct sk_buff *skb, int family); +int nf_ct_handle_fragments(struct net *net, struct sk_buff *skb, + u16 zone, u8 family, u8 *proto, u16 *mru); #define NF_CT_STAT_INC(net, count) __this_cpu_inc((net)->ct.stat->count) #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count) diff --git a/net/netfilter/nf_conntrack_ovs.c b/net/netfilter/nf_conntrack_ovs.c index c60ef71d1aea..52b776bdf526 100644 --- a/net/netfilter/nf_conntrack_ovs.c +++ b/net/netfilter/nf_conntrack_ovs.c @@ -3,6 +3,8 @@ #include #include +#include +#include #include /* 'skb' should already be pulled to nh_ofs. */ @@ -128,3 +130,49 @@ int nf_ct_skb_network_trim(struct sk_buff *skb, int family) return pskb_trim_rcsum(skb, len); } EXPORT_SYMBOL_GPL(nf_ct_skb_network_trim); + +/* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero + * value if 'skb' is freed. + */ +int nf_ct_handle_fragments(struct net *net, struct sk_buff *skb, + u16 zone, u8 family, u8 *proto, u16 *mru) +{ + int err; + + if (family == NFPROTO_IPV4) { + enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone; + + memset(IPCB(skb), 0, sizeof(struct inet_skb_parm)); + local_bh_disable(); + err = ip_defrag(net, skb, user); + local_bh_enable(); + if (err) + return err; + + *mru = IPCB(skb)->frag_max_size; +#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) + } else if (family == NFPROTO_IPV6) { + enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone; + + memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm)); + err = nf_ct_frag6_gather(net, skb, user); + if (err) { + if (err != -EINPROGRESS) + kfree_skb(skb); + return err; + } + + *proto = ipv6_hdr(skb)->nexthdr; + *mru = IP6CB(skb)->frag_max_size; +#endif + } else { + kfree_skb(skb); + return -EPFNOSUPPORT; + } + + skb_clear_hash(skb); + skb->ignore_df = 1; + + return 0; +} +EXPORT_SYMBOL_GPL(nf_ct_handle_fragments); diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 962e2f70e597..5d40ad02cabc 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -434,56 +434,13 @@ static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key, return 0; } -/* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero - * value if 'skb' is freed. - */ -static int handle_fragments(struct net *net, struct sk_buff *skb, - u16 zone, u8 family, u8 *proto, u16 *mru) -{ - int err; - - if (family == NFPROTO_IPV4) { - enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone; - - memset(IPCB(skb), 0, sizeof(struct inet_skb_parm)); - err = ip_defrag(net, skb, user); - if (err) - return err; - - *mru = IPCB(skb)->frag_max_size; -#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) - } else if (family == NFPROTO_IPV6) { - enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone; - - memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm)); - err = nf_ct_frag6_gather(net, skb, user); - if (err) { - if (err != -EINPROGRESS) - kfree_skb(skb); - return err; - } - - *proto = ipv6_hdr(skb)->nexthdr; - *mru = IP6CB(skb)->frag_max_size; -#endif - } else { - kfree_skb(skb); - return -EPFNOSUPPORT; - } - - skb_clear_hash(skb); - skb->ignore_df = 1; - - return 0; -} - static int
[ovs-dev] [PATCH net-next 3/5] openvswitch: move key and ovs_cb update out of handle_fragments
This patch has no functional changes and just moves key and ovs_cb update out of handle_fragments, and skb_clear_hash() and skb->ignore_df change into handle_fragments(), to make it easier to move the duplicate code from handle_fragments() into nf_conntrack_ovs later. Note that it changes to pass info->family to handle_fragments() instead of key for the packet type check, as info->family is set according to key->eth.type in ovs_ct_copy_action() when creating the action. Signed-off-by: Xin Long --- net/openvswitch/conntrack.c | 37 + 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 47a58657b1e4..962e2f70e597 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -437,13 +437,12 @@ static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key, /* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero * value if 'skb' is freed. */ -static int handle_fragments(struct net *net, struct sw_flow_key *key, - u16 zone, struct sk_buff *skb) +static int handle_fragments(struct net *net, struct sk_buff *skb, + u16 zone, u8 family, u8 *proto, u16 *mru) { - struct ovs_skb_cb ovs_cb = *OVS_CB(skb); int err; - if (key->eth.type == htons(ETH_P_IP)) { + if (family == NFPROTO_IPV4) { enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone; memset(IPCB(skb), 0, sizeof(struct inet_skb_parm)); @@ -451,9 +450,9 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key, if (err) return err; - ovs_cb.mru = IPCB(skb)->frag_max_size; + *mru = IPCB(skb)->frag_max_size; #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) - } else if (key->eth.type == htons(ETH_P_IPV6)) { + } else if (family == NFPROTO_IPV6) { enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone; memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm)); @@ -464,22 +463,35 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key, return err; } - key->ip.proto = ipv6_hdr(skb)->nexthdr; - ovs_cb.mru = IP6CB(skb)->frag_max_size; + *proto = ipv6_hdr(skb)->nexthdr; + *mru = IP6CB(skb)->frag_max_size; #endif } else { kfree_skb(skb); return -EPFNOSUPPORT; } + skb_clear_hash(skb); + skb->ignore_df = 1; + + return 0; +} + +static int ovs_ct_handle_fragments(struct net *net, struct sw_flow_key *key, + u16 zone, int family, struct sk_buff *skb) +{ + struct ovs_skb_cb ovs_cb = *OVS_CB(skb); + int err; + + err = handle_fragments(net, skb, zone, family, >ip.proto, _cb.mru); + if (err) + return err; + /* The key extracted from the fragment that completed this datagram * likely didn't have an L4 header, so regenerate it. */ ovs_flow_key_update_l3l4(skb, key); - key->ip.frag = OVS_FRAG_TYPE_NONE; - skb_clear_hash(skb); - skb->ignore_df = 1; *OVS_CB(skb) = ovs_cb; return 0; @@ -,7 +1123,8 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb, } if (key->ip.frag != OVS_FRAG_TYPE_NONE) { - err = handle_fragments(net, key, info->zone.id, skb); + err = ovs_ct_handle_fragments(net, key, info->zone.id, + info->family, skb); if (err) return err; } -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next 4/5] net: sched: move frag check and tc_skb_cb update out of handle_fragments
This patch has no functional changes and just moves frag check and tc_skb_cb update out of handle_fragments, to make it easier to move the duplicate code from handle_fragments() into nf_conntrack_ovs later. Signed-off-by: Xin Long --- net/sched/act_ct.c | 71 +- 1 file changed, 39 insertions(+), 32 deletions(-) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 0a1ecc972a8b..9f133ed93815 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -778,29 +778,10 @@ static int tcf_ct_ipv6_is_fragment(struct sk_buff *skb, bool *frag) return 0; } -static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, - u8 family, u16 zone, bool *defrag) +static int handle_fragments(struct net *net, struct sk_buff *skb, + u16 zone, u8 family, u16 *mru) { - enum ip_conntrack_info ctinfo; - struct nf_conn *ct; - int err = 0; - bool frag; - u16 mru; - - /* Previously seen (loopback)? Ignore. */ - ct = nf_ct_get(skb, ); - if ((ct && !nf_ct_is_template(ct)) || ctinfo == IP_CT_UNTRACKED) - return 0; - - if (family == NFPROTO_IPV4) - err = tcf_ct_ipv4_is_fragment(skb, ); - else - err = tcf_ct_ipv6_is_fragment(skb, ); - if (err || !frag) - return err; - - skb_get(skb); - mru = tc_skb_cb(skb)->mru; + int err; if (family == NFPROTO_IPV4) { enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone; @@ -812,10 +793,8 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, if (err && err != -EINPROGRESS) return err; - if (!err) { - *defrag = true; - mru = IPCB(skb)->frag_max_size; - } + if (!err) + *mru = IPCB(skb)->frag_max_size; } else { /* NFPROTO_IPV6 */ #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone; @@ -825,18 +804,14 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, if (err && err != -EINPROGRESS) goto out_free; - if (!err) { - *defrag = true; - mru = IP6CB(skb)->frag_max_size; - } + if (!err) + *mru = IP6CB(skb)->frag_max_size; #else err = -EOPNOTSUPP; goto out_free; #endif } - if (err != -EINPROGRESS) - tc_skb_cb(skb)->mru = mru; skb_clear_hash(skb); skb->ignore_df = 1; return err; @@ -846,6 +821,38 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, return err; } +static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, + u8 family, u16 zone, bool *defrag) +{ + enum ip_conntrack_info ctinfo; + struct nf_conn *ct; + int err = 0; + bool frag; + u16 mru; + + /* Previously seen (loopback)? Ignore. */ + ct = nf_ct_get(skb, ); + if ((ct && !nf_ct_is_template(ct)) || ctinfo == IP_CT_UNTRACKED) + return 0; + + if (family == NFPROTO_IPV4) + err = tcf_ct_ipv4_is_fragment(skb, ); + else + err = tcf_ct_ipv6_is_fragment(skb, ); + if (err || !frag) + return err; + + skb_get(skb); + err = handle_fragments(net, skb, zone, family, ); + if (err) + return err; + + *defrag = true; + tc_skb_cb(skb)->mru = mru; + + return 0; +} + static void tcf_ct_params_free(struct tcf_ct_params *params) { if (params->helper) { -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next 0/5] net: move more duplicate code of ovs and tc conntrack into nf_conntrack_ovs
We've moved some duplicate code into nf_nat_ovs in: "net: eliminate the duplicate code in the ct nat functions of ovs and tc" This patchset addresses more code duplication in the conntrack of ovs and tc then creates nf_conntrack_ovs for them, and four functions will be extracted and moved into it: nf_ct_handle_fragments() nf_ct_skb_network_trim() nf_ct_helper() nf_ct_add_helper() Xin Long (5): net: create nf_conntrack_ovs for ovs and tc use net: extract nf_ct_skb_network_trim function to nf_conntrack_ovs openvswitch: move key and ovs_cb update out of handle_fragments net: sched: move frag check and tc_skb_cb update out of handle_fragments net: extract nf_ct_handle_fragments to nf_conntrack_ovs include/net/netfilter/nf_conntrack.h | 4 + net/netfilter/Kconfig| 6 + net/netfilter/Makefile | 1 + net/netfilter/nf_conntrack_helper.c | 98 --- net/netfilter/nf_conntrack_ovs.c | 178 +++ net/openvswitch/Kconfig | 1 + net/openvswitch/conntrack.c | 80 ++-- net/sched/Kconfig| 1 + net/sched/act_ct.c | 76 ++-- 9 files changed, 210 insertions(+), 235 deletions(-) create mode 100644 net/netfilter/nf_conntrack_ovs.c -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next 2/5] net: extract nf_ct_skb_network_trim function to nf_conntrack_ovs
There are almost the same code in ovs_skb_network_trim() and tcf_ct_skb_network_trim(), this patch extracts them into a function nf_ct_skb_network_trim() and moves the function to nf_conntrack_ovs. Signed-off-by: Xin Long --- include/net/netfilter/nf_conntrack.h | 2 ++ net/netfilter/nf_conntrack_ovs.c | 26 net/openvswitch/conntrack.c | 36 net/sched/act_ct.c | 27 + 4 files changed, 33 insertions(+), 58 deletions(-) diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index 6a2019aaa464..a6e89d7212f8 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -362,6 +362,8 @@ static inline struct nf_conntrack_net *nf_ct_pernet(const struct net *net) return net_generic(net, nf_conntrack_net_id); } +int nf_ct_skb_network_trim(struct sk_buff *skb, int family); + #define NF_CT_STAT_INC(net, count) __this_cpu_inc((net)->ct.stat->count) #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count) #define NF_CT_STAT_ADD_ATOMIC(net, count, v) this_cpu_add((net)->ct.stat->count, (v)) diff --git a/net/netfilter/nf_conntrack_ovs.c b/net/netfilter/nf_conntrack_ovs.c index eff4d53f8b8c..c60ef71d1aea 100644 --- a/net/netfilter/nf_conntrack_ovs.c +++ b/net/netfilter/nf_conntrack_ovs.c @@ -102,3 +102,29 @@ int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family, return ret; } EXPORT_SYMBOL_GPL(nf_ct_add_helper); + +/* Trim the skb to the length specified by the IP/IPv6 header, + * removing any trailing lower-layer padding. This prepares the skb + * for higher-layer processing that assumes skb->len excludes padding + * (such as nf_ip_checksum). The caller needs to pull the skb to the + * network header, and ensure ip_hdr/ipv6_hdr points to valid data. + */ +int nf_ct_skb_network_trim(struct sk_buff *skb, int family) +{ + unsigned int len; + + switch (family) { + case NFPROTO_IPV4: + len = skb_ip_totlen(skb); + break; + case NFPROTO_IPV6: + len = sizeof(struct ipv6hdr) + + ntohs(ipv6_hdr(skb)->payload_len); + break; + default: + len = skb->len; + } + + return pskb_trim_rcsum(skb, len); +} +EXPORT_SYMBOL_GPL(nf_ct_skb_network_trim); diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 2172930b1f17..47a58657b1e4 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -1090,36 +1090,6 @@ static int ovs_ct_commit(struct net *net, struct sw_flow_key *key, return 0; } -/* Trim the skb to the length specified by the IP/IPv6 header, - * removing any trailing lower-layer padding. This prepares the skb - * for higher-layer processing that assumes skb->len excludes padding - * (such as nf_ip_checksum). The caller needs to pull the skb to the - * network header, and ensure ip_hdr/ipv6_hdr points to valid data. - */ -static int ovs_skb_network_trim(struct sk_buff *skb) -{ - unsigned int len; - int err; - - switch (skb->protocol) { - case htons(ETH_P_IP): - len = skb_ip_totlen(skb); - break; - case htons(ETH_P_IPV6): - len = sizeof(struct ipv6hdr) - + ntohs(ipv6_hdr(skb)->payload_len); - break; - default: - len = skb->len; - } - - err = pskb_trim_rcsum(skb, len); - if (err) - kfree_skb(skb); - - return err; -} - /* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero * value if 'skb' is freed. */ @@ -1134,9 +1104,11 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb, nh_ofs = skb_network_offset(skb); skb_pull_rcsum(skb, nh_ofs); - err = ovs_skb_network_trim(skb); - if (err) + err = nf_ct_skb_network_trim(skb, info->family); + if (err) { + kfree_skb(skb); return err; + } if (key->ip.frag != OVS_FRAG_TYPE_NONE) { err = handle_fragments(net, key, info->zone.id, skb); diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index b126f03c1bb6..0a1ecc972a8b 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -726,31 +726,6 @@ static bool tcf_ct_skb_nfct_cached(struct net *net, struct sk_buff *skb, return false; } -/* Trim the skb to the length specified by the IP/IPv6 header, - * removing any trailing lower-layer padding. This prepares the skb - * for higher-layer processing that assumes skb->len excludes padding - * (such as nf_ip_checksum). The caller needs to pull the skb to the - * network header, and ensure ip_hdr/ipv6_hdr points to valid data. - */ -static int tcf_ct_skb_network_trim(struct sk_buff *skb, int family) -{ - unsigne
[ovs-dev] [PATCH net-next 1/5] net: create nf_conntrack_ovs for ovs and tc use
Similar to nf_nat_ovs created by Commit ebddb1404900 ("net: move the nat function to nf_nat_ovs for ovs and tc"), this patch is to create nf_conntrack_ovs to get these functions shared by OVS and TC only. There are nf_ct_helper() and nf_ct_add_helper() from nf_conntrak_helper in this patch, and will be more in the following patches. Signed-off-by: Xin Long --- net/netfilter/Kconfig | 6 ++ net/netfilter/Makefile | 1 + net/netfilter/nf_conntrack_helper.c | 98 -- net/netfilter/nf_conntrack_ovs.c| 104 net/openvswitch/Kconfig | 1 + net/sched/Kconfig | 1 + 6 files changed, 113 insertions(+), 98 deletions(-) create mode 100644 net/netfilter/nf_conntrack_ovs.c diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index f71b41c7ce2f..645664f462e5 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -189,6 +189,12 @@ config NF_CONNTRACK_LABELS to connection tracking entries. It can be used with xtables connlabel match and the nftables ct expression. +config NF_CONNTRACK_OVS + bool "Connection tracking for openvswitch" + help + This option enables support for openvswitch. It can be used by + openvswitch and tc conntrack. + config NF_CT_PROTO_DCCP bool 'DCCP protocol connection tracking support' depends on NETFILTER_ADVANCED diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index ba2a6b5e93d9..5ffef1cd6143 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -11,6 +11,7 @@ nf_conntrack-$(CONFIG_NF_CONNTRACK_TIMEOUT) += nf_conntrack_timeout.o nf_conntrack-$(CONFIG_NF_CONNTRACK_TIMESTAMP) += nf_conntrack_timestamp.o nf_conntrack-$(CONFIG_NF_CONNTRACK_EVENTS) += nf_conntrack_ecache.o nf_conntrack-$(CONFIG_NF_CONNTRACK_LABELS) += nf_conntrack_labels.o +nf_conntrack-$(CONFIG_NF_CONNTRACK_OVS) += nf_conntrack_ovs.o nf_conntrack-$(CONFIG_NF_CT_PROTO_DCCP) += nf_conntrack_proto_dccp.o nf_conntrack-$(CONFIG_NF_CT_PROTO_SCTP) += nf_conntrack_proto_sctp.o nf_conntrack-$(CONFIG_NF_CT_PROTO_GRE) += nf_conntrack_proto_gre.o diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c index 48ea6d0264b5..0c4db2f2ac43 100644 --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -242,104 +242,6 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, } EXPORT_SYMBOL_GPL(__nf_ct_try_assign_helper); -/* 'skb' should already be pulled to nh_ofs. */ -int nf_ct_helper(struct sk_buff *skb, struct nf_conn *ct, -enum ip_conntrack_info ctinfo, u16 proto) -{ - const struct nf_conntrack_helper *helper; - const struct nf_conn_help *help; - unsigned int protoff; - int err; - - if (ctinfo == IP_CT_RELATED_REPLY) - return NF_ACCEPT; - - help = nfct_help(ct); - if (!help) - return NF_ACCEPT; - - helper = rcu_dereference(help->helper); - if (!helper) - return NF_ACCEPT; - - if (helper->tuple.src.l3num != NFPROTO_UNSPEC && - helper->tuple.src.l3num != proto) - return NF_ACCEPT; - - switch (proto) { - case NFPROTO_IPV4: - protoff = ip_hdrlen(skb); - proto = ip_hdr(skb)->protocol; - break; - case NFPROTO_IPV6: { - u8 nexthdr = ipv6_hdr(skb)->nexthdr; - __be16 frag_off; - int ofs; - - ofs = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), , - _off); - if (ofs < 0 || (frag_off & htons(~0x7)) != 0) { - pr_debug("proto header not found\n"); - return NF_ACCEPT; - } - protoff = ofs; - proto = nexthdr; - break; - } - default: - WARN_ONCE(1, "helper invoked on non-IP family!"); - return NF_DROP; - } - - if (helper->tuple.dst.protonum != proto) - return NF_ACCEPT; - - err = helper->help(skb, protoff, ct, ctinfo); - if (err != NF_ACCEPT) - return err; - - /* Adjust seqs after helper. This is needed due to some helpers (e.g., -* FTP with NAT) adusting the TCP payload size when mangling IP -* addresses and/or port numbers in the text-based control connection. -*/ - if (test_bit(IPS_SEQ_ADJUST_BIT, >status) && - !nf_ct_seq_adjust(skb, ct, ctinfo, protoff)) - return NF_DROP; - return NF_ACCEPT; -} -EXPORT_SYMBOL_GPL(nf_ct_helper); - -int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family, -u8 proto, bool nat, struct nf_conntrack_helper **hp) -{ - str
[ovs-dev] [PATCHv4 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc
There are two nat functions are nearly the same in both OVS and TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat(). This patch creates nf_nat_ovs.c under netfilter and moves them there then exports nf_ct_nat() so that it can be shared by both OVS and TC, and keeps the nat (type) check and nat flag update in OVS and TC's own place, as these parts are different between OVS and TC. Note that in OVS nat function it was using skb->protocol to get the proto as it already skips vlans in key_extract(), while it doesn't in TC, and TC has to call skb_protocol() to get proto. So in nf_ct_nat_execute(), we keep using skb_protocol() which works for both OVS and TC contrack. Signed-off-by: Xin Long --- include/net/netfilter/nf_nat.h | 4 + net/netfilter/Kconfig | 3 + net/netfilter/Makefile | 1 + net/netfilter/nf_nat_ovs.c | 135 net/openvswitch/Kconfig| 1 + net/openvswitch/conntrack.c| 137 +++-- net/sched/Kconfig | 1 + net/sched/act_ct.c | 136 +++- 8 files changed, 166 insertions(+), 252 deletions(-) create mode 100644 net/netfilter/nf_nat_ovs.c diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h index e9eb01e99d2f..9877f064548a 100644 --- a/include/net/netfilter/nf_nat.h +++ b/include/net/netfilter/nf_nat.h @@ -104,6 +104,10 @@ unsigned int nf_nat_inet_fn(void *priv, struct sk_buff *skb, const struct nf_hook_state *state); +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct, + enum ip_conntrack_info ctinfo, int *action, + const struct nf_nat_range2 *range, bool commit); + static inline int nf_nat_initialized(const struct nf_conn *ct, enum nf_nat_manip_type manip) { diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index 0846bd75b1da..f71b41c7ce2f 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -459,6 +459,9 @@ config NF_NAT_REDIRECT config NF_NAT_MASQUERADE bool +config NF_NAT_OVS + bool + config NETFILTER_SYNPROXY tristate diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 1d4db1943936..3754eb06fb41 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -59,6 +59,7 @@ obj-$(CONFIG_NF_LOG_SYSLOG) += nf_log_syslog.o obj-$(CONFIG_NF_NAT) += nf_nat.o nf_nat-$(CONFIG_NF_NAT_REDIRECT) += nf_nat_redirect.o nf_nat-$(CONFIG_NF_NAT_MASQUERADE) += nf_nat_masquerade.o +nf_nat-$(CONFIG_NF_NAT_OVS) += nf_nat_ovs.o ifeq ($(CONFIG_NF_NAT),m) nf_nat-$(CONFIG_DEBUG_INFO_BTF_MODULES) += nf_nat_bpf.o diff --git a/net/netfilter/nf_nat_ovs.c b/net/netfilter/nf_nat_ovs.c new file mode 100644 index ..551abd2da614 --- /dev/null +++ b/net/netfilter/nf_nat_ovs.c @@ -0,0 +1,135 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Support nat functions for openvswitch and used by OVS and TC conntrack. */ + +#include + +/* Modelled after nf_nat_ipv[46]_fn(). + * range is only used for new, uninitialized NAT state. + * Returns either NF_ACCEPT or NF_DROP. + */ +static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, +enum ip_conntrack_info ctinfo, int *action, +const struct nf_nat_range2 *range, +enum nf_nat_manip_type maniptype) +{ + __be16 proto = skb_protocol(skb, true); + int hooknum, err = NF_ACCEPT; + + /* See HOOK2MANIP(). */ + if (maniptype == NF_NAT_MANIP_SRC) + hooknum = NF_INET_LOCAL_IN; /* Source NAT */ + else + hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */ + + switch (ctinfo) { + case IP_CT_RELATED: + case IP_CT_RELATED_REPLY: + if (proto == htons(ETH_P_IP) && + ip_hdr(skb)->protocol == IPPROTO_ICMP) { + if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo, + hooknum)) + err = NF_DROP; + goto out; + } else if (IS_ENABLED(CONFIG_IPV6) && proto == htons(ETH_P_IPV6)) { + __be16 frag_off; + u8 nexthdr = ipv6_hdr(skb)->nexthdr; + int hdrlen = ipv6_skip_exthdr(skb, + sizeof(struct ipv6hdr), + , _off); + + if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) { + if (!nf_nat_icmpv6_reply_translation(skb, ct, +ctinfo, +hooknum, +hdrlen)) +
[ovs-dev] [PATCHv4 net-next 4/5] net: sched: update the nat flag for icmp error packets in ct_nat_execute
In ovs_ct_nat_execute(), the packet flow key nat flags are updated when it processes ICMP(v6) error packets translation successfully. In ct_nat_execute() when processing ICMP(v6) error packets translation successfully, it should have done the same in ct_nat_execute() to set post_ct_s/dnat flag, which will be used to update flow key nat flags in OVS module later. Reviewed-by: Saeed Mahameed Signed-off-by: Xin Long --- net/sched/act_ct.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index dd5ae7551956..bb87d1e910ea 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -936,13 +936,13 @@ static int ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, } err = nf_nat_packet(ct, ctinfo, hooknum, skb); +out: if (err == NF_ACCEPT) { if (maniptype == NF_NAT_MANIP_SRC) tc_skb_cb(skb)->post_ct_snat = 1; if (maniptype == NF_NAT_MANIP_DST) tc_skb_cb(skb)->post_ct_dnat = 1; } -out: return err; } #endif /* CONFIG_NF_NAT */ -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv4 net-next 1/5] openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute
The calls to ovs_ct_nat_execute() are as below: ovs_ct_execute() ovs_ct_lookup() __ovs_ct_lookup() ovs_ct_nat() ovs_ct_nat_execute() ovs_ct_commit() __ovs_ct_lookup() ovs_ct_nat() ovs_ct_nat_execute() and since skb_pull_rcsum() and skb_push_rcsum() are already called in ovs_ct_execute(), there's no need to do it again in ovs_ct_nat_execute(). Reviewed-by: Saeed Mahameed Acked-by: Aaron Conole Signed-off-by: Xin Long --- net/openvswitch/conntrack.c | 17 ++--- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index d78f0fc4337d..dff093a10d6d 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -735,10 +735,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, const struct nf_nat_range2 *range, enum nf_nat_manip_type maniptype, struct sw_flow_key *key) { - int hooknum, nh_off, err = NF_ACCEPT; - - nh_off = skb_network_offset(skb); - skb_pull_rcsum(skb, nh_off); + int hooknum, err = NF_ACCEPT; /* See HOOK2MANIP(). */ if (maniptype == NF_NAT_MANIP_SRC) @@ -755,7 +752,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo, hooknum)) err = NF_DROP; - goto push; + goto out; } else if (IS_ENABLED(CONFIG_IPV6) && skb->protocol == htons(ETH_P_IPV6)) { __be16 frag_off; @@ -770,7 +767,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, hooknum, hdrlen)) err = NF_DROP; - goto push; + goto out; } } /* Non-ICMP, fall thru to initialize if needed. */ @@ -788,7 +785,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, ? nf_nat_setup_info(ct, range, maniptype) : nf_nat_alloc_null_binding(ct, hooknum); if (err != NF_ACCEPT) - goto push; + goto out; } break; @@ -798,13 +795,11 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, default: err = NF_DROP; - goto push; + goto out; } err = nf_nat_packet(ct, ctinfo, hooknum, skb); -push: - skb_push_rcsum(skb, nh_off); - +out: /* Update the flow key if NAT successful. */ if (err == NF_ACCEPT) ovs_nat_update_key(key, skb, maniptype); -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv4 net-next 2/5] openvswitch: return NF_ACCEPT when OVS_CT_NAT is not set in info nat
Either OVS_CT_SRC_NAT or OVS_CT_DST_NAT is set, OVS_CT_NAT must be set in info->nat. Thus, if OVS_CT_NAT is not set in info->nat, it will definitely not do NAT but returns NF_ACCEPT in ovs_ct_nat(). This patch changes nothing funcational but only makes this return earlier in ovs_ct_nat() to keep consistent with TC's processing in tcf_ct_act_nat(). Reviewed-by: Saeed Mahameed Acked-by: Aaron Conole Signed-off-by: Xin Long --- net/openvswitch/conntrack.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index dff093a10d6d..5ea74270da46 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -816,6 +816,9 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key *key, enum nf_nat_manip_type maniptype; int err; + if (!(info->nat & OVS_CT_NAT)) + return NF_ACCEPT; + /* Add NAT extension if not confirmed yet. */ if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct)) return NF_ACCEPT; /* Can't NAT. */ @@ -825,8 +828,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key *key, * Make sure new expected connections (IP_CT_RELATED) are NATted only * when committing. */ - if (info->nat & OVS_CT_NAT && ctinfo != IP_CT_NEW && - ct->status & IPS_NAT_MASK && + if (ctinfo != IP_CT_NEW && ct->status & IPS_NAT_MASK && (ctinfo != IP_CT_RELATED || info->commit)) { /* NAT an established or related connection like before. */ if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv4 net-next 3/5] openvswitch: return NF_DROP when fails to add nat ext in ovs_ct_nat
When it fails to allocate nat ext, the packet should be dropped, like the memory allocation failures in other places in ovs_ct_nat(). This patch changes to return NF_DROP when fails to add nat ext before doing NAT in ovs_ct_nat(), also it would keep consistent with tc action ct' processing in tcf_ct_act_nat(). Signed-off-by: Xin Long --- net/openvswitch/conntrack.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 5ea74270da46..58c9f0edc3c4 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -821,7 +821,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key *key, /* Add NAT extension if not confirmed yet. */ if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct)) - return NF_ACCEPT; /* Can't NAT. */ + return NF_DROP; /* Can't NAT. */ /* Determine NAT type. * Check if the NAT type can be deduced from the tracked connection. -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv4 net-next 0/5] net: eliminate the duplicate code in the ct nat functions of ovs and tc
The changes in the patchset: "net: add helper support in tc act_ct for ovs offloading" had moved some common ct code used by both OVS and TC into netfilter. There are still some big functions pretty similar defined and used in each of OVS and TC. It is not good to maintain such big function in 2 places. This patchset is to extract the functions for NAT processing from OVS and TC to netfilter. To make this change clear and safe, this patchset gets the common code out of OVS and TC step by step: The patch 1-4 make some minor changes in OVS and TC to make the NAT code of them completely the same, then the patch 5 moves the common code to the netfilter and exports one function called by each of OVS and TC. v1->v2: - Create nf_nat_ovs.c to include the nat functions, as Pablo suggested. v2->v3: - fix a typo in subject of patch 2/5, as Marcelo noticed. - fix in openvswitch to keep OVS ct nat and TC ct nat consistent in patch 3/5 instead of in tc, as Marcelo noticed. - use BIT(var) macro instead of (1 << var) in patch 5/5, as Marcelo suggested. - use ifdef in netfilter/Makefile to build nf_nat_ovs only when OVS or TC ct action is enabled in patch 5/5, as Marcelo suggested. v3->v4: - add NF_NAT_OVS in netfilter/Kconfig and add select NF_NAT_OVS in OVS and TC Kconfig instead of using ifdef in netfilter/Makefile, as Pablo suggested. Xin Long (5): openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute openvswitch: return NF_ACCEPT when OVS_CT_NAT is not set in info nat openvswitch: return NF_DROP when fails to add nat ext in ovs_ct_nat net: sched: update the nat flag for icmp error packets in ct_nat_execute net: move the nat function to nf_nat_ovs for ovs and tc include/net/netfilter/nf_nat.h | 4 + net/netfilter/Kconfig | 3 + net/netfilter/Makefile | 1 + net/netfilter/nf_nat_ovs.c | 135 ++ net/openvswitch/Kconfig| 1 + net/openvswitch/conntrack.c| 146 +++-- net/sched/Kconfig | 1 + net/sched/act_ct.c | 136 +++--- 8 files changed, 169 insertions(+), 258 deletions(-) create mode 100644 net/netfilter/nf_nat_ovs.c -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv3 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc
On Tue, Dec 6, 2022 at 10:32 PM Xin Long wrote: > > On Tue, Dec 6, 2022 at 6:54 PM Pablo Neira Ayuso wrote: > > > > On Tue, Dec 06, 2022 at 06:31:16PM -0500, Xin Long wrote: > > [...] > > > diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile > > > index 1d4db1943936..0976d34b1e5f 100644 > > > --- a/net/netfilter/Makefile > > > +++ b/net/netfilter/Makefile > > > @@ -54,6 +54,12 @@ obj-$(CONFIG_NF_CONNTRACK_TFTP) += nf_conntrack_tftp.o > > > > > > nf_nat-y := nf_nat_core.o nf_nat_proto.o nf_nat_helper.o > > > > > > +ifdef CONFIG_OPENVSWITCH > > > +nf_nat-y += nf_nat_ovs.o > > > +else ifdef CONFIG_NET_ACT_CT > > > +nf_nat-y += nf_nat_ovs.o > > > +endif > > > > Maybe add CONFIG_NF_NAT_OVS and select it from OPENVSWITCH Kconfig > > (select is a hammer, but it should be fine in this case since > > OPENVSWITCH already depends on NF_NAT?). > not really completely depends, it's: > > depends on (!NF_NAT || NF_NAT) > > but it's fine, the select will be: > > select NF_NAT_OVS if NF_NAT > > > > > Then in Makefile: > > > > nf_nat-$(CONFIG_NF_NAT_OVS) += nf_nat_ovs.o > > > > And CONFIG_NF_NAT_OVS depends on OPENVSWITCH. > Sounds great! > Then it will be: > > --- a/net/netfilter/Kconfig > +++ b/net/netfilter/Kconfig > @@ -459,6 +459,10 @@ config NF_NAT_REDIRECT > config NF_NAT_MASQUERADE > bool > > +config NF_NAT_OVS > + bool > + depends on OPENVSWITCH || NET_ACT_CT > + Just FYI, "depends on" is not necessary in this case. Even without this "depends on OPENVSWITCH || NET_ACT_CT", it will still be disabled automatically if OPENVSWITCH and NET_ACT_CT are disabled, and you can't enable it manually either. Thanks. > > --- a/net/netfilter/Makefile > +++ b/net/netfilter/Makefile > @@ -59,6 +59,7 @@ obj-$(CONFIG_NF_LOG_SYSLOG) += nf_log_syslog.o > obj-$(CONFIG_NF_NAT) += nf_nat.o > nf_nat-$(CONFIG_NF_NAT_REDIRECT) += nf_nat_redirect.o > nf_nat-$(CONFIG_NF_NAT_MASQUERADE) += nf_nat_masquerade.o > +nf_nat-$(CONFIG_NF_NAT_OVS) += nf_nat_ovs.o > > --- a/net/openvswitch/Kconfig > +++ b/net/openvswitch/Kconfig > @@ -15,6 +15,7 @@ config OPENVSWITCH > select NET_MPLS_GSO > select DST_CACHE > select NET_NSH > + select NF_NAT_OVS if NF_NAT > > --- a/net/sched/Kconfig > +++ b/net/sched/Kconfig > @@ -977,6 +977,7 @@ config NET_ACT_TUNNEL_KEY > config NET_ACT_CT > tristate "connection tracking tc action" > depends on NET_CLS_ACT && NF_CONNTRACK && (!NF_NAT || NF_NAT) > && NF_FLOW_TABLE > + select NF_NAT_OVS if NF_NAT > > > I will prepare v4, Thanks. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv3 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc
On Tue, Dec 6, 2022 at 6:54 PM Pablo Neira Ayuso wrote: > > On Tue, Dec 06, 2022 at 06:31:16PM -0500, Xin Long wrote: > [...] > > diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile > > index 1d4db1943936..0976d34b1e5f 100644 > > --- a/net/netfilter/Makefile > > +++ b/net/netfilter/Makefile > > @@ -54,6 +54,12 @@ obj-$(CONFIG_NF_CONNTRACK_TFTP) += nf_conntrack_tftp.o > > > > nf_nat-y := nf_nat_core.o nf_nat_proto.o nf_nat_helper.o > > > > +ifdef CONFIG_OPENVSWITCH > > +nf_nat-y += nf_nat_ovs.o > > +else ifdef CONFIG_NET_ACT_CT > > +nf_nat-y += nf_nat_ovs.o > > +endif > > Maybe add CONFIG_NF_NAT_OVS and select it from OPENVSWITCH Kconfig > (select is a hammer, but it should be fine in this case since > OPENVSWITCH already depends on NF_NAT?). not really completely depends, it's: depends on (!NF_NAT || NF_NAT) but it's fine, the select will be: select NF_NAT_OVS if NF_NAT > > Then in Makefile: > > nf_nat-$(CONFIG_NF_NAT_OVS) += nf_nat_ovs.o > > And CONFIG_NF_NAT_OVS depends on OPENVSWITCH. Sounds great! Then it will be: --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -459,6 +459,10 @@ config NF_NAT_REDIRECT config NF_NAT_MASQUERADE bool +config NF_NAT_OVS + bool + depends on OPENVSWITCH || NET_ACT_CT + --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -59,6 +59,7 @@ obj-$(CONFIG_NF_LOG_SYSLOG) += nf_log_syslog.o obj-$(CONFIG_NF_NAT) += nf_nat.o nf_nat-$(CONFIG_NF_NAT_REDIRECT) += nf_nat_redirect.o nf_nat-$(CONFIG_NF_NAT_MASQUERADE) += nf_nat_masquerade.o +nf_nat-$(CONFIG_NF_NAT_OVS) += nf_nat_ovs.o --- a/net/openvswitch/Kconfig +++ b/net/openvswitch/Kconfig @@ -15,6 +15,7 @@ config OPENVSWITCH select NET_MPLS_GSO select DST_CACHE select NET_NSH + select NF_NAT_OVS if NF_NAT --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -977,6 +977,7 @@ config NET_ACT_TUNNEL_KEY config NET_ACT_CT tristate "connection tracking tc action" depends on NET_CLS_ACT && NF_CONNTRACK && (!NF_NAT || NF_NAT) && NF_FLOW_TABLE + select NF_NAT_OVS if NF_NAT I will prepare v4, Thanks. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv3 net-next 4/5] net: sched: update the nat flag for icmp error packets in ct_nat_execute
In ovs_ct_nat_execute(), the packet flow key nat flags are updated when it processes ICMP(v6) error packets translation successfully. In ct_nat_execute() when processing ICMP(v6) error packets translation successfully, it should have done the same in ct_nat_execute() to set post_ct_s/dnat flag, which will be used to update flow key nat flags in OVS module later. Reviewed-by: Saeed Mahameed Signed-off-by: Xin Long --- net/sched/act_ct.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index dd5ae7551956..bb87d1e910ea 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -936,13 +936,13 @@ static int ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, } err = nf_nat_packet(ct, ctinfo, hooknum, skb); +out: if (err == NF_ACCEPT) { if (maniptype == NF_NAT_MANIP_SRC) tc_skb_cb(skb)->post_ct_snat = 1; if (maniptype == NF_NAT_MANIP_DST) tc_skb_cb(skb)->post_ct_dnat = 1; } -out: return err; } #endif /* CONFIG_NF_NAT */ -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv3 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc
There are two nat functions are nearly the same in both OVS and TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat(). This patch creates nf_nat_ovs.c under netfilter and moves them there then exports nf_ct_nat() so that it can be shared by both OVS and TC, and keeps the nat (type) check and nat flag update in OVS and TC's own place, as these parts are different between OVS and TC. Note that in OVS nat function it was using skb->protocol to get the proto as it already skips vlans in key_extract(), while it doesn't in TC, and TC has to call skb_protocol() to get proto. So in nf_ct_nat_execute(), we keep using skb_protocol() which works for both OVS and TC contrack. Reviewed-by: Saeed Mahameed Signed-off-by: Xin Long --- include/net/netfilter/nf_nat.h | 4 + net/netfilter/Makefile | 6 ++ net/netfilter/nf_nat_ovs.c | 135 net/openvswitch/conntrack.c| 137 +++-- net/sched/act_ct.c | 136 +++- 5 files changed, 166 insertions(+), 252 deletions(-) create mode 100644 net/netfilter/nf_nat_ovs.c diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h index e9eb01e99d2f..9877f064548a 100644 --- a/include/net/netfilter/nf_nat.h +++ b/include/net/netfilter/nf_nat.h @@ -104,6 +104,10 @@ unsigned int nf_nat_inet_fn(void *priv, struct sk_buff *skb, const struct nf_hook_state *state); +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct, + enum ip_conntrack_info ctinfo, int *action, + const struct nf_nat_range2 *range, bool commit); + static inline int nf_nat_initialized(const struct nf_conn *ct, enum nf_nat_manip_type manip) { diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 1d4db1943936..0976d34b1e5f 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -54,6 +54,12 @@ obj-$(CONFIG_NF_CONNTRACK_TFTP) += nf_conntrack_tftp.o nf_nat-y := nf_nat_core.o nf_nat_proto.o nf_nat_helper.o +ifdef CONFIG_OPENVSWITCH +nf_nat-y += nf_nat_ovs.o +else ifdef CONFIG_NET_ACT_CT +nf_nat-y += nf_nat_ovs.o +endif + obj-$(CONFIG_NF_LOG_SYSLOG) += nf_log_syslog.o obj-$(CONFIG_NF_NAT) += nf_nat.o diff --git a/net/netfilter/nf_nat_ovs.c b/net/netfilter/nf_nat_ovs.c new file mode 100644 index ..551abd2da614 --- /dev/null +++ b/net/netfilter/nf_nat_ovs.c @@ -0,0 +1,135 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Support nat functions for openvswitch and used by OVS and TC conntrack. */ + +#include + +/* Modelled after nf_nat_ipv[46]_fn(). + * range is only used for new, uninitialized NAT state. + * Returns either NF_ACCEPT or NF_DROP. + */ +static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, +enum ip_conntrack_info ctinfo, int *action, +const struct nf_nat_range2 *range, +enum nf_nat_manip_type maniptype) +{ + __be16 proto = skb_protocol(skb, true); + int hooknum, err = NF_ACCEPT; + + /* See HOOK2MANIP(). */ + if (maniptype == NF_NAT_MANIP_SRC) + hooknum = NF_INET_LOCAL_IN; /* Source NAT */ + else + hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */ + + switch (ctinfo) { + case IP_CT_RELATED: + case IP_CT_RELATED_REPLY: + if (proto == htons(ETH_P_IP) && + ip_hdr(skb)->protocol == IPPROTO_ICMP) { + if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo, + hooknum)) + err = NF_DROP; + goto out; + } else if (IS_ENABLED(CONFIG_IPV6) && proto == htons(ETH_P_IPV6)) { + __be16 frag_off; + u8 nexthdr = ipv6_hdr(skb)->nexthdr; + int hdrlen = ipv6_skip_exthdr(skb, + sizeof(struct ipv6hdr), + , _off); + + if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) { + if (!nf_nat_icmpv6_reply_translation(skb, ct, +ctinfo, +hooknum, +hdrlen)) + err = NF_DROP; + goto out; + } + } + /* Non-ICMP, fall thru to initialize if needed. */ + fallthrough; + case IP_CT_NEW: + /* Seen it before? This can happen for loopback, retrans, +* or local packets. +*/ + i
[ovs-dev] [PATCHv3 net-next 2/5] openvswitch: return NF_ACCEPT when OVS_CT_NAT is not set in info nat
Either OVS_CT_SRC_NAT or OVS_CT_DST_NAT is set, OVS_CT_NAT must be set in info->nat. Thus, if OVS_CT_NAT is not set in info->nat, it will definitely not do NAT but returns NF_ACCEPT in ovs_ct_nat(). This patch changes nothing funcational but only makes this return earlier in ovs_ct_nat() to keep consistent with TC's processing in tcf_ct_act_nat(). Reviewed-by: Saeed Mahameed Acked-by: Aaron Conole Signed-off-by: Xin Long --- net/openvswitch/conntrack.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index dff093a10d6d..5ea74270da46 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -816,6 +816,9 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key *key, enum nf_nat_manip_type maniptype; int err; + if (!(info->nat & OVS_CT_NAT)) + return NF_ACCEPT; + /* Add NAT extension if not confirmed yet. */ if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct)) return NF_ACCEPT; /* Can't NAT. */ @@ -825,8 +828,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key *key, * Make sure new expected connections (IP_CT_RELATED) are NATted only * when committing. */ - if (info->nat & OVS_CT_NAT && ctinfo != IP_CT_NEW && - ct->status & IPS_NAT_MASK && + if (ctinfo != IP_CT_NEW && ct->status & IPS_NAT_MASK && (ctinfo != IP_CT_RELATED || info->commit)) { /* NAT an established or related connection like before. */ if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv3 net-next 1/5] openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute
The calls to ovs_ct_nat_execute() are as below: ovs_ct_execute() ovs_ct_lookup() __ovs_ct_lookup() ovs_ct_nat() ovs_ct_nat_execute() ovs_ct_commit() __ovs_ct_lookup() ovs_ct_nat() ovs_ct_nat_execute() and since skb_pull_rcsum() and skb_push_rcsum() are already called in ovs_ct_execute(), there's no need to do it again in ovs_ct_nat_execute(). Reviewed-by: Saeed Mahameed Acked-by: Aaron Conole Signed-off-by: Xin Long --- net/openvswitch/conntrack.c | 17 ++--- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index d78f0fc4337d..dff093a10d6d 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -735,10 +735,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, const struct nf_nat_range2 *range, enum nf_nat_manip_type maniptype, struct sw_flow_key *key) { - int hooknum, nh_off, err = NF_ACCEPT; - - nh_off = skb_network_offset(skb); - skb_pull_rcsum(skb, nh_off); + int hooknum, err = NF_ACCEPT; /* See HOOK2MANIP(). */ if (maniptype == NF_NAT_MANIP_SRC) @@ -755,7 +752,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo, hooknum)) err = NF_DROP; - goto push; + goto out; } else if (IS_ENABLED(CONFIG_IPV6) && skb->protocol == htons(ETH_P_IPV6)) { __be16 frag_off; @@ -770,7 +767,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, hooknum, hdrlen)) err = NF_DROP; - goto push; + goto out; } } /* Non-ICMP, fall thru to initialize if needed. */ @@ -788,7 +785,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, ? nf_nat_setup_info(ct, range, maniptype) : nf_nat_alloc_null_binding(ct, hooknum); if (err != NF_ACCEPT) - goto push; + goto out; } break; @@ -798,13 +795,11 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, default: err = NF_DROP; - goto push; + goto out; } err = nf_nat_packet(ct, ctinfo, hooknum, skb); -push: - skb_push_rcsum(skb, nh_off); - +out: /* Update the flow key if NAT successful. */ if (err == NF_ACCEPT) ovs_nat_update_key(key, skb, maniptype); -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv3 net-next 3/5] openvswitch: return NF_DROP when fails to add nat ext in ovs_ct_nat
When it fails to allocate nat ext, the packet should be dropped, like the memory allocation failures in other places in ovs_ct_nat(). This patch changes to return NF_DROP when fails to add nat ext before doing NAT in ovs_ct_nat(), also it would keep consistent with tc action ct' processing in tcf_ct_act_nat(). Signed-off-by: Xin Long --- net/openvswitch/conntrack.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 5ea74270da46..58c9f0edc3c4 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -821,7 +821,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key *key, /* Add NAT extension if not confirmed yet. */ if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct)) - return NF_ACCEPT; /* Can't NAT. */ + return NF_DROP; /* Can't NAT. */ /* Determine NAT type. * Check if the NAT type can be deduced from the tracked connection. -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv3 net-next 0/5] net: eliminate the duplicate code in the ct nat functions of ovs and tc
The changes in the patchset: "net: add helper support in tc act_ct for ovs offloading" had moved some common ct code used by both OVS and TC into netfilter. There are still some big functions pretty similar defined and used in each of OVS and TC. It is not good to maintain such big function in 2 places. This patchset is to extract the functions for NAT processing from OVS and TC to netfilter. To make this change clear and safe, this patchset gets the common code out of OVS and TC step by step: The patch 1-4 make some minor changes in OVS and TC to make the NAT code of them completely the same, then the patch 5 moves the common code to the netfilter and exports one function called by each of OVS and TC. v1->v2: - Create nf_nat_ovs.c to include the nat functions, as Pablo suggested. v2->v3: - Fix a typo in subject of patch 2/5, as Marcelo noticed. - Fix in openvswitch to keep OVS ct nat and TC ct nat consistent in patch 3/5 instead of in tc, as Marcelo noticed. - Use BIT(var) macro instead of (1 << var) in patch 5/5, as Marcelo suggested. - Use ifdef in netfilter/Makefile to build nf_nat_ovs only when OVS or TC ct action is enabled in patch 5/5, as Marcelo suggested. Xin Long (5): openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute openvswitch: return NF_ACCEPT when OVS_CT_NAT is not set in info nat openvswitch: return NF_DROP when fails to add nat ext in ovs_ct_nat net: sched: update the nat flag for icmp error packets in ct_nat_execute net: move the nat function to nf_nat_ovs for ovs and tc include/net/netfilter/nf_nat.h | 4 + net/netfilter/Makefile | 6 ++ net/netfilter/nf_nat_ovs.c | 135 ++ net/openvswitch/conntrack.c| 146 +++-- net/sched/act_ct.c | 136 +++--- 5 files changed, 169 insertions(+), 258 deletions(-) create mode 100644 net/netfilter/nf_nat_ovs.c -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv2 net-next 3/5] net: sched: return NF_ACCEPT when fails to add nat ext in tcf_ct_act_nat
On Wed, Nov 23, 2022 at 9:24 AM Marcelo Ricardo Leitner wrote: > > On Tue, Nov 22, 2022 at 12:32:19PM -0500, Xin Long wrote: > > This patch changes to return NF_ACCEPT when fails to add nat > > ext before doing NAT in tcf_ct_act_nat(), to keep consistent > > with OVS' processing in ovs_ct_nat(). > > > > Reviewed-by: Saeed Mahameed > > Signed-off-by: Xin Long > > --- > > net/sched/act_ct.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c > > index da0b7f665277..8869b3ef6642 100644 > > --- a/net/sched/act_ct.c > > +++ b/net/sched/act_ct.c > > @@ -994,7 +994,7 @@ static int tcf_ct_act_nat(struct sk_buff *skb, > > > > /* Add NAT extension if not confirmed yet. */ > > if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct)) > > - return NF_DROP; /* Can't NAT. */ > > + return NF_ACCEPT; /* Can't NAT. */ > > I'm wondering if the fix should actually be in OVS, to make it drop > the packet? Aaron, Eelco? > > If the user asked for NAT, and it can't NAT, it doesn't seem right to > forward the packet while not performing the asked action. > > If we follow the code here, it may even commit the entry without the > NAT extension, rendering the connection useless/broken per the first > if condition above. It just won't try again. nf_ct_nat_ext_add() returning NULL is caused by krealloc() failure like an -ENOMEM error, and a similar thing could happen in nfct_seqadj_ext_add() called by ovs_ct_nat() -> nf_nat_setup_info() when doing NAT where it returns DROP. So I think it's right that we should fix this in openvswitch instead of TC. Anyway, in ovs_ct_nat(): if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct)) return NF_ACCEPT; /* Can't NAT. */ git blame shows this was added at the beginning by: 05752523e565 ("openvswitch: Interface with NAT.") So add Jarno Rajahalme to Cc: list. Thanks. > > > > > if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) && > > (ctinfo != IP_CT_RELATED || commit)) { > > -- > > 2.31.1 > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv2 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc
On Wed, Nov 23, 2022 at 1:52 PM Marcelo Ricardo Leitner wrote: > > On Tue, Nov 22, 2022 at 12:32:21PM -0500, Xin Long wrote: > > --- a/net/netfilter/Makefile > > +++ b/net/netfilter/Makefile > > @@ -52,7 +52,7 @@ obj-$(CONFIG_NF_CONNTRACK_SANE) += nf_conntrack_sane.o > > obj-$(CONFIG_NF_CONNTRACK_SIP) += nf_conntrack_sip.o > > obj-$(CONFIG_NF_CONNTRACK_TFTP) += nf_conntrack_tftp.o > > > > -nf_nat-y := nf_nat_core.o nf_nat_proto.o nf_nat_helper.o > > +nf_nat-y := nf_nat_core.o nf_nat_proto.o nf_nat_helper.o nf_nat_ovs.o > > Considering that the code in nf_nat_ovs is only used if ovs or act_ct > are enabled, shouldn't it be using an invisible knob here that gets > automatically selected by them? Pablo? It's good to not build it if no place is using it. Making nf_nat_ovs a module might be too much. If it's okay to Netfilter devel, I will change to use "ifdef" in Makefile: +ifdef CONFIG_OPENVSWITCH +nf_nat-y += nf_nat_ovs.o +else ifdef CONFIG_NET_ACT_CT +nf_nat-y += nf_nat_ovs.o +endif + Thanks. > > I think this is my last comment on this series. The rest LGTM. > > > > > obj-$(CONFIG_NF_LOG_SYSLOG) += nf_log_syslog.o > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv2 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc
On Wed, Nov 23, 2022 at 4:21 PM Marcelo Ricardo Leitner wrote: > > On Wed, Nov 23, 2022 at 02:55:05PM -0500, Xin Long wrote: > > On Wed, Nov 23, 2022 at 2:17 PM Marcelo Ricardo Leitner > > wrote: > > > > > > On Wed, Nov 23, 2022 at 01:54:41PM -0500, Xin Long wrote: > > > > On Wed, Nov 23, 2022 at 1:48 PM Marcelo Ricardo Leitner > > > > wrote: > > > > > > > > > > On Wed, Nov 23, 2022 at 12:31:38PM -0500, Xin Long wrote: > > > > > > On Wed, Nov 23, 2022 at 10:13 AM Marcelo Ricardo Leitner > > > > > > wrote: > > > > > > > > > > > > > > On Wed, Nov 23, 2022 at 12:09:55PM -0300, Marcelo Ricardo Leitner > > > > > > > wrote: > > > > > > > > On Tue, Nov 22, 2022 at 12:32:21PM -0500, Xin Long wrote: > > > > > > > > > +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct, > > > > > > > > > + enum ip_conntrack_info ctinfo, int *action, > > > > > > > > > + const struct nf_nat_range2 *range, bool commit) > > > > > > > > > +{ > > > > > > > > > + enum nf_nat_manip_type maniptype; > > > > > > > > > + int err, ct_action = *action; > > > > > > > > > + > > > > > > > > > + *action = 0; > > > > > > > > > + > > > > > > > > > + /* Add NAT extension if not confirmed yet. */ > > > > > > > > > + if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct)) > > > > > > > > > + return NF_ACCEPT; /* Can't NAT. */ > > > > > > > > > + > > > > > > > > > + if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) && > > > > > > > > > + (ctinfo != IP_CT_RELATED || commit)) { > > > > > > > > > + /* NAT an established or related connection like > > > > > > > > > before. */ > > > > > > > > > + if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY) > > > > > > > > > + /* This is the REPLY direction for a > > > > > > > > > connection > > > > > > > > > +* for which NAT was applied in the > > > > > > > > > forward > > > > > > > > > +* direction. Do the reverse NAT. > > > > > > > > > +*/ > > > > > > > > > + maniptype = ct->status & IPS_SRC_NAT > > > > > > > > > + ? NF_NAT_MANIP_DST : > > > > > > > > > NF_NAT_MANIP_SRC; > > > > > > > > > + else > > > > > > > > > + maniptype = ct->status & IPS_SRC_NAT > > > > > > > > > + ? NF_NAT_MANIP_SRC : > > > > > > > > > NF_NAT_MANIP_DST; > > > > > > > > > + } else if (ct_action & (1 << NF_NAT_MANIP_SRC)) { > > > > > > > > > + maniptype = NF_NAT_MANIP_SRC; > > > > > > > > > + } else if (ct_action & (1 << NF_NAT_MANIP_DST)) { > > > > > > > > > + maniptype = NF_NAT_MANIP_DST; > > > > > > > > > + } else { > > > > > > > > > + return NF_ACCEPT; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + err = nf_ct_nat_execute(skb, ct, ctinfo, action, range, > > > > > > > > > maniptype); > > > > > > > > > + if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) { > > > > > > > > > + if (ct->status & IPS_SRC_NAT) { > > > > > > > > > + if (maniptype == NF_NAT_MANIP_SRC) > > > > > > > > > + maniptype = NF_NAT_MANIP_DST; > > > > > > > > > + else > > > > > > > > > + maniptype = NF_NAT_MANIP_SRC; > > > > > > > > > + > > > > > > > > > + err = nf_ct_n
Re: [ovs-dev] [PATCHv2 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc
On Wed, Nov 23, 2022 at 2:17 PM Marcelo Ricardo Leitner wrote: > > On Wed, Nov 23, 2022 at 01:54:41PM -0500, Xin Long wrote: > > On Wed, Nov 23, 2022 at 1:48 PM Marcelo Ricardo Leitner > > wrote: > > > > > > On Wed, Nov 23, 2022 at 12:31:38PM -0500, Xin Long wrote: > > > > On Wed, Nov 23, 2022 at 10:13 AM Marcelo Ricardo Leitner > > > > wrote: > > > > > > > > > > On Wed, Nov 23, 2022 at 12:09:55PM -0300, Marcelo Ricardo Leitner > > > > > wrote: > > > > > > On Tue, Nov 22, 2022 at 12:32:21PM -0500, Xin Long wrote: > > > > > > > +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct, > > > > > > > + enum ip_conntrack_info ctinfo, int *action, > > > > > > > + const struct nf_nat_range2 *range, bool commit) > > > > > > > +{ > > > > > > > + enum nf_nat_manip_type maniptype; > > > > > > > + int err, ct_action = *action; > > > > > > > + > > > > > > > + *action = 0; > > > > > > > + > > > > > > > + /* Add NAT extension if not confirmed yet. */ > > > > > > > + if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct)) > > > > > > > + return NF_ACCEPT; /* Can't NAT. */ > > > > > > > + > > > > > > > + if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) && > > > > > > > + (ctinfo != IP_CT_RELATED || commit)) { > > > > > > > + /* NAT an established or related connection like > > > > > > > before. */ > > > > > > > + if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY) > > > > > > > + /* This is the REPLY direction for a > > > > > > > connection > > > > > > > +* for which NAT was applied in the forward > > > > > > > +* direction. Do the reverse NAT. > > > > > > > +*/ > > > > > > > + maniptype = ct->status & IPS_SRC_NAT > > > > > > > + ? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC; > > > > > > > + else > > > > > > > + maniptype = ct->status & IPS_SRC_NAT > > > > > > > + ? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST; > > > > > > > + } else if (ct_action & (1 << NF_NAT_MANIP_SRC)) { > > > > > > > + maniptype = NF_NAT_MANIP_SRC; > > > > > > > + } else if (ct_action & (1 << NF_NAT_MANIP_DST)) { > > > > > > > + maniptype = NF_NAT_MANIP_DST; > > > > > > > + } else { > > > > > > > + return NF_ACCEPT; > > > > > > > + } > > > > > > > + > > > > > > > + err = nf_ct_nat_execute(skb, ct, ctinfo, action, range, > > > > > > > maniptype); > > > > > > > + if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) { > > > > > > > + if (ct->status & IPS_SRC_NAT) { > > > > > > > + if (maniptype == NF_NAT_MANIP_SRC) > > > > > > > + maniptype = NF_NAT_MANIP_DST; > > > > > > > + else > > > > > > > + maniptype = NF_NAT_MANIP_SRC; > > > > > > > + > > > > > > > + err = nf_ct_nat_execute(skb, ct, ctinfo, > > > > > > > action, range, > > > > > > > + maniptype); > > > > > > > + } else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) { > > > > > > > + err = nf_ct_nat_execute(skb, ct, ctinfo, > > > > > > > action, NULL, > > > > > > > + NF_NAT_MANIP_SRC); > > > > > > > + } > > > > > > > + } > > > > > > > + return err; > > > > > > > +} > > > > > > > +EXPORT_SYMBOL_GPL(nf_ct_nat); > > > > > > > diff --git a/net/openvswitch/conntrack.c > > > >
Re: [ovs-dev] [PATCHv2 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc
On Wed, Nov 23, 2022 at 1:48 PM Marcelo Ricardo Leitner wrote: > > On Wed, Nov 23, 2022 at 12:31:38PM -0500, Xin Long wrote: > > On Wed, Nov 23, 2022 at 10:13 AM Marcelo Ricardo Leitner > > wrote: > > > > > > On Wed, Nov 23, 2022 at 12:09:55PM -0300, Marcelo Ricardo Leitner wrote: > > > > On Tue, Nov 22, 2022 at 12:32:21PM -0500, Xin Long wrote: > > > > > +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct, > > > > > + enum ip_conntrack_info ctinfo, int *action, > > > > > + const struct nf_nat_range2 *range, bool commit) > > > > > +{ > > > > > + enum nf_nat_manip_type maniptype; > > > > > + int err, ct_action = *action; > > > > > + > > > > > + *action = 0; > > > > > + > > > > > + /* Add NAT extension if not confirmed yet. */ > > > > > + if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct)) > > > > > + return NF_ACCEPT; /* Can't NAT. */ > > > > > + > > > > > + if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) && > > > > > + (ctinfo != IP_CT_RELATED || commit)) { > > > > > + /* NAT an established or related connection like before. > > > > > */ > > > > > + if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY) > > > > > + /* This is the REPLY direction for a connection > > > > > +* for which NAT was applied in the forward > > > > > +* direction. Do the reverse NAT. > > > > > +*/ > > > > > + maniptype = ct->status & IPS_SRC_NAT > > > > > + ? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC; > > > > > + else > > > > > + maniptype = ct->status & IPS_SRC_NAT > > > > > + ? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST; > > > > > + } else if (ct_action & (1 << NF_NAT_MANIP_SRC)) { > > > > > + maniptype = NF_NAT_MANIP_SRC; > > > > > + } else if (ct_action & (1 << NF_NAT_MANIP_DST)) { > > > > > + maniptype = NF_NAT_MANIP_DST; > > > > > + } else { > > > > > + return NF_ACCEPT; > > > > > + } > > > > > + > > > > > + err = nf_ct_nat_execute(skb, ct, ctinfo, action, range, > > > > > maniptype); > > > > > + if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) { > > > > > + if (ct->status & IPS_SRC_NAT) { > > > > > + if (maniptype == NF_NAT_MANIP_SRC) > > > > > + maniptype = NF_NAT_MANIP_DST; > > > > > + else > > > > > + maniptype = NF_NAT_MANIP_SRC; > > > > > + > > > > > + err = nf_ct_nat_execute(skb, ct, ctinfo, action, > > > > > range, > > > > > + maniptype); > > > > > + } else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) { > > > > > + err = nf_ct_nat_execute(skb, ct, ctinfo, action, > > > > > NULL, > > > > > + NF_NAT_MANIP_SRC); > > > > > + } > > > > > + } > > > > > + return err; > > > > > +} > > > > > +EXPORT_SYMBOL_GPL(nf_ct_nat); > > > > > diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c > > > > > index cc643a556ea1..d03c75165663 100644 > > > > > --- a/net/openvswitch/conntrack.c > > > > > +++ b/net/openvswitch/conntrack.c > > > > > @@ -726,144 +726,27 @@ static void ovs_nat_update_key(struct > > > > > sw_flow_key *key, > > > > > } > > > > > } > > > > > > > > > > -/* Modelled after nf_nat_ipv[46]_fn(). > > > > > - * range is only used for new, uninitialized NAT state. > > > > > - * Returns either NF_ACCEPT or NF_DROP. > > > > > - */ > > > > > -static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn > > > > > *ct, > > > > > - enum ip_conntrack_info ctinfo, > &g
Re: [ovs-dev] [PATCHv2 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc
On Wed, Nov 23, 2022 at 10:13 AM Marcelo Ricardo Leitner wrote: > > On Wed, Nov 23, 2022 at 12:09:55PM -0300, Marcelo Ricardo Leitner wrote: > > On Tue, Nov 22, 2022 at 12:32:21PM -0500, Xin Long wrote: > > > +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct, > > > + enum ip_conntrack_info ctinfo, int *action, > > > + const struct nf_nat_range2 *range, bool commit) > > > +{ > > > + enum nf_nat_manip_type maniptype; > > > + int err, ct_action = *action; > > > + > > > + *action = 0; > > > + > > > + /* Add NAT extension if not confirmed yet. */ > > > + if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct)) > > > + return NF_ACCEPT; /* Can't NAT. */ > > > + > > > + if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) && > > > + (ctinfo != IP_CT_RELATED || commit)) { > > > + /* NAT an established or related connection like before. */ > > > + if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY) > > > + /* This is the REPLY direction for a connection > > > +* for which NAT was applied in the forward > > > +* direction. Do the reverse NAT. > > > +*/ > > > + maniptype = ct->status & IPS_SRC_NAT > > > + ? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC; > > > + else > > > + maniptype = ct->status & IPS_SRC_NAT > > > + ? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST; > > > + } else if (ct_action & (1 << NF_NAT_MANIP_SRC)) { > > > + maniptype = NF_NAT_MANIP_SRC; > > > + } else if (ct_action & (1 << NF_NAT_MANIP_DST)) { > > > + maniptype = NF_NAT_MANIP_DST; > > > + } else { > > > + return NF_ACCEPT; > > > + } > > > + > > > + err = nf_ct_nat_execute(skb, ct, ctinfo, action, range, maniptype); > > > + if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) { > > > + if (ct->status & IPS_SRC_NAT) { > > > + if (maniptype == NF_NAT_MANIP_SRC) > > > + maniptype = NF_NAT_MANIP_DST; > > > + else > > > + maniptype = NF_NAT_MANIP_SRC; > > > + > > > + err = nf_ct_nat_execute(skb, ct, ctinfo, action, > > > range, > > > + maniptype); > > > + } else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) { > > > + err = nf_ct_nat_execute(skb, ct, ctinfo, action, NULL, > > > + NF_NAT_MANIP_SRC); > > > + } > > > + } > > > + return err; > > > +} > > > +EXPORT_SYMBOL_GPL(nf_ct_nat); > > > diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c > > > index cc643a556ea1..d03c75165663 100644 > > > --- a/net/openvswitch/conntrack.c > > > +++ b/net/openvswitch/conntrack.c > > > @@ -726,144 +726,27 @@ static void ovs_nat_update_key(struct sw_flow_key > > > *key, > > > } > > > } > > > > > > -/* Modelled after nf_nat_ipv[46]_fn(). > > > - * range is only used for new, uninitialized NAT state. > > > - * Returns either NF_ACCEPT or NF_DROP. > > > - */ > > > -static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, > > > - enum ip_conntrack_info ctinfo, > > > - const struct nf_nat_range2 *range, > > > - enum nf_nat_manip_type maniptype, struct > > > sw_flow_key *key) > > > -{ > > > - int hooknum, err = NF_ACCEPT; > > > - > > > - /* See HOOK2MANIP(). */ > > > - if (maniptype == NF_NAT_MANIP_SRC) > > > - hooknum = NF_INET_LOCAL_IN; /* Source NAT */ > > > - else > > > - hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */ > > > - > > > - switch (ctinfo) { > > > - case IP_CT_RELATED: > > > - case IP_CT_RELATED_REPLY: > > > - if (IS_ENABLED(CONFIG_NF_NAT) && > > > - skb->protocol == htons(ETH_P_IP) && > > > - ip_hdr(skb)->protocol == IPPROTO_ICMP) { > > >
[ovs-dev] [PATCHv2 net-next 5/5] net: move the nat function to nf_nat_ovs for ovs and tc
There are two nat functions are nearly the same in both OVS and TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat(). This patch creates nf_nat_ovs.c under netfilter and moves them there then exports nf_ct_nat() so that it can be shared by both OVS and TC, and keeps the nat (type) check and nat flag update in OVS and TC's own place, as these parts are different between OVS and TC. Note that in OVS nat function it was using skb->protocol to get the proto as it already skips vlans in key_extract(), while it doesn't in TC, and TC has to call skb_protocol() to get proto. So in nf_ct_nat_execute(), we keep using skb_protocol() which works for both OVS and TC contrack. Reviewed-by: Saeed Mahameed Signed-off-by: Xin Long --- include/net/netfilter/nf_nat.h | 4 + net/netfilter/Makefile | 2 +- net/netfilter/nf_nat_ovs.c | 135 net/openvswitch/conntrack.c| 137 +++-- net/sched/act_ct.c | 136 +++- 5 files changed, 161 insertions(+), 253 deletions(-) create mode 100644 net/netfilter/nf_nat_ovs.c diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h index e9eb01e99d2f..9877f064548a 100644 --- a/include/net/netfilter/nf_nat.h +++ b/include/net/netfilter/nf_nat.h @@ -104,6 +104,10 @@ unsigned int nf_nat_inet_fn(void *priv, struct sk_buff *skb, const struct nf_hook_state *state); +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct, + enum ip_conntrack_info ctinfo, int *action, + const struct nf_nat_range2 *range, bool commit); + static inline int nf_nat_initialized(const struct nf_conn *ct, enum nf_nat_manip_type manip) { diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 1d4db1943936..4fa50d2842ec 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -52,7 +52,7 @@ obj-$(CONFIG_NF_CONNTRACK_SANE) += nf_conntrack_sane.o obj-$(CONFIG_NF_CONNTRACK_SIP) += nf_conntrack_sip.o obj-$(CONFIG_NF_CONNTRACK_TFTP) += nf_conntrack_tftp.o -nf_nat-y := nf_nat_core.o nf_nat_proto.o nf_nat_helper.o +nf_nat-y := nf_nat_core.o nf_nat_proto.o nf_nat_helper.o nf_nat_ovs.o obj-$(CONFIG_NF_LOG_SYSLOG) += nf_log_syslog.o diff --git a/net/netfilter/nf_nat_ovs.c b/net/netfilter/nf_nat_ovs.c new file mode 100644 index ..daff80e7a43a --- /dev/null +++ b/net/netfilter/nf_nat_ovs.c @@ -0,0 +1,135 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Support nat functions for openvswitch and used by OVS and TC conntrack. */ + +#include + +/* Modelled after nf_nat_ipv[46]_fn(). + * range is only used for new, uninitialized NAT state. + * Returns either NF_ACCEPT or NF_DROP. + */ +static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, +enum ip_conntrack_info ctinfo, int *action, +const struct nf_nat_range2 *range, +enum nf_nat_manip_type maniptype) +{ + __be16 proto = skb_protocol(skb, true); + int hooknum, err = NF_ACCEPT; + + /* See HOOK2MANIP(). */ + if (maniptype == NF_NAT_MANIP_SRC) + hooknum = NF_INET_LOCAL_IN; /* Source NAT */ + else + hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */ + + switch (ctinfo) { + case IP_CT_RELATED: + case IP_CT_RELATED_REPLY: + if (proto == htons(ETH_P_IP) && + ip_hdr(skb)->protocol == IPPROTO_ICMP) { + if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo, + hooknum)) + err = NF_DROP; + goto out; + } else if (IS_ENABLED(CONFIG_IPV6) && proto == htons(ETH_P_IPV6)) { + __be16 frag_off; + u8 nexthdr = ipv6_hdr(skb)->nexthdr; + int hdrlen = ipv6_skip_exthdr(skb, + sizeof(struct ipv6hdr), + , _off); + + if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) { + if (!nf_nat_icmpv6_reply_translation(skb, ct, +ctinfo, +hooknum, +hdrlen)) + err = NF_DROP; + goto out; + } + } + /* Non-ICMP, fall thru to initialize if needed. */ + fallthrough; + case IP_CT_NEW: + /* Seen it before? This can happen for loopback, retrans, +* or local packets. +*/ +
[ovs-dev] [PATCHv2 net-next 4/5] net: sched: update the nat flag for icmp error packets in ct_nat_execute
In ovs_ct_nat_execute(), the packet flow key nat flags are updated when it processes ICMP(v6) error packets translation successfully. In ct_nat_execute() when processing ICMP(v6) error packets translation successfully, it should have done the same in ct_nat_execute() to set post_ct_s/dnat flag, which will be used to update flow key nat flags in OVS module later. Reviewed-by: Saeed Mahameed Signed-off-by: Xin Long --- net/sched/act_ct.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 8869b3ef6642..c7782c9a6ab6 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -936,13 +936,13 @@ static int ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, } err = nf_nat_packet(ct, ctinfo, hooknum, skb); +out: if (err == NF_ACCEPT) { if (maniptype == NF_NAT_MANIP_SRC) tc_skb_cb(skb)->post_ct_snat = 1; if (maniptype == NF_NAT_MANIP_DST) tc_skb_cb(skb)->post_ct_dnat = 1; } -out: return err; } #endif /* CONFIG_NF_NAT */ -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv2 net-next 3/5] net: sched: return NF_ACCEPT when fails to add nat ext in tcf_ct_act_nat
This patch changes to return NF_ACCEPT when fails to add nat ext before doing NAT in tcf_ct_act_nat(), to keep consistent with OVS' processing in ovs_ct_nat(). Reviewed-by: Saeed Mahameed Signed-off-by: Xin Long --- net/sched/act_ct.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index da0b7f665277..8869b3ef6642 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -994,7 +994,7 @@ static int tcf_ct_act_nat(struct sk_buff *skb, /* Add NAT extension if not confirmed yet. */ if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct)) - return NF_DROP; /* Can't NAT. */ + return NF_ACCEPT; /* Can't NAT. */ if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) && (ctinfo != IP_CT_RELATED || commit)) { -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv2 net-next 2/5] openvswitch: return NF_ACCEPT when OVS_CT_NAT is net set in info nat
Either OVS_CT_SRC_NAT or OVS_CT_DST_NAT is set, OVS_CT_NAT must be set in info->nat. Thus, if OVS_CT_NAT is not set in info->nat, it will definitely not do NAT but returns NF_ACCEPT in ovs_ct_nat(). This patch changes nothing funcational but only makes this return earlier in ovs_ct_nat() to keep consistent with TC's processing in tcf_ct_act_nat(). Reviewed-by: Saeed Mahameed Acked-by: Aaron Conole Signed-off-by: Xin Long --- net/openvswitch/conntrack.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 4c5e5a6475af..cc643a556ea1 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -816,6 +816,9 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key *key, enum nf_nat_manip_type maniptype; int err; + if (!(info->nat & OVS_CT_NAT)) + return NF_ACCEPT; + /* Add NAT extension if not confirmed yet. */ if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct)) return NF_ACCEPT; /* Can't NAT. */ @@ -825,8 +828,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key *key, * Make sure new expected connections (IP_CT_RELATED) are NATted only * when committing. */ - if (info->nat & OVS_CT_NAT && ctinfo != IP_CT_NEW && - ct->status & IPS_NAT_MASK && + if (ctinfo != IP_CT_NEW && ct->status & IPS_NAT_MASK && (ctinfo != IP_CT_RELATED || info->commit)) { /* NAT an established or related connection like before. */ if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv2 net-next 1/5] openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute
The calls to ovs_ct_nat_execute() are as below: ovs_ct_execute() ovs_ct_lookup() __ovs_ct_lookup() ovs_ct_nat() ovs_ct_nat_execute() ovs_ct_commit() __ovs_ct_lookup() ovs_ct_nat() ovs_ct_nat_execute() and since skb_pull_rcsum() and skb_push_rcsum() are already called in ovs_ct_execute(), there's no need to do it again in ovs_ct_nat_execute(). Reviewed-by: Saeed Mahameed Acked-by: Aaron Conole Signed-off-by: Xin Long --- net/openvswitch/conntrack.c | 17 ++--- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 4348321856af..4c5e5a6475af 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -735,10 +735,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, const struct nf_nat_range2 *range, enum nf_nat_manip_type maniptype, struct sw_flow_key *key) { - int hooknum, nh_off, err = NF_ACCEPT; - - nh_off = skb_network_offset(skb); - skb_pull_rcsum(skb, nh_off); + int hooknum, err = NF_ACCEPT; /* See HOOK2MANIP(). */ if (maniptype == NF_NAT_MANIP_SRC) @@ -755,7 +752,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo, hooknum)) err = NF_DROP; - goto push; + goto out; } else if (IS_ENABLED(CONFIG_IPV6) && skb->protocol == htons(ETH_P_IPV6)) { __be16 frag_off; @@ -770,7 +767,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, hooknum, hdrlen)) err = NF_DROP; - goto push; + goto out; } } /* Non-ICMP, fall thru to initialize if needed. */ @@ -788,7 +785,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, ? nf_nat_setup_info(ct, range, maniptype) : nf_nat_alloc_null_binding(ct, hooknum); if (err != NF_ACCEPT) - goto push; + goto out; } break; @@ -798,13 +795,11 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, default: err = NF_DROP; - goto push; + goto out; } err = nf_nat_packet(ct, ctinfo, hooknum, skb); -push: - skb_push_rcsum(skb, nh_off); - +out: /* Update the flow key if NAT successful. */ if (err == NF_ACCEPT) ovs_nat_update_key(key, skb, maniptype); -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv2 net-next 0/5] net: eliminate the duplicate code in the ct nat functions of ovs and tc
The changes in the patchset: "net: add helper support in tc act_ct for ovs offloading" had moved some common ct code used by both OVS and TC into netfilter. There are still some big functions pretty similar defined and used in each of OVS and TC. It is not good to maintain such big function in 2 places. This patchset is to extract the functions for NAT processing from OVS and TC to netfilter. To make this change clear and safe, this patchset gets the common code out of OVS and TC step by step: The patch 1-4 make some minor changes in OVS and TC to make the NAT code of them completely the same, then the patch 5 moves the common code to the netfilter and exports one function called by each of OVS and TC. v1->v2: - Create nf_nat_ovs.c to include the nat functions, as Pablo suggested. Xin Long (5): openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute openvswitch: return NF_ACCEPT when OVS_CT_NAT is net set in info nat net: sched: return NF_ACCEPT when fails to add nat ext in tcf_ct_act_nat net: sched: update the nat flag for icmp error packets in ct_nat_execute net: move the nat function to nf_nat_ovs for ovs and tc include/net/netfilter/nf_nat.h | 4 + net/netfilter/Makefile | 2 +- net/netfilter/nf_nat_ovs.c | 135 ++ net/openvswitch/conntrack.c| 146 +++-- net/sched/act_ct.c | 136 +++--- 5 files changed, 164 insertions(+), 259 deletions(-) create mode 100644 net/netfilter/nf_nat_ovs.c -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc
On Thu, Nov 17, 2022 at 5:57 AM Pablo Neira Ayuso wrote: > > On Wed, Nov 16, 2022 at 07:51:40PM -0500, Xin Long wrote: > > On Wed, Nov 16, 2022 at 4:54 PM Pablo Neira Ayuso > > wrote: > > > On Tue, Nov 15, 2022 at 10:50:57AM -0500, Xin Long wrote: > [...] > > > I'd suggest you move this code to nf_nat_ovs.c or such so we remember > > > these symbols are used by act_ct.c and ovs. > > > > Good idea, do you think we should also create nf_conntrack_ovs.c > > to have nf_ct_helper() and nf_ct_add_helper()? > > which were added by: > > > > https://lore.kernel.org/netdev/20221101150031.a6rtrgzwfd7kzknn@t14s.localdomain/T/ > > If it is used by ovs infra, I would suggest to move there too. OK, I will create only "nf_conntrack_ovs.c" to have the nat functions in this patch. and post another patch to move nf_ct_helper() and nf_ct_add_helper() there, too. I think one file "nf_conntrack_ovs.c" should be enough to include all these functions used by ovs conntrack infra. Thanks. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc
On Wed, Nov 16, 2022 at 4:54 PM Pablo Neira Ayuso wrote: > > On Tue, Nov 15, 2022 at 10:50:57AM -0500, Xin Long wrote: > > diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c > > index e29e4ccb5c5a..1c72b8caa24e 100644 > > --- a/net/netfilter/nf_nat_core.c > > +++ b/net/netfilter/nf_nat_core.c > > @@ -784,6 +784,137 @@ nf_nat_inet_fn(void *priv, struct sk_buff *skb, > > } > > EXPORT_SYMBOL_GPL(nf_nat_inet_fn); > > > > +/* Modelled after nf_nat_ipv[46]_fn(). > > + * range is only used for new, uninitialized NAT state. > > + * Returns either NF_ACCEPT or NF_DROP. > > + */ > > +static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, > > + enum ip_conntrack_info ctinfo, int *action, > > + const struct nf_nat_range2 *range, > > + enum nf_nat_manip_type maniptype) > > +{ > > + __be16 proto = skb_protocol(skb, true); > > + int hooknum, err = NF_ACCEPT; > > + > > + /* See HOOK2MANIP(). */ > > + if (maniptype == NF_NAT_MANIP_SRC) > > + hooknum = NF_INET_LOCAL_IN; /* Source NAT */ > > + else > > + hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */ > > + > > + switch (ctinfo) { > > + case IP_CT_RELATED: > > + case IP_CT_RELATED_REPLY: > > + if (proto == htons(ETH_P_IP) && > > + ip_hdr(skb)->protocol == IPPROTO_ICMP) { > > + if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo, > > +hooknum)) > > + err = NF_DROP; > > + goto out; > > + } else if (IS_ENABLED(CONFIG_IPV6) && proto == > > htons(ETH_P_IPV6)) { > > + __be16 frag_off; > > + u8 nexthdr = ipv6_hdr(skb)->nexthdr; > > + int hdrlen = ipv6_skip_exthdr(skb, > > + sizeof(struct ipv6hdr), > > + , _off); > > + > > + if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) { > > + if (!nf_nat_icmpv6_reply_translation(skb, ct, > > + ctinfo, > > + hooknum, > > + hdrlen)) > > + err = NF_DROP; > > + goto out; > > + } > > + } > > + /* Non-ICMP, fall thru to initialize if needed. */ > > + fallthrough; > > + case IP_CT_NEW: > > + /* Seen it before? This can happen for loopback, retrans, > > + * or local packets. > > + */ > > + if (!nf_nat_initialized(ct, maniptype)) { > > + /* Initialize according to the NAT action. */ > > + err = (range && range->flags & NF_NAT_RANGE_MAP_IPS) > > + /* Action is set up to establish a new > > + * mapping. > > + */ > > + ? nf_nat_setup_info(ct, range, maniptype) > > + : nf_nat_alloc_null_binding(ct, hooknum); > > + if (err != NF_ACCEPT) > > + goto out; > > + } > > + break; > > + > > + case IP_CT_ESTABLISHED: > > + case IP_CT_ESTABLISHED_REPLY: > > + break; > > + > > + default: > > + err = NF_DROP; > > + goto out; > > + } > > + > > + err = nf_nat_packet(ct, ctinfo, hooknum, skb); > > + if (err == NF_ACCEPT) > > + *action |= (1 << maniptype); > > +out: > > + return err; > > +} > > + > > +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct, > > + enum ip_conntrack_info ctinfo, int *action, > > + const struct nf_nat_range2 *range, bool commit) > > +{ > > + enum nf_nat_manip_type maniptype; > > + int err, ct_action = *action; > > + > > + *action = 0; > > + > > + /* Add NAT extension if not confirmed yet. */ > > + if (!nf_ct_is_confirmed(c
Re: [ovs-dev] [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc
On Wed, Nov 16, 2022 at 4:05 PM Aaron Conole wrote: > > Hi Xin, > > Xin Long writes: > > > There are two nat functions are nearly the same in both OVS and > > TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat(). > > > > This patch is to move them to netfilter nf_nat_core and export > > nf_ct_nat() so that it can be shared by both OVS and TC, and > > keep the nat (type) check and nat flag update in OVS and TC's > > own place, as these parts are different between OVS and TC. > > > > Note that in OVS nat function it was using skb->protocol to get > > the proto as it already skips vlans in key_extract(), while it > > doesn't in TC, and TC has to call skb_protocol() to get proto. > > So in nf_ct_nat_execute(), we keep using skb_protocol() which > > works for both OVS and TC. > > > > Signed-off-by: Xin Long > > --- > > include/net/netfilter/nf_nat.h | 4 + > > net/netfilter/nf_nat_core.c| 131 +++ > > net/openvswitch/conntrack.c| 137 +++-- > > net/sched/act_ct.c | 136 +++- > > 4 files changed, 156 insertions(+), 252 deletions(-) > > > > diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h > > index e9eb01e99d2f..9877f064548a 100644 > > --- a/include/net/netfilter/nf_nat.h > > +++ b/include/net/netfilter/nf_nat.h > > @@ -104,6 +104,10 @@ unsigned int > > nf_nat_inet_fn(void *priv, struct sk_buff *skb, > > const struct nf_hook_state *state); > > > > +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct, > > + enum ip_conntrack_info ctinfo, int *action, > > + const struct nf_nat_range2 *range, bool commit); > > + > > static inline int nf_nat_initialized(const struct nf_conn *ct, > >enum nf_nat_manip_type manip) > > { > > diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c > > index e29e4ccb5c5a..1c72b8caa24e 100644 > > --- a/net/netfilter/nf_nat_core.c > > +++ b/net/netfilter/nf_nat_core.c > > @@ -784,6 +784,137 @@ nf_nat_inet_fn(void *priv, struct sk_buff *skb, > > } > > EXPORT_SYMBOL_GPL(nf_nat_inet_fn); > > > > +/* Modelled after nf_nat_ipv[46]_fn(). > > + * range is only used for new, uninitialized NAT state. > > + * Returns either NF_ACCEPT or NF_DROP. > > + */ > > +static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, > > + enum ip_conntrack_info ctinfo, int *action, > > + const struct nf_nat_range2 *range, > > + enum nf_nat_manip_type maniptype) > > +{ > > + __be16 proto = skb_protocol(skb, true); > > + int hooknum, err = NF_ACCEPT; > > + > > + /* See HOOK2MANIP(). */ > > + if (maniptype == NF_NAT_MANIP_SRC) > > + hooknum = NF_INET_LOCAL_IN; /* Source NAT */ > > + else > > + hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */ > > + > > + switch (ctinfo) { > > + case IP_CT_RELATED: > > + case IP_CT_RELATED_REPLY: > > + if (proto == htons(ETH_P_IP) && > > + ip_hdr(skb)->protocol == IPPROTO_ICMP) { > > + if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo, > > +hooknum)) > > + err = NF_DROP; > > + goto out; > > + } else if (IS_ENABLED(CONFIG_IPV6) && proto == > > htons(ETH_P_IPV6)) { > > + __be16 frag_off; > > + u8 nexthdr = ipv6_hdr(skb)->nexthdr; > > + int hdrlen = ipv6_skip_exthdr(skb, > > + sizeof(struct ipv6hdr), > > + , _off); > > + > > + if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) { > > + if (!nf_nat_icmpv6_reply_translation(skb, ct, > > + ctinfo, > > + hooknum, > > + hdrlen)) > > + err = NF_DROP; > > + goto out; > > + } > > + } > &g
[ovs-dev] [PATCH net-next 5/5] net: move the nat function to nf_nat_core for ovs and tc
There are two nat functions are nearly the same in both OVS and TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat(). This patch is to move them to netfilter nf_nat_core and export nf_ct_nat() so that it can be shared by both OVS and TC, and keep the nat (type) check and nat flag update in OVS and TC's own place, as these parts are different between OVS and TC. Note that in OVS nat function it was using skb->protocol to get the proto as it already skips vlans in key_extract(), while it doesn't in TC, and TC has to call skb_protocol() to get proto. So in nf_ct_nat_execute(), we keep using skb_protocol() which works for both OVS and TC. Signed-off-by: Xin Long --- include/net/netfilter/nf_nat.h | 4 + net/netfilter/nf_nat_core.c| 131 +++ net/openvswitch/conntrack.c| 137 +++-- net/sched/act_ct.c | 136 +++- 4 files changed, 156 insertions(+), 252 deletions(-) diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h index e9eb01e99d2f..9877f064548a 100644 --- a/include/net/netfilter/nf_nat.h +++ b/include/net/netfilter/nf_nat.h @@ -104,6 +104,10 @@ unsigned int nf_nat_inet_fn(void *priv, struct sk_buff *skb, const struct nf_hook_state *state); +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct, + enum ip_conntrack_info ctinfo, int *action, + const struct nf_nat_range2 *range, bool commit); + static inline int nf_nat_initialized(const struct nf_conn *ct, enum nf_nat_manip_type manip) { diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c index e29e4ccb5c5a..1c72b8caa24e 100644 --- a/net/netfilter/nf_nat_core.c +++ b/net/netfilter/nf_nat_core.c @@ -784,6 +784,137 @@ nf_nat_inet_fn(void *priv, struct sk_buff *skb, } EXPORT_SYMBOL_GPL(nf_nat_inet_fn); +/* Modelled after nf_nat_ipv[46]_fn(). + * range is only used for new, uninitialized NAT state. + * Returns either NF_ACCEPT or NF_DROP. + */ +static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, +enum ip_conntrack_info ctinfo, int *action, +const struct nf_nat_range2 *range, +enum nf_nat_manip_type maniptype) +{ + __be16 proto = skb_protocol(skb, true); + int hooknum, err = NF_ACCEPT; + + /* See HOOK2MANIP(). */ + if (maniptype == NF_NAT_MANIP_SRC) + hooknum = NF_INET_LOCAL_IN; /* Source NAT */ + else + hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */ + + switch (ctinfo) { + case IP_CT_RELATED: + case IP_CT_RELATED_REPLY: + if (proto == htons(ETH_P_IP) && + ip_hdr(skb)->protocol == IPPROTO_ICMP) { + if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo, + hooknum)) + err = NF_DROP; + goto out; + } else if (IS_ENABLED(CONFIG_IPV6) && proto == htons(ETH_P_IPV6)) { + __be16 frag_off; + u8 nexthdr = ipv6_hdr(skb)->nexthdr; + int hdrlen = ipv6_skip_exthdr(skb, + sizeof(struct ipv6hdr), + , _off); + + if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) { + if (!nf_nat_icmpv6_reply_translation(skb, ct, +ctinfo, +hooknum, +hdrlen)) + err = NF_DROP; + goto out; + } + } + /* Non-ICMP, fall thru to initialize if needed. */ + fallthrough; + case IP_CT_NEW: + /* Seen it before? This can happen for loopback, retrans, +* or local packets. +*/ + if (!nf_nat_initialized(ct, maniptype)) { + /* Initialize according to the NAT action. */ + err = (range && range->flags & NF_NAT_RANGE_MAP_IPS) + /* Action is set up to establish a new +* mapping. +*/ + ? nf_nat_setup_info(ct, range, maniptype) + : nf_nat_alloc_null_binding(ct, hooknum); + if (err != NF_ACCEPT) + goto out; + } + break; + + case IP_CT_ESTABLISHED: + case IP_CT_ESTABLIS
[ovs-dev] [PATCH net-next 4/5] net: sched: update the nat flag for icmp error packets in ct_nat_execute
In ovs_ct_nat_execute(), the packet flow key nat flags are updated when it processes ICMP(v6) error packets translation successfully. In ct_nat_execute() when processing ICMP(v6) error packets translation successfully, it should have done the same in ct_nat_execute() to set post_ct_s/dnat flag, which will be used to update flow key nat flags in OVS module later. Signed-off-by: Xin Long --- net/sched/act_ct.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 8869b3ef6642..c7782c9a6ab6 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -936,13 +936,13 @@ static int ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, } err = nf_nat_packet(ct, ctinfo, hooknum, skb); +out: if (err == NF_ACCEPT) { if (maniptype == NF_NAT_MANIP_SRC) tc_skb_cb(skb)->post_ct_snat = 1; if (maniptype == NF_NAT_MANIP_DST) tc_skb_cb(skb)->post_ct_dnat = 1; } -out: return err; } #endif /* CONFIG_NF_NAT */ -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next 3/5] net: sched: return NF_ACCEPT when fails to add nat ext in tcf_ct_act_nat
This patch changes to return NF_ACCEPT when fails to add nat ext before doing NAT in tcf_ct_act_nat(), to keep consistent with OVS' processing in ovs_ct_nat(). Signed-off-by: Xin Long --- net/sched/act_ct.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index da0b7f665277..8869b3ef6642 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -994,7 +994,7 @@ static int tcf_ct_act_nat(struct sk_buff *skb, /* Add NAT extension if not confirmed yet. */ if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct)) - return NF_DROP; /* Can't NAT. */ + return NF_ACCEPT; /* Can't NAT. */ if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) && (ctinfo != IP_CT_RELATED || commit)) { -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next 1/5] openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute
The calls to ovs_ct_nat_execute() are as below: ovs_ct_execute() ovs_ct_lookup() __ovs_ct_lookup() ovs_ct_nat() ovs_ct_nat_execute() ovs_ct_commit() __ovs_ct_lookup() ovs_ct_nat() ovs_ct_nat_execute() and since skb_pull_rcsum() and skb_push_rcsum() are already called in ovs_ct_execute(), there's no need to do it again in ovs_ct_nat_execute(). Signed-off-by: Xin Long --- net/openvswitch/conntrack.c | 17 ++--- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 4348321856af..4c5e5a6475af 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -735,10 +735,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, const struct nf_nat_range2 *range, enum nf_nat_manip_type maniptype, struct sw_flow_key *key) { - int hooknum, nh_off, err = NF_ACCEPT; - - nh_off = skb_network_offset(skb); - skb_pull_rcsum(skb, nh_off); + int hooknum, err = NF_ACCEPT; /* See HOOK2MANIP(). */ if (maniptype == NF_NAT_MANIP_SRC) @@ -755,7 +752,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo, hooknum)) err = NF_DROP; - goto push; + goto out; } else if (IS_ENABLED(CONFIG_IPV6) && skb->protocol == htons(ETH_P_IPV6)) { __be16 frag_off; @@ -770,7 +767,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, hooknum, hdrlen)) err = NF_DROP; - goto push; + goto out; } } /* Non-ICMP, fall thru to initialize if needed. */ @@ -788,7 +785,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, ? nf_nat_setup_info(ct, range, maniptype) : nf_nat_alloc_null_binding(ct, hooknum); if (err != NF_ACCEPT) - goto push; + goto out; } break; @@ -798,13 +795,11 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, default: err = NF_DROP; - goto push; + goto out; } err = nf_nat_packet(ct, ctinfo, hooknum, skb); -push: - skb_push_rcsum(skb, nh_off); - +out: /* Update the flow key if NAT successful. */ if (err == NF_ACCEPT) ovs_nat_update_key(key, skb, maniptype); -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next 0/5] net: eliminate the duplicate code in the ct nat functions of ovs and tc
The changes in the patchset: "net: add helper support in tc act_ct for ovs offloading" had moved some common ct code used by both OVS and TC into netfilter. There are still some big functions pretty similar defined and used in each of OVS and TC. It is not good to maintain such similar code in 2 places. This patchset is to extract the functions for NAT processing from OVS and TC to netfilter. To make this change clear and safe, this patchset gets the common code out of OVS and TC step by step: The patch 1-4 make some minor changes in OVS and TC to make the NAT code of them completely the same, then the patch 5 moves the common code to the netfilter and exports one function called by each of OVS and TC. Xin Long (5): openvswitch: delete the unncessary skb_pull_rcsum call in ovs_ct_nat_execute openvswitch: return NF_ACCEPT when OVS_CT_NAT is net set in info nat net: sched: return NF_ACCEPT when fails to add nat ext in tcf_ct_act_nat net: sched: update the nat flag for icmp error packets in ct_nat_execute net: move the nat function to nf_nat_core for ovs and tc include/net/netfilter/nf_nat.h | 4 + net/netfilter/nf_nat_core.c| 131 + net/openvswitch/conntrack.c| 146 +++-- net/sched/act_ct.c | 136 +++--- 4 files changed, 159 insertions(+), 258 deletions(-) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next 2/5] openvswitch: return NF_ACCEPT when OVS_CT_NAT is net set in info nat
Either OVS_CT_SRC_NAT or OVS_CT_DST_NAT is set, OVS_CT_NAT must be set in info->nat. Thus, if OVS_CT_NAT is not set in info->nat, it will definitely not do NAT but returns NF_ACCEPT in ovs_ct_nat(). This patch changes nothing funcational but only makes this return earlier in ovs_ct_nat() to keep consistent with TC's processing in tcf_ct_act_nat(). Signed-off-by: Xin Long --- net/openvswitch/conntrack.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 4c5e5a6475af..cc643a556ea1 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -816,6 +816,9 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key *key, enum nf_nat_manip_type maniptype; int err; + if (!(info->nat & OVS_CT_NAT)) + return NF_ACCEPT; + /* Add NAT extension if not confirmed yet. */ if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct)) return NF_ACCEPT; /* Can't NAT. */ @@ -825,8 +828,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key *key, * Make sure new expected connections (IP_CT_RELATED) are NATted only * when committing. */ - if (info->nat & OVS_CT_NAT && ctinfo != IP_CT_NEW && - ct->status & IPS_NAT_MASK && + if (ctinfo != IP_CT_NEW && ct->status & IPS_NAT_MASK && (ctinfo != IP_CT_RELATED || info->commit)) { /* NAT an established or related connection like before. */ if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv4 net-next 4/4] net: sched: add helper support in act_ct
This patch is to add helper support in act_ct for OVS actions=ct(alg=xxx) offloading, which is corresponding to Commit cae3a2627520 ("openvswitch: Allow attaching helpers to ct action") in OVS kernel part. The difference is when adding TC actions family and proto cannot be got from the filter/match, other than helper name in tb[TCA_CT_HELPER_NAME], we also need to send the family in tb[TCA_CT_HELPER_FAMILY] and the proto in tb[TCA_CT_HELPER_PROTO] to kernel. Acked-by: Marcelo Ricardo Leitner Signed-off-by: Xin Long --- include/net/tc_act/tc_ct.h| 1 + include/uapi/linux/tc_act/tc_ct.h | 3 ++ net/sched/act_ct.c| 89 --- 3 files changed, 85 insertions(+), 8 deletions(-) diff --git a/include/net/tc_act/tc_ct.h b/include/net/tc_act/tc_ct.h index 8250d6f0a462..b24ea2d9400b 100644 --- a/include/net/tc_act/tc_ct.h +++ b/include/net/tc_act/tc_ct.h @@ -10,6 +10,7 @@ #include struct tcf_ct_params { + struct nf_conntrack_helper *helper; struct nf_conn *tmpl; u16 zone; diff --git a/include/uapi/linux/tc_act/tc_ct.h b/include/uapi/linux/tc_act/tc_ct.h index 5fb1d7ac1027..6c5200f0ed38 100644 --- a/include/uapi/linux/tc_act/tc_ct.h +++ b/include/uapi/linux/tc_act/tc_ct.h @@ -22,6 +22,9 @@ enum { TCA_CT_NAT_PORT_MIN,/* be16 */ TCA_CT_NAT_PORT_MAX,/* be16 */ TCA_CT_PAD, + TCA_CT_HELPER_NAME, /* string */ + TCA_CT_HELPER_FAMILY, /* u8 */ + TCA_CT_HELPER_PROTO,/* u8 */ __TCA_CT_MAX }; diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 193a460a9d7f..da0b7f665277 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -33,6 +33,7 @@ #include #include #include +#include #include static struct workqueue_struct *act_ct_wq; @@ -655,7 +656,7 @@ struct tc_ct_action_net { /* Determine whether skb->_nfct is equal to the result of conntrack lookup. */ static bool tcf_ct_skb_nfct_cached(struct net *net, struct sk_buff *skb, - u16 zone_id, bool force) + struct tcf_ct_params *p) { enum ip_conntrack_info ctinfo; struct nf_conn *ct; @@ -665,11 +666,19 @@ static bool tcf_ct_skb_nfct_cached(struct net *net, struct sk_buff *skb, return false; if (!net_eq(net, read_pnet(>ct_net))) goto drop_ct; - if (nf_ct_zone(ct)->id != zone_id) + if (nf_ct_zone(ct)->id != p->zone) goto drop_ct; + if (p->helper) { + struct nf_conn_help *help; + + help = nf_ct_ext_find(ct, NF_CT_EXT_HELPER); + if (help && rcu_access_pointer(help->helper) != p->helper) + goto drop_ct; + } /* Force conntrack entry direction. */ - if (force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) { + if ((p->ct_action & TCA_CT_ACT_FORCE) && + CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) { if (nf_ct_is_confirmed(ct)) nf_ct_kill(ct); @@ -832,6 +841,13 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, static void tcf_ct_params_free(struct tcf_ct_params *params) { + if (params->helper) { +#if IS_ENABLED(CONFIG_NF_NAT) + if (params->ct_action & TCA_CT_ACT_NAT) + nf_nat_helper_put(params->helper); +#endif + nf_conntrack_helper_put(params->helper); + } if (params->ct_ft) tcf_ct_flow_table_put(params->ct_ft); if (params->tmpl) @@ -1026,13 +1042,14 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, struct tcf_result *res) { struct net *net = dev_net(skb->dev); - bool cached, commit, clear, force; enum ip_conntrack_info ctinfo; struct tcf_ct *c = to_ct(a); struct nf_conn *tmpl = NULL; struct nf_hook_state state; + bool cached, commit, clear; int nh_ofs, err, retval; struct tcf_ct_params *p; + bool add_helper = false; bool skip_add = false; bool defrag = false; struct nf_conn *ct; @@ -1043,7 +1060,6 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, retval = READ_ONCE(c->tcf_action); commit = p->ct_action & TCA_CT_ACT_COMMIT; clear = p->ct_action & TCA_CT_ACT_CLEAR; - force = p->ct_action & TCA_CT_ACT_FORCE; tmpl = p->tmpl; tcf_lastuse_update(>tcf_tm); @@ -1086,7 +1102,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, * actually run the packet through conntrack twice unless it's for a * different zone. */ - cached = tcf_ct_skb_nfct_cached(net, skb, p->zone, force); + ca
[ovs-dev] [PATCHv4 net-next 3/4] net: sched: call tcf_ct_params_free to free params in tcf_ct_init
This patch is to make the err path simple by calling tcf_ct_params_free(), so that it won't cause problems when more members are added into param and need freeing on the err path. Acked-by: Marcelo Ricardo Leitner Signed-off-by: Xin Long --- net/sched/act_ct.c | 35 ++- 1 file changed, 18 insertions(+), 17 deletions(-) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index b38d91d6b249..193a460a9d7f 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -345,11 +345,9 @@ static void tcf_ct_flow_table_cleanup_work(struct work_struct *work) module_put(THIS_MODULE); } -static void tcf_ct_flow_table_put(struct tcf_ct_params *params) +static void tcf_ct_flow_table_put(struct tcf_ct_flow_table *ct_ft) { - struct tcf_ct_flow_table *ct_ft = params->ct_ft; - - if (refcount_dec_and_test(>ct_ft->ref)) { + if (refcount_dec_and_test(_ft->ref)) { rhashtable_remove_fast(_ht, _ft->node, zones_params); INIT_RCU_WORK(_ft->rwork, tcf_ct_flow_table_cleanup_work); queue_rcu_work(act_ct_wq, _ft->rwork); @@ -832,18 +830,23 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, return err; } -static void tcf_ct_params_free(struct rcu_head *head) +static void tcf_ct_params_free(struct tcf_ct_params *params) { - struct tcf_ct_params *params = container_of(head, - struct tcf_ct_params, rcu); - - tcf_ct_flow_table_put(params); - + if (params->ct_ft) + tcf_ct_flow_table_put(params->ct_ft); if (params->tmpl) nf_ct_put(params->tmpl); kfree(params); } +static void tcf_ct_params_free_rcu(struct rcu_head *head) +{ + struct tcf_ct_params *params; + + params = container_of(head, struct tcf_ct_params, rcu); + tcf_ct_params_free(params); +} + #if IS_ENABLED(CONFIG_NF_NAT) /* Modelled after nf_nat_ipv[46]_fn(). * range is only used for new, uninitialized NAT state. @@ -1390,7 +1393,7 @@ static int tcf_ct_init(struct net *net, struct nlattr *nla, err = tcf_ct_flow_table_get(net, params); if (err) - goto cleanup_params; + goto cleanup; spin_lock_bh(>tcf_lock); goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); @@ -1401,17 +1404,15 @@ static int tcf_ct_init(struct net *net, struct nlattr *nla, if (goto_ch) tcf_chain_put_by_act(goto_ch); if (params) - call_rcu(>rcu, tcf_ct_params_free); + call_rcu(>rcu, tcf_ct_params_free_rcu); return res; -cleanup_params: - if (params->tmpl) - nf_ct_put(params->tmpl); cleanup: if (goto_ch) tcf_chain_put_by_act(goto_ch); - kfree(params); + if (params) + tcf_ct_params_free(params); tcf_idr_release(*a, bind); return err; } @@ -1423,7 +1424,7 @@ static void tcf_ct_cleanup(struct tc_action *a) params = rcu_dereference_protected(c->params, 1); if (params) - call_rcu(>rcu, tcf_ct_params_free); + call_rcu(>rcu, tcf_ct_params_free_rcu); } static int tcf_ct_dump_key_val(struct sk_buff *skb, -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv4 net-next 2/4] net: move add ct helper function to nf_conntrack_helper for ovs and tc
Move ovs_ct_add_helper from openvswitch to nf_conntrack_helper and rename as nf_ct_add_helper, so that it can be used in TC act_ct in the next patch. Acked-by: Marcelo Ricardo Leitner Signed-off-by: Xin Long --- include/net/netfilter/nf_conntrack_helper.h | 2 + net/netfilter/nf_conntrack_helper.c | 31 +++ net/openvswitch/conntrack.c | 44 +++-- 3 files changed, 38 insertions(+), 39 deletions(-) diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h index b6676249eeeb..f30b1694b690 100644 --- a/include/net/netfilter/nf_conntrack_helper.h +++ b/include/net/netfilter/nf_conntrack_helper.h @@ -117,6 +117,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, int nf_ct_helper(struct sk_buff *skb, struct nf_conn *ct, enum ip_conntrack_info ctinfo, u16 proto); +int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family, +u8 proto, bool nat, struct nf_conntrack_helper **hp); void nf_ct_helper_destroy(struct nf_conn *ct); diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c index 88039eedadea..48ea6d0264b5 100644 --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -309,6 +309,37 @@ int nf_ct_helper(struct sk_buff *skb, struct nf_conn *ct, } EXPORT_SYMBOL_GPL(nf_ct_helper); +int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family, +u8 proto, bool nat, struct nf_conntrack_helper **hp) +{ + struct nf_conntrack_helper *helper; + struct nf_conn_help *help; + int ret = 0; + + helper = nf_conntrack_helper_try_module_get(name, family, proto); + if (!helper) + return -EINVAL; + + help = nf_ct_helper_ext_add(ct, GFP_KERNEL); + if (!help) { + nf_conntrack_helper_put(helper); + return -ENOMEM; + } +#if IS_ENABLED(CONFIG_NF_NAT) + if (nat) { + ret = nf_nat_helper_try_module_get(name, family, proto); + if (ret) { + nf_conntrack_helper_put(helper); + return ret; + } + } +#endif + rcu_assign_pointer(help->helper, helper); + *hp = helper; + return ret; +} +EXPORT_SYMBOL_GPL(nf_ct_add_helper); + /* appropriate ct lock protecting must be taken by caller */ static int unhelp(struct nf_conn *ct, void *me) { diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 18f54fa38e8f..4348321856af 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -1291,43 +1291,6 @@ int ovs_ct_clear(struct sk_buff *skb, struct sw_flow_key *key) return 0; } -static int ovs_ct_add_helper(struct ovs_conntrack_info *info, const char *name, -const struct sw_flow_key *key, bool log) -{ - struct nf_conntrack_helper *helper; - struct nf_conn_help *help; - int ret = 0; - - helper = nf_conntrack_helper_try_module_get(name, info->family, - key->ip.proto); - if (!helper) { - OVS_NLERR(log, "Unknown helper \"%s\"", name); - return -EINVAL; - } - - help = nf_ct_helper_ext_add(info->ct, GFP_KERNEL); - if (!help) { - nf_conntrack_helper_put(helper); - return -ENOMEM; - } - -#if IS_ENABLED(CONFIG_NF_NAT) - if (info->nat) { - ret = nf_nat_helper_try_module_get(name, info->family, - key->ip.proto); - if (ret) { - nf_conntrack_helper_put(helper); - OVS_NLERR(log, "Failed to load \"%s\" NAT helper, error: %d", - name, ret); - return ret; - } - } -#endif - rcu_assign_pointer(help->helper, helper); - info->helper = helper; - return ret; -} - #if IS_ENABLED(CONFIG_NF_NAT) static int parse_nat(const struct nlattr *attr, struct ovs_conntrack_info *info, bool log) @@ -1661,9 +1624,12 @@ int ovs_ct_copy_action(struct net *net, const struct nlattr *attr, } if (helper) { - err = ovs_ct_add_helper(_info, helper, key, log); - if (err) + err = nf_ct_add_helper(ct_info.ct, helper, ct_info.family, + key->ip.proto, ct_info.nat, _info.helper); + if (err) { + OVS_NLERR(log, "Failed to add %s helper %d", helper, err); goto err_free_ct; + } } err = ovs_nla_add_action(sfa, OVS_ACTION_ATTR_CT, _info, -- 2.31.1 __
[ovs-dev] [PATCHv4 net-next 1/4] net: move the ct helper function to nf_conntrack_helper for ovs and tc
Move ovs_ct_helper from openvswitch to nf_conntrack_helper and rename as nf_ct_helper so that it can be used in TC act_ct in the next patch. Note that it also adds the checks for the family and proto, as in TC act_ct, the packets with correct family and proto are not guaranteed. Acked-by: Marcelo Ricardo Leitner Signed-off-by: Xin Long --- include/net/netfilter/nf_conntrack_helper.h | 3 + net/netfilter/nf_conntrack_helper.c | 69 + net/openvswitch/conntrack.c | 61 +- 3 files changed, 73 insertions(+), 60 deletions(-) diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h index 9939c366f720..b6676249eeeb 100644 --- a/include/net/netfilter/nf_conntrack_helper.h +++ b/include/net/netfilter/nf_conntrack_helper.h @@ -115,6 +115,9 @@ struct nf_conn_help *nf_ct_helper_ext_add(struct nf_conn *ct, gfp_t gfp); int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, gfp_t flags); +int nf_ct_helper(struct sk_buff *skb, struct nf_conn *ct, +enum ip_conntrack_info ctinfo, u16 proto); + void nf_ct_helper_destroy(struct nf_conn *ct); static inline struct nf_conn_help *nfct_help(const struct nf_conn *ct) diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c index ff737a76052e..88039eedadea 100644 --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -26,7 +26,9 @@ #include #include #include +#include #include +#include static DEFINE_MUTEX(nf_ct_helper_mutex); struct hlist_head *nf_ct_helper_hash __read_mostly; @@ -240,6 +242,73 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, } EXPORT_SYMBOL_GPL(__nf_ct_try_assign_helper); +/* 'skb' should already be pulled to nh_ofs. */ +int nf_ct_helper(struct sk_buff *skb, struct nf_conn *ct, +enum ip_conntrack_info ctinfo, u16 proto) +{ + const struct nf_conntrack_helper *helper; + const struct nf_conn_help *help; + unsigned int protoff; + int err; + + if (ctinfo == IP_CT_RELATED_REPLY) + return NF_ACCEPT; + + help = nfct_help(ct); + if (!help) + return NF_ACCEPT; + + helper = rcu_dereference(help->helper); + if (!helper) + return NF_ACCEPT; + + if (helper->tuple.src.l3num != NFPROTO_UNSPEC && + helper->tuple.src.l3num != proto) + return NF_ACCEPT; + + switch (proto) { + case NFPROTO_IPV4: + protoff = ip_hdrlen(skb); + proto = ip_hdr(skb)->protocol; + break; + case NFPROTO_IPV6: { + u8 nexthdr = ipv6_hdr(skb)->nexthdr; + __be16 frag_off; + int ofs; + + ofs = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), , + _off); + if (ofs < 0 || (frag_off & htons(~0x7)) != 0) { + pr_debug("proto header not found\n"); + return NF_ACCEPT; + } + protoff = ofs; + proto = nexthdr; + break; + } + default: + WARN_ONCE(1, "helper invoked on non-IP family!"); + return NF_DROP; + } + + if (helper->tuple.dst.protonum != proto) + return NF_ACCEPT; + + err = helper->help(skb, protoff, ct, ctinfo); + if (err != NF_ACCEPT) + return err; + + /* Adjust seqs after helper. This is needed due to some helpers (e.g., +* FTP with NAT) adusting the TCP payload size when mangling IP +* addresses and/or port numbers in the text-based control connection. +*/ + if (test_bit(IPS_SEQ_ADJUST_BIT, >status) && + !nf_ct_seq_adjust(skb, ct, ctinfo, protoff)) + return NF_DROP; + return NF_ACCEPT; +} +EXPORT_SYMBOL_GPL(nf_ct_helper); + /* appropriate ct lock protecting must be taken by caller */ static int unhelp(struct nf_conn *ct, void *me) { diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index c7b10234cf7c..18f54fa38e8f 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -434,65 +434,6 @@ static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key, return 0; } -/* 'skb' should already be pulled to nh_ofs. */ -static int ovs_ct_helper(struct sk_buff *skb, u16 proto) -{ - const struct nf_conntrack_helper *helper; - const struct nf_conn_help *help; - enum ip_conntrack_info ctinfo; - unsigned int protoff; - struct nf_conn *ct; - int err; - - ct = nf_ct_get(skb, ); - if (!ct || ctinfo == IP_CT_RELATED_REPLY) - return NF_ACCEPT; - - help = nfct_help(ct); - if (!help) -
[ovs-dev] [PATCHv4 net-next 0/4] net: add helper support in tc act_ct for ovs offloading
Ilya reported an issue that FTP traffic would be broken when the OVS flow with ct(commit,alg=ftp) installed in the OVS kernel module, and it was caused by that TC didn't support the ftp helper offloaded from OVS. This patchset is to add the helper support in act_ct for OVS offloading in kernel net/sched. The 1st and 2nd patches move some common code into nf_conntrack_helper from openvswitch so that they could be used by net/sched in the 4th patch (Note there are still some other common code used in both OVS and TC, and I will extract it in other patches). The 3rd patch extracts another function in net/sched to make the 4th patch easier to write. The 4th patch adds this feature in net/sched. The user space part will be added in another patch, and with it these OVS flows (FTP over SNAT) can be used to test this feature: table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk \ actions=ct(table=1, nat), normal table=0, in_port=veth2,tcp,ct_state=-trk actions=ct(table=1, nat) table=0, in_port=veth1,tcp,ct_state=-trk actions=ct(table=0, nat) table=0, in_port=veth1,tcp,ct_state=+trk+rel actions=ct(commit, nat),normal table=0, in_port=veth1,tcp,ct_state=+trk+est actions=veth2" table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new \ actions=ct(commit, nat(src=7.7.16.1), alg=ftp),normal" table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+est actions=veth2" table=1, in_port=veth2,tcp,ct_state=+trk+est actions=veth1" v1->v2: - add the 2nd patch to extract nf_ct_add_helper from openvswitch for tc act_ct use. - go to drop instead of return -EINVAL when fails to add SEQADJ ext in tcf_ct_act() as Paolo noticed in the 4th patch. - add ct exts only when the ct is not confirmed as Pablo noticed in the 4th patch. v2->v3: - fix a warning of unused variable 'err' when CONFIG_NF_NAT is disabled in 2nd patch. v3->v4: - have the nf_conn and ip_conntrack_info passed into nf_ct_helper() as Aaron suggested in the 1st patch. - no need to pass the 'force' into tcf_ct_skb_nfct_cached() as Marcelo noticed, and remove the unnecessary variable 'force' in tcf_ct_act() in the 4th patch. - fix a typo err in the cover letter as Marcelo noticed. Xin Long (4): net: move the ct helper function to nf_conntrack_helper for ovs and tc net: move add ct helper function to nf_conntrack_helper for ovs and tc net: sched: call tcf_ct_params_free to free params in tcf_ct_init net: sched: add helper support in act_ct include/net/netfilter/nf_conntrack_helper.h | 5 + include/net/tc_act/tc_ct.h | 1 + include/uapi/linux/tc_act/tc_ct.h | 3 + net/netfilter/nf_conntrack_helper.c | 100 net/openvswitch/conntrack.c | 105 + net/sched/act_ct.c | 124 6 files changed, 214 insertions(+), 124 deletions(-) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv3 net-next 4/4] net: sched: add helper support in act_ct
On Tue, Nov 1, 2022 at 11:00 AM Marcelo Ricardo Leitner wrote: > > On Mon, Oct 31, 2022 at 11:36:10AM -0400, Xin Long wrote: > ... > > --- a/net/sched/act_ct.c > > +++ b/net/sched/act_ct.c > > @@ -33,6 +33,7 @@ > > #include > > #include > > #include > > +#include > > #include > > > > static struct workqueue_struct *act_ct_wq; > > @@ -655,7 +656,7 @@ struct tc_ct_action_net { > > > > /* Determine whether skb->_nfct is equal to the result of conntrack > > lookup. */ > > static bool tcf_ct_skb_nfct_cached(struct net *net, struct sk_buff *skb, > > -u16 zone_id, bool force) > > +struct tcf_ct_params *p, bool force) > > Nit, it could have fetched 'force' from p->ct_action too then, as it > is only used in this function. right, we can save one variable here. > > There's a typo in Ilya's name in the cover letter. Ah sorry, it was a letter missed when copying from the last cover. > > Other than this, LGTM. > Acked-by: Marcelo Ricardo Leitner Thanks! ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv3 net-next 1/4] net: move the ct helper function to nf_conntrack_helper for ovs and tc
On Wed, Nov 2, 2022 at 3:21 PM Aaron Conole wrote: > > Xin Long writes: > > > Move ovs_ct_helper from openvswitch to nf_conntrack_helper and rename > > as nf_ct_helper so that it can be used in TC act_ct in the next patch. > > Note that it also adds the checks for the family and proto, as in TC > > act_ct, the packets with correct family and proto are not guaranteed. > > > > Signed-off-by: Xin Long > > --- > > Hi Xin, > > > include/net/netfilter/nf_conntrack_helper.h | 2 + > > net/netfilter/nf_conntrack_helper.c | 71 + > > net/openvswitch/conntrack.c | 61 +- > > 3 files changed, 74 insertions(+), 60 deletions(-) > > > > diff --git a/include/net/netfilter/nf_conntrack_helper.h > > b/include/net/netfilter/nf_conntrack_helper.h > > index 9939c366f720..6c32e59fc16f 100644 > > --- a/include/net/netfilter/nf_conntrack_helper.h > > +++ b/include/net/netfilter/nf_conntrack_helper.h > > @@ -115,6 +115,8 @@ struct nf_conn_help *nf_ct_helper_ext_add(struct > > nf_conn *ct, gfp_t gfp); > > int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, > > gfp_t flags); > > > > +int nf_ct_helper(struct sk_buff *skb, u16 proto); > > + > > void nf_ct_helper_destroy(struct nf_conn *ct); > > > > static inline struct nf_conn_help *nfct_help(const struct nf_conn *ct) > > diff --git a/net/netfilter/nf_conntrack_helper.c > > b/net/netfilter/nf_conntrack_helper.c > > index ff737a76052e..83615e479f87 100644 > > --- a/net/netfilter/nf_conntrack_helper.c > > +++ b/net/netfilter/nf_conntrack_helper.c > > @@ -26,7 +26,9 @@ > > #include > > #include > > #include > > +#include > > #include > > +#include > > > > static DEFINE_MUTEX(nf_ct_helper_mutex); > > struct hlist_head *nf_ct_helper_hash __read_mostly; > > @@ -240,6 +242,75 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, > > struct nf_conn *tmpl, > > } > > EXPORT_SYMBOL_GPL(__nf_ct_try_assign_helper); > > > > +/* 'skb' should already be pulled to nh_ofs. */ > > +int nf_ct_helper(struct sk_buff *skb, u16 proto) > > AFAICT, in all the places we call this we will have the nf_conn and > ip_conntrack_info already. Maybe it makes sense to pass them here > rather than looking up again? Originally, that information wasn't > available in the calling function - over time this has been added so we > might as well reduce the amount of "extra lookups" performed. I just tried to keep nf_ct_helper() as similar as possible to the original ovs_ct_helper(). But sure, we can also pass ct and ctinfo as the arguments, like some other functions where it passes all of skb, ct and ctinfo. Thanks. > > > +{ > > + const struct nf_conntrack_helper *helper; > > + const struct nf_conn_help *help; > > + enum ip_conntrack_info ctinfo; > > + unsigned int protoff; > > + struct nf_conn *ct; > > + int err; > > + > > + ct = nf_ct_get(skb, ); > > + if (!ct || ctinfo == IP_CT_RELATED_REPLY) > > + return NF_ACCEPT; > > + > > + help = nfct_help(ct); > > + if (!help) > > + return NF_ACCEPT; > > + > > + helper = rcu_dereference(help->helper); > > + if (!helper) > > + return NF_ACCEPT; > > + > > + if (helper->tuple.src.l3num != NFPROTO_UNSPEC && > > + helper->tuple.src.l3num != proto) > > + return NF_ACCEPT; > > + > > + switch (proto) { > > + case NFPROTO_IPV4: > > + protoff = ip_hdrlen(skb); > > + proto = ip_hdr(skb)->protocol; > > + break; > > + case NFPROTO_IPV6: { > > + u8 nexthdr = ipv6_hdr(skb)->nexthdr; > > + __be16 frag_off; > > + int ofs; > > + > > + ofs = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), , > > +_off); > > + if (ofs < 0 || (frag_off & htons(~0x7)) != 0) { > > + pr_debug("proto header not found\n"); > > + return NF_ACCEPT; > > + } > > + protoff = ofs; > > + proto = nexthdr; > > + break; > > + } > > + default: > > + WARN_ONCE(1, "helper invoked on non-IP family!"); > > + return NF_DROP; > > + } > > + >
[ovs-dev] [PATCHv3 net-next 4/4] net: sched: add helper support in act_ct
This patch is to add helper support in act_ct for OVS actions=ct(alg=xxx) offloading, which is corresponding to Commit cae3a2627520 ("openvswitch: Allow attaching helpers to ct action") in OVS kernel part. The difference is when adding TC actions family and proto cannot be got from the filter/match, other than helper name in tb[TCA_CT_HELPER_NAME], we also need to send the family in tb[TCA_CT_HELPER_FAMILY] and the proto in tb[TCA_CT_HELPER_PROTO] to kernel. Signed-off-by: Xin Long --- include/net/tc_act/tc_ct.h| 1 + include/uapi/linux/tc_act/tc_ct.h | 3 ++ net/sched/act_ct.c| 83 +-- 3 files changed, 82 insertions(+), 5 deletions(-) diff --git a/include/net/tc_act/tc_ct.h b/include/net/tc_act/tc_ct.h index 8250d6f0a462..b24ea2d9400b 100644 --- a/include/net/tc_act/tc_ct.h +++ b/include/net/tc_act/tc_ct.h @@ -10,6 +10,7 @@ #include struct tcf_ct_params { + struct nf_conntrack_helper *helper; struct nf_conn *tmpl; u16 zone; diff --git a/include/uapi/linux/tc_act/tc_ct.h b/include/uapi/linux/tc_act/tc_ct.h index 5fb1d7ac1027..6c5200f0ed38 100644 --- a/include/uapi/linux/tc_act/tc_ct.h +++ b/include/uapi/linux/tc_act/tc_ct.h @@ -22,6 +22,9 @@ enum { TCA_CT_NAT_PORT_MIN,/* be16 */ TCA_CT_NAT_PORT_MAX,/* be16 */ TCA_CT_PAD, + TCA_CT_HELPER_NAME, /* string */ + TCA_CT_HELPER_FAMILY, /* u8 */ + TCA_CT_HELPER_PROTO,/* u8 */ __TCA_CT_MAX }; diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 193a460a9d7f..85f4dc5650da 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -33,6 +33,7 @@ #include #include #include +#include #include static struct workqueue_struct *act_ct_wq; @@ -655,7 +656,7 @@ struct tc_ct_action_net { /* Determine whether skb->_nfct is equal to the result of conntrack lookup. */ static bool tcf_ct_skb_nfct_cached(struct net *net, struct sk_buff *skb, - u16 zone_id, bool force) + struct tcf_ct_params *p, bool force) { enum ip_conntrack_info ctinfo; struct nf_conn *ct; @@ -665,8 +666,15 @@ static bool tcf_ct_skb_nfct_cached(struct net *net, struct sk_buff *skb, return false; if (!net_eq(net, read_pnet(>ct_net))) goto drop_ct; - if (nf_ct_zone(ct)->id != zone_id) + if (nf_ct_zone(ct)->id != p->zone) goto drop_ct; + if (p->helper) { + struct nf_conn_help *help; + + help = nf_ct_ext_find(ct, NF_CT_EXT_HELPER); + if (help && rcu_access_pointer(help->helper) != p->helper) + goto drop_ct; + } /* Force conntrack entry direction. */ if (force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) { @@ -832,6 +840,13 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, static void tcf_ct_params_free(struct tcf_ct_params *params) { + if (params->helper) { +#if IS_ENABLED(CONFIG_NF_NAT) + if (params->ct_action & TCA_CT_ACT_NAT) + nf_nat_helper_put(params->helper); +#endif + nf_conntrack_helper_put(params->helper); + } if (params->ct_ft) tcf_ct_flow_table_put(params->ct_ft); if (params->tmpl) @@ -1033,6 +1048,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, struct nf_hook_state state; int nh_ofs, err, retval; struct tcf_ct_params *p; + bool add_helper = false; bool skip_add = false; bool defrag = false; struct nf_conn *ct; @@ -1086,7 +1102,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, * actually run the packet through conntrack twice unless it's for a * different zone. */ - cached = tcf_ct_skb_nfct_cached(net, skb, p->zone, force); + cached = tcf_ct_skb_nfct_cached(net, skb, p, force); if (!cached) { if (tcf_ct_flow_table_lookup(p, skb, family)) { skip_add = true; @@ -1119,6 +1135,22 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, if (err != NF_ACCEPT) goto drop; + if (!nf_ct_is_confirmed(ct) && commit && p->helper && !nfct_help(ct)) { + err = __nf_ct_try_assign_helper(ct, p->tmpl, GFP_ATOMIC); + if (err) + goto drop; + add_helper = true; + if (p->ct_action & TCA_CT_ACT_NAT && !nfct_seqadj(ct)) { + if (!nfct_seqadj_ext_add(ct)) + goto drop; + } + } + + if (nf_ct_is_confirmed(ct) ? ((!cached && !skip_add) || add_helper) : comm
[ovs-dev] [PATCHv3 net-next 3/4] net: sched: call tcf_ct_params_free to free params in tcf_ct_init
This patch is to make the err path simple by calling tcf_ct_params_free(), so that it won't cause problems when more members are added into param and need freeing on the err path. Signed-off-by: Xin Long --- net/sched/act_ct.c | 35 ++- 1 file changed, 18 insertions(+), 17 deletions(-) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index b38d91d6b249..193a460a9d7f 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -345,11 +345,9 @@ static void tcf_ct_flow_table_cleanup_work(struct work_struct *work) module_put(THIS_MODULE); } -static void tcf_ct_flow_table_put(struct tcf_ct_params *params) +static void tcf_ct_flow_table_put(struct tcf_ct_flow_table *ct_ft) { - struct tcf_ct_flow_table *ct_ft = params->ct_ft; - - if (refcount_dec_and_test(>ct_ft->ref)) { + if (refcount_dec_and_test(_ft->ref)) { rhashtable_remove_fast(_ht, _ft->node, zones_params); INIT_RCU_WORK(_ft->rwork, tcf_ct_flow_table_cleanup_work); queue_rcu_work(act_ct_wq, _ft->rwork); @@ -832,18 +830,23 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, return err; } -static void tcf_ct_params_free(struct rcu_head *head) +static void tcf_ct_params_free(struct tcf_ct_params *params) { - struct tcf_ct_params *params = container_of(head, - struct tcf_ct_params, rcu); - - tcf_ct_flow_table_put(params); - + if (params->ct_ft) + tcf_ct_flow_table_put(params->ct_ft); if (params->tmpl) nf_ct_put(params->tmpl); kfree(params); } +static void tcf_ct_params_free_rcu(struct rcu_head *head) +{ + struct tcf_ct_params *params; + + params = container_of(head, struct tcf_ct_params, rcu); + tcf_ct_params_free(params); +} + #if IS_ENABLED(CONFIG_NF_NAT) /* Modelled after nf_nat_ipv[46]_fn(). * range is only used for new, uninitialized NAT state. @@ -1390,7 +1393,7 @@ static int tcf_ct_init(struct net *net, struct nlattr *nla, err = tcf_ct_flow_table_get(net, params); if (err) - goto cleanup_params; + goto cleanup; spin_lock_bh(>tcf_lock); goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); @@ -1401,17 +1404,15 @@ static int tcf_ct_init(struct net *net, struct nlattr *nla, if (goto_ch) tcf_chain_put_by_act(goto_ch); if (params) - call_rcu(>rcu, tcf_ct_params_free); + call_rcu(>rcu, tcf_ct_params_free_rcu); return res; -cleanup_params: - if (params->tmpl) - nf_ct_put(params->tmpl); cleanup: if (goto_ch) tcf_chain_put_by_act(goto_ch); - kfree(params); + if (params) + tcf_ct_params_free(params); tcf_idr_release(*a, bind); return err; } @@ -1423,7 +1424,7 @@ static void tcf_ct_cleanup(struct tc_action *a) params = rcu_dereference_protected(c->params, 1); if (params) - call_rcu(>rcu, tcf_ct_params_free); + call_rcu(>rcu, tcf_ct_params_free_rcu); } static int tcf_ct_dump_key_val(struct sk_buff *skb, -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv3 net-next 2/4] net: move add ct helper function to nf_conntrack_helper for ovs and tc
Move ovs_ct_add_helper from openvswitch to nf_conntrack_helper and rename as nf_ct_add_helper, so that it can be used in TC act_ct in the next patch. Signed-off-by: Xin Long --- include/net/netfilter/nf_conntrack_helper.h | 2 + net/netfilter/nf_conntrack_helper.c | 31 +++ net/openvswitch/conntrack.c | 44 +++-- 3 files changed, 38 insertions(+), 39 deletions(-) diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h index 6c32e59fc16f..ad1adbfbeee2 100644 --- a/include/net/netfilter/nf_conntrack_helper.h +++ b/include/net/netfilter/nf_conntrack_helper.h @@ -116,6 +116,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, gfp_t flags); int nf_ct_helper(struct sk_buff *skb, u16 proto); +int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family, +u8 proto, bool nat, struct nf_conntrack_helper **hp); void nf_ct_helper_destroy(struct nf_conn *ct); diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c index 83615e479f87..1a2ab77d1bd7 100644 --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -311,6 +311,37 @@ int nf_ct_helper(struct sk_buff *skb, u16 proto) } EXPORT_SYMBOL_GPL(nf_ct_helper); +int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family, +u8 proto, bool nat, struct nf_conntrack_helper **hp) +{ + struct nf_conntrack_helper *helper; + struct nf_conn_help *help; + int ret = 0; + + helper = nf_conntrack_helper_try_module_get(name, family, proto); + if (!helper) + return -EINVAL; + + help = nf_ct_helper_ext_add(ct, GFP_KERNEL); + if (!help) { + nf_conntrack_helper_put(helper); + return -ENOMEM; + } +#if IS_ENABLED(CONFIG_NF_NAT) + if (nat) { + ret = nf_nat_helper_try_module_get(name, family, proto); + if (ret) { + nf_conntrack_helper_put(helper); + return ret; + } + } +#endif + rcu_assign_pointer(help->helper, helper); + *hp = helper; + return ret; +} +EXPORT_SYMBOL_GPL(nf_ct_add_helper); + /* appropriate ct lock protecting must be taken by caller */ static int unhelp(struct nf_conn *ct, void *me) { diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 19b5c54615c8..d37011e678c2 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -1291,43 +1291,6 @@ int ovs_ct_clear(struct sk_buff *skb, struct sw_flow_key *key) return 0; } -static int ovs_ct_add_helper(struct ovs_conntrack_info *info, const char *name, -const struct sw_flow_key *key, bool log) -{ - struct nf_conntrack_helper *helper; - struct nf_conn_help *help; - int ret = 0; - - helper = nf_conntrack_helper_try_module_get(name, info->family, - key->ip.proto); - if (!helper) { - OVS_NLERR(log, "Unknown helper \"%s\"", name); - return -EINVAL; - } - - help = nf_ct_helper_ext_add(info->ct, GFP_KERNEL); - if (!help) { - nf_conntrack_helper_put(helper); - return -ENOMEM; - } - -#if IS_ENABLED(CONFIG_NF_NAT) - if (info->nat) { - ret = nf_nat_helper_try_module_get(name, info->family, - key->ip.proto); - if (ret) { - nf_conntrack_helper_put(helper); - OVS_NLERR(log, "Failed to load \"%s\" NAT helper, error: %d", - name, ret); - return ret; - } - } -#endif - rcu_assign_pointer(help->helper, helper); - info->helper = helper; - return ret; -} - #if IS_ENABLED(CONFIG_NF_NAT) static int parse_nat(const struct nlattr *attr, struct ovs_conntrack_info *info, bool log) @@ -1661,9 +1624,12 @@ int ovs_ct_copy_action(struct net *net, const struct nlattr *attr, } if (helper) { - err = ovs_ct_add_helper(_info, helper, key, log); - if (err) + err = nf_ct_add_helper(ct_info.ct, helper, ct_info.family, + key->ip.proto, ct_info.nat, _info.helper); + if (err) { + OVS_NLERR(log, "Failed to add %s helper %d", helper, err); goto err_free_ct; + } } err = ovs_nla_add_action(sfa, OVS_ACTION_ATTR_CT, _info, -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv3 net-next 1/4] net: move the ct helper function to nf_conntrack_helper for ovs and tc
Move ovs_ct_helper from openvswitch to nf_conntrack_helper and rename as nf_ct_helper so that it can be used in TC act_ct in the next patch. Note that it also adds the checks for the family and proto, as in TC act_ct, the packets with correct family and proto are not guaranteed. Signed-off-by: Xin Long --- include/net/netfilter/nf_conntrack_helper.h | 2 + net/netfilter/nf_conntrack_helper.c | 71 + net/openvswitch/conntrack.c | 61 +- 3 files changed, 74 insertions(+), 60 deletions(-) diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h index 9939c366f720..6c32e59fc16f 100644 --- a/include/net/netfilter/nf_conntrack_helper.h +++ b/include/net/netfilter/nf_conntrack_helper.h @@ -115,6 +115,8 @@ struct nf_conn_help *nf_ct_helper_ext_add(struct nf_conn *ct, gfp_t gfp); int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, gfp_t flags); +int nf_ct_helper(struct sk_buff *skb, u16 proto); + void nf_ct_helper_destroy(struct nf_conn *ct); static inline struct nf_conn_help *nfct_help(const struct nf_conn *ct) diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c index ff737a76052e..83615e479f87 100644 --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -26,7 +26,9 @@ #include #include #include +#include #include +#include static DEFINE_MUTEX(nf_ct_helper_mutex); struct hlist_head *nf_ct_helper_hash __read_mostly; @@ -240,6 +242,75 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, } EXPORT_SYMBOL_GPL(__nf_ct_try_assign_helper); +/* 'skb' should already be pulled to nh_ofs. */ +int nf_ct_helper(struct sk_buff *skb, u16 proto) +{ + const struct nf_conntrack_helper *helper; + const struct nf_conn_help *help; + enum ip_conntrack_info ctinfo; + unsigned int protoff; + struct nf_conn *ct; + int err; + + ct = nf_ct_get(skb, ); + if (!ct || ctinfo == IP_CT_RELATED_REPLY) + return NF_ACCEPT; + + help = nfct_help(ct); + if (!help) + return NF_ACCEPT; + + helper = rcu_dereference(help->helper); + if (!helper) + return NF_ACCEPT; + + if (helper->tuple.src.l3num != NFPROTO_UNSPEC && + helper->tuple.src.l3num != proto) + return NF_ACCEPT; + + switch (proto) { + case NFPROTO_IPV4: + protoff = ip_hdrlen(skb); + proto = ip_hdr(skb)->protocol; + break; + case NFPROTO_IPV6: { + u8 nexthdr = ipv6_hdr(skb)->nexthdr; + __be16 frag_off; + int ofs; + + ofs = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), , + _off); + if (ofs < 0 || (frag_off & htons(~0x7)) != 0) { + pr_debug("proto header not found\n"); + return NF_ACCEPT; + } + protoff = ofs; + proto = nexthdr; + break; + } + default: + WARN_ONCE(1, "helper invoked on non-IP family!"); + return NF_DROP; + } + + if (helper->tuple.dst.protonum != proto) + return NF_ACCEPT; + + err = helper->help(skb, protoff, ct, ctinfo); + if (err != NF_ACCEPT) + return err; + + /* Adjust seqs after helper. This is needed due to some helpers (e.g., +* FTP with NAT) adusting the TCP payload size when mangling IP +* addresses and/or port numbers in the text-based control connection. +*/ + if (test_bit(IPS_SEQ_ADJUST_BIT, >status) && + !nf_ct_seq_adjust(skb, ct, ctinfo, protoff)) + return NF_DROP; + return NF_ACCEPT; +} +EXPORT_SYMBOL_GPL(nf_ct_helper); + /* appropriate ct lock protecting must be taken by caller */ static int unhelp(struct nf_conn *ct, void *me) { diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index c7b10234cf7c..19b5c54615c8 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -434,65 +434,6 @@ static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key, return 0; } -/* 'skb' should already be pulled to nh_ofs. */ -static int ovs_ct_helper(struct sk_buff *skb, u16 proto) -{ - const struct nf_conntrack_helper *helper; - const struct nf_conn_help *help; - enum ip_conntrack_info ctinfo; - unsigned int protoff; - struct nf_conn *ct; - int err; - - ct = nf_ct_get(skb, ); - if (!ct || ctinfo == IP_CT_RELATED_REPLY) - return NF_ACCEPT; - - help = nfct_help(ct); - if (!help) - return NF_ACCEPT; - - helper = rcu_dereference(h
[ovs-dev] [PATCHv3 net-next 0/4] net: add helper support in tc act_ct for ovs offloading
lya reported an issue that FTP traffic would be broken when the OVS flow with ct(commit,alg=ftp) installed in the OVS kernel module, and it was caused by that TC didn't support the ftp helper offloaded from OVS. This patchset is to add the helper support in act_ct for OVS offloading in kernel net/sched. The 1st and 2nd patches move some common code into nf_conntrack_helper from openvswitch so that they could be used by net/sched in the 4th patch (Note there are still some other common code used in both OVS and TC, and I will extract it in other patches). The 3rd patch extracts another function in net/sched to make the 4th patch easier to write. The 4th patch adds this feature in net/sched. The user space part will be added in another patch, and with it these OVS flows (FTP over SNAT) can be used to test this feature: table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk \ actions=ct(table=1, nat), normal table=0, in_port=veth2,tcp,ct_state=-trk actions=ct(table=1, nat) table=0, in_port=veth1,tcp,ct_state=-trk actions=ct(table=0, nat) table=0, in_port=veth1,tcp,ct_state=+trk+rel actions=ct(commit, nat),normal table=0, in_port=veth1,tcp,ct_state=+trk+est actions=veth2" table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new \ actions=ct(commit, nat(src=7.7.16.1), alg=ftp),normal" table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+est actions=veth2" table=1, in_port=veth2,tcp,ct_state=+trk+est actions=veth1" v1->v2: - go to drop instead of return -EINVAL when fails to add SEQADJ ext in tcf_ct_act() as Paolo noticed. - add the 2nd patch to extract nf_ct_add_helper from openvswitch for tc act_ct use. - add ct exts only when the ct is not confirmed as Pablo noticed. v2->v3: - fix a warning of unused variable 'err' when CONFIG_NF_NAT is disabled. Xin Long (4): net: move the ct helper function to nf_conntrack_helper for ovs and tc net: move add ct helper function to nf_conntrack_helper for ovs and tc net: sched: call tcf_ct_params_free to free params in tcf_ct_init net: sched: add helper support in act_ct include/net/netfilter/nf_conntrack_helper.h | 4 + include/net/tc_act/tc_ct.h | 1 + include/uapi/linux/tc_act/tc_ct.h | 3 + net/netfilter/nf_conntrack_helper.c | 102 + net/openvswitch/conntrack.c | 105 + net/sched/act_ct.c | 118 6 files changed, 212 insertions(+), 121 deletions(-) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next 4/4] net: sched: add helper support in act_ct
This patch is to add helper support in act_ct for OVS actions=ct(alg=xxx) offloading, which is corresponding to Commit cae3a2627520 ("openvswitch: Allow attaching helpers to ct action") in OVS kernel part. The difference is when adding TC actions family and proto cannot be got from the filter/match, other than helper name in tb[TCA_CT_HELPER_NAME], we also need to send the family in tb[TCA_CT_HELPER_FAMILY] and the proto in tb[TCA_CT_HELPER_PROTO] to kernel. Signed-off-by: Xin Long --- include/net/tc_act/tc_ct.h| 1 + include/uapi/linux/tc_act/tc_ct.h | 3 ++ net/sched/act_ct.c| 83 +-- 3 files changed, 82 insertions(+), 5 deletions(-) diff --git a/include/net/tc_act/tc_ct.h b/include/net/tc_act/tc_ct.h index 8250d6f0a462..b24ea2d9400b 100644 --- a/include/net/tc_act/tc_ct.h +++ b/include/net/tc_act/tc_ct.h @@ -10,6 +10,7 @@ #include struct tcf_ct_params { + struct nf_conntrack_helper *helper; struct nf_conn *tmpl; u16 zone; diff --git a/include/uapi/linux/tc_act/tc_ct.h b/include/uapi/linux/tc_act/tc_ct.h index 5fb1d7ac1027..6c5200f0ed38 100644 --- a/include/uapi/linux/tc_act/tc_ct.h +++ b/include/uapi/linux/tc_act/tc_ct.h @@ -22,6 +22,9 @@ enum { TCA_CT_NAT_PORT_MIN,/* be16 */ TCA_CT_NAT_PORT_MAX,/* be16 */ TCA_CT_PAD, + TCA_CT_HELPER_NAME, /* string */ + TCA_CT_HELPER_FAMILY, /* u8 */ + TCA_CT_HELPER_PROTO,/* u8 */ __TCA_CT_MAX }; diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 193a460a9d7f..85f4dc5650da 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -33,6 +33,7 @@ #include #include #include +#include #include static struct workqueue_struct *act_ct_wq; @@ -655,7 +656,7 @@ struct tc_ct_action_net { /* Determine whether skb->_nfct is equal to the result of conntrack lookup. */ static bool tcf_ct_skb_nfct_cached(struct net *net, struct sk_buff *skb, - u16 zone_id, bool force) + struct tcf_ct_params *p, bool force) { enum ip_conntrack_info ctinfo; struct nf_conn *ct; @@ -665,8 +666,15 @@ static bool tcf_ct_skb_nfct_cached(struct net *net, struct sk_buff *skb, return false; if (!net_eq(net, read_pnet(>ct_net))) goto drop_ct; - if (nf_ct_zone(ct)->id != zone_id) + if (nf_ct_zone(ct)->id != p->zone) goto drop_ct; + if (p->helper) { + struct nf_conn_help *help; + + help = nf_ct_ext_find(ct, NF_CT_EXT_HELPER); + if (help && rcu_access_pointer(help->helper) != p->helper) + goto drop_ct; + } /* Force conntrack entry direction. */ if (force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) { @@ -832,6 +840,13 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, static void tcf_ct_params_free(struct tcf_ct_params *params) { + if (params->helper) { +#if IS_ENABLED(CONFIG_NF_NAT) + if (params->ct_action & TCA_CT_ACT_NAT) + nf_nat_helper_put(params->helper); +#endif + nf_conntrack_helper_put(params->helper); + } if (params->ct_ft) tcf_ct_flow_table_put(params->ct_ft); if (params->tmpl) @@ -1033,6 +1048,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, struct nf_hook_state state; int nh_ofs, err, retval; struct tcf_ct_params *p; + bool add_helper = false; bool skip_add = false; bool defrag = false; struct nf_conn *ct; @@ -1086,7 +1102,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, * actually run the packet through conntrack twice unless it's for a * different zone. */ - cached = tcf_ct_skb_nfct_cached(net, skb, p->zone, force); + cached = tcf_ct_skb_nfct_cached(net, skb, p, force); if (!cached) { if (tcf_ct_flow_table_lookup(p, skb, family)) { skip_add = true; @@ -1119,6 +1135,22 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, if (err != NF_ACCEPT) goto drop; + if (!nf_ct_is_confirmed(ct) && commit && p->helper && !nfct_help(ct)) { + err = __nf_ct_try_assign_helper(ct, p->tmpl, GFP_ATOMIC); + if (err) + goto drop; + add_helper = true; + if (p->ct_action & TCA_CT_ACT_NAT && !nfct_seqadj(ct)) { + if (!nfct_seqadj_ext_add(ct)) + goto drop; + } + } + + if (nf_ct_is_confirmed(ct) ? ((!cached && !skip_add) || add_helper) : comm
[ovs-dev] [PATCH net-next 3/4] net: sched: call tcf_ct_params_free to free params in tcf_ct_init
This patch is to make the err path simple by calling tcf_ct_params_free(), so that it won't cause problems when more members are added into param and need freeing on the err path. Signed-off-by: Xin Long --- net/sched/act_ct.c | 35 ++- 1 file changed, 18 insertions(+), 17 deletions(-) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index b38d91d6b249..193a460a9d7f 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -345,11 +345,9 @@ static void tcf_ct_flow_table_cleanup_work(struct work_struct *work) module_put(THIS_MODULE); } -static void tcf_ct_flow_table_put(struct tcf_ct_params *params) +static void tcf_ct_flow_table_put(struct tcf_ct_flow_table *ct_ft) { - struct tcf_ct_flow_table *ct_ft = params->ct_ft; - - if (refcount_dec_and_test(>ct_ft->ref)) { + if (refcount_dec_and_test(_ft->ref)) { rhashtable_remove_fast(_ht, _ft->node, zones_params); INIT_RCU_WORK(_ft->rwork, tcf_ct_flow_table_cleanup_work); queue_rcu_work(act_ct_wq, _ft->rwork); @@ -832,18 +830,23 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, return err; } -static void tcf_ct_params_free(struct rcu_head *head) +static void tcf_ct_params_free(struct tcf_ct_params *params) { - struct tcf_ct_params *params = container_of(head, - struct tcf_ct_params, rcu); - - tcf_ct_flow_table_put(params); - + if (params->ct_ft) + tcf_ct_flow_table_put(params->ct_ft); if (params->tmpl) nf_ct_put(params->tmpl); kfree(params); } +static void tcf_ct_params_free_rcu(struct rcu_head *head) +{ + struct tcf_ct_params *params; + + params = container_of(head, struct tcf_ct_params, rcu); + tcf_ct_params_free(params); +} + #if IS_ENABLED(CONFIG_NF_NAT) /* Modelled after nf_nat_ipv[46]_fn(). * range is only used for new, uninitialized NAT state. @@ -1390,7 +1393,7 @@ static int tcf_ct_init(struct net *net, struct nlattr *nla, err = tcf_ct_flow_table_get(net, params); if (err) - goto cleanup_params; + goto cleanup; spin_lock_bh(>tcf_lock); goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); @@ -1401,17 +1404,15 @@ static int tcf_ct_init(struct net *net, struct nlattr *nla, if (goto_ch) tcf_chain_put_by_act(goto_ch); if (params) - call_rcu(>rcu, tcf_ct_params_free); + call_rcu(>rcu, tcf_ct_params_free_rcu); return res; -cleanup_params: - if (params->tmpl) - nf_ct_put(params->tmpl); cleanup: if (goto_ch) tcf_chain_put_by_act(goto_ch); - kfree(params); + if (params) + tcf_ct_params_free(params); tcf_idr_release(*a, bind); return err; } @@ -1423,7 +1424,7 @@ static void tcf_ct_cleanup(struct tc_action *a) params = rcu_dereference_protected(c->params, 1); if (params) - call_rcu(>rcu, tcf_ct_params_free); + call_rcu(>rcu, tcf_ct_params_free_rcu); } static int tcf_ct_dump_key_val(struct sk_buff *skb, -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next 2/4] net: move add ct helper function to nf_conntrack_helper for ovs and tc
Move ovs_ct_add_helper from openvswitch to nf_conntrack_helper and rename as nf_ct_add_helper, so that it can be used in TC act_ct in the next patch. Signed-off-by: Xin Long --- include/net/netfilter/nf_conntrack_helper.h | 2 + net/netfilter/nf_conntrack_helper.c | 31 +++ net/openvswitch/conntrack.c | 44 +++-- 3 files changed, 38 insertions(+), 39 deletions(-) diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h index 6c32e59fc16f..ad1adbfbeee2 100644 --- a/include/net/netfilter/nf_conntrack_helper.h +++ b/include/net/netfilter/nf_conntrack_helper.h @@ -116,6 +116,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, gfp_t flags); int nf_ct_helper(struct sk_buff *skb, u16 proto); +int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family, +u8 proto, bool nat, struct nf_conntrack_helper **hp); void nf_ct_helper_destroy(struct nf_conn *ct); diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c index 83615e479f87..ee22b8a059cd 100644 --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -311,6 +311,37 @@ int nf_ct_helper(struct sk_buff *skb, u16 proto) } EXPORT_SYMBOL_GPL(nf_ct_helper); +int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family, +u8 proto, bool nat, struct nf_conntrack_helper **hp) +{ + struct nf_conntrack_helper *helper; + struct nf_conn_help *help; + int err; + + helper = nf_conntrack_helper_try_module_get(name, family, proto); + if (!helper) + return -EINVAL; + + help = nf_ct_helper_ext_add(ct, GFP_KERNEL); + if (!help) { + nf_conntrack_helper_put(helper); + return -ENOMEM; + } +#if IS_ENABLED(CONFIG_NF_NAT) + if (nat) { + err = nf_nat_helper_try_module_get(name, family, proto); + if (err) { + nf_conntrack_helper_put(helper); + return err; + } + } +#endif + rcu_assign_pointer(help->helper, helper); + *hp = helper; + return 0; +} +EXPORT_SYMBOL_GPL(nf_ct_add_helper); + /* appropriate ct lock protecting must be taken by caller */ static int unhelp(struct nf_conn *ct, void *me) { diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 19b5c54615c8..d37011e678c2 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -1291,43 +1291,6 @@ int ovs_ct_clear(struct sk_buff *skb, struct sw_flow_key *key) return 0; } -static int ovs_ct_add_helper(struct ovs_conntrack_info *info, const char *name, -const struct sw_flow_key *key, bool log) -{ - struct nf_conntrack_helper *helper; - struct nf_conn_help *help; - int ret = 0; - - helper = nf_conntrack_helper_try_module_get(name, info->family, - key->ip.proto); - if (!helper) { - OVS_NLERR(log, "Unknown helper \"%s\"", name); - return -EINVAL; - } - - help = nf_ct_helper_ext_add(info->ct, GFP_KERNEL); - if (!help) { - nf_conntrack_helper_put(helper); - return -ENOMEM; - } - -#if IS_ENABLED(CONFIG_NF_NAT) - if (info->nat) { - ret = nf_nat_helper_try_module_get(name, info->family, - key->ip.proto); - if (ret) { - nf_conntrack_helper_put(helper); - OVS_NLERR(log, "Failed to load \"%s\" NAT helper, error: %d", - name, ret); - return ret; - } - } -#endif - rcu_assign_pointer(help->helper, helper); - info->helper = helper; - return ret; -} - #if IS_ENABLED(CONFIG_NF_NAT) static int parse_nat(const struct nlattr *attr, struct ovs_conntrack_info *info, bool log) @@ -1661,9 +1624,12 @@ int ovs_ct_copy_action(struct net *net, const struct nlattr *attr, } if (helper) { - err = ovs_ct_add_helper(_info, helper, key, log); - if (err) + err = nf_ct_add_helper(ct_info.ct, helper, ct_info.family, + key->ip.proto, ct_info.nat, _info.helper); + if (err) { + OVS_NLERR(log, "Failed to add %s helper %d", helper, err); goto err_free_ct; + } } err = ovs_nla_add_action(sfa, OVS_ACTION_ATTR_CT, _info, -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next 1/4] net: move the ct helper function to nf_conntrack_helper for ovs and tc
Move ovs_ct_helper from openvswitch to nf_conntrack_helper and rename as nf_ct_helper so that it can be used in TC act_ct in the next patch. Note that it also adds the checks for the family and proto, as in TC act_ct, the packets with correct family and proto are not guaranteed. Signed-off-by: Xin Long --- include/net/netfilter/nf_conntrack_helper.h | 2 + net/netfilter/nf_conntrack_helper.c | 71 + net/openvswitch/conntrack.c | 61 +- 3 files changed, 74 insertions(+), 60 deletions(-) diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h index 9939c366f720..6c32e59fc16f 100644 --- a/include/net/netfilter/nf_conntrack_helper.h +++ b/include/net/netfilter/nf_conntrack_helper.h @@ -115,6 +115,8 @@ struct nf_conn_help *nf_ct_helper_ext_add(struct nf_conn *ct, gfp_t gfp); int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, gfp_t flags); +int nf_ct_helper(struct sk_buff *skb, u16 proto); + void nf_ct_helper_destroy(struct nf_conn *ct); static inline struct nf_conn_help *nfct_help(const struct nf_conn *ct) diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c index ff737a76052e..83615e479f87 100644 --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -26,7 +26,9 @@ #include #include #include +#include #include +#include static DEFINE_MUTEX(nf_ct_helper_mutex); struct hlist_head *nf_ct_helper_hash __read_mostly; @@ -240,6 +242,75 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, } EXPORT_SYMBOL_GPL(__nf_ct_try_assign_helper); +/* 'skb' should already be pulled to nh_ofs. */ +int nf_ct_helper(struct sk_buff *skb, u16 proto) +{ + const struct nf_conntrack_helper *helper; + const struct nf_conn_help *help; + enum ip_conntrack_info ctinfo; + unsigned int protoff; + struct nf_conn *ct; + int err; + + ct = nf_ct_get(skb, ); + if (!ct || ctinfo == IP_CT_RELATED_REPLY) + return NF_ACCEPT; + + help = nfct_help(ct); + if (!help) + return NF_ACCEPT; + + helper = rcu_dereference(help->helper); + if (!helper) + return NF_ACCEPT; + + if (helper->tuple.src.l3num != NFPROTO_UNSPEC && + helper->tuple.src.l3num != proto) + return NF_ACCEPT; + + switch (proto) { + case NFPROTO_IPV4: + protoff = ip_hdrlen(skb); + proto = ip_hdr(skb)->protocol; + break; + case NFPROTO_IPV6: { + u8 nexthdr = ipv6_hdr(skb)->nexthdr; + __be16 frag_off; + int ofs; + + ofs = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), , + _off); + if (ofs < 0 || (frag_off & htons(~0x7)) != 0) { + pr_debug("proto header not found\n"); + return NF_ACCEPT; + } + protoff = ofs; + proto = nexthdr; + break; + } + default: + WARN_ONCE(1, "helper invoked on non-IP family!"); + return NF_DROP; + } + + if (helper->tuple.dst.protonum != proto) + return NF_ACCEPT; + + err = helper->help(skb, protoff, ct, ctinfo); + if (err != NF_ACCEPT) + return err; + + /* Adjust seqs after helper. This is needed due to some helpers (e.g., +* FTP with NAT) adusting the TCP payload size when mangling IP +* addresses and/or port numbers in the text-based control connection. +*/ + if (test_bit(IPS_SEQ_ADJUST_BIT, >status) && + !nf_ct_seq_adjust(skb, ct, ctinfo, protoff)) + return NF_DROP; + return NF_ACCEPT; +} +EXPORT_SYMBOL_GPL(nf_ct_helper); + /* appropriate ct lock protecting must be taken by caller */ static int unhelp(struct nf_conn *ct, void *me) { diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index c7b10234cf7c..19b5c54615c8 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -434,65 +434,6 @@ static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key, return 0; } -/* 'skb' should already be pulled to nh_ofs. */ -static int ovs_ct_helper(struct sk_buff *skb, u16 proto) -{ - const struct nf_conntrack_helper *helper; - const struct nf_conn_help *help; - enum ip_conntrack_info ctinfo; - unsigned int protoff; - struct nf_conn *ct; - int err; - - ct = nf_ct_get(skb, ); - if (!ct || ctinfo == IP_CT_RELATED_REPLY) - return NF_ACCEPT; - - help = nfct_help(ct); - if (!help) - return NF_ACCEPT; - - helper = rcu_dereference(h
[ovs-dev] [PATCH net-next 0/4] net: add helper support in tc act_ct for ovs offloading
Ilya reported an issue that FTP traffic would be broken when the OVS flow with ct(commit,alg=ftp) installed in the OVS kernel module, and it was caused by that TC didn't support the ftp helper offloaded from OVS. This patchset is to add the helper support in act_ct for OVS offloading in kernel net/sched. The 1st and 2nd patches move some common code into nf_conntrack_helper from openvswitch so that they could be used by net/sched in the 4th patch (Note there are still some other common code used in both OVS and TC, and I will extract it in other patches). The 3rd patch extracts another function in net/sched to make the 4th patch easier to write. The 4th patch adds this feature in net/sched. The user space part will be added in another patch, and with it these OVS flows (FTP over SNAT) can be used to test this feature: table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk \ actions=ct(table=1, nat), normal table=0, in_port=veth2,tcp,ct_state=-trk actions=ct(table=1, nat) table=0, in_port=veth1,tcp,ct_state=-trk actions=ct(table=0, nat) table=0, in_port=veth1,tcp,ct_state=+trk+rel actions=ct(commit, nat),normal table=0, in_port=veth1,tcp,ct_state=+trk+est actions=veth2" table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new \ actions=ct(commit, nat(src=7.7.16.1), alg=ftp),normal" table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+est actions=veth2" table=1, in_port=veth2,tcp,ct_state=+trk+est actions=veth1" v1->v2: - go to drop instead of return -EINVAL when fails to add SEQADJ ext in tcf_ct_act() as Paolo noticed. - add the 2nd patch to extract nf_ct_add_helper from openvswitch for tc act_ct use. - add ct exts only when the ct is not confirmed as Pablo noticed. Xin Long (4): net: move the ct helper function to nf_conntrack_helper for ovs and tc net: move add ct helper function to nf_conntrack_helper for ovs and tc net: sched: call tcf_ct_params_free to free params in tcf_ct_init net: sched: add helper support in act_ct include/net/netfilter/nf_conntrack_helper.h | 4 + include/net/tc_act/tc_ct.h | 1 + include/uapi/linux/tc_act/tc_ct.h | 3 + net/netfilter/nf_conntrack_helper.c | 102 + net/openvswitch/conntrack.c | 105 + net/sched/act_ct.c | 118 6 files changed, 212 insertions(+), 121 deletions(-) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net] openvswitch: add nf_ct_is_confirmed check before assigning the helper
On Tue, Oct 11, 2022 at 10:06 AM Aaron Conole wrote: > > Aaron Conole writes: > > > Xin Long writes: > > > >> A WARN_ON call trace would be triggered when 'ct(commit, alg=helper)' > >> applies on a confirmed connection: > >> > >> WARNING: CPU: 0 PID: 1251 at net/netfilter/nf_conntrack_extend.c:98 > >> RIP: 0010:nf_ct_ext_add+0x12d/0x150 [nf_conntrack] > >> Call Trace: > >> > >>nf_ct_helper_ext_add+0x12/0x60 [nf_conntrack] > >>__nf_ct_try_assign_helper+0xc4/0x160 [nf_conntrack] > >>__ovs_ct_lookup+0x72e/0x780 [openvswitch] > >>ovs_ct_execute+0x1d8/0x920 [openvswitch] > >>do_execute_actions+0x4e6/0xb60 [openvswitch] > >>ovs_execute_actions+0x60/0x140 [openvswitch] > >>ovs_packet_cmd_execute+0x2ad/0x310 [openvswitch] > >>genl_family_rcv_msg_doit.isra.15+0x113/0x150 > >>genl_rcv_msg+0xef/0x1f0 > >> > >> which can be reproduced with these OVS flows: > >> > >> table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk > >> actions=ct(commit, table=1) > >> table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new > >> actions=ct(commit, alg=ftp),normal > >> > >> The issue was introduced by commit 248d45f1e193 ("openvswitch: Allow > >> attaching helper in later commit") where it somehow removed the check > >> of nf_ct_is_confirmed before asigning the helper. This patch is to fix > >> it by bringing it back. > >> > >> Fixes: 248d45f1e193 ("openvswitch: Allow attaching helper in later commit") > >> Reported-by: Pablo Neira Ayuso > >> Signed-off-by: Xin Long > >> --- > > > > Hi Xin, > > > > Looking at the original commit, I think this will read like a revert. I > > am doing some testing now, but I think we need input from Yi-Hung to > > find out what the use case is that the original fixed. > > I'm also not able to reproduce the WARN_ON. My env: > > kernel: 4c86114194e6 ("Merge tag 'iomap-6.1-merge-1' of > git://git.kernel.org/pub/scm/fs/xfs/xfs-linux") > > Using current upstream OVS > I used your flows (adjusting the port names): > > cookie=0x0, duration=246.240s, table=0, n_packets=17, n_bytes=1130, > ct_state=-trk,tcp,in_port=v0,tp_dst=2121 actions=ct(commit,table=1) > cookie=0x0, duration=246.240s, table=1, n_packets=1, n_bytes=74, > ct_state=+new+trk,tcp,in_port=v0,tp_dst=2121 actions=ct(commit,alg=ftp),NORMAL > > and ran: > > $ ip netns exec server python3 -m pyftpdlib -i 172.31.110.20 & > $ ip netns exec client curl ftp://172.31.110.20:2121 > > but no WARN_ON message got triggered. Are there additional flows you > used that I am missing, or perhaps this should be on a different kernel > commit? > > > -Aaron > Hi, Aaron, thanks for looking at this. Here is a script to reproduce it, you can try, and let me know if it's still not reproduced (it's also upstream ovs, for the first 2 lines, you may adjust on your env.) export PATH=$PATH:/usr/local/share/openvswitch/scripts ovs-ctl restart #systemctl restart openvswitch ip net add ns0 ip net add ns1 ip link add veth0 type veth peer name veth1 ip link add veth3 type veth peer name veth2 ip link set veth0 netns ns0 ip link set veth3 netns ns1 ip net exec ns0 ip addr add 7.7.7.1/24 dev veth0 mac1=`ip net exec ns1 cat /sys/class/net/veth3/address` mac2=`ip net exec ns0 cat /sys/class/net/veth0/address` ip net exec ns0 ip neigh add 7.7.16.2 dev veth0 lladdr $mac1 ip net exec ns1 ip addr add 7.7.16.2/24 dev veth3 ip net exec ns1 ip neigh add 7.7.16.1 dev veth3 lladdr $mac2 ip net exec ns0 ip link set veth0 up ip net exec ns1 ip link set veth3 up ip net exec ns0 ip route add 7.7.16.2 dev veth0 sleep 0.5 ovs-vsctl set Open_vSwitch . other_config:hw-offload=false ovs-vsctl set Open_vSwitch . other_config:tc-policy=skip_hw ovs-vsctl add-br br-ovs ovs-vsctl add-port br-ovs veth1 ovs-vsctl add-port br-ovs veth2 ip link set br-ovs up ip link set veth1 up ip link set veth2 up ovs-ofctl del-flows br-ovs ovs-ofctl add-flow br-ovs arp,actions=normal ovs-ofctl add-flow br-ovs icmp,actions=normal ovs-ofctl add-flow br-ovs "table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk actions=ct(commit, table=1)" #ovs-ofctl add-flow br-ovs "table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk actions=ct(table=1, nat)" ovs-ofctl add-flow br-ovs "table=0, in_port=veth2,tcp,ct_state=-trk actions=ct(table=1, nat)" ovs-ofctl add-flow br-ovs "table=0, in_port=veth1,tcp,ct_state=-trk actions=ct(table=0, nat)" ovs-ofctl add-flow br-ovs "table=0, in_port=veth1,tcp,ct_state=+trk+rel actions=ct(commit, nat),normal" ovs-o
[ovs-dev] [PATCH net] openvswitch: add nf_ct_is_confirmed check before assigning the helper
A WARN_ON call trace would be triggered when 'ct(commit, alg=helper)' applies on a confirmed connection: WARNING: CPU: 0 PID: 1251 at net/netfilter/nf_conntrack_extend.c:98 RIP: 0010:nf_ct_ext_add+0x12d/0x150 [nf_conntrack] Call Trace: nf_ct_helper_ext_add+0x12/0x60 [nf_conntrack] __nf_ct_try_assign_helper+0xc4/0x160 [nf_conntrack] __ovs_ct_lookup+0x72e/0x780 [openvswitch] ovs_ct_execute+0x1d8/0x920 [openvswitch] do_execute_actions+0x4e6/0xb60 [openvswitch] ovs_execute_actions+0x60/0x140 [openvswitch] ovs_packet_cmd_execute+0x2ad/0x310 [openvswitch] genl_family_rcv_msg_doit.isra.15+0x113/0x150 genl_rcv_msg+0xef/0x1f0 which can be reproduced with these OVS flows: table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk actions=ct(commit, table=1) table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new actions=ct(commit, alg=ftp),normal The issue was introduced by commit 248d45f1e193 ("openvswitch: Allow attaching helper in later commit") where it somehow removed the check of nf_ct_is_confirmed before asigning the helper. This patch is to fix it by bringing it back. Fixes: 248d45f1e193 ("openvswitch: Allow attaching helper in later commit") Reported-by: Pablo Neira Ayuso Signed-off-by: Xin Long --- net/openvswitch/conntrack.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 4e70df91d0f2..6862475f0f88 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -1015,7 +1015,8 @@ static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key, * connections which we will commit, we may need to attach * the helper here. */ - if (info->commit && info->helper && !nfct_help(ct)) { + if (!nf_ct_is_confirmed(ct) && info->commit && + info->helper && !nfct_help(ct)) { int err = __nf_ct_try_assign_helper(ct, info->ct, GFP_ATOMIC); if (err) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next 3/3] net: sched: add helper support in act_ct
On Thu, Oct 6, 2022 at 10:11 AM Pablo Neira Ayuso wrote: > > On Tue, Oct 04, 2022 at 09:19:56PM -0400, Xin Long wrote: > [...] > > @@ -1119,6 +1135,22 @@ static int tcf_ct_act(struct sk_buff *skb, const > > struct tc_action *a, > > if (err != NF_ACCEPT) > > goto drop; > > > > + if (commit && p->helper && !nfct_help(ct)) { > > + err = __nf_ct_try_assign_helper(ct, p->tmpl, GFP_ATOMIC); > > + if (err) > > + goto drop; > > + add_helper = true; > > + if (p->ct_action & TCA_CT_ACT_NAT && !nfct_seqadj(ct)) { > > + if (!nfct_seqadj_ext_add(ct)) > > You can only add ct extensions if ct is !nf_ct_is_confirmed(ct)), is > this guaranteed in this codepath? This is a good catch, the same issue exists on __nf_ct_try_assign_helper(), and also in __ovs_ct_lookup(). I could trigger the warning on OVS conntrack with the flow: table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk actions=ct(commit, table=1) table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new actions=ct(commit, alg=ftp),normal I will prepare a fix for ovs conntrack first. Thanks. > > > + return -EINVAL; > > + } > > + } > > + > > + if (nf_ct_is_confirmed(ct) ? ((!cached && !skip_add) || add_helper) : > > commit) { > > + if (nf_ct_helper(skb, family) != NF_ACCEPT) > > + goto drop; > > + } > > + > > if (commit) { > > tcf_ct_act_set_mark(ct, p->mark, p->mark_mask); > > tcf_ct_act_set_labels(ct, p->labels, p->labels_mask); ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next 3/3] net: sched: add helper support in act_ct
On Thu, Oct 6, 2022 at 5:57 AM Paolo Abeni wrote: > > On Tue, 2022-10-04 at 21:19 -0400, Xin Long wrote: > > This patch is to add helper support in act_ct for OVS actions=ct(alg=xxx) > > offloading, which is corresponding to Commit cae3a2627520 ("openvswitch: > > Allow attaching helpers to ct action") in OVS kernel part. > > > > The difference is when adding TC actions family and proto cannot be got > > from the filter/match, other than helper name in tb[TCA_CT_HELPER_NAME], > > we also need to send the family in tb[TCA_CT_HELPER_FAMILY] and the > > proto in tb[TCA_CT_HELPER_PROTO] to kernel. > > > > Signed-off-by: Xin Long > > --- > > include/net/tc_act/tc_ct.h| 1 + > > include/uapi/linux/tc_act/tc_ct.h | 3 + > > net/sched/act_ct.c| 113 -- > > 3 files changed, 112 insertions(+), 5 deletions(-) > > > > diff --git a/include/net/tc_act/tc_ct.h b/include/net/tc_act/tc_ct.h > > index 8250d6f0a462..b24ea2d9400b 100644 > > --- a/include/net/tc_act/tc_ct.h > > +++ b/include/net/tc_act/tc_ct.h > > @@ -10,6 +10,7 @@ > > #include > > > > struct tcf_ct_params { > > + struct nf_conntrack_helper *helper; > > struct nf_conn *tmpl; > > u16 zone; > > > > diff --git a/include/uapi/linux/tc_act/tc_ct.h > > b/include/uapi/linux/tc_act/tc_ct.h > > index 5fb1d7ac1027..6c5200f0ed38 100644 > > --- a/include/uapi/linux/tc_act/tc_ct.h > > +++ b/include/uapi/linux/tc_act/tc_ct.h > > @@ -22,6 +22,9 @@ enum { > > TCA_CT_NAT_PORT_MIN,/* be16 */ > > TCA_CT_NAT_PORT_MAX,/* be16 */ > > TCA_CT_PAD, > > + TCA_CT_HELPER_NAME, /* string */ > > + TCA_CT_HELPER_FAMILY, /* u8 */ > > + TCA_CT_HELPER_PROTO,/* u8 */ > > __TCA_CT_MAX > > }; > > > > diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c > > index 193a460a9d7f..f237c27079db 100644 > > --- a/net/sched/act_ct.c > > +++ b/net/sched/act_ct.c > > @@ -33,6 +33,7 @@ > > #include > > #include > > #include > > +#include > > #include > > > > static struct workqueue_struct *act_ct_wq; > > @@ -655,7 +656,7 @@ struct tc_ct_action_net { > > > > /* Determine whether skb->_nfct is equal to the result of conntrack > > lookup. */ > > static bool tcf_ct_skb_nfct_cached(struct net *net, struct sk_buff *skb, > > -u16 zone_id, bool force) > > +struct tcf_ct_params *p, bool force) > > { > > enum ip_conntrack_info ctinfo; > > struct nf_conn *ct; > > @@ -665,8 +666,15 @@ static bool tcf_ct_skb_nfct_cached(struct net *net, > > struct sk_buff *skb, > > return false; > > if (!net_eq(net, read_pnet(>ct_net))) > > goto drop_ct; > > - if (nf_ct_zone(ct)->id != zone_id) > > + if (nf_ct_zone(ct)->id != p->zone) > > goto drop_ct; > > + if (p->helper) { > > + struct nf_conn_help *help; > > + > > + help = nf_ct_ext_find(ct, NF_CT_EXT_HELPER); > > + if (help && rcu_access_pointer(help->helper) != p->helper) > > + goto drop_ct; > > + } > > > > /* Force conntrack entry direction. */ > > if (force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) { > > @@ -832,6 +840,13 @@ static int tcf_ct_handle_fragments(struct net *net, > > struct sk_buff *skb, > > > > static void tcf_ct_params_free(struct tcf_ct_params *params) > > { > > + if (params->helper) { > > +#if IS_ENABLED(CONFIG_NF_NAT) > > + if (params->ct_action & TCA_CT_ACT_NAT) > > + nf_nat_helper_put(params->helper); > > +#endif > > + nf_conntrack_helper_put(params->helper); > > + } > > There is exactly the same code chunk in __ovs_ct_free_action(), I guess > you can extract a common helper here, too. > > > if (params->ct_ft) > > tcf_ct_flow_table_put(params->ct_ft); > > if (params->tmpl) > > @@ -1033,6 +1048,7 @@ static int tcf_ct_act(struct sk_buff *skb, const > > struct tc_action *a, > > struct nf_hook_state state; > > int nh_ofs, err, retval; > > struct tcf_ct_params *p; > > + bool add_helper = false; > > bool skip_add = false; > > bool defrag = false; > > struct nf_conn *c
[ovs-dev] [PATCH net-next 3/3] net: sched: add helper support in act_ct
This patch is to add helper support in act_ct for OVS actions=ct(alg=xxx) offloading, which is corresponding to Commit cae3a2627520 ("openvswitch: Allow attaching helpers to ct action") in OVS kernel part. The difference is when adding TC actions family and proto cannot be got from the filter/match, other than helper name in tb[TCA_CT_HELPER_NAME], we also need to send the family in tb[TCA_CT_HELPER_FAMILY] and the proto in tb[TCA_CT_HELPER_PROTO] to kernel. Signed-off-by: Xin Long --- include/net/tc_act/tc_ct.h| 1 + include/uapi/linux/tc_act/tc_ct.h | 3 + net/sched/act_ct.c| 113 -- 3 files changed, 112 insertions(+), 5 deletions(-) diff --git a/include/net/tc_act/tc_ct.h b/include/net/tc_act/tc_ct.h index 8250d6f0a462..b24ea2d9400b 100644 --- a/include/net/tc_act/tc_ct.h +++ b/include/net/tc_act/tc_ct.h @@ -10,6 +10,7 @@ #include struct tcf_ct_params { + struct nf_conntrack_helper *helper; struct nf_conn *tmpl; u16 zone; diff --git a/include/uapi/linux/tc_act/tc_ct.h b/include/uapi/linux/tc_act/tc_ct.h index 5fb1d7ac1027..6c5200f0ed38 100644 --- a/include/uapi/linux/tc_act/tc_ct.h +++ b/include/uapi/linux/tc_act/tc_ct.h @@ -22,6 +22,9 @@ enum { TCA_CT_NAT_PORT_MIN,/* be16 */ TCA_CT_NAT_PORT_MAX,/* be16 */ TCA_CT_PAD, + TCA_CT_HELPER_NAME, /* string */ + TCA_CT_HELPER_FAMILY, /* u8 */ + TCA_CT_HELPER_PROTO,/* u8 */ __TCA_CT_MAX }; diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 193a460a9d7f..f237c27079db 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -33,6 +33,7 @@ #include #include #include +#include #include static struct workqueue_struct *act_ct_wq; @@ -655,7 +656,7 @@ struct tc_ct_action_net { /* Determine whether skb->_nfct is equal to the result of conntrack lookup. */ static bool tcf_ct_skb_nfct_cached(struct net *net, struct sk_buff *skb, - u16 zone_id, bool force) + struct tcf_ct_params *p, bool force) { enum ip_conntrack_info ctinfo; struct nf_conn *ct; @@ -665,8 +666,15 @@ static bool tcf_ct_skb_nfct_cached(struct net *net, struct sk_buff *skb, return false; if (!net_eq(net, read_pnet(>ct_net))) goto drop_ct; - if (nf_ct_zone(ct)->id != zone_id) + if (nf_ct_zone(ct)->id != p->zone) goto drop_ct; + if (p->helper) { + struct nf_conn_help *help; + + help = nf_ct_ext_find(ct, NF_CT_EXT_HELPER); + if (help && rcu_access_pointer(help->helper) != p->helper) + goto drop_ct; + } /* Force conntrack entry direction. */ if (force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) { @@ -832,6 +840,13 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, static void tcf_ct_params_free(struct tcf_ct_params *params) { + if (params->helper) { +#if IS_ENABLED(CONFIG_NF_NAT) + if (params->ct_action & TCA_CT_ACT_NAT) + nf_nat_helper_put(params->helper); +#endif + nf_conntrack_helper_put(params->helper); + } if (params->ct_ft) tcf_ct_flow_table_put(params->ct_ft); if (params->tmpl) @@ -1033,6 +1048,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, struct nf_hook_state state; int nh_ofs, err, retval; struct tcf_ct_params *p; + bool add_helper = false; bool skip_add = false; bool defrag = false; struct nf_conn *ct; @@ -1086,7 +1102,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, * actually run the packet through conntrack twice unless it's for a * different zone. */ - cached = tcf_ct_skb_nfct_cached(net, skb, p->zone, force); + cached = tcf_ct_skb_nfct_cached(net, skb, p, force); if (!cached) { if (tcf_ct_flow_table_lookup(p, skb, family)) { skip_add = true; @@ -1119,6 +1135,22 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, if (err != NF_ACCEPT) goto drop; + if (commit && p->helper && !nfct_help(ct)) { + err = __nf_ct_try_assign_helper(ct, p->tmpl, GFP_ATOMIC); + if (err) + goto drop; + add_helper = true; + if (p->ct_action & TCA_CT_ACT_NAT && !nfct_seqadj(ct)) { + if (!nfct_seqadj_ext_add(ct)) + return -EINVAL; + } + } + + if (nf_ct_is_confirmed(ct) ? ((!cached && !skip_add) || add_helper) : commit) { +
[ovs-dev] [PATCH net-next 2/3] net: sched: call tcf_ct_params_free to free params in tcf_ct_init
This patch is to make the err path simple by calling tcf_ct_params_free(), so that it won't cause problems when more members are added into param and need freeing on the err path. Signed-off-by: Xin Long --- net/sched/act_ct.c | 35 ++- 1 file changed, 18 insertions(+), 17 deletions(-) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index b38d91d6b249..193a460a9d7f 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -345,11 +345,9 @@ static void tcf_ct_flow_table_cleanup_work(struct work_struct *work) module_put(THIS_MODULE); } -static void tcf_ct_flow_table_put(struct tcf_ct_params *params) +static void tcf_ct_flow_table_put(struct tcf_ct_flow_table *ct_ft) { - struct tcf_ct_flow_table *ct_ft = params->ct_ft; - - if (refcount_dec_and_test(>ct_ft->ref)) { + if (refcount_dec_and_test(_ft->ref)) { rhashtable_remove_fast(_ht, _ft->node, zones_params); INIT_RCU_WORK(_ft->rwork, tcf_ct_flow_table_cleanup_work); queue_rcu_work(act_ct_wq, _ft->rwork); @@ -832,18 +830,23 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, return err; } -static void tcf_ct_params_free(struct rcu_head *head) +static void tcf_ct_params_free(struct tcf_ct_params *params) { - struct tcf_ct_params *params = container_of(head, - struct tcf_ct_params, rcu); - - tcf_ct_flow_table_put(params); - + if (params->ct_ft) + tcf_ct_flow_table_put(params->ct_ft); if (params->tmpl) nf_ct_put(params->tmpl); kfree(params); } +static void tcf_ct_params_free_rcu(struct rcu_head *head) +{ + struct tcf_ct_params *params; + + params = container_of(head, struct tcf_ct_params, rcu); + tcf_ct_params_free(params); +} + #if IS_ENABLED(CONFIG_NF_NAT) /* Modelled after nf_nat_ipv[46]_fn(). * range is only used for new, uninitialized NAT state. @@ -1390,7 +1393,7 @@ static int tcf_ct_init(struct net *net, struct nlattr *nla, err = tcf_ct_flow_table_get(net, params); if (err) - goto cleanup_params; + goto cleanup; spin_lock_bh(>tcf_lock); goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); @@ -1401,17 +1404,15 @@ static int tcf_ct_init(struct net *net, struct nlattr *nla, if (goto_ch) tcf_chain_put_by_act(goto_ch); if (params) - call_rcu(>rcu, tcf_ct_params_free); + call_rcu(>rcu, tcf_ct_params_free_rcu); return res; -cleanup_params: - if (params->tmpl) - nf_ct_put(params->tmpl); cleanup: if (goto_ch) tcf_chain_put_by_act(goto_ch); - kfree(params); + if (params) + tcf_ct_params_free(params); tcf_idr_release(*a, bind); return err; } @@ -1423,7 +1424,7 @@ static void tcf_ct_cleanup(struct tc_action *a) params = rcu_dereference_protected(c->params, 1); if (params) - call_rcu(>rcu, tcf_ct_params_free); + call_rcu(>rcu, tcf_ct_params_free_rcu); } static int tcf_ct_dump_key_val(struct sk_buff *skb, -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next 1/3] net: move the helper function to nf_conntrack_helper for ovs and tc
Move ovs_ct_helper from openvswitch to nf_conntrack_helper and rename as nf_ct_helper so that it can be used in TC act_ct in the next patch. Note that it also adds the checks for the family and proto, as in TC act_ct, the packets with correct family and proto are not guaranteed. Signed-off-by: Xin Long --- include/net/netfilter/nf_conntrack_helper.h | 2 + net/netfilter/nf_conntrack_helper.c | 71 + net/openvswitch/conntrack.c | 61 +- 3 files changed, 74 insertions(+), 60 deletions(-) diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h index 9939c366f720..6c32e59fc16f 100644 --- a/include/net/netfilter/nf_conntrack_helper.h +++ b/include/net/netfilter/nf_conntrack_helper.h @@ -115,6 +115,8 @@ struct nf_conn_help *nf_ct_helper_ext_add(struct nf_conn *ct, gfp_t gfp); int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, gfp_t flags); +int nf_ct_helper(struct sk_buff *skb, u16 proto); + void nf_ct_helper_destroy(struct nf_conn *ct); static inline struct nf_conn_help *nfct_help(const struct nf_conn *ct) diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c index ff737a76052e..83615e479f87 100644 --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -26,7 +26,9 @@ #include #include #include +#include #include +#include static DEFINE_MUTEX(nf_ct_helper_mutex); struct hlist_head *nf_ct_helper_hash __read_mostly; @@ -240,6 +242,75 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, } EXPORT_SYMBOL_GPL(__nf_ct_try_assign_helper); +/* 'skb' should already be pulled to nh_ofs. */ +int nf_ct_helper(struct sk_buff *skb, u16 proto) +{ + const struct nf_conntrack_helper *helper; + const struct nf_conn_help *help; + enum ip_conntrack_info ctinfo; + unsigned int protoff; + struct nf_conn *ct; + int err; + + ct = nf_ct_get(skb, ); + if (!ct || ctinfo == IP_CT_RELATED_REPLY) + return NF_ACCEPT; + + help = nfct_help(ct); + if (!help) + return NF_ACCEPT; + + helper = rcu_dereference(help->helper); + if (!helper) + return NF_ACCEPT; + + if (helper->tuple.src.l3num != NFPROTO_UNSPEC && + helper->tuple.src.l3num != proto) + return NF_ACCEPT; + + switch (proto) { + case NFPROTO_IPV4: + protoff = ip_hdrlen(skb); + proto = ip_hdr(skb)->protocol; + break; + case NFPROTO_IPV6: { + u8 nexthdr = ipv6_hdr(skb)->nexthdr; + __be16 frag_off; + int ofs; + + ofs = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), , + _off); + if (ofs < 0 || (frag_off & htons(~0x7)) != 0) { + pr_debug("proto header not found\n"); + return NF_ACCEPT; + } + protoff = ofs; + proto = nexthdr; + break; + } + default: + WARN_ONCE(1, "helper invoked on non-IP family!"); + return NF_DROP; + } + + if (helper->tuple.dst.protonum != proto) + return NF_ACCEPT; + + err = helper->help(skb, protoff, ct, ctinfo); + if (err != NF_ACCEPT) + return err; + + /* Adjust seqs after helper. This is needed due to some helpers (e.g., +* FTP with NAT) adusting the TCP payload size when mangling IP +* addresses and/or port numbers in the text-based control connection. +*/ + if (test_bit(IPS_SEQ_ADJUST_BIT, >status) && + !nf_ct_seq_adjust(skb, ct, ctinfo, protoff)) + return NF_DROP; + return NF_ACCEPT; +} +EXPORT_SYMBOL_GPL(nf_ct_helper); + /* appropriate ct lock protecting must be taken by caller */ static int unhelp(struct nf_conn *ct, void *me) { diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index cb255d8ed99a..75e709303815 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -434,65 +434,6 @@ static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key, return 0; } -/* 'skb' should already be pulled to nh_ofs. */ -static int ovs_ct_helper(struct sk_buff *skb, u16 proto) -{ - const struct nf_conntrack_helper *helper; - const struct nf_conn_help *help; - enum ip_conntrack_info ctinfo; - unsigned int protoff; - struct nf_conn *ct; - int err; - - ct = nf_ct_get(skb, ); - if (!ct || ctinfo == IP_CT_RELATED_REPLY) - return NF_ACCEPT; - - help = nfct_help(ct); - if (!help) - return NF_ACCEPT; - - helper = rcu_dereference(h
[ovs-dev] [PATCH net-next 0/3] net: add helper support in tc act_ct for ovs offloading
Ilya reported an issue that FTP traffic would be broken when the OVS flow with ct(commit,alg=ftp) installed in the OVS kernel module, and it was caused by that TC didn't support the ftp helper offloaded from OVS. This patchset is to add the helper support in act_ct for ovs offloading in kernel net/sched. The 1st patch moves some code from openvswitch into nf_conntrack_helper so that it can be used by net/sched in the 3rd patch (Note there are still some other common code used in both OVS and TC and I will extract it in other patches). The 2nd patch extracts another function in net/ sched to make the 3rd patch easier to write. The user space part will be added in another patch, and with it these OVS flows (FTP over SNAT) can be used to test this feature: table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk \ actions=ct(table=1, nat), normal table=0, in_port=veth2,tcp,ct_state=-trk actions=ct(table=1, nat) table=0, in_port=veth1,tcp,ct_state=-trk actions=ct(table=0, nat) table=0, in_port=veth1,tcp,ct_state=+trk+rel actions=ct(commit, nat),normal table=0, in_port=veth1,tcp,ct_state=+trk+est actions=veth2" table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new \ actions=ct(commit, nat(src=7.7.16.1), alg=ftp),normal" table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+est actions=veth2" table=1, in_port=veth2,tcp,ct_state=+trk+est actions=veth1" Xin Long (3): net: move the helper function to nf_conntrack_helper for ovs and tc net: sched: call tcf_ct_params_free to free params in tcf_ct_init net: sched: add helper support in act_ct include/net/netfilter/nf_conntrack_helper.h | 2 + include/net/tc_act/tc_ct.h | 1 + include/uapi/linux/tc_act/tc_ct.h | 3 + net/netfilter/nf_conntrack_helper.c | 71 ++ net/openvswitch/conntrack.c | 61 +--- net/sched/act_ct.c | 148 +--- 6 files changed, 204 insertions(+), 82 deletions(-) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev