On Fri, Apr 12, 2019 at 10:23 AM Yifeng Sun <[email protected]> wrote:
>
> This patch introduces changes needed by OVS to support latest
> Linux kernels (4.19.x and 4.20.x). Recent kernels changed many
> APIs that are being used by OVS. One major change is that
> struct nf_conntrack_l3proto became invisible outside of kernel, so
> get_l4proto function is added in file compact/nf_conntrack_core.c to
> accommodate this issue.
>
> In addition, if kernel is not compiled with CONFIG_NF_NAT_IPV4
> or CONFIG_NF_NAT_IPV6, flow action 'ct(nat)' can cause kernel
> to crash. This patch handles this condition.
>
> This patch doesn't introduce new failed tests when running
> 'make check-kmod' for kernels listed below:
>     3.10.0-957.5.1.el7.x86_64
>     4.4.0-142-generic
>     4.17.14
>     4.18.0-16-generic
>     4.19.34
>     4.20.17
>
> Travis passed at
> https://travis-ci.org/yifsun/ovs-travis/builds/519011670
>
> Signed-off-by: Yifeng Sun <[email protected]>
> v1->v2: Fixed the CONFIG_NF_NAT_IPV4 bug by using Greg's config
>         file. Thanks Greg!
> ---
Hi Yifeng,

Thanks for the patch.

I think this patch mixes a couple upstream patches backport, 4.19,
4.20 compilation issues, and the nf_nat issue together so that it may
be hard to keep track of the kernel backport.  IMHO, it would be
easier to break this patch down to a couple of them, so that it would
be easier to maintain and review. My detailed comments are as below.

> diff --git a/datapath/conntrack.c b/datapath/conntrack.c
> index 52208bad3029..ce36a8ddea50 100644
> --- a/datapath/conntrack.c
> +++ b/datapath/conntrack.c
> @@ -38,6 +38,10 @@
>  #include <net/netfilter/nf_nat_l3proto.h>
>  #endif
>
> +#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) && defined(HAVE_IPV6_FRAG_H)
> +#include <net/ipv6_frag.h>
> +#endif
> +
I think this is related to an upstream change 70b095c843266 ("ipv6:
remove dependency of nf_defrag_ipv6 on ipv6 module"). Should we split
this out in anther patch? We may be able to hide the following in the
compat layer.
+#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) && defined(HAVE_IPV6_FRAG_H)
+#endif


>  #include "datapath.h"
>  #include "conntrack.h"
>  #include "flow.h"
> @@ -645,32 +649,62 @@ static struct nf_conn *
>  ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
>                      u8 l3num, struct sk_buff *skb, bool natted)
>  {
> -       const struct nf_conntrack_l3proto *l3proto;
>         const struct nf_conntrack_l4proto *l4proto;
>         struct nf_conntrack_tuple tuple;
>         struct nf_conntrack_tuple_hash *h;
>         struct nf_conn *ct;
> -       unsigned int dataoff;
>         u8 protonum;
>
> +#ifdef HAVE_NF_CT_INVERT_TUPLE_TAKES_L3PROTO
> +       const struct nf_conntrack_l3proto *l3proto;
> +       unsigned int dataoff;
> +
>         l3proto = __nf_ct_l3proto_find(l3num);
>         if (l3proto->get_l4proto(skb, skb_network_offset(skb), &dataoff,
>                                  &protonum) <= 0) {
>                 pr_debug("ovs_ct_find_existing: Can't get protonum\n");
>                 return NULL;
>         }
> +#else
> +       int protooff;
> +
> +       protooff = get_l4proto(skb, skb_network_offset(skb),
> +                              l3num, &protonum);
> +       if (protooff <= 0) {
> +               pr_warn("ovs_ct_find_existing: Can't get protonum\n");
> +               return NULL;
> +       }
> +#endif
> +
> +#ifdef HAVE_NF_CT_L4PROTO_FIND_TAKES_L3PROTO
>         l4proto = __nf_ct_l4proto_find(l3num, protonum);
> -       if (!nf_ct_get_tuple(skb, skb_network_offset(skb), dataoff, l3num,
> -                            protonum, net, &tuple, l3proto, l4proto)) {
> +#else
> +       l4proto = __nf_ct_l4proto_find(protonum);
> +#endif
> +
> +#ifdef HAVE_NF_CT_GET_TUPLE
> +       if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
> +                                      l3num, net, &tuple)) {
> +               pr_debug("ovs_ct_find_existing: Can't get tuple\n");
> +               return NULL;
> +       }
> +#else
> +       if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
> +                                      l3num, net, &tuple)) {
>                 pr_debug("ovs_ct_find_existing: Can't get tuple\n");
>                 return NULL;
>         }
> +#endif
>
>         /* Must invert the tuple if skb has been transformed by NAT. */
>         if (natted) {
>                 struct nf_conntrack_tuple inverse;
>
> +#ifdef HAVE_NF_CT_INVERT_TUPLE_TAKES_L3PROTO
>                 if (!nf_ct_invert_tuple(&inverse, &tuple, l3proto, l4proto)) {
> +#else
> +               if (!nf_ct_invert_tuple(&inverse, &tuple, l4proto)) {
> +#endif
>                         pr_debug("ovs_ct_find_existing: Inversion failed!\n");
>                         return NULL;
>                 }
The changes in ovs_ct_find_existing() is due to upstream commit
60e3be94e6a ("openvswitch: use nf_ct_get_tuplepr, invert_tuplepr").
Can we split it out as a separate patch?

>From the upstream commit 60e3be94e6a, instead of using
nf_ct_get_tuple() it invokes nf_ct_get_tuplepr(), and it looks like
nf_ct_get_tuplepr() was available in quite old kernel (at least
2.6.26), and it gets updated to add network namespace support in
a31f1adc09489 ("netfilter: nf_conntrack: Add a struct net parameter to
l4_pkt_to_tuple"). Can we see if we can replace nf_ct_get_tuple() to
nf_ct_get_tuplepr() to avoid the above #ifde #else #endif logic.


> @@ -989,6 +1023,9 @@ static int __ovs_ct_lookup(struct net *net, struct 
> sw_flow_key *key,
>         if (!cached) {
>                 struct nf_conn *tmpl = info->ct;
>                 int err;
> +#ifndef HAVE_NF_CONNTRACK_IN_TAKES_NET
> +               struct nf_hook_state state = {};
> +#endif
>
>                 /* Associate skb with specified zone. */
>                 if (tmpl) {
> @@ -998,8 +1035,15 @@ static int __ovs_ct_lookup(struct net *net, struct 
> sw_flow_key *key,
>                         nf_ct_set(skb, tmpl, IP_CT_NEW);
>                 }
>
> +#ifdef HAVE_NF_CONNTRACK_IN_TAKES_NET
>                 err = nf_conntrack_in(net, info->family,
>                                       NF_INET_PRE_ROUTING, skb);
> +#else
> +               state.hook = NF_INET_PRE_ROUTING,
> +               state.pf = info->family,
> +               state.net = net,
> +               err = nf_conntrack_in(skb, &state);
> +#endif
>                 if (err != NF_ACCEPT)
>                         return -ENOENT;
>

The changes in __ovs_ct_lookup() is related to 93e66024b024
("netfilter: conntrack: pass nf_hook_state to packet and error
handlers").  In general, we would like to sychronize our
./datapatch/*.c code to be as similar as to the upstream
./net/openvswitch/*.c. In this case, We can try to hide the #if #else
#endif in the compat layer in ./datapath/linux/compat/

Here is an example.  I only tested it on 4.4 kernel, it may need to be
tested on other kernels.

diff --git a/acinclude.m4 b/acinclude.m4
index 3cd6ea7302d5..d6cfbd54e357 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -675,6 +675,9 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [
                   [nf_ct_set])
   OVS_GREP_IFELSE([$KSRC/include/net/netfilter/nf_conntrack.h],
                   [nf_ct_is_untracked])
+  OVS_FIND_PARAM_IFELSE([$KSRC/include/net/netfilter/nf_conntrack_core.h],
+                  [nf_conntrack_in], [u_int8_t pf],
+                  [OVS_DEFINE([HAVE_NF_CONNTRACK_IN_PF])])
   OVS_GREP_IFELSE([$KSRC/include/net/netfilter/nf_conntrack_zones.h],
                   [nf_ct_zone_init])
   OVS_GREP_IFELSE([$KSRC/include/net/netfilter/nf_conntrack_l3proto.h],
diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index a7dc9e0c3513..5a97f913f0b2 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -987,6 +987,11 @@ static int __ovs_ct_lookup(struct net *net,
struct sw_flow_key *key,
        struct nf_conn *ct;

        if (!cached) {
+               struct nf_hook_state state = {
+                       .hook = NF_INET_PRE_ROUTING,
+                       .pf = info->family,
+                       .net = net,
+               };
                struct nf_conn *tmpl = info->ct;
                int err;

@@ -998,8 +1003,7 @@ static int __ovs_ct_lookup(struct net *net,
struct sw_flow_key *key,
                        nf_ct_set(skb, tmpl, IP_CT_NEW);
                }

-               err = nf_conntrack_in(net, info->family,
-                                     NF_INET_PRE_ROUTING, skb);
+               err = nf_conntrack_in(skb, &state);
                if (err != NF_ACCEPT)
                        return -ENOENT;

diff --git a/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h
b/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h
index 7834c8c25f79..b05a5beda3cc 100644
--- a/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h
+++ b/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h
@@ -104,4 +104,14 @@ static inline bool rpl_nf_ct_delete(struct
nf_conn *ct, u32 portid, int report)
 #define nf_ct_delete rpl_nf_ct_delete
 #endif /* HAVE_NF_CONN_TIMER */

+#ifdef HAVE_NF_CONNTRACK_IN_PF
+
+static inline bool rpl_nf_conntrack_in(struct sk_buff *skb,
+                                       struct nf_hook_state *state)
+{
+    return nf_conntrack_in(state->net, state->pf, state->hook, skb);
+}
+#define nf_conntrack_in rpl_nf_conntrack_in
+#endif /* HAVE_NF_CONNTRACK_IN_PF */
+
 #endif /* _NF_CONNTRACK_CORE_WRAPPER_H */


> @@ -1307,9 +1351,17 @@ int ovs_ct_execute(struct net *net, struct sk_buff 
> *skb,
>  {
>         int nh_ofs;
>         int err;
> +       /* From kernel 4.19.0+, Function handle_fragments may shrink skb's
> +        * headroom, which will result in loss of ethernet header data.
> +        * This buf is used to backup the header data before calling
> +        * handle_fragments. */
> +       char buf[32];
>
>         /* The conntrack module expects to be working at L3. */
>         nh_ofs = skb_network_offset(skb);
> +       if (nh_ofs > sizeof(buf))
> +               return -EINVAL;
> +       memcpy(buf, skb->data, nh_ofs);
>         skb_pull_rcsum(skb, nh_ofs);
>
>         err = ovs_skb_network_trim(skb);
> @@ -1326,8 +1378,16 @@ int ovs_ct_execute(struct net *net, struct sk_buff 
> *skb,
>                 err = ovs_ct_commit(net, key, info, skb);
>         else
>                 err = ovs_ct_lookup(net, key, info, skb);
> +       if (err)
> +               return err;
>
> +       if (skb_headroom(skb) < nh_ofs) {
> +               err = pskb_expand_head(skb, nh_ofs, 0, GFP_ATOMIC);
> +               if (err)
> +                       return err;
> +       }
>         skb_push(skb, nh_ofs);
> +       memcpy(skb->data, buf, nh_ofs);
>         skb_postpush_rcsum(skb, skb->data, nh_ofs);
>         if (err)
>                 kfree_skb(skb);
The change in ovs_ct_execute() looks like a bug fix in upstream
kernel. According to our backport policy,
http://docs.openvswitch.org/en/latest/internals/contributing/backporting-patches/
 Please upstream it to net-next before bring it back to datapath.



> @@ -1362,7 +1422,11 @@ static int ovs_ct_add_helper(struct ovs_conntrack_info 
> *info, const char *name,
>                 return -EINVAL;
>         }
>
> +#ifdef HAVE_NF_CT_HELPER_EXT_ADD_TAKES_HELPER
>         help = nf_ct_helper_ext_add(info->ct, helper, GFP_KERNEL);
> +#else
> +       help = nf_ct_helper_ext_add(info->ct, GFP_KERNEL);
> +#endif
>         if (!help) {
>                 nf_conntrack_helper_put(helper);
>                 return -ENOMEM;
The change here is related to upstream patch 440534d3c56b ("netfilter:
Remove useless param helper of nf_ct_helper_ext_add"). Can you try to
hide the #if #else #endif logic in the compat layer as the example in
__ovs_ct_lookup().


> @@ -1387,6 +1451,20 @@ static int parse_nat(const struct nlattr *attr,
>         bool have_proto_max = false;
>         bool ip_vers = (info->family == NFPROTO_IPV6);
>
> +#ifndef CONFIG_NF_NAT_IPV4
> +       if (info->family == NFPROTO_IPV4) {
> +               OVS_NLERR(log, "Flow action ct(nat) not supported without 
> nf_nat_ipv4");
> +               return -ENOTSUPP;
> +       }
> +#endif
> +
> +#ifndef CONFIG_NF_NAT_IPV6
> +       if (info->family == NFPROTO_IPV6) {
> +               OVS_NLERR(log, "Flow action ct(nat) not supported without 
> nf_nat_ipv6");
> +               return -ENOTSUPP;
> +        }
> +#endif
> +
>         nla_for_each_nested(a, attr, rem) {
>                 static const int ovs_nat_attr_lens[OVS_NAT_ATTR_MAX + 1][2] = 
> {
>                         [OVS_NAT_ATTR_SRC] = {0, 0},
Is this something that would happen in the upstream kernel? If this is
the case, we should upstream that before backport it to datapath.


I did not review the following compat code since they may need to
change accroding with different backport approach.

> diff --git a/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h 
> b/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h
> index 7834c8c25f79..7fca7dc551c8 100644

Thanks,

-Yi-Hung
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to