Re: [PATCH V4 net 2/3] tuntap: disable preemption during XDP processing
From: Jason Wang Date: Sat, 24 Feb 2018 11:32:25 +0800 > Except for tuntap, all other drivers' XDP was implemented at NAPI > poll() routine in a bh. This guarantees all XDP operation were done at > the same CPU which is required by e.g BFP_MAP_TYPE_PERCPU_ARRAY. But > for tuntap, we do it in process context and we try to protect XDP > processing by RCU reader lock. This is insufficient since > CONFIG_PREEMPT_RCU can preempt the RCU reader critical section which > breaks the assumption that all XDP were processed in the same CPU. > > Fixing this by simply disabling preemption during XDP processing. > > Fixes: 761876c857cb ("tap: XDP support") > Signed-off-by: Jason Wang Applied.
Re: [PATCH V4 net 2/3] tuntap: disable preemption during XDP processing
On 2018年02月26日 19:02, Jesper Dangaard Brouer wrote: On Sat, 24 Feb 2018 11:32:25 +0800 Jason Wang wrote: Except for tuntap, all other drivers' XDP was implemented at NAPI poll() routine in a bh. This guarantees all XDP operation were done at the same CPU which is required by e.g BFP_MAP_TYPE_PERCPU_ARRAY. But There is a typo in the defined name "BFP_MAP_TYPE_PERCPU_ARRAY". Besides it is NOT a requirement that comes from the map type BPF_MAP_TYPE_PERCPU_ARRAY. But it looks to me that bpf_array uses percpu structure, e.g in percpu_array_map_lookup_elem() it tries to access it through this_cpu_ptr(). The requirement comes from the bpf_redirect_map helper (and only partly devmap + cpumap types), as the BPF helper/program stores information in the per-cpu redirect_info struct (see filter.c), that is used by xdp_do_redirect() and xdp_do_flush_map(). struct redirect_info { u32 ifindex; u32 flags; struct bpf_map *map; struct bpf_map *map_to_flush; unsigned long map_owner; }; static DEFINE_PER_CPU(struct redirect_info, redirect_info); [...] void xdp_do_flush_map(void) { struct redirect_info *ri = this_cpu_ptr(&redirect_info); struct bpf_map *map = ri->map_to_flush; [...] Notice the same redirect_info is used by the TC clsbpf system... for tuntap, we do it in process context and we try to protect XDP processing by RCU reader lock. This is insufficient since CONFIG_PREEMPT_RCU can preempt the RCU reader critical section which breaks the assumption that all XDP were processed in the same CPU. Fixing this by simply disabling preemption during XDP processing. I guess, this could pamper over the problem... But I generally find it problematic that the tuntap is not invoking XDP from NAPI poll() routine in BH-context, as that context provided us with some protection that allow certain kind of optimizations (like this flush API). I hope this will not limit us in the future, that tuntap driver violate the XDP call context. Good to see tuntap is on the radar :), it was easily forgotten. I was glad to test anything new in XDP for tuntap. But I do not see any thing that prevents us from having a similar environment that NAPI poll() can provides. I admit the flush is inefficient now, but it does not mean we can't solve it in the future. E.g for the flush, I plan to introduce the batching API which can accept an array of skb/XDP pointers in its sendmsg(). Then we can do a more efficient flush. Note, tuntap supports NAPI (IFF_NAPI), but the main use case is kernel rx path hardening. Unless rx batching is enabled, it would be slower than non NAPI mode. Technically, it can support XDP but need more work (e.g work at the level of XDP buffer instead of skb). I tend to do fixes or optimizations on the current code unless we find a real blocker. Fixes: 761876c857cb ("tap: XDP support") $ git describe --contains 761876c857cb v4.14-rc1~130^2~270^2 So please let me know if you're ok with the fix. Thanks
Re: [PATCH V4 net 2/3] tuntap: disable preemption during XDP processing
On Sat, 24 Feb 2018 11:32:25 +0800 Jason Wang wrote: > Except for tuntap, all other drivers' XDP was implemented at NAPI > poll() routine in a bh. This guarantees all XDP operation were done at > the same CPU which is required by e.g BFP_MAP_TYPE_PERCPU_ARRAY. But There is a typo in the defined name "BFP_MAP_TYPE_PERCPU_ARRAY". Besides it is NOT a requirement that comes from the map type BPF_MAP_TYPE_PERCPU_ARRAY. The requirement comes from the bpf_redirect_map helper (and only partly devmap + cpumap types), as the BPF helper/program stores information in the per-cpu redirect_info struct (see filter.c), that is used by xdp_do_redirect() and xdp_do_flush_map(). struct redirect_info { u32 ifindex; u32 flags; struct bpf_map *map; struct bpf_map *map_to_flush; unsigned long map_owner; }; static DEFINE_PER_CPU(struct redirect_info, redirect_info); [...] void xdp_do_flush_map(void) { struct redirect_info *ri = this_cpu_ptr(&redirect_info); struct bpf_map *map = ri->map_to_flush; [...] Notice the same redirect_info is used by the TC clsbpf system... > for tuntap, we do it in process context and we try to protect XDP > processing by RCU reader lock. This is insufficient since > CONFIG_PREEMPT_RCU can preempt the RCU reader critical section which > breaks the assumption that all XDP were processed in the same CPU. > > Fixing this by simply disabling preemption during XDP processing. I guess, this could pamper over the problem... But I generally find it problematic that the tuntap is not invoking XDP from NAPI poll() routine in BH-context, as that context provided us with some protection that allow certain kind of optimizations (like this flush API). I hope this will not limit us in the future, that tuntap driver violate the XDP call context. > Fixes: 761876c857cb ("tap: XDP support") $ git describe --contains 761876c857cb v4.14-rc1~130^2~270^2 -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
[PATCH V4 net 2/3] tuntap: disable preemption during XDP processing
Except for tuntap, all other drivers' XDP was implemented at NAPI poll() routine in a bh. This guarantees all XDP operation were done at the same CPU which is required by e.g BFP_MAP_TYPE_PERCPU_ARRAY. But for tuntap, we do it in process context and we try to protect XDP processing by RCU reader lock. This is insufficient since CONFIG_PREEMPT_RCU can preempt the RCU reader critical section which breaks the assumption that all XDP were processed in the same CPU. Fixing this by simply disabling preemption during XDP processing. Fixes: 761876c857cb ("tap: XDP support") Signed-off-by: Jason Wang --- drivers/net/tun.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 2823a4a..63d39fe6 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1642,6 +1642,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, else *skb_xdp = 0; + preempt_disable(); rcu_read_lock(); xdp_prog = rcu_dereference(tun->xdp_prog); if (xdp_prog && !*skb_xdp) { @@ -1665,6 +1666,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, if (err) goto err_redirect; rcu_read_unlock(); + preempt_enable(); return NULL; case XDP_TX: xdp_xmit = true; @@ -1686,6 +1688,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, skb = build_skb(buf, buflen); if (!skb) { rcu_read_unlock(); + preempt_enable(); return ERR_PTR(-ENOMEM); } @@ -1698,10 +1701,12 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, skb->dev = tun->dev; generic_xdp_tx(skb, xdp_prog); rcu_read_unlock(); + preempt_enable(); return NULL; } rcu_read_unlock(); + preempt_enable(); return skb; @@ -1709,6 +1714,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, put_page(alloc_frag->page); err_xdp: rcu_read_unlock(); + preempt_enable(); this_cpu_inc(tun->pcpu_stats->rx_dropped); return NULL; } -- 2.7.4