Re: [PATCH net-next 16/24] net: netkit, veth, tun, virt*: Use nested-BH locking for XDP redirect.
On 2023-12-18 09:52:05 [+0100], Daniel Borkmann wrote: > Hi Sebastian, Hi Daniel, > Please exclude netkit from this set given it does not support XDP, but > instead only accepts tc BPF typed programs. okay, thank you. > Thanks, > Daniel Sebastian
[PATCH net-next 16/24] net: netkit, veth, tun, virt*: Use nested-BH locking for XDP redirect.
The per-CPU variables used during bpf_prog_run_xdp() invocation and later during xdp_do_redirect() rely on disabled BH for their protection. Without locking in local_bh_disable() on PREEMPT_RT these data structure require explicit locking. This is a follow-up on the previous change which introduced bpf_run_lock.redirect_lock and uses it now within drivers. The simple way is to acquire the lock before bpf_prog_run_xdp() is invoked and hold it until the end of function. This does not always work because some drivers (cpsw, atlantic) invoke xdp_do_flush() in the same context. Acquiring the lock in bpf_prog_run_xdp() and dropping in xdp_do_redirect() (without touching drivers) does not work because not all driver, which use bpf_prog_run_xdp(), do support XDP_REDIRECT (and invoke xdp_do_redirect()). Ideally the minimal locking scope would be bpf_prog_run_xdp() + xdp_do_redirect() and everything else (error recovery, DMA unmapping, free/ alloc of memory, …) would happen outside of the locked section. Cc: "K. Y. Srinivasan" Cc: "Michael S. Tsirkin" Cc: Alexei Starovoitov Cc: Andrii Nakryiko Cc: Dexuan Cui Cc: Haiyang Zhang Cc: Hao Luo Cc: Jesper Dangaard Brouer Cc: Jiri Olsa Cc: John Fastabend Cc: Juergen Gross Cc: KP Singh Cc: Martin KaFai Lau Cc: Nikolay Aleksandrov Cc: Song Liu Cc: Stanislav Fomichev Cc: Stefano Stabellini Cc: Wei Liu Cc: Willem de Bruijn Cc: Xuan Zhuo Cc: Yonghong Song Cc: b...@vger.kernel.org Cc: virtualizat...@lists.linux.dev Cc: xen-devel@lists.xenproject.org Signed-off-by: Sebastian Andrzej Siewior --- drivers/net/hyperv/netvsc_bpf.c | 1 + drivers/net/netkit.c| 13 +++ drivers/net/tun.c | 28 +-- drivers/net/veth.c | 40 - drivers/net/virtio_net.c| 1 + drivers/net/xen-netfront.c | 1 + 6 files changed, 52 insertions(+), 32 deletions(-) diff --git a/drivers/net/hyperv/netvsc_bpf.c b/drivers/net/hyperv/netvsc_bpf.c index 4a9522689fa4f..55f8ca92ca199 100644 --- a/drivers/net/hyperv/netvsc_bpf.c +++ b/drivers/net/hyperv/netvsc_bpf.c @@ -58,6 +58,7 @@ u32 netvsc_run_xdp(struct net_device *ndev, struct netvsc_channel *nvchan, memcpy(xdp->data, data, len); + guard(local_lock_nested_bh)(_run_lock.redirect_lock); act = bpf_prog_run_xdp(prog, xdp); switch (act) { diff --git a/drivers/net/netkit.c b/drivers/net/netkit.c index 39171380ccf29..fbcf78477bda8 100644 --- a/drivers/net/netkit.c +++ b/drivers/net/netkit.c @@ -80,8 +80,15 @@ static netdev_tx_t netkit_xmit(struct sk_buff *skb, struct net_device *dev) netkit_prep_forward(skb, !net_eq(dev_net(dev), dev_net(peer))); skb->dev = peer; entry = rcu_dereference(nk->active); - if (entry) - ret = netkit_run(entry, skb, ret); + if (entry) { + scoped_guard(local_lock_nested_bh, _run_lock.redirect_lock) { + ret = netkit_run(entry, skb, ret); + if (ret == NETKIT_REDIRECT) { + dev_sw_netstats_tx_add(dev, 1, len); + skb_do_redirect(skb); + } + } + } switch (ret) { case NETKIT_NEXT: case NETKIT_PASS: @@ -95,8 +102,6 @@ static netdev_tx_t netkit_xmit(struct sk_buff *skb, struct net_device *dev) } break; case NETKIT_REDIRECT: - dev_sw_netstats_tx_add(dev, 1, len); - skb_do_redirect(skb); break; case NETKIT_DROP: default: diff --git a/drivers/net/tun.c b/drivers/net/tun.c index afa5497f7c35c..fe0d31f11e4b6 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1708,16 +1708,18 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, xdp_init_buff(, buflen, >xdp_rxq); xdp_prepare_buff(, buf, pad, len, false); - act = bpf_prog_run_xdp(xdp_prog, ); - if (act == XDP_REDIRECT || act == XDP_TX) { - get_page(alloc_frag->page); - alloc_frag->offset += buflen; - } - err = tun_xdp_act(tun, xdp_prog, , act); - if (err < 0) { - if (act == XDP_REDIRECT || act == XDP_TX) - put_page(alloc_frag->page); - goto out; + scoped_guard(local_lock_nested_bh, _run_lock.redirect_lock) { + act = bpf_prog_run_xdp(xdp_prog, ); + if (act == XDP_REDIRECT || act == XDP_TX) { + get_page(alloc_frag->page); + alloc_frag->offset += buflen; + } + err = tun_xdp_act(tun, xdp_prog, , act); + if (err < 0) { +
[PATCH REPOST v2 1/2] x86/xen: Allow to retry if cpu_initialize_context() failed.
From: Boris Ostrovsky If memory allocation in cpu_initialize_context() fails then it will bring up the VCPU and leave with the corresponding CPU bit set in xen_cpu_initialized_map. The following (presumably successful) CPU bring up will BUG in xen_pv_cpu_up() because nothing for that VCPU would be initialized. Clear the CPU bits, that were set in cpu_initialize_context() in case the memory allocation fails. [ bigeasy: Creating a patch from Boris' email. ] Signed-off-by: Boris Ostrovsky Signed-off-by: Sebastian Andrzej Siewior Link: https://lore.kernel.org/r/20211206152034.2150770-2-bige...@linutronix.de --- arch/x86/xen/smp_pv.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c index 4a6019238ee7d..57c3f9332ec94 100644 --- a/arch/x86/xen/smp_pv.c +++ b/arch/x86/xen/smp_pv.c @@ -260,8 +260,11 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle) return 0; ctxt = kzalloc(sizeof(*ctxt), GFP_KERNEL); - if (ctxt == NULL) + if (ctxt == NULL) { + cpumask_clear_cpu(cpu, xen_cpu_initialized_map); + cpumask_clear_cpu(cpu, cpu_callout_mask); return -ENOMEM; + } gdt = get_cpu_gdt_rw(cpu); -- 2.34.1
[PATCH REPOST v2 2/2] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again.
From: "Longpeng(Mike)" A CPU will not show up in virtualized environment which includes an Enclave. The VM splits its resources into a primary VM and a Enclave VM. While the Enclave is active, the hypervisor will ignore all requests to bring up a CPU and this CPU will remain in CPU_UP_PREPARE state. The kernel will wait up to ten seconds for CPU to show up (do_boot_cpu()) and then rollback the hotplug state back to CPUHP_OFFLINE leaving the CPU state in CPU_UP_PREPARE. The CPU state is set back to CPUHP_TEARDOWN_CPU during the CPU_POST_DEAD stage. After the Enclave VM terminates, the primary VM can bring up the CPU again. Allow to bring up the CPU if it is in the CPU_UP_PREPARE state. [bigeasy: Rewrite commit description.] Signed-off-by: Longpeng(Mike) Signed-off-by: Sebastian Andrzej Siewior Link: https://lore.kernel.org/r/20210901051143.2752-1-longpe...@huawei.com Reviewed-by: Valentin Schneider Tested-by: Dongli Zhang Reviewed-by: Henry Wang Link: https://lore.kernel.org/r/20211122154714.xaoxok3fpk5bg...@linutronix.de Link: https://lore.kernel.org/r/20211206152034.2150770-3-bige...@linutronix.de --- kernel/smpboot.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/kernel/smpboot.c b/kernel/smpboot.c index f6bc0bc8a2aab..b9f54544e7499 100644 --- a/kernel/smpboot.c +++ b/kernel/smpboot.c @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu) */ return -EAGAIN; + case CPU_UP_PREPARE: + /* +* Timeout while waiting for the CPU to show up. Allow to try +* again later. +*/ + return 0; + default: /* Should not happen. Famous last words. */ -- 2.34.1
[PATCH REPOST v2 0/2 v2] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again.
This is a another repost of the previous patch (#2) and adding Boris (Ostrovsky)'s suggestion regarding the XEN bits. The previous posts can be found at https://lore.kernel.org/all/20211206152034.2150770-1-bige...@linutronix.de/ https://lore.kernel.org/all/20211122154714.xaoxok3fpk5bg...@linutronix.de/ Sebastian
Re: [PATCH 0/2 v2] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again.
On 2021-12-06 16:20:32 [+0100], To linux-ker...@vger.kernel.org wrote: > This is a repost of the previous patch (#2) and adding Boris > (Ostrovsky)'s suggestion regarding the XEN bits. > The previous post can be found at >https://lore.kernel.org/all/20211122154714.xaoxok3fpk5bg...@linutronix.de/ ping. Sebastian
[PATCH 2/2] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again.
From: "Longpeng(Mike)" A CPU will not show up in virtualized environment which includes an Enclave. The VM splits its resources into a primary VM and a Enclave VM. While the Enclave is active, the hypervisor will ignore all requests to bring up a CPU and this CPU will remain in CPU_UP_PREPARE state. The kernel will wait up to ten seconds for CPU to show up (do_boot_cpu()) and then rollback the hotplug state back to CPUHP_OFFLINE leaving the CPU state in CPU_UP_PREPARE. The CPU state is set back to CPUHP_TEARDOWN_CPU during the CPU_POST_DEAD stage. After the Enclave VM terminates, the primary VM can bring up the CPU again. Allow to bring up the CPU if it is in the CPU_UP_PREPARE state. [bigeasy: Rewrite commit description.] Signed-off-by: Longpeng(Mike) Signed-off-by: Sebastian Andrzej Siewior Link: https://lore.kernel.org/r/20210901051143.2752-1-longpe...@huawei.com Reviewed-by: Valentin Schneider Tested-by: Dongli Zhang Reviewed-by: Henry Wang Link: https://lore.kernel.org/r/20211122154714.xaoxok3fpk5bg...@linutronix.de --- kernel/smpboot.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/kernel/smpboot.c b/kernel/smpboot.c index f6bc0bc8a2aab..b9f54544e7499 100644 --- a/kernel/smpboot.c +++ b/kernel/smpboot.c @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu) */ return -EAGAIN; + case CPU_UP_PREPARE: + /* +* Timeout while waiting for the CPU to show up. Allow to try +* again later. +*/ + return 0; + default: /* Should not happen. Famous last words. */ -- 2.34.1
[PATCH 0/2 v2] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again.
This is a repost of the previous patch (#2) and adding Boris (Ostrovsky)'s suggestion regarding the XEN bits. The previous post can be found at https://lore.kernel.org/all/20211122154714.xaoxok3fpk5bg...@linutronix.de/ Sebastian
[PATCH 1/2] x86/xen: Allow to retry if cpu_initialize_context() failed.
From: Boris Ostrovsky If memory allocation in cpu_initialize_context() fails then it will bring up the VCPU and leave with the corresponding CPU bit set in xen_cpu_initialized_map. The following (presumably successful) CPU bring up will BUG in xen_pv_cpu_up() because nothing for that VCPU would be initialized. Clear the CPU bits, that were set in cpu_initialize_context() in case the memory allocation fails. [ bigeasy: Creating a patch from Boris' email. ] Signed-off-by: Boris Ostrovsky Signed-off-by: Sebastian Andrzej Siewior --- arch/x86/xen/smp_pv.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c index 6a8f3b53ab834..86368fcef4667 100644 --- a/arch/x86/xen/smp_pv.c +++ b/arch/x86/xen/smp_pv.c @@ -277,8 +277,11 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle) return 0; ctxt = kzalloc(sizeof(*ctxt), GFP_KERNEL); - if (ctxt == NULL) + if (ctxt == NULL) { + cpumask_clear_cpu(cpu, xen_cpu_initialized_map); + cpumask_clear_cpu(cpu, cpu_callout_mask); return -ENOMEM; + } gdt = get_cpu_gdt_rw(cpu); -- 2.34.1
Re: [PATCH] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again.
On 2021-11-24 21:17:34 [-0500], Boris Ostrovsky wrote: > > On 11/24/21 5:54 PM, Thomas Gleixner wrote: > > Any comment from XEN folks? > > > If memory allocation in cpu_initialize_context() fails we will not be > able to bring up the VCPU because xen_cpu_initialized_map bit at the > top of that routine will already have been set. We will BUG in > xen_pv_cpu_up() on second (presumably successful) attempt because > nothing for that VCPU would be initialized. This can in principle be > fixed by moving allocation to the top of the routine and freeing > context if the bit in the bitmap is already set. > > > Having said that, allocation really should not fail: for PV guests we > first bring max number of VCPUs up and then offline them down to > however many need to run. I think if we fail allocation during boot we > are going to have a really bad day anyway. > So can we keep the patch as-is or are some changes needed? > -boris Sebastian
[PATCH] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again.
From: "Longpeng(Mike)" A CPU will not show up in virtualized environment which includes an Enclave. The VM splits its resources into a primary VM and a Enclave VM. While the Enclave is active, the hypervisor will ignore all requests to bring up a CPU and this CPU will remain in CPU_UP_PREPARE state. The kernel will wait up to ten seconds for CPU to show up (do_boot_cpu()) and then rollback the hotplug state back to CPUHP_OFFLINE leaving the CPU state in CPU_UP_PREPARE. The CPU state is set back to CPUHP_TEARDOWN_CPU during the CPU_POST_DEAD stage. After the Enclave VM terminates, the primary VM can bring up the CPU again. Allow to bring up the CPU if it is in the CPU_UP_PREPARE state. [bigeasy: Rewrite commit description.] Signed-off-by: Longpeng(Mike) Signed-off-by: Sebastian Andrzej Siewior Link: https://lore.kernel.org/r/20210901051143.2752-1-longpe...@huawei.com --- For XEN: this changes the behaviour as it allows to invoke cpu_initialize_context() again should it have have earlier. I *think* this is okay and would to bring up the CPU again should the memory allocation in cpu_initialize_context() fail. kernel/smpboot.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/kernel/smpboot.c b/kernel/smpboot.c index f6bc0bc8a2aab..34958d7fe2c1c 100644 --- a/kernel/smpboot.c +++ b/kernel/smpboot.c @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu) */ return -EAGAIN; + case CPU_UP_PREPARE: + /* +* Timeout while waiting for the CPU to show up. Allow to try +* again later. +*/ + return 0; + default: /* Should not happen. Famous last words. */ -- 2.33.1
[Xen-devel] [PATCH 22/34] xen: Use CONFIG_PREEMPTION
From: Thomas Gleixner CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same functionality which today depends on CONFIG_PREEMPT. Switch the preempt anand xen-ops code over to use CONFIG_PREEMPTION. Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini Cc: xen-devel@lists.xenproject.org Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior --- drivers/xen/preempt.c | 4 ++-- include/xen/xen-ops.h | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/xen/preempt.c b/drivers/xen/preempt.c index 8b9919c26095d..70650b248de5d 100644 --- a/drivers/xen/preempt.c +++ b/drivers/xen/preempt.c @@ -8,7 +8,7 @@ #include #include -#ifndef CONFIG_PREEMPT +#ifndef CONFIG_PREEMPTION /* * Some hypercalls issued by the toolstack can take many 10s of @@ -37,4 +37,4 @@ asmlinkage __visible void xen_maybe_preempt_hcall(void) __this_cpu_write(xen_in_preemptible_hcall, true); } } -#endif /* CONFIG_PREEMPT */ +#endif /* CONFIG_PREEMPTION */ diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h index d89969aa9942c..095be1d66f31c 100644 --- a/include/xen/xen-ops.h +++ b/include/xen/xen-ops.h @@ -215,7 +215,7 @@ bool xen_running_on_version_or_later(unsigned int major, unsigned int minor); void xen_efi_runtime_setup(void); -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION static inline void xen_preemptible_hcall_begin(void) { @@ -239,6 +239,6 @@ static inline void xen_preemptible_hcall_end(void) __this_cpu_write(xen_in_preemptible_hcall, false); } -#endif /* CONFIG_PREEMPT */ +#endif /* CONFIG_PREEMPTION */ #endif /* INCLUDE_XEN_OPS_H */ -- 2.23.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel