Re: [PATCH net-next] fix unsafe set_memory_rw from softirq
On Wed, Oct 2, 2013 at 9:57 PM, Eric Dumazet wrote: > On Wed, 2013-10-02 at 21:53 -0700, Eric Dumazet wrote: >> On Wed, 2013-10-02 at 21:44 -0700, Alexei Starovoitov wrote: >> >> > I think ifdef config_x86 is a bit ugly inside struct sk_filter, but >> > don't mind whichever way. >> >> Its not fair to make sk_filter bigger, because it means that simple (non >> JIT) filter might need an extra cache line. >> >> You could presumably use the following layout instead : >> >> struct sk_filter >> { >> atomic_trefcnt; >> struct rcu_head rcu; >> struct work_struct work; >> >> unsigned intlen cacheline_aligned;/* Number of >> filter blocks */ >> unsigned int(*bpf_func)(const struct sk_buff *skb, >> const struct sock_filter >> *filter); >> struct sock_filter insns[0]; >> }; > > And since @len is not used by sk_run_filter() use : > > struct sk_filter { > atomic_trefcnt; > int len; /* number of filter blocks */ > struct rcu_head rcu; > struct work_struct work; > > unsigned int(*bpf_func)(const struct sk_buff *skb, > const struct sock_filter *filter) > cacheline_aligned; > struct sock_filter insns[0]; > }; yes. make sense to avoid first insn cache miss inside sk_run_filter() at the expense of 8-byte gap between work and bpf_func (on x86_64 w/o lockdep) Probably even better to overlap work and insns fields. Pro: sk_filter size the same, no impact on non-jit case Con: would be harder to understand the code another problem is that kfree(sk_filter) inside sk_filter_release_rcu() needs to move inside bpf_jit_free(). so self nack. Let me fix these issues and respin Thanks Alexei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net-next] fix unsafe set_memory_rw from softirq
On Wed, Oct 2, 2013 at 9:57 PM, Eric Dumazet eric.duma...@gmail.com wrote: On Wed, 2013-10-02 at 21:53 -0700, Eric Dumazet wrote: On Wed, 2013-10-02 at 21:44 -0700, Alexei Starovoitov wrote: I think ifdef config_x86 is a bit ugly inside struct sk_filter, but don't mind whichever way. Its not fair to make sk_filter bigger, because it means that simple (non JIT) filter might need an extra cache line. You could presumably use the following layout instead : struct sk_filter { atomic_trefcnt; struct rcu_head rcu; struct work_struct work; unsigned intlen cacheline_aligned;/* Number of filter blocks */ unsigned int(*bpf_func)(const struct sk_buff *skb, const struct sock_filter *filter); struct sock_filter insns[0]; }; And since @len is not used by sk_run_filter() use : struct sk_filter { atomic_trefcnt; int len; /* number of filter blocks */ struct rcu_head rcu; struct work_struct work; unsigned int(*bpf_func)(const struct sk_buff *skb, const struct sock_filter *filter) cacheline_aligned; struct sock_filter insns[0]; }; yes. make sense to avoid first insn cache miss inside sk_run_filter() at the expense of 8-byte gap between work and bpf_func (on x86_64 w/o lockdep) Probably even better to overlap work and insns fields. Pro: sk_filter size the same, no impact on non-jit case Con: would be harder to understand the code another problem is that kfree(sk_filter) inside sk_filter_release_rcu() needs to move inside bpf_jit_free(). so self nack. Let me fix these issues and respin Thanks Alexei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net-next] fix unsafe set_memory_rw from softirq
On Wed, 2013-10-02 at 21:53 -0700, Eric Dumazet wrote: > On Wed, 2013-10-02 at 21:44 -0700, Alexei Starovoitov wrote: > > > I think ifdef config_x86 is a bit ugly inside struct sk_filter, but > > don't mind whichever way. > > Its not fair to make sk_filter bigger, because it means that simple (non > JIT) filter might need an extra cache line. > > You could presumably use the following layout instead : > > struct sk_filter > { > atomic_trefcnt; > struct rcu_head rcu; > struct work_struct work; > > unsigned intlen cacheline_aligned;/* Number of > filter blocks */ > unsigned int(*bpf_func)(const struct sk_buff *skb, > const struct sock_filter *filter); > struct sock_filter insns[0]; > }; And since @len is not used by sk_run_filter() use : struct sk_filter { atomic_trefcnt; int len; /* number of filter blocks */ struct rcu_head rcu; struct work_struct work; unsigned int(*bpf_func)(const struct sk_buff *skb, const struct sock_filter *filter) cacheline_aligned; struct sock_filter insns[0]; }; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net-next] fix unsafe set_memory_rw from softirq
On Wed, 2013-10-02 at 21:44 -0700, Alexei Starovoitov wrote: > I think ifdef config_x86 is a bit ugly inside struct sk_filter, but > don't mind whichever way. Its not fair to make sk_filter bigger, because it means that simple (non JIT) filter might need an extra cache line. You could presumably use the following layout instead : struct sk_filter { atomic_trefcnt; struct rcu_head rcu; struct work_struct work; unsigned intlen cacheline_aligned;/* Number of filter blocks */ unsigned int(*bpf_func)(const struct sk_buff *skb, const struct sock_filter *filter); struct sock_filter insns[0]; }; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net-next] fix unsafe set_memory_rw from softirq
On Wed, Oct 2, 2013 at 9:23 PM, Eric Dumazet wrote: > On Wed, 2013-10-02 at 20:50 -0700, Alexei Starovoitov wrote: >> on x86 system with net.core.bpf_jit_enable = 1 > >> diff --git a/include/linux/filter.h b/include/linux/filter.h >> index a6ac848..378fa03 100644 >> --- a/include/linux/filter.h >> +++ b/include/linux/filter.h >> @@ -27,6 +27,7 @@ struct sk_filter >> unsigned intlen;/* Number of filter blocks */ >> unsigned int(*bpf_func)(const struct sk_buff *skb, >> const struct sock_filter *filter); >> + struct work_struct work; >> struct rcu_head rcu; >> struct sock_filter insns[0]; >> }; > > Nice catch ! > > It seems only x86 and s390 needs this work_struct. I think ifdef config_x86 is a bit ugly inside struct sk_filter, but don't mind whichever way. > (and you might CC Heiko Carstens to ask him > to make the s390 part, of Ack it if you plan to do it) set_memory_rw() on s390 is a simple page table walker that doesn't do any IPI unlike x86 Heiko, please confirm that it's not an issue there. Thanks Alexei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net-next] fix unsafe set_memory_rw from softirq
On Wed, 2013-10-02 at 20:50 -0700, Alexei Starovoitov wrote: > on x86 system with net.core.bpf_jit_enable = 1 > diff --git a/include/linux/filter.h b/include/linux/filter.h > index a6ac848..378fa03 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -27,6 +27,7 @@ struct sk_filter > unsigned intlen;/* Number of filter blocks */ > unsigned int(*bpf_func)(const struct sk_buff *skb, > const struct sock_filter *filter); > + struct work_struct work; > struct rcu_head rcu; > struct sock_filter insns[0]; > }; Nice catch ! It seems only x86 and s390 needs this work_struct. (and you might CC Heiko Carstens to ask him to make the s390 part, of Ack it if you plan to do it) Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH net-next] fix unsafe set_memory_rw from softirq
on x86 system with net.core.bpf_jit_enable = 1 sudo tcpdump -i eth1 'tcp port 22' causes the warning: [ 56.766097] Possible unsafe locking scenario: [ 56.766097] [ 56.780146]CPU0 [ 56.786807] [ 56.793188] lock(&(>lock)->rlock); [ 56.799593] [ 56.805889] lock(&(>lock)->rlock); [ 56.812266] [ 56.812266] *** DEADLOCK *** [ 56.812266] [ 56.830670] 1 lock held by ksoftirqd/1/13: [ 56.836838] #0: (rcu_read_lock){.+.+..}, at: [] vm_unmap_aliases+0x8c/0x380 [ 56.849757] [ 56.849757] stack backtrace: [ 56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45 [ 56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012 [ 56.882004] 821944c0 88080bbdb8c8 8175a145 0007 [ 56.895630] 88080bbd5f40 88080bbdb928 81755b14 0001 [ 56.909313] 88080001 8808 8101178f 0001 [ 56.923006] Call Trace: [ 56.929532] [] dump_stack+0x55/0x76 [ 56.936067] [] print_usage_bug+0x1f7/0x208 [ 56.942445] [] ? save_stack_trace+0x2f/0x50 [ 56.948932] [] ? check_usage_backwards+0x150/0x150 [ 56.955470] [] mark_lock+0x282/0x2c0 [ 56.961945] [] __lock_acquire+0x45d/0x1d50 [ 56.968474] [] ? __lock_acquire+0x2de/0x1d50 [ 56.975140] [] ? cpumask_next_and+0x55/0x90 [ 56.981942] [] lock_acquire+0x92/0x1d0 [ 56.988745] [] ? vm_unmap_aliases+0x16a/0x380 [ 56.995619] [] _raw_spin_lock+0x41/0x50 [ 57.002493] [] ? vm_unmap_aliases+0x16a/0x380 [ 57.009447] [] vm_unmap_aliases+0x16a/0x380 [ 57.016477] [] ? vm_unmap_aliases+0x8c/0x380 [ 57.023607] [] change_page_attr_set_clr+0xc0/0x460 [ 57.030818] [] ? trace_hardirqs_on+0xd/0x10 [ 57.037896] [] ? kmem_cache_free+0xb0/0x2b0 [ 57.044789] [] ? free_object_rcu+0x93/0xa0 [ 57.051720] [] set_memory_rw+0x2f/0x40 [ 57.058727] [] bpf_jit_free+0x2c/0x40 [ 57.065577] [] sk_filter_release_rcu+0x1a/0x30 [ 57.072338] [] rcu_process_callbacks+0x202/0x7c0 [ 57.078962] [] __do_softirq+0xf7/0x3f0 [ 57.085373] [] run_ksoftirqd+0x35/0x70 cannot reuse filter memory, since it's readonly, so have to extend sk_filter with work_struct Signed-off-by: Alexei Starovoitov --- arch/x86/net/bpf_jit_comp.c | 17 - include/linux/filter.h |1 + 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 79c216a..89a43df 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -772,13 +772,20 @@ out: return; } +static void bpf_jit_free_deferred(struct work_struct *work) +{ + struct sk_filter *fp = container_of(work, struct sk_filter, work); + unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK; + struct bpf_binary_header *header = (void *)addr; + + set_memory_rw(addr, header->pages); + module_free(NULL, header); +} + void bpf_jit_free(struct sk_filter *fp) { if (fp->bpf_func != sk_run_filter) { - unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK; - struct bpf_binary_header *header = (void *)addr; - - set_memory_rw(addr, header->pages); - module_free(NULL, header); + INIT_WORK(>work, bpf_jit_free_deferred); + schedule_work(>work); } } diff --git a/include/linux/filter.h b/include/linux/filter.h index a6ac848..378fa03 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -27,6 +27,7 @@ struct sk_filter unsigned intlen;/* Number of filter blocks */ unsigned int(*bpf_func)(const struct sk_buff *skb, const struct sock_filter *filter); + struct work_struct work; struct rcu_head rcu; struct sock_filter insns[0]; }; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH net-next] fix unsafe set_memory_rw from softirq
on x86 system with net.core.bpf_jit_enable = 1 sudo tcpdump -i eth1 'tcp port 22' causes the warning: [ 56.766097] Possible unsafe locking scenario: [ 56.766097] [ 56.780146]CPU0 [ 56.786807] [ 56.793188] lock((vb-lock)-rlock); [ 56.799593] Interrupt [ 56.805889] lock((vb-lock)-rlock); [ 56.812266] [ 56.812266] *** DEADLOCK *** [ 56.812266] [ 56.830670] 1 lock held by ksoftirqd/1/13: [ 56.836838] #0: (rcu_read_lock){.+.+..}, at: [8118f44c] vm_unmap_aliases+0x8c/0x380 [ 56.849757] [ 56.849757] stack backtrace: [ 56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45 [ 56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012 [ 56.882004] 821944c0 88080bbdb8c8 8175a145 0007 [ 56.895630] 88080bbd5f40 88080bbdb928 81755b14 0001 [ 56.909313] 88080001 8808 8101178f 0001 [ 56.923006] Call Trace: [ 56.929532] [8175a145] dump_stack+0x55/0x76 [ 56.936067] [81755b14] print_usage_bug+0x1f7/0x208 [ 56.942445] [8101178f] ? save_stack_trace+0x2f/0x50 [ 56.948932] [810cc0a0] ? check_usage_backwards+0x150/0x150 [ 56.955470] [810ccb52] mark_lock+0x282/0x2c0 [ 56.961945] [810ccfed] __lock_acquire+0x45d/0x1d50 [ 56.968474] [810cce6e] ? __lock_acquire+0x2de/0x1d50 [ 56.975140] [81393bf5] ? cpumask_next_and+0x55/0x90 [ 56.981942] [810cef72] lock_acquire+0x92/0x1d0 [ 56.988745] [8118f52a] ? vm_unmap_aliases+0x16a/0x380 [ 56.995619] [817628f1] _raw_spin_lock+0x41/0x50 [ 57.002493] [8118f52a] ? vm_unmap_aliases+0x16a/0x380 [ 57.009447] [8118f52a] vm_unmap_aliases+0x16a/0x380 [ 57.016477] [8118f44c] ? vm_unmap_aliases+0x8c/0x380 [ 57.023607] [810436b0] change_page_attr_set_clr+0xc0/0x460 [ 57.030818] [810cfb8d] ? trace_hardirqs_on+0xd/0x10 [ 57.037896] [811a8330] ? kmem_cache_free+0xb0/0x2b0 [ 57.044789] [811b59c3] ? free_object_rcu+0x93/0xa0 [ 57.051720] [81043d9f] set_memory_rw+0x2f/0x40 [ 57.058727] [8104e17c] bpf_jit_free+0x2c/0x40 [ 57.065577] [81642cba] sk_filter_release_rcu+0x1a/0x30 [ 57.072338] [811108e2] rcu_process_callbacks+0x202/0x7c0 [ 57.078962] [81057f17] __do_softirq+0xf7/0x3f0 [ 57.085373] [81058245] run_ksoftirqd+0x35/0x70 cannot reuse filter memory, since it's readonly, so have to extend sk_filter with work_struct Signed-off-by: Alexei Starovoitov a...@plumgrid.com --- arch/x86/net/bpf_jit_comp.c | 17 - include/linux/filter.h |1 + 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 79c216a..89a43df 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -772,13 +772,20 @@ out: return; } +static void bpf_jit_free_deferred(struct work_struct *work) +{ + struct sk_filter *fp = container_of(work, struct sk_filter, work); + unsigned long addr = (unsigned long)fp-bpf_func PAGE_MASK; + struct bpf_binary_header *header = (void *)addr; + + set_memory_rw(addr, header-pages); + module_free(NULL, header); +} + void bpf_jit_free(struct sk_filter *fp) { if (fp-bpf_func != sk_run_filter) { - unsigned long addr = (unsigned long)fp-bpf_func PAGE_MASK; - struct bpf_binary_header *header = (void *)addr; - - set_memory_rw(addr, header-pages); - module_free(NULL, header); + INIT_WORK(fp-work, bpf_jit_free_deferred); + schedule_work(fp-work); } } diff --git a/include/linux/filter.h b/include/linux/filter.h index a6ac848..378fa03 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -27,6 +27,7 @@ struct sk_filter unsigned intlen;/* Number of filter blocks */ unsigned int(*bpf_func)(const struct sk_buff *skb, const struct sock_filter *filter); + struct work_struct work; struct rcu_head rcu; struct sock_filter insns[0]; }; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net-next] fix unsafe set_memory_rw from softirq
On Wed, 2013-10-02 at 20:50 -0700, Alexei Starovoitov wrote: on x86 system with net.core.bpf_jit_enable = 1 diff --git a/include/linux/filter.h b/include/linux/filter.h index a6ac848..378fa03 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -27,6 +27,7 @@ struct sk_filter unsigned intlen;/* Number of filter blocks */ unsigned int(*bpf_func)(const struct sk_buff *skb, const struct sock_filter *filter); + struct work_struct work; struct rcu_head rcu; struct sock_filter insns[0]; }; Nice catch ! It seems only x86 and s390 needs this work_struct. (and you might CC Heiko Carstens heiko.carst...@de.ibm.com to ask him to make the s390 part, of Ack it if you plan to do it) Thanks -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net-next] fix unsafe set_memory_rw from softirq
On Wed, Oct 2, 2013 at 9:23 PM, Eric Dumazet eric.duma...@gmail.com wrote: On Wed, 2013-10-02 at 20:50 -0700, Alexei Starovoitov wrote: on x86 system with net.core.bpf_jit_enable = 1 diff --git a/include/linux/filter.h b/include/linux/filter.h index a6ac848..378fa03 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -27,6 +27,7 @@ struct sk_filter unsigned intlen;/* Number of filter blocks */ unsigned int(*bpf_func)(const struct sk_buff *skb, const struct sock_filter *filter); + struct work_struct work; struct rcu_head rcu; struct sock_filter insns[0]; }; Nice catch ! It seems only x86 and s390 needs this work_struct. I think ifdef config_x86 is a bit ugly inside struct sk_filter, but don't mind whichever way. (and you might CC Heiko Carstens heiko.carst...@de.ibm.com to ask him to make the s390 part, of Ack it if you plan to do it) set_memory_rw() on s390 is a simple page table walker that doesn't do any IPI unlike x86 Heiko, please confirm that it's not an issue there. Thanks Alexei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net-next] fix unsafe set_memory_rw from softirq
On Wed, 2013-10-02 at 21:44 -0700, Alexei Starovoitov wrote: I think ifdef config_x86 is a bit ugly inside struct sk_filter, but don't mind whichever way. Its not fair to make sk_filter bigger, because it means that simple (non JIT) filter might need an extra cache line. You could presumably use the following layout instead : struct sk_filter { atomic_trefcnt; struct rcu_head rcu; struct work_struct work; unsigned intlen cacheline_aligned;/* Number of filter blocks */ unsigned int(*bpf_func)(const struct sk_buff *skb, const struct sock_filter *filter); struct sock_filter insns[0]; }; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net-next] fix unsafe set_memory_rw from softirq
On Wed, 2013-10-02 at 21:53 -0700, Eric Dumazet wrote: On Wed, 2013-10-02 at 21:44 -0700, Alexei Starovoitov wrote: I think ifdef config_x86 is a bit ugly inside struct sk_filter, but don't mind whichever way. Its not fair to make sk_filter bigger, because it means that simple (non JIT) filter might need an extra cache line. You could presumably use the following layout instead : struct sk_filter { atomic_trefcnt; struct rcu_head rcu; struct work_struct work; unsigned intlen cacheline_aligned;/* Number of filter blocks */ unsigned int(*bpf_func)(const struct sk_buff *skb, const struct sock_filter *filter); struct sock_filter insns[0]; }; And since @len is not used by sk_run_filter() use : struct sk_filter { atomic_trefcnt; int len; /* number of filter blocks */ struct rcu_head rcu; struct work_struct work; unsigned int(*bpf_func)(const struct sk_buff *skb, const struct sock_filter *filter) cacheline_aligned; struct sock_filter insns[0]; }; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/