Re: [PATCH RFC 08/26] locking: Remove spin_unlock_wait() generic definitions
On Fri, Jun 30, 2017 at 02:13:39PM +0100, Will Deacon wrote: > On Fri, Jun 30, 2017 at 05:38:15AM -0700, Paul E. McKenney wrote: > > On Fri, Jun 30, 2017 at 10:19:29AM +0100, Will Deacon wrote: > > > On Thu, Jun 29, 2017 at 05:01:16PM -0700, Paul E. McKenney wrote: > > > > There is no agreed-upon definition of spin_unlock_wait()'s semantics, > > > > and it appears that all callers could do just as well with a lock/unlock > > > > pair. This commit therefore removes spin_unlock_wait() and related > > > > definitions from core code. > > > > > > > > Signed-off-by: Paul E. McKenney> > > > Cc: Arnd Bergmann > > > > Cc: Ingo Molnar > > > > Cc: Will Deacon > > > > Cc: Peter Zijlstra > > > > Cc: Alan Stern > > > > Cc: Andrea Parri > > > > Cc: Linus Torvalds > > > > --- > > > > include/asm-generic/qspinlock.h | 14 - > > > > include/linux/spinlock.h| 31 --- > > > > include/linux/spinlock_up.h | 6 --- > > > > kernel/locking/qspinlock.c | 117 > > > > > > > > 4 files changed, 168 deletions(-) > > > > > > [...] > > > > > > > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c > > > > index b2caec7315af..64a9051e4c2c 100644 > > > > --- a/kernel/locking/qspinlock.c > > > > +++ b/kernel/locking/qspinlock.c > > > > @@ -267,123 +267,6 @@ static __always_inline u32 > > > > __pv_wait_head_or_lock(struct qspinlock *lock, > > > > #define queued_spin_lock_slowpath native_queued_spin_lock_slowpath > > > > #endif > > > > > > > > -/* > > > > - * Various notes on spin_is_locked() and spin_unlock_wait(), which are > > > > - * 'interesting' functions: > > > > - * > > > > - * PROBLEM: some architectures have an interesting issue with atomic > > > > ACQUIRE > > > > - * operations in that the ACQUIRE applies to the LOAD _not_ the STORE > > > > (ARM64, > > > > - * PPC). Also qspinlock has a similar issue per construction, the > > > > setting of > > > > - * the locked byte can be unordered acquiring the lock proper. > > > > - * > > > > - * This gets to be 'interesting' in the following cases, where the > > > > /should/s > > > > - * end up false because of this issue. > > > > - * > > > > - * > > > > - * CASE 1: > > > > - * > > > > - * So the spin_is_locked() correctness issue comes from something like: > > > > - * > > > > - * CPU0 CPU1 > > > > - * > > > > - * global_lock();local_lock(i) > > > > - * spin_lock() spin_lock([i]) > > > > - * for (i) if (!spin_is_locked()) { > > > > - * spin_unlock_wait([i]); > > > > smp_acquire__after_ctrl_dep(); > > > > - * return; > > > > - * } > > > > - * // deal with fail > > > > - * > > > > - * Where it is important CPU1 sees G locked or CPU0 sees L[i] locked > > > > such > > > > - * that there is exclusion between the two critical sections. > > > > - * > > > > - * The load from spin_is_locked() /should/ be constrained by the > > > > ACQUIRE from > > > > - * spin_lock([i]), and similarly the load(s) from > > > > spin_unlock_wait([i]) > > > > - * /should/ be constrained by the ACQUIRE from spin_lock(). > > > > - * > > > > - * Similarly, later stuff is constrained by the ACQUIRE from CTRL+RMB. > > > > > > Might be worth keeping this comment about spin_is_locked, since we're not > > > removing that guy just yet! > > > > Ah, all the examples had spin_unlock_wait() in them. So what I need to > > do is to create a spin_unlock_wait()-free example to illustrate the > > text starting with "The load from spin_is_locked(", correct? > > Yeah, I think so. > > > I also need to check all uses of spin_is_locked(). There might no > > longer be any that rely on any particular ordering... > > Right. I think we're looking for the "insane case" as per 38b850a73034 > (which was apparently used by ipc/sem.c at the time, but no longer). > > There's a usage in kernel/debug/debug_core.c, but it doesn't fill me with > joy. That is indeed an interesting one... But my first round will be what semantics the implementations seem to provide: Acquire courtesy of TSO: s390, sparc, x86. Acquire: ia64 (in reality fully ordered). Control dependency: alpha, arc, arm, blackfin, hexagon, m32r, mn10300, tile, xtensa. Control dependency plus leading full barrier: arm64, powerpc. UP-only: c6x, cris, frv, h8300, m68k, microblaze nios2, openrisc, um, unicore32. Special cases: metag: Acquire if !CONFIG_METAG_SMP_WRITE_REORDERING. Otherwise control dependency? mips: Control dependency, acquire if CONFIG_CPU_CAVIUM_OCTEON. parisc: Acquire courtesy of TSO,
Re: nf_conntrack: Infoleak via CTA_ID and CTA_EXPECT_ID
Florian, Am 30.06.2017 um 21:55 schrieb Florian Westphal: >>> Why not use a hash of the address? >> >> Would also work. Or xor it with a random number. >> >> On the other hand, for user space it would be more useful when the conntrack >> id >> does not repeat that often. That's why I favor the good old counter method. >> Currently the conntrack id is reused very fast. >> e.g. in one of our applications we use the conntack id via NFQUEUE and watch >> the >> destroy events via conntrack. It happens regularly that a new connection has >> the >> same id than a different connection we saw some moments before, before we >> receive >> the destroy event from the conntrack socket. > > Perhaps we can place that in a new extension (its not needed in any > fastpath ops)? To get rid of the infoleak we have to re-introduce the id field in struct nf_conn and struct nf_conntrack_expect. Otherwise have nothing to compare against in the conntrack/expect remove case. So the only question is what to store, IMHO a counter that can wrap around is the cheapest method and would also not harm the fast-path. Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair
On Fri, Jun 30, 2017 at 01:02:48PM -0700, Paul E. McKenney wrote: > On Fri, Jun 30, 2017 at 09:21:23PM +0200, Oleg Nesterov wrote: > > On 06/30, Paul E. McKenney wrote: > > > > > > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote: > > > > > > > > I do not think the overhead will be noticeable in this particular case. > > > > > > > > But I am not sure I understand why do we want to unlock_wait. Yes I > > > > agree, > >^ > > > > if it was not clear, I tried to say "why do we want to _remove_ > > unlock_wait". > > > > > > it has some problems, but still... > > > > > > > > The code above looks strange for me. If we are going to repeat this > > > > pattern > > > > the perhaps we should add a helper for lock+unlock and name it > > > > unlock_wait2 ;) > > > > > > > > If not, we should probably change this code more: > > > > > > This looks -much- better than my patch! May I have your Signed-off-by? > > > > Only if you promise to replace all RCU flavors with a single simple > > implementation > > based on rwlock ;) > > ;-) ;-) ;-) > > Here you go: > > https://github.com/pramalhe/ConcurrencyFreaks/blob/master/papers/poormanurcu-2015.pdf > > > Seriously, of course I won't argue, and it seems that nobody except me likes > > this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() > > and > > sometimes it makes sense. > > Well, that analogy was what led me to propose that its semantics be > defined as spin_lock() immediately followed by spin_unlock(). But that > didn't go over well. > > > Including this particular case. task_work_run() is going to flush/destroy > > the > > ->task_works list, so it needs to wait until all currently executing > > "readers" > > (task_work_cancel()'s which have started before ->task_works was updated) > > have > > completed. > > Understood! And please see below for the resulting patch and commit log. Please let me know if I broke something. Thanx, Paul commit 6c0801c9ab19fc2f4c1e2436eb1b72e0af9a317b Author: Oleg NesterovDate: Fri Jun 30 13:13:59 2017 -0700 task_work: Replace spin_unlock_wait() with lock/unlock pair There is no agreed-upon definition of spin_unlock_wait()'s semantics, and it appears that all callers could do just as well with a lock/unlock pair. This commit therefore replaces the spin_unlock_wait() call in task_work_run() with a spin_lock_irq() and a spin_unlock_irq() aruond the cmpxchg() dequeue loop. This should be safe from a performance perspective because ->pi_lock is local to the task and because calls to the other side of the race, task_work_cancel(), should be rare. Signed-off-by: Oleg Nesterov Signed-off-by: Paul E. McKenney diff --git a/kernel/task_work.c b/kernel/task_work.c index d513051fcca2..836a72a66fba 100644 --- a/kernel/task_work.c +++ b/kernel/task_work.c @@ -96,20 +96,16 @@ void task_work_run(void) * work->func() can do task_work_add(), do not set * work_exited unless the list is empty. */ + raw_spin_lock_irq(>pi_lock); do { work = READ_ONCE(task->task_works); head = !work && (task->flags & PF_EXITING) ? _exited : NULL; } while (cmpxchg(>task_works, work, head) != work); + raw_spin_unlock_irq(>pi_lock); if (!work) break; - /* -* Synchronize with task_work_cancel(). It can't remove -* the first entry == work, cmpxchg(task_works) should -* fail, but it can play with *work and other entries. -*/ - raw_spin_unlock_wait(>pi_lock); do { next = work->next; -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair
On Fri, Jun 30, 2017 at 03:50:33PM -0400, Alan Stern wrote: > On Fri, 30 Jun 2017, Oleg Nesterov wrote: > > > On 06/30, Paul E. McKenney wrote: > > > > > > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote: > > > > > > > > I do not think the overhead will be noticeable in this particular case. > > > > > > > > But I am not sure I understand why do we want to unlock_wait. Yes I > > > > agree, > >^ > > > > if it was not clear, I tried to say "why do we want to _remove_ > > unlock_wait". > > > > > > it has some problems, but still... > > > > > > > > The code above looks strange for me. If we are going to repeat this > > > > pattern > > > > the perhaps we should add a helper for lock+unlock and name it > > > > unlock_wait2 ;) > > > > > > > > If not, we should probably change this code more: > > > > > > This looks -much- better than my patch! May I have your Signed-off-by? > > > > Only if you promise to replace all RCU flavors with a single simple > > implementation > > based on rwlock ;) > > > > Seriously, of course I won't argue, and it seems that nobody except me likes > > this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() > > and > > sometimes it makes sense. > > If it looks like synchronize_rcu(), why not actually use > synchronize_rcu()? My guess is that the latencies of synchronize_rcu() don't suit his needs. When the lock is not held, spin_unlock_wait() is quite fast, even compared to expedited grace periods. Thanx, Paul > Alan Stern > > > Including this particular case. task_work_run() is going to flush/destroy > > the > > ->task_works list, so it needs to wait until all currently executing > > "readers" > > (task_work_cancel()'s which have started before ->task_works was updated) > > have > > completed. > -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair
On Fri, Jun 30, 2017 at 09:21:23PM +0200, Oleg Nesterov wrote: > On 06/30, Paul E. McKenney wrote: > > > > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote: > > > > > > I do not think the overhead will be noticeable in this particular case. > > > > > > But I am not sure I understand why do we want to unlock_wait. Yes I agree, >^ > > if it was not clear, I tried to say "why do we want to _remove_ unlock_wait". > > > > it has some problems, but still... > > > > > > The code above looks strange for me. If we are going to repeat this > > > pattern > > > the perhaps we should add a helper for lock+unlock and name it > > > unlock_wait2 ;) > > > > > > If not, we should probably change this code more: > > > > This looks -much- better than my patch! May I have your Signed-off-by? > > Only if you promise to replace all RCU flavors with a single simple > implementation > based on rwlock ;) ;-) ;-) ;-) Here you go: https://github.com/pramalhe/ConcurrencyFreaks/blob/master/papers/poormanurcu-2015.pdf > Seriously, of course I won't argue, and it seems that nobody except me likes > this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() and > sometimes it makes sense. Well, that analogy was what led me to propose that its semantics be defined as spin_lock() immediately followed by spin_unlock(). But that didn't go over well. > Including this particular case. task_work_run() is going to flush/destroy the > ->task_works list, so it needs to wait until all currently executing "readers" > (task_work_cancel()'s which have started before ->task_works was updated) have > completed. Understood! Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nf_conntrack: Infoleak via CTA_ID and CTA_EXPECT_ID
Richard Weinbergerwrote: > Florian, > > Am 30.06.2017 um 21:35 schrieb Florian Westphal: > > Richard Weinberger wrote: > >> Hi! > >> > >> I noticed that nf_conntrack leaks kernel addresses, it uses the memory > >> address > >> as identifier used for generating conntrack and expect ids.. > >> Since these ids are also visible to unprivileged users via network > >> namespaces > >> I suggest reverting these commits: > > > > Why not use a hash of the address? > > Would also work. Or xor it with a random number. > > On the other hand, for user space it would be more useful when the conntrack > id > does not repeat that often. That's why I favor the good old counter method. > Currently the conntrack id is reused very fast. > e.g. in one of our applications we use the conntack id via NFQUEUE and watch > the > destroy events via conntrack. It happens regularly that a new connection has > the > same id than a different connection we saw some moments before, before we > receive > the destroy event from the conntrack socket. Perhaps we can place that in a new extension (its not needed in any fastpath ops)? -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair
On Fri, 30 Jun 2017, Oleg Nesterov wrote: > On 06/30, Paul E. McKenney wrote: > > > > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote: > > > > > > I do not think the overhead will be noticeable in this particular case. > > > > > > But I am not sure I understand why do we want to unlock_wait. Yes I agree, >^ > > if it was not clear, I tried to say "why do we want to _remove_ unlock_wait". > > > > it has some problems, but still... > > > > > > The code above looks strange for me. If we are going to repeat this > > > pattern > > > the perhaps we should add a helper for lock+unlock and name it > > > unlock_wait2 ;) > > > > > > If not, we should probably change this code more: > > > > This looks -much- better than my patch! May I have your Signed-off-by? > > Only if you promise to replace all RCU flavors with a single simple > implementation > based on rwlock ;) > > Seriously, of course I won't argue, and it seems that nobody except me likes > this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() and > sometimes it makes sense. If it looks like synchronize_rcu(), why not actually use synchronize_rcu()? Alan Stern > Including this particular case. task_work_run() is going to flush/destroy the > ->task_works list, so it needs to wait until all currently executing "readers" > (task_work_cancel()'s which have started before ->task_works was updated) have > completed. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nf_conntrack: Infoleak via CTA_ID and CTA_EXPECT_ID
Florian, Am 30.06.2017 um 21:35 schrieb Florian Westphal: > Richard Weinbergerwrote: >> Hi! >> >> I noticed that nf_conntrack leaks kernel addresses, it uses the memory >> address >> as identifier used for generating conntrack and expect ids.. >> Since these ids are also visible to unprivileged users via network namespaces >> I suggest reverting these commits: > > Why not use a hash of the address? Would also work. Or xor it with a random number. On the other hand, for user space it would be more useful when the conntrack id does not repeat that often. That's why I favor the good old counter method. Currently the conntrack id is reused very fast. e.g. in one of our applications we use the conntack id via NFQUEUE and watch the destroy events via conntrack. It happens regularly that a new connection has the same id than a different connection we saw some moments before, before we receive the destroy event from the conntrack socket. Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
nf_conntrack: Infoleak via CTA_ID and CTA_EXPECT_ID
Hi! I noticed that nf_conntrack leaks kernel addresses, it uses the memory address as identifier used for generating conntrack and expect ids.. Since these ids are also visible to unprivileged users via network namespaces I suggest reverting these commits: commit 7f85f914721ffcef382a57995182916bd43d8a65 Author: Patrick McHardyDate: Fri Sep 28 14:41:27 2007 -0700 [NETFILTER]: nf_conntrack: kill unique ID Remove the per-conntrack ID, its not necessary anymore for dumping. For compatiblity reasons we send the address of the conntrack to userspace as ID. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller commit 3583240249ef354760e04ae49bd7b462a638f40c Author: Patrick McHardy Date: Fri Sep 28 14:41:50 2007 -0700 [NETFILTER]: nf_conntrack_expect: kill unique ID Similar to the conntrack ID, the per-expectation ID is not needed anymore, kill it. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair
On 06/30, Paul E. McKenney wrote: > > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote: > > > > I do not think the overhead will be noticeable in this particular case. > > > > But I am not sure I understand why do we want to unlock_wait. Yes I agree, ^ if it was not clear, I tried to say "why do we want to _remove_ unlock_wait". > > it has some problems, but still... > > > > The code above looks strange for me. If we are going to repeat this pattern > > the perhaps we should add a helper for lock+unlock and name it unlock_wait2 > > ;) > > > > If not, we should probably change this code more: > > This looks -much- better than my patch! May I have your Signed-off-by? Only if you promise to replace all RCU flavors with a single simple implementation based on rwlock ;) Seriously, of course I won't argue, and it seems that nobody except me likes this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() and sometimes it makes sense. Including this particular case. task_work_run() is going to flush/destroy the ->task_works list, so it needs to wait until all currently executing "readers" (task_work_cancel()'s which have started before ->task_works was updated) have completed. Oleg. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH nf-next 3/3] netfilter: exthdr: early set support
This allows setting 2 and 4 byte quantities in the tcp option space. This allows native replacement for xt_TCPMSS in nf_tables. Todo: -TCPOPTSTRIP equivalent -possibly restrict write support so we can't generate invalid options (e.g., don't allow write to kind,len). Signed-off-by: Florian Westphal--- include/uapi/linux/netfilter/nf_tables.h | 4 +- net/netfilter/nft_exthdr.c | 161 ++- 2 files changed, 162 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index 683f6f88fcac..4766f5091b5f 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -731,7 +731,8 @@ enum nft_exthdr_op { * @NFTA_EXTHDR_OFFSET: extension header offset (NLA_U32) * @NFTA_EXTHDR_LEN: extension header length (NLA_U32) * @NFTA_EXTHDR_FLAGS: extension header flags (NLA_U32) - * @NFTA_EXTHDR_OP: option match type (NLA_U8) + * @NFTA_EXTHDR_OP: option match type (NLA_U32) + * @NFTA_EXTHDR_SREG: option match type (NLA_U32) */ enum nft_exthdr_attributes { NFTA_EXTHDR_UNSPEC, @@ -741,6 +742,7 @@ enum nft_exthdr_attributes { NFTA_EXTHDR_LEN, NFTA_EXTHDR_FLAGS, NFTA_EXTHDR_OP, + NFTA_EXTHDR_SREG, __NFTA_EXTHDR_MAX }; #define NFTA_EXTHDR_MAX(__NFTA_EXTHDR_MAX - 1) diff --git a/net/netfilter/nft_exthdr.c b/net/netfilter/nft_exthdr.c index e3a6eebe7e0c..214211699f8d 100644 --- a/net/netfilter/nft_exthdr.c +++ b/net/netfilter/nft_exthdr.c @@ -8,6 +8,7 @@ * Development of this code funded by Astaro AG (http://www.astaro.com/) */ +#include #include #include #include @@ -23,6 +24,7 @@ struct nft_exthdr { u8 len; u8 op; enum nft_registers dreg:8; + enum nft_registers sreg:8; u8 flags; }; @@ -124,6 +126,88 @@ static void nft_exthdr_tcp_eval(const struct nft_expr *expr, regs->verdict.code = NFT_BREAK; } +static void nft_exthdr_tcp_set_eval(const struct nft_expr *expr, + struct nft_regs *regs, + const struct nft_pktinfo *pkt) +{ + u8 buff[sizeof(struct tcphdr) + MAX_TCP_OPTION_SPACE]; + struct nft_exthdr *priv = nft_expr_priv(expr); + unsigned int i, optl, tcphdr_len, offset; + struct tcphdr *tcph; + u8 *opt; + u32 src; + + tcph = nft_tcp_header_pointer(pkt, sizeof(buff), buff, _len); + if (!tcph) + return; + + opt = (u8 *)tcph; + for (i = sizeof(*tcph); i < tcphdr_len - 1; i += optl) { + union { + u8 octet; + __be16 v16; + __be32 v32; + } old, new; + + optl = optlen(opt, i); + + if (priv->type != opt[i]) + continue; + + if (i + optl > tcphdr_len || priv->len + priv->offset > optl) + return; + + if (!skb_make_writable(pkt->skb, pkt->xt.thoff + i + priv->len)) + return; + + tcph = nft_tcp_header_pointer(pkt, sizeof(buff), buff, + _len); + if (!tcph) + return; + + src = regs->data[priv->sreg]; + offset = i + priv->offset; + + switch (priv->len) { + case 2: + old.v16 = get_unaligned((u16 *)(opt + offset)); + new.v16 = src; + + if (old.v16 == new.v16) + return; + + switch (priv->type) { + case TCPOPT_MSS: + /* increase can cause connection to stall */ + if (ntohs(old.v16) <= ntohs(new.v16)) + return; + break; + } + + put_unaligned(new.v16, (u16*)(opt + offset)); + inet_proto_csum_replace2(>check, pkt->skb, +old.v16, new.v16, false); + break; + case 4: + new.v32 = src; + old.v32 = get_unaligned((u32 *)(opt + offset)); + + if (old.v32 == new.v32) + return; + + put_unaligned(new.v32, (u32*)(opt + offset)); + inet_proto_csum_replace4(>check, pkt->skb, +old.v32, new.v32, false); + break; + default: + WARN_ON_ONCE(1); + break; + } + + return; + } +} + static const
Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair
On Fri, Jun 30, 2017 at 09:16:07AM -0700, Paul E. McKenney wrote: > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote: > > On 06/30, Paul E. McKenney wrote: > > > > > > > > + raw_spin_lock_irq(>pi_lock); > > > > > + raw_spin_unlock_irq(>pi_lock); > > > > > > I agree that the spin_unlock_wait() implementations would avoid the > > > deadlock with an acquisition from an interrupt handler, while also > > > avoiding the need to momentarily disable interrupts. The ->pi_lock is > > > a per-task lock, so I am assuming (perhaps naively) that contention is > > > not a problem. So is the overhead of interrupt disabling likely to be > > > noticeable here? > > > > I do not think the overhead will be noticeable in this particular case. > > > > But I am not sure I understand why do we want to unlock_wait. Yes I agree, > > it has some problems, but still... Well, I tried documenting exactly what it did and did not do, which got an ack from Peter. https://marc.info/?l=linux-kernel=149575078313105 However, my later pull request spawned a bit of discussion: https://marc.info/?l=linux-kernel=149730349001044 This discussion led me to propose strengthening spin_unlock_wait() to act as a lock/unlock pair. This can be implemented on x86 as an smp_mb() followed by a read-only spinloop, as shown on branch spin_unlock_wait.2017.06.23a on my -rcu tree. Linus was not amused, and said that if we were going to make spin_unlock_wait() have the semantics of lock+unlock, we should just open-code that, especially given that there are way more definitions of spin_unlock_wait() than there are uses. He also suggested making spin_unlock_wait() have only acquire semantics (x86 spin loop with no memory-barrier instructions) and add explicit barriers where required. https://marc.info/?l=linux-kernel=149860012913036 I did a series for this which may be found on branch spin_unlock_wait.2017.06.27a on my -rcu tree. This approach was not loved by others (see later on the above thread), and Linus's reply (which reiterated his opposition to lock+unlock semantics) suggested the possibility of removing spin_unlock_wait() entirely. https://marc.info/?l=linux-kernel=149869476911620 So I figured, in for a penny, in for a pound, and therefore did the series that includes this patch. The most recent update (which does not yet include your improved version) is on branch spin_unlock_wait.2017.06.30b of my -rcu tree. Hey, you asked! ;-) Thanx, Paul > > The code above looks strange for me. If we are going to repeat this pattern > > the perhaps we should add a helper for lock+unlock and name it unlock_wait2 > > ;) > > > > If not, we should probably change this code more: > > This looks -much- better than my patch! May I have your Signed-off-by? > > Thanx, Paul > > > --- a/kernel/task_work.c > > +++ b/kernel/task_work.c > > @@ -96,20 +96,16 @@ void task_work_run(void) > > * work->func() can do task_work_add(), do not set > > * work_exited unless the list is empty. > > */ > > + raw_spin_lock_irq(>pi_lock); > > do { > > work = READ_ONCE(task->task_works); > > head = !work && (task->flags & PF_EXITING) ? > > _exited : NULL; > > } while (cmpxchg(>task_works, work, head) != work); > > + raw_spin_unlock_irq(>pi_lock); > > > > if (!work) > > break; > > - /* > > -* Synchronize with task_work_cancel(). It can't remove > > -* the first entry == work, cmpxchg(task_works) should > > -* fail, but it can play with *work and other entries. > > -*/ > > - raw_spin_unlock_wait(>pi_lock); > > > > do { > > next = work->next; > > > > performance-wise this is almost the same, and if we do not really care about > > overhead we can simplify the code: this way it is obvious that we can't race > > with task_work_cancel(). > > > > Oleg. > > -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair
On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote: > On 06/30, Paul E. McKenney wrote: > > > > > > + raw_spin_lock_irq(>pi_lock); > > > > + raw_spin_unlock_irq(>pi_lock); > > > > I agree that the spin_unlock_wait() implementations would avoid the > > deadlock with an acquisition from an interrupt handler, while also > > avoiding the need to momentarily disable interrupts. The ->pi_lock is > > a per-task lock, so I am assuming (perhaps naively) that contention is > > not a problem. So is the overhead of interrupt disabling likely to be > > noticeable here? > > I do not think the overhead will be noticeable in this particular case. > > But I am not sure I understand why do we want to unlock_wait. Yes I agree, > it has some problems, but still... > > The code above looks strange for me. If we are going to repeat this pattern > the perhaps we should add a helper for lock+unlock and name it unlock_wait2 ;) > > If not, we should probably change this code more: This looks -much- better than my patch! May I have your Signed-off-by? Thanx, Paul > --- a/kernel/task_work.c > +++ b/kernel/task_work.c > @@ -96,20 +96,16 @@ void task_work_run(void) >* work->func() can do task_work_add(), do not set >* work_exited unless the list is empty. >*/ > + raw_spin_lock_irq(>pi_lock); > do { > work = READ_ONCE(task->task_works); > head = !work && (task->flags & PF_EXITING) ? > _exited : NULL; > } while (cmpxchg(>task_works, work, head) != work); > + raw_spin_unlock_irq(>pi_lock); > > if (!work) > break; > - /* > - * Synchronize with task_work_cancel(). It can't remove > - * the first entry == work, cmpxchg(task_works) should > - * fail, but it can play with *work and other entries. > - */ > - raw_spin_unlock_wait(>pi_lock); > > do { > next = work->next; > > performance-wise this is almost the same, and if we do not really care about > overhead we can simplify the code: this way it is obvious that we can't race > with task_work_cancel(). > > Oleg. > -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] netfilter: nfnetlink: Improve input length sanitization in nfnetlink_rcv
On Thu, Jun 29, 2017 at 6:22 PM, Pablo Neira Ayusowrote: > On Tue, Jun 27, 2017 at 07:05:27PM +0200, Pablo Neira Ayuso wrote: >> On Tue, Jun 27, 2017 at 05:58:25PM +0200, Pablo Neira Ayuso wrote: >> > On Wed, Jun 07, 2017 at 03:50:38PM +0200, Mateusz Jurczyk wrote: >> > > Verify that the length of the socket buffer is sufficient to cover the >> > > nlmsghdr structure before accessing the nlh->nlmsg_len field for further >> > > input sanitization. If the client only supplies 1-3 bytes of data in >> > > sk_buff, then nlh->nlmsg_len remains partially uninitialized and >> > > contains leftover memory from the corresponding kernel allocation. >> > > Operating on such data may result in indeterminate evaluation of the >> > > nlmsg_len < NLMSG_HDRLEN expression. >> > > >> > > The bug was discovered by a runtime instrumentation designed to detect >> > > use of uninitialized memory in the kernel. The patch prevents this and >> > > other similar tools (e.g. KMSAN) from flagging this behavior in the >> > > future. >> > >> > Applied, thanks. >> >> Wait, I keeping this back after closer look. >> >> I think we have to remove this: >> >> if (nlh->nlmsg_len < NLMSG_HDRLEN || <--- >> skb->len < NLMSG_HDRLEN + sizeof(struct nfgenmsg)) >> return; >> >> in nfnetlink_rcv_skb_batch() >> >> now that we make this unfront check from nfnetlink_rcv(). > > BTW, I can just mangle your patch here to delete such line to speed up > things. See the mangled patch that is attached to this email. Sure, I think the condition in nfnetlink_rcv_skb_batch() can be now safely removed. Feel free to proceed with the mangled patch. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 08/26] locking: Remove spin_unlock_wait() generic definitions
On Fri, Jun 30, 2017 at 05:38:15AM -0700, Paul E. McKenney wrote: > On Fri, Jun 30, 2017 at 10:19:29AM +0100, Will Deacon wrote: > > On Thu, Jun 29, 2017 at 05:01:16PM -0700, Paul E. McKenney wrote: > > > There is no agreed-upon definition of spin_unlock_wait()'s semantics, > > > and it appears that all callers could do just as well with a lock/unlock > > > pair. This commit therefore removes spin_unlock_wait() and related > > > definitions from core code. > > > > > > Signed-off-by: Paul E. McKenney> > > Cc: Arnd Bergmann > > > Cc: Ingo Molnar > > > Cc: Will Deacon > > > Cc: Peter Zijlstra > > > Cc: Alan Stern > > > Cc: Andrea Parri > > > Cc: Linus Torvalds > > > --- > > > include/asm-generic/qspinlock.h | 14 - > > > include/linux/spinlock.h| 31 --- > > > include/linux/spinlock_up.h | 6 --- > > > kernel/locking/qspinlock.c | 117 > > > > > > 4 files changed, 168 deletions(-) > > > > [...] > > > > > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c > > > index b2caec7315af..64a9051e4c2c 100644 > > > --- a/kernel/locking/qspinlock.c > > > +++ b/kernel/locking/qspinlock.c > > > @@ -267,123 +267,6 @@ static __always_inline u32 > > > __pv_wait_head_or_lock(struct qspinlock *lock, > > > #define queued_spin_lock_slowpathnative_queued_spin_lock_slowpath > > > #endif > > > > > > -/* > > > - * Various notes on spin_is_locked() and spin_unlock_wait(), which are > > > - * 'interesting' functions: > > > - * > > > - * PROBLEM: some architectures have an interesting issue with atomic > > > ACQUIRE > > > - * operations in that the ACQUIRE applies to the LOAD _not_ the STORE > > > (ARM64, > > > - * PPC). Also qspinlock has a similar issue per construction, the > > > setting of > > > - * the locked byte can be unordered acquiring the lock proper. > > > - * > > > - * This gets to be 'interesting' in the following cases, where the > > > /should/s > > > - * end up false because of this issue. > > > - * > > > - * > > > - * CASE 1: > > > - * > > > - * So the spin_is_locked() correctness issue comes from something like: > > > - * > > > - * CPU0CPU1 > > > - * > > > - * global_lock(); local_lock(i) > > > - * spin_lock() spin_lock([i]) > > > - * for (i) if (!spin_is_locked()) { > > > - * spin_unlock_wait([i]); > > > smp_acquire__after_ctrl_dep(); > > > - * return; > > > - * } > > > - * // deal with fail > > > - * > > > - * Where it is important CPU1 sees G locked or CPU0 sees L[i] locked such > > > - * that there is exclusion between the two critical sections. > > > - * > > > - * The load from spin_is_locked() /should/ be constrained by the > > > ACQUIRE from > > > - * spin_lock([i]), and similarly the load(s) from > > > spin_unlock_wait([i]) > > > - * /should/ be constrained by the ACQUIRE from spin_lock(). > > > - * > > > - * Similarly, later stuff is constrained by the ACQUIRE from CTRL+RMB. > > > > Might be worth keeping this comment about spin_is_locked, since we're not > > removing that guy just yet! > > Ah, all the examples had spin_unlock_wait() in them. So what I need to > do is to create a spin_unlock_wait()-free example to illustrate the > text starting with "The load from spin_is_locked(", correct? Yeah, I think so. > I also need to check all uses of spin_is_locked(). There might no > longer be any that rely on any particular ordering... Right. I think we're looking for the "insane case" as per 38b850a73034 (which was apparently used by ipc/sem.c at the time, but no longer). There's a usage in kernel/debug/debug_core.c, but it doesn't fill me with joy. Will -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair
On Fri, Jun 30, 2017 at 01:04:45PM +0200, Oleg Nesterov wrote: > On 06/29, Paul E. McKenney wrote: > > > > --- a/kernel/task_work.c > > +++ b/kernel/task_work.c > > @@ -109,7 +109,8 @@ void task_work_run(void) > > * the first entry == work, cmpxchg(task_works) should > > * fail, but it can play with *work and other entries. > > */ > > - raw_spin_unlock_wait(>pi_lock); > > + raw_spin_lock(>pi_lock); > > + raw_spin_unlock(>pi_lock); > > Well, bit the you need spin_lock_irq(). And this is one of the reasons > why I personally think unlock_wait have some sense... Good catch, and I clearly need to double-check the other commits for any need to disable interrupts. Anyway, like this, with the addition of a flags variable, correct? > > + raw_spin_lock_irq(>pi_lock); > > + raw_spin_unlock_irq(>pi_lock); I agree that the spin_unlock_wait() implementations would avoid the deadlock with an acquisition from an interrupt handler, while also avoiding the need to momentarily disable interrupts. The ->pi_lock is a per-task lock, so I am assuming (perhaps naively) that contention is not a problem. So is the overhead of interrupt disabling likely to be noticeable here? Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 08/26] locking: Remove spin_unlock_wait() generic definitions
On Fri, Jun 30, 2017 at 10:19:29AM +0100, Will Deacon wrote: > On Thu, Jun 29, 2017 at 05:01:16PM -0700, Paul E. McKenney wrote: > > There is no agreed-upon definition of spin_unlock_wait()'s semantics, > > and it appears that all callers could do just as well with a lock/unlock > > pair. This commit therefore removes spin_unlock_wait() and related > > definitions from core code. > > > > Signed-off-by: Paul E. McKenney> > Cc: Arnd Bergmann > > Cc: Ingo Molnar > > Cc: Will Deacon > > Cc: Peter Zijlstra > > Cc: Alan Stern > > Cc: Andrea Parri > > Cc: Linus Torvalds > > --- > > include/asm-generic/qspinlock.h | 14 - > > include/linux/spinlock.h| 31 --- > > include/linux/spinlock_up.h | 6 --- > > kernel/locking/qspinlock.c | 117 > > > > 4 files changed, 168 deletions(-) > > [...] > > > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c > > index b2caec7315af..64a9051e4c2c 100644 > > --- a/kernel/locking/qspinlock.c > > +++ b/kernel/locking/qspinlock.c > > @@ -267,123 +267,6 @@ static __always_inline u32 > > __pv_wait_head_or_lock(struct qspinlock *lock, > > #define queued_spin_lock_slowpath native_queued_spin_lock_slowpath > > #endif > > > > -/* > > - * Various notes on spin_is_locked() and spin_unlock_wait(), which are > > - * 'interesting' functions: > > - * > > - * PROBLEM: some architectures have an interesting issue with atomic > > ACQUIRE > > - * operations in that the ACQUIRE applies to the LOAD _not_ the STORE > > (ARM64, > > - * PPC). Also qspinlock has a similar issue per construction, the setting > > of > > - * the locked byte can be unordered acquiring the lock proper. > > - * > > - * This gets to be 'interesting' in the following cases, where the > > /should/s > > - * end up false because of this issue. > > - * > > - * > > - * CASE 1: > > - * > > - * So the spin_is_locked() correctness issue comes from something like: > > - * > > - * CPU0 CPU1 > > - * > > - * global_lock();local_lock(i) > > - * spin_lock() spin_lock([i]) > > - * for (i) if (!spin_is_locked()) { > > - * spin_unlock_wait([i]); smp_acquire__after_ctrl_dep(); > > - * return; > > - * } > > - * // deal with fail > > - * > > - * Where it is important CPU1 sees G locked or CPU0 sees L[i] locked such > > - * that there is exclusion between the two critical sections. > > - * > > - * The load from spin_is_locked() /should/ be constrained by the ACQUIRE > > from > > - * spin_lock([i]), and similarly the load(s) from spin_unlock_wait([i]) > > - * /should/ be constrained by the ACQUIRE from spin_lock(). > > - * > > - * Similarly, later stuff is constrained by the ACQUIRE from CTRL+RMB. > > Might be worth keeping this comment about spin_is_locked, since we're not > removing that guy just yet! Ah, all the examples had spin_unlock_wait() in them. So what I need to do is to create a spin_unlock_wait()-free example to illustrate the text starting with "The load from spin_is_locked(", correct? I also need to check all uses of spin_is_locked(). There might no longer be any that rely on any particular ordering... Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 03/26] sched: Replace spin_unlock_wait() with lock/unlock pair
On Fri, Jun 30, 2017 at 12:31:50PM +0200, Arnd Bergmann wrote: > On Fri, Jun 30, 2017 at 2:01 AM, Paul E. McKenney >wrote: > > There is no agreed-upon definition of spin_unlock_wait()'s semantics, > > and it appears that all callers could do just as well with a lock/unlock > > pair. This commit therefore replaces the spin_unlock_wait() call in > > do_task_dead() with spin_lock() followed immediately by spin_unlock(). > > This should be safe from a performance perspective because the lock is > > this tasks ->pi_lock, and this is called only after the task exits. > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index e91138fcde86..6dea3d9728c8 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -3461,7 +3461,8 @@ void __noreturn do_task_dead(void) > > * is held by try_to_wake_up() > > */ > > smp_mb(); > > - raw_spin_unlock_wait(>pi_lock); > > + raw_spin_lock(>pi_lock); > > + raw_spin_unlock(>pi_lock); > > Does the raw_spin_lock()/raw_spin_unlock() imply an smp_mb() or stronger? > Maybe it would be clearer to remove the extra barrier if so. No, it does not in general, but it does on most architectures, and there are ways to allow those architectures to gain the benefit of their stronger locks. For example, would this work? > > * is held by try_to_wake_up() > > */ > > - smp_mb(); > > - raw_spin_unlock_wait(>pi_lock); > > + smp_mb__before_spinlock(); > > + raw_spin_lock(>pi_lock); > > + raw_spin_unlock(>pi_lock); Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] Netfilter (+folks) userday in Faro, Portugal (Monday, July 3th)
Hi! We're releasing today the planning for our Netfilter (and folks) userday in Faro, Portugal. This is the pre-event that traditionally come before the Netfilter developer Workshop. This is the planning: 9h: Welcoming (Pablo Neira) 9h15: Suricata and its interaction with Netfilter (Eric Leblond) 10h:Coffe break 10h15: OVS and OVN (Joe Stringer) 10h50: Introduction to kasan and syzkaller (Florian Westphal) 11h25: Netfilter logging (Eric Leblond) 12h:Lunch 14: Linux network stack walkthrough (Florian Westphal) 14h50: Hands-on: migrate your laptop to nftables (Netfilter coreteam) 15h:Coffe break 15h10: TCP/IP is Boring: A tour of cellular protocol stacks (Harald Welte) 16h:Closing This event will take place in the Campus da Penha, in the University of Algarve, Building 9, room 2.25 (second floor). This event is public, free (no cost) and it requires no previous invitation. If you're in the area, you are welcome to join us. Thanks! P.S: Please, note that the Netfilter developer workshop starts on July 4th and it requires an invitation to attend indeed. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 25/29] netfilter, kbuild: use canonical method to specify objs.
From: Pablo Neira Ayuso > Sent: 29 June 2017 23:53 > Should use ":=" instead of "+=". ... > diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile > index c9b78e7b342f..913380919301 100644 > --- a/net/netfilter/Makefile > +++ b/net/netfilter/Makefile > @@ -70,10 +70,9 @@ obj-$(CONFIG_NETFILTER_SYNPROXY) += nf_synproxy_core.o > obj-$(CONFIG_NF_DUP_NETDEV) += nf_dup_netdev.o > > # nf_tables > -nf_tables-objs += nf_tables_core.o nf_tables_api.o nf_tables_trace.o > -nf_tables-objs += nft_immediate.o nft_cmp.o nft_range.o > -nf_tables-objs += nft_bitwise.o nft_byteorder.o nft_payload.o > -nf_tables-objs += nft_lookup.o nft_dynset.o > +nf_tables-objs := nf_tables_core.o nf_tables_api.o nf_tables_trace.o \ > + nft_immediate.o nft_cmp.o nft_range.o nft_bitwise.o \ > + nft_byteorder.o nft_payload.o nft_lookup.o nft_dynset.o Why? My preference is to add each file on its own line. Generates much less churn when files are added or removed. David -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair
On 06/29, Paul E. McKenney wrote: > > --- a/kernel/task_work.c > +++ b/kernel/task_work.c > @@ -109,7 +109,8 @@ void task_work_run(void) >* the first entry == work, cmpxchg(task_works) should >* fail, but it can play with *work and other entries. >*/ > - raw_spin_unlock_wait(>pi_lock); > + raw_spin_lock(>pi_lock); > + raw_spin_unlock(>pi_lock); Well, bit the you need spin_lock_irq(). And this is one of the reasons why I personally think unlock_wait have some sense... Oleg. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH nft 2/3] include: Remove __init macro definition.
On Fri, Jun 30, 2017 at 02:57:12PM +0530, Varsha Rao wrote: > Add nft_init function, which calls _init functions in main.c file. > Remove __init macro definition as libnftables library will be created > soon. Rename realm_table_init() function to avoid ambiguity as > realm_table_rt_init() and realm_table_meta_init() in rt.c and meta.c > files. > > Signed-off-by: Varsha Rao> --- > include/nftables.h | 9 + > include/utils.h| 1 - > src/ct.c | 2 +- > src/datatype.c | 2 +- > src/gmputil.c | 2 +- > src/main.c | 15 +++ > src/meta.c | 4 ++-- > src/netlink.c | 2 +- > src/rt.c | 2 +- > src/xt.c | 2 +- > 10 files changed, 32 insertions(+), 9 deletions(-) > > diff --git a/include/nftables.h b/include/nftables.h > index 26fd344..b188b9e 100644 > --- a/include/nftables.h > +++ b/include/nftables.h > @@ -117,5 +117,14 @@ struct parser_state; > > int nft_run(struct nft_ctx *nft, void *scanner, struct parser_state *state, > struct list_head *msgs); > +void ct_label_table_init(void); > +void mark_table_init(void); > +void gmp_init(void); > +void realm_table_rt_init(void); > +void devgroup_table_init(void); > +void netlink_open_sock(void); OK, so before I apply this, I would like that we remove netlink_open_sock() from nft_init(). Could you make a patch that does the following (in steps): 1) Add a new struct mnl_socket * field to struct nft_ctx, you can name this new field as 'nf_sock'. 2) Call netlink_open_sock() from nft_netlink(), at the very beginning, so you set ctx->nf_sock. 3) Use ctx->nf_sock everywhere in src/netlink.c and src/mnl.c, so this is not global anymore. All this in one single patch. As a result, we don't need to call netlink_open_sock() from nft_init() anymore. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH nft 1/3] include: Remove datatype_register().
On Fri, Jun 30, 2017 at 02:56:19PM +0530, Varsha Rao wrote: > Remove datatype_register() function and its calling __init functions. > Add arguments of datatype_register() function to datatype array. Applied, thanks Varsha. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 03/26] sched: Replace spin_unlock_wait() with lock/unlock pair
On Fri, Jun 30, 2017 at 2:01 AM, Paul E. McKenneywrote: > There is no agreed-upon definition of spin_unlock_wait()'s semantics, > and it appears that all callers could do just as well with a lock/unlock > pair. This commit therefore replaces the spin_unlock_wait() call in > do_task_dead() with spin_lock() followed immediately by spin_unlock(). > This should be safe from a performance perspective because the lock is > this tasks ->pi_lock, and this is called only after the task exits. > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index e91138fcde86..6dea3d9728c8 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3461,7 +3461,8 @@ void __noreturn do_task_dead(void) > * is held by try_to_wake_up() > */ > smp_mb(); > - raw_spin_unlock_wait(>pi_lock); > + raw_spin_lock(>pi_lock); > + raw_spin_unlock(>pi_lock); Does the raw_spin_lock()/raw_spin_unlock() imply an smp_mb() or stronger? Maybe it would be clearer to remove the extra barrier if so. Arnd -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH nft 3/3] include: Remove __exit macro definition.
Add nft_exit function, which calls _exit functions in main.c file. Remove __exit macro definition as libnftables library will be created soon. Rename realm_table_exit() function to avoid ambiguity as realm_table_rt_exit() and realm_table_meta_exit() in rt.c and meta.c files. Signed-off-by: Varsha Rao--- include/nftables.h | 9 + include/utils.h| 1 - src/ct.c | 2 +- src/datatype.c | 2 +- src/main.c | 11 +++ src/meta.c | 4 ++-- src/netlink.c | 2 +- src/rt.c | 2 +- 8 files changed, 26 insertions(+), 7 deletions(-) diff --git a/include/nftables.h b/include/nftables.h index b188b9e..5d1a783 100644 --- a/include/nftables.h +++ b/include/nftables.h @@ -117,6 +117,7 @@ struct parser_state; int nft_run(struct nft_ctx *nft, void *scanner, struct parser_state *state, struct list_head *msgs); + void ct_label_table_init(void); void mark_table_init(void); void gmp_init(void); @@ -127,4 +128,12 @@ void realm_table_meta_init(void); void xt_init(void); void nft_init(void); +void ct_label_table_exit(void); +void mark_table_exit(void); +void realm_table_meta_exit(void); +void devgroup_table_exit(void); +void netlink_close_sock(void); +void realm_table_rt_exit(void); +void nft_exit(void); + #endif /* NFTABLES_NFTABLES_H */ diff --git a/include/utils.h b/include/utils.h index 0c3341b..0605eee 100644 --- a/include/utils.h +++ b/include/utils.h @@ -32,7 +32,6 @@ #define __gmp_fmtstring(x, y) #endif -#define __exit __attribute__((destructor)) #define __must_check __attribute__((warn_unused_result)) #define __noreturn __attribute__((__noreturn__)) diff --git a/src/ct.c b/src/ct.c index 25efc70..d64f467 100644 --- a/src/ct.c +++ b/src/ct.c @@ -210,7 +210,7 @@ void ct_label_table_init(void) ct_label_tbl = rt_symbol_table_init(CONNLABEL_CONF); } -static void __exit ct_label_table_exit(void) +void ct_label_table_exit(void) { rt_symbol_table_free(ct_label_tbl); } diff --git a/src/datatype.c b/src/datatype.c index 4f74c06..0d5ba51 100644 --- a/src/datatype.c +++ b/src/datatype.c @@ -722,7 +722,7 @@ void mark_table_init(void) mark_tbl = rt_symbol_table_init("/etc/iproute2/rt_marks"); } -static void __exit mark_table_exit(void) +void mark_table_exit(void) { rt_symbol_table_free(mark_tbl); } diff --git a/src/main.c b/src/main.c index 301bce0..5da8bee 100644 --- a/src/main.c +++ b/src/main.c @@ -276,6 +276,16 @@ void nft_init(void) #endif } +void nft_exit(void) +{ + netlink_close_sock(); + ct_label_table_exit(); + realm_table_rt_exit(); + devgroup_table_exit(); + realm_table_meta_exit(); + mark_table_exit(); +} + int main(int argc, char * const *argv) { struct parser_state state; @@ -412,6 +422,7 @@ out: xfree(buf); cache_release(); iface_cache_release(); + nft_exit(); return rc; } diff --git a/src/meta.c b/src/meta.c index 4fb26d5..9c80893 100644 --- a/src/meta.c +++ b/src/meta.c @@ -42,7 +42,7 @@ void realm_table_meta_init(void) realm_tbl = rt_symbol_table_init("/etc/iproute2/rt_realms"); } -static void __exit realm_table_exit(void) +void realm_table_meta_exit(void) { rt_symbol_table_free(realm_tbl); } @@ -338,7 +338,7 @@ void devgroup_table_init(void) devgroup_tbl = rt_symbol_table_init("/etc/iproute2/group"); } -static void __exit devgroup_table_exit(void) +void devgroup_table_exit(void) { rt_symbol_table_free(devgroup_tbl); } diff --git a/src/netlink.c b/src/netlink.c index 3993aa1..6d466c0 100644 --- a/src/netlink.c +++ b/src/netlink.c @@ -67,7 +67,7 @@ void netlink_open_sock(void) fcntl(mnl_socket_get_fd(nf_sock), F_SETFL, O_NONBLOCK); } -static void __exit netlink_close_sock(void) +void netlink_close_sock(void) { if (nf_sock) mnl_socket_close(nf_sock); diff --git a/src/rt.c b/src/rt.c index 5f57cf0..cd2d5a4 100644 --- a/src/rt.c +++ b/src/rt.c @@ -29,7 +29,7 @@ void realm_table_rt_init(void) realm_tbl = rt_symbol_table_init("/etc/iproute2/rt_realms"); } -static void __exit realm_table_exit(void) +void realm_table_rt_exit(void) { rt_symbol_table_free(realm_tbl); } -- 2.9.4 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH nft 2/3] include: Remove __init macro definition.
Add nft_init function, which calls _init functions in main.c file. Remove __init macro definition as libnftables library will be created soon. Rename realm_table_init() function to avoid ambiguity as realm_table_rt_init() and realm_table_meta_init() in rt.c and meta.c files. Signed-off-by: Varsha Rao--- include/nftables.h | 9 + include/utils.h| 1 - src/ct.c | 2 +- src/datatype.c | 2 +- src/gmputil.c | 2 +- src/main.c | 15 +++ src/meta.c | 4 ++-- src/netlink.c | 2 +- src/rt.c | 2 +- src/xt.c | 2 +- 10 files changed, 32 insertions(+), 9 deletions(-) diff --git a/include/nftables.h b/include/nftables.h index 26fd344..b188b9e 100644 --- a/include/nftables.h +++ b/include/nftables.h @@ -117,5 +117,14 @@ struct parser_state; int nft_run(struct nft_ctx *nft, void *scanner, struct parser_state *state, struct list_head *msgs); +void ct_label_table_init(void); +void mark_table_init(void); +void gmp_init(void); +void realm_table_rt_init(void); +void devgroup_table_init(void); +void netlink_open_sock(void); +void realm_table_meta_init(void); +void xt_init(void); +void nft_init(void); #endif /* NFTABLES_NFTABLES_H */ diff --git a/include/utils.h b/include/utils.h index 3199388..0c3341b 100644 --- a/include/utils.h +++ b/include/utils.h @@ -32,7 +32,6 @@ #define __gmp_fmtstring(x, y) #endif -#define __init __attribute__((constructor)) #define __exit __attribute__((destructor)) #define __must_check __attribute__((warn_unused_result)) #define __noreturn __attribute__((__noreturn__)) diff --git a/src/ct.c b/src/ct.c index 9b7140b..25efc70 100644 --- a/src/ct.c +++ b/src/ct.c @@ -205,7 +205,7 @@ static const struct datatype ct_label_type = { .parse = ct_label_type_parse, }; -static void __init ct_label_table_init(void) +void ct_label_table_init(void) { ct_label_tbl = rt_symbol_table_init(CONNLABEL_CONF); } diff --git a/src/datatype.c b/src/datatype.c index b3c8f66..4f74c06 100644 --- a/src/datatype.c +++ b/src/datatype.c @@ -717,7 +717,7 @@ void rt_symbol_table_free(struct symbol_table *tbl) } static struct symbol_table *mark_tbl; -static void __init mark_table_init(void) +void mark_table_init(void) { mark_tbl = rt_symbol_table_init("/etc/iproute2/rt_marks"); } diff --git a/src/gmputil.c b/src/gmputil.c index c763792..844ea61 100644 --- a/src/gmputil.c +++ b/src/gmputil.c @@ -207,7 +207,7 @@ static void *gmp_xrealloc(void *ptr, size_t old_size, size_t new_size) return xrealloc(ptr, new_size); } -static void __init gmp_init(void) +void gmp_init(void) { mp_set_memory_functions(xmalloc, gmp_xrealloc, NULL); } diff --git a/src/main.c b/src/main.c index 7fbf00a..301bce0 100644 --- a/src/main.c +++ b/src/main.c @@ -262,6 +262,20 @@ err1: return ret; } +void nft_init(void) +{ +mark_table_init(); +realm_table_rt_init(); +devgroup_table_init(); +realm_table_meta_init(); +ct_label_table_init(); +netlink_open_sock(); +gmp_init(); +#ifdef HAVE_LIBXTABLES + xt_init(); +#endif +} + int main(int argc, char * const *argv) { struct parser_state state; @@ -272,6 +286,7 @@ int main(int argc, char * const *argv) bool interactive = false; int i, val, rc = NFT_EXIT_SUCCESS; + nft_init(); while (1) { val = getopt_long(argc, argv, OPTSTRING, options, NULL); if (val == -1) diff --git a/src/meta.c b/src/meta.c index e9334b8..4fb26d5 100644 --- a/src/meta.c +++ b/src/meta.c @@ -37,7 +37,7 @@ #include static struct symbol_table *realm_tbl; -static void __init realm_table_init(void) +void realm_table_meta_init(void) { realm_tbl = rt_symbol_table_init("/etc/iproute2/rt_realms"); } @@ -333,7 +333,7 @@ const struct datatype pkttype_type = { }; static struct symbol_table *devgroup_tbl; -static void __init devgroup_table_init(void) +void devgroup_table_init(void) { devgroup_tbl = rt_symbol_table_init("/etc/iproute2/group"); } diff --git a/src/netlink.c b/src/netlink.c index 880502c..3993aa1 100644 --- a/src/netlink.c +++ b/src/netlink.c @@ -61,7 +61,7 @@ static struct mnl_socket *nfsock_open(void) return s; } -static void __init netlink_open_sock(void) +void netlink_open_sock(void) { nf_sock = nfsock_open(); fcntl(mnl_socket_get_fd(nf_sock), F_SETFL, O_NONBLOCK); diff --git a/src/rt.c b/src/rt.c index 530ebe6..5f57cf0 100644 --- a/src/rt.c +++ b/src/rt.c @@ -24,7 +24,7 @@ #include static struct symbol_table *realm_tbl; -static void __init realm_table_init(void) +void realm_table_rt_init(void) { realm_tbl = rt_symbol_table_init("/etc/iproute2/rt_realms"); } diff --git a/src/xt.c b/src/xt.c index e24b0af..9680f8e 100644 --- a/src/xt.c +++ b/src/xt.c @@ -351,7 +351,7 @@
[PATCH nft 1/3] include: Remove datatype_register().
Remove datatype_register() function and its calling __init functions. Add arguments of datatype_register() function to datatype array. Signed-off-by: Varsha Rao--- include/ct.h | 3 +++ include/datatype.h | 1 - include/exthdr.h | 1 + include/fib.h | 2 ++ include/meta.h | 7 ++- include/proto.h| 8 include/rt.h | 1 + src/ct.c | 13 +++-- src/datatype.c | 25 +++-- src/exthdr.c | 7 +-- src/fib.c | 7 +-- src/meta.c | 44 +--- src/proto.c| 24 ++-- src/rt.c | 7 +-- 14 files changed, 57 insertions(+), 93 deletions(-) diff --git a/include/ct.h b/include/ct.h index ae900ee..b8cd7bf 100644 --- a/include/ct.h +++ b/include/ct.h @@ -23,6 +23,9 @@ struct ct_template { .len= (__len), \ } +extern const struct datatype ct_dir_type; +extern const struct datatype ct_state_type; +extern const struct datatype ct_status_type; extern struct expr *ct_expr_alloc(const struct location *loc, enum nft_ct_keys key, int8_t direction); extern void ct_expr_update_type(struct proto_ctx *ctx, struct expr *expr); diff --git a/include/datatype.h b/include/datatype.h index 58c4d3e..2e34591 100644 --- a/include/datatype.h +++ b/include/datatype.h @@ -152,7 +152,6 @@ struct datatype { const struct symbol_table *sym_tbl; }; -extern void datatype_register(const struct datatype *dtype); extern const struct datatype *datatype_lookup(enum datatypes type); extern const struct datatype *datatype_lookup_byname(const char *name); diff --git a/include/exthdr.h b/include/exthdr.h index a2647ee..97ccc38 100644 --- a/include/exthdr.h +++ b/include/exthdr.h @@ -89,5 +89,6 @@ extern const struct exthdr_desc exthdr_rt2; extern const struct exthdr_desc exthdr_frag; extern const struct exthdr_desc exthdr_dst; extern const struct exthdr_desc exthdr_mh; +extern const struct datatype mh_type_type; #endif /* NFTABLES_EXTHDR_H */ diff --git a/include/fib.h b/include/fib.h index 3a019e6..9ce681c 100644 --- a/include/fib.h +++ b/include/fib.h @@ -4,4 +4,6 @@ extern struct expr *fib_expr_alloc(const struct location *loc, unsigned int flags, unsigned int result); +extern const struct datatype fib_addr_type; + #endif /* NFTABLES_FIB_H */ diff --git a/include/meta.h b/include/meta.h index 5578460..ebef7d8 100644 --- a/include/meta.h +++ b/include/meta.h @@ -28,7 +28,12 @@ extern struct expr *meta_expr_alloc(const struct location *loc, struct stmt *meta_stmt_meta_iiftype(const struct location *loc, uint16_t type); -const struct datatype ifindex_type; +extern const struct datatype ifindex_type; +extern const struct datatype tchandle_type; +extern const struct datatype gid_type; +extern const struct datatype uid_type; +extern const struct datatype devgroup_type; +extern const struct datatype pkttype_type; struct error_record *meta_key_parse(const struct location *loc, const char *name, diff --git a/include/proto.h b/include/proto.h index 01188ab..39aa485 100644 --- a/include/proto.h +++ b/include/proto.h @@ -322,4 +322,12 @@ extern const struct proto_desc proto_netdev; extern const struct proto_desc proto_unknown; extern const struct proto_hdr_template proto_unknown_template; +extern const struct datatype icmp_type_type; +extern const struct datatype tcp_flag_type; +extern const struct datatype dccp_pkttype_type; +extern const struct datatype arpop_type; +extern const struct datatype icmp6_type_type; +extern const struct datatype dscp_type; +extern const struct datatype ecn_type; + #endif /* NFTABLES_PROTO_H */ diff --git a/include/rt.h b/include/rt.h index 728cf5f..bbffa74 100644 --- a/include/rt.h +++ b/include/rt.h @@ -26,6 +26,7 @@ struct rt_template { .invalid= (__invalid), \ } +extern const struct datatype realm_type; extern struct expr *rt_expr_alloc(const struct location *loc, enum nft_rt_keys key, bool invalid); extern void rt_expr_update_type(struct proto_ctx *ctx, struct expr *expr); diff --git a/src/ct.c b/src/ct.c index c705750..9b7140b 100644 --- a/src/ct.c +++ b/src/ct.c @@ -44,7 +44,7 @@ static const struct symbol_table ct_state_tbl = { } }; -static const struct datatype ct_state_type = { +const struct datatype ct_state_type = { .type = TYPE_CT_STATE, .name = "ct_state", .desc = "conntrack state", @@ -63,7 +63,7 @@ static const struct symbol_table ct_dir_tbl = { } }; -static const struct datatype ct_dir_type = { +const struct datatype ct_dir_type = { .type = TYPE_CT_DIR, .name = "ct_dir",
[PATCH nft 0/3] include: Remove datatype_register(), __init and __exit macros.
This patchset removes datatype_register() function, __init and __exit macro definitions. Varsha Rao (3): include: Remove datatype_register(). include: Remove __init macro definition. include: Remove __exit macro definition. include/ct.h | 3 +++ include/datatype.h | 1 - include/exthdr.h | 1 + include/fib.h | 2 ++ include/meta.h | 7 ++- include/nftables.h | 18 ++ include/proto.h| 8 include/rt.h | 1 + include/utils.h| 2 -- src/ct.c | 17 + src/datatype.c | 29 + src/exthdr.c | 7 +-- src/fib.c | 7 +-- src/gmputil.c | 2 +- src/main.c | 26 ++ src/meta.c | 52 +--- src/netlink.c | 4 ++-- src/proto.c| 24 ++-- src/rt.c | 11 +++ src/xt.c | 2 +- 20 files changed, 115 insertions(+), 109 deletions(-) -- 2.9.4 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 12/26] arm64: Remove spin_unlock_wait() arch-specific definitions
On Thu, Jun 29, 2017 at 05:01:20PM -0700, Paul E. McKenney wrote: > There is no agreed-upon definition of spin_unlock_wait()'s semantics, > and it appears that all callers could do just as well with a lock/unlock > pair. This commit therefore removes the underlying arch-specific > arch_spin_unlock_wait(). > > Signed-off-by: Paul E. McKenney> Cc: Catalin Marinas > Cc: Will Deacon > Cc: > Cc: Peter Zijlstra > Cc: Alan Stern > Cc: Andrea Parri > Cc: Linus Torvalds > --- > arch/arm64/include/asm/spinlock.h | 58 > --- > 1 file changed, 5 insertions(+), 53 deletions(-) I'm going to miss this code. Acked-by: Will Deacon Will -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 08/26] locking: Remove spin_unlock_wait() generic definitions
On Thu, Jun 29, 2017 at 05:01:16PM -0700, Paul E. McKenney wrote: > There is no agreed-upon definition of spin_unlock_wait()'s semantics, > and it appears that all callers could do just as well with a lock/unlock > pair. This commit therefore removes spin_unlock_wait() and related > definitions from core code. > > Signed-off-by: Paul E. McKenney> Cc: Arnd Bergmann > Cc: Ingo Molnar > Cc: Will Deacon > Cc: Peter Zijlstra > Cc: Alan Stern > Cc: Andrea Parri > Cc: Linus Torvalds > --- > include/asm-generic/qspinlock.h | 14 - > include/linux/spinlock.h| 31 --- > include/linux/spinlock_up.h | 6 --- > kernel/locking/qspinlock.c | 117 > > 4 files changed, 168 deletions(-) [...] > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c > index b2caec7315af..64a9051e4c2c 100644 > --- a/kernel/locking/qspinlock.c > +++ b/kernel/locking/qspinlock.c > @@ -267,123 +267,6 @@ static __always_inline u32 > __pv_wait_head_or_lock(struct qspinlock *lock, > #define queued_spin_lock_slowpathnative_queued_spin_lock_slowpath > #endif > > -/* > - * Various notes on spin_is_locked() and spin_unlock_wait(), which are > - * 'interesting' functions: > - * > - * PROBLEM: some architectures have an interesting issue with atomic ACQUIRE > - * operations in that the ACQUIRE applies to the LOAD _not_ the STORE (ARM64, > - * PPC). Also qspinlock has a similar issue per construction, the setting of > - * the locked byte can be unordered acquiring the lock proper. > - * > - * This gets to be 'interesting' in the following cases, where the /should/s > - * end up false because of this issue. > - * > - * > - * CASE 1: > - * > - * So the spin_is_locked() correctness issue comes from something like: > - * > - * CPU0CPU1 > - * > - * global_lock(); local_lock(i) > - * spin_lock() spin_lock([i]) > - * for (i) if (!spin_is_locked()) { > - * spin_unlock_wait([i]);smp_acquire__after_ctrl_dep(); > - * return; > - * } > - * // deal with fail > - * > - * Where it is important CPU1 sees G locked or CPU0 sees L[i] locked such > - * that there is exclusion between the two critical sections. > - * > - * The load from spin_is_locked() /should/ be constrained by the ACQUIRE > from > - * spin_lock([i]), and similarly the load(s) from spin_unlock_wait([i]) > - * /should/ be constrained by the ACQUIRE from spin_lock(). > - * > - * Similarly, later stuff is constrained by the ACQUIRE from CTRL+RMB. Might be worth keeping this comment about spin_is_locked, since we're not removing that guy just yet! Will -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html