Re: [PATCH RFC 08/26] locking: Remove spin_unlock_wait() generic definitions

2017-06-30 Thread Paul E. McKenney
On Fri, Jun 30, 2017 at 02:13:39PM +0100, Will Deacon wrote:
> On Fri, Jun 30, 2017 at 05:38:15AM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 30, 2017 at 10:19:29AM +0100, Will Deacon wrote:
> > > On Thu, Jun 29, 2017 at 05:01:16PM -0700, Paul E. McKenney wrote:
> > > > There is no agreed-upon definition of spin_unlock_wait()'s semantics,
> > > > and it appears that all callers could do just as well with a lock/unlock
> > > > pair.  This commit therefore removes spin_unlock_wait() and related
> > > > definitions from core code.
> > > > 
> > > > Signed-off-by: Paul E. McKenney 
> > > > Cc: Arnd Bergmann 
> > > > Cc: Ingo Molnar 
> > > > Cc: Will Deacon 
> > > > Cc: Peter Zijlstra 
> > > > Cc: Alan Stern 
> > > > Cc: Andrea Parri 
> > > > Cc: Linus Torvalds 
> > > > ---
> > > >  include/asm-generic/qspinlock.h |  14 -
> > > >  include/linux/spinlock.h|  31 ---
> > > >  include/linux/spinlock_up.h |   6 ---
> > > >  kernel/locking/qspinlock.c  | 117 
> > > > 
> > > >  4 files changed, 168 deletions(-)
> > > 
> > > [...]
> > > 
> > > > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> > > > index b2caec7315af..64a9051e4c2c 100644
> > > > --- a/kernel/locking/qspinlock.c
> > > > +++ b/kernel/locking/qspinlock.c
> > > > @@ -267,123 +267,6 @@ static __always_inline u32  
> > > > __pv_wait_head_or_lock(struct qspinlock *lock,
> > > >  #define queued_spin_lock_slowpath  native_queued_spin_lock_slowpath
> > > >  #endif
> > > >  
> > > > -/*
> > > > - * Various notes on spin_is_locked() and spin_unlock_wait(), which are
> > > > - * 'interesting' functions:
> > > > - *
> > > > - * PROBLEM: some architectures have an interesting issue with atomic 
> > > > ACQUIRE
> > > > - * operations in that the ACQUIRE applies to the LOAD _not_ the STORE 
> > > > (ARM64,
> > > > - * PPC). Also qspinlock has a similar issue per construction, the 
> > > > setting of
> > > > - * the locked byte can be unordered acquiring the lock proper.
> > > > - *
> > > > - * This gets to be 'interesting' in the following cases, where the 
> > > > /should/s
> > > > - * end up false because of this issue.
> > > > - *
> > > > - *
> > > > - * CASE 1:
> > > > - *
> > > > - * So the spin_is_locked() correctness issue comes from something like:
> > > > - *
> > > > - *   CPU0  CPU1
> > > > - *
> > > > - *   global_lock();local_lock(i)
> > > > - * spin_lock() spin_lock([i])
> > > > - * for (i)   if (!spin_is_locked()) {
> > > > - *   spin_unlock_wait([i]);  
> > > > smp_acquire__after_ctrl_dep();
> > > > - * return;
> > > > - *   }
> > > > - *   // deal with fail
> > > > - *
> > > > - * Where it is important CPU1 sees G locked or CPU0 sees L[i] locked 
> > > > such
> > > > - * that there is exclusion between the two critical sections.
> > > > - *
> > > > - * The load from spin_is_locked() /should/ be constrained by the 
> > > > ACQUIRE from
> > > > - * spin_lock([i]), and similarly the load(s) from 
> > > > spin_unlock_wait([i])
> > > > - * /should/ be constrained by the ACQUIRE from spin_lock().
> > > > - *
> > > > - * Similarly, later stuff is constrained by the ACQUIRE from CTRL+RMB.
> > > 
> > > Might be worth keeping this comment about spin_is_locked, since we're not
> > > removing that guy just yet!
> > 
> > Ah, all the examples had spin_unlock_wait() in them.  So what I need to
> > do is to create a spin_unlock_wait()-free example to illustrate the
> > text starting with "The load from spin_is_locked(", correct?
> 
> Yeah, I think so.
> 
> > I also need to check all uses of spin_is_locked().  There might no
> > longer be any that rely on any particular ordering...
> 
> Right. I think we're looking for the "insane case" as per 38b850a73034
> (which was apparently used by ipc/sem.c at the time, but no longer).
> 
> There's a usage in kernel/debug/debug_core.c, but it doesn't fill me with
> joy.

That is indeed an interesting one...  But my first round will be what
semantics the implementations seem to provide:

Acquire courtesy of TSO: s390, sparc, x86.
Acquire: ia64 (in reality fully ordered).
Control dependency: alpha, arc, arm, blackfin, hexagon, m32r, mn10300, tile,
xtensa.
Control dependency plus leading full barrier: arm64, powerpc.
UP-only: c6x, cris, frv, h8300, m68k, microblaze nios2, openrisc, um, unicore32.

Special cases:
metag: Acquire if !CONFIG_METAG_SMP_WRITE_REORDERING.
   Otherwise control dependency?
mips: Control dependency, acquire if CONFIG_CPU_CAVIUM_OCTEON.
parisc: Acquire courtesy of TSO, 

Re: nf_conntrack: Infoleak via CTA_ID and CTA_EXPECT_ID

2017-06-30 Thread Richard Weinberger
Florian,

Am 30.06.2017 um 21:55 schrieb Florian Westphal:
>>> Why not use a hash of the address?
>>
>> Would also work. Or xor it with a random number.
>>
>> On the other hand, for user space it would be more useful when the conntrack 
>> id
>> does not repeat that often. That's why I favor the good old counter method.
>> Currently the conntrack id is reused very fast.
>> e.g. in one of our applications we use the conntack id via NFQUEUE and watch 
>> the
>> destroy events via conntrack. It happens regularly that a new connection has 
>> the
>> same id than a different connection we saw some moments before, before we 
>> receive
>> the destroy event from the conntrack socket.
> 
> Perhaps we can place that in a new extension (its not needed in any
> fastpath ops)?

To get rid of the infoleak we have to re-introduce the id field in struct 
nf_conn
and struct nf_conntrack_expect.
Otherwise have nothing to compare against in the conntrack/expect remove case.

So the only question is what to store, IMHO a counter that can wrap around is 
the
cheapest method and would also not harm the fast-path.

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair

2017-06-30 Thread Paul E. McKenney
On Fri, Jun 30, 2017 at 01:02:48PM -0700, Paul E. McKenney wrote:
> On Fri, Jun 30, 2017 at 09:21:23PM +0200, Oleg Nesterov wrote:
> > On 06/30, Paul E. McKenney wrote:
> > >
> > > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote:
> > > >
> > > > I do not think the overhead will be noticeable in this particular case.
> > > >
> > > > But I am not sure I understand why do we want to unlock_wait. Yes I 
> > > > agree,
> >^
> > 
> > if it was not clear, I tried to say "why do we want to _remove_ 
> > unlock_wait".
> > 
> > > > it has some problems, but still...
> > > >
> > > > The code above looks strange for me. If we are going to repeat this 
> > > > pattern
> > > > the perhaps we should add a helper for lock+unlock and name it 
> > > > unlock_wait2 ;)
> > > >
> > > > If not, we should probably change this code more:
> > >
> > > This looks -much- better than my patch!  May I have your Signed-off-by?
> > 
> > Only if you promise to replace all RCU flavors with a single simple 
> > implementation
> > based on rwlock ;)
> 
> ;-) ;-) ;-)
> 
> Here you go:
> 
> https://github.com/pramalhe/ConcurrencyFreaks/blob/master/papers/poormanurcu-2015.pdf
> 
> > Seriously, of course I won't argue, and it seems that nobody except me likes
> > this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() 
> > and
> > sometimes it makes sense.
> 
> Well, that analogy was what led me to propose that its semantics be
> defined as spin_lock() immediately followed by spin_unlock().  But that
> didn't go over well.
> 
> > Including this particular case. task_work_run() is going to flush/destroy 
> > the
> > ->task_works list, so it needs to wait until all currently executing 
> > "readers"
> > (task_work_cancel()'s which have started before ->task_works was updated) 
> > have
> > completed.
> 
> Understood!

And please see below for the resulting patch and commit log.  Please let
me know if I broke something.

Thanx, Paul



commit 6c0801c9ab19fc2f4c1e2436eb1b72e0af9a317b
Author: Oleg Nesterov 
Date:   Fri Jun 30 13:13:59 2017 -0700

task_work: Replace spin_unlock_wait() with lock/unlock pair

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore replaces the spin_unlock_wait() call in
task_work_run() with a spin_lock_irq() and a spin_unlock_irq() aruond
the cmpxchg() dequeue loop.  This should be safe from a performance
perspective because ->pi_lock is local to the task and because calls to
the other side of the race, task_work_cancel(), should be rare.

Signed-off-by: Oleg Nesterov 
Signed-off-by: Paul E. McKenney 

diff --git a/kernel/task_work.c b/kernel/task_work.c
index d513051fcca2..836a72a66fba 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -96,20 +96,16 @@ void task_work_run(void)
 * work->func() can do task_work_add(), do not set
 * work_exited unless the list is empty.
 */
+   raw_spin_lock_irq(>pi_lock);
do {
work = READ_ONCE(task->task_works);
head = !work && (task->flags & PF_EXITING) ?
_exited : NULL;
} while (cmpxchg(>task_works, work, head) != work);
+   raw_spin_unlock_irq(>pi_lock);
 
if (!work)
break;
-   /*
-* Synchronize with task_work_cancel(). It can't remove
-* the first entry == work, cmpxchg(task_works) should
-* fail, but it can play with *work and other entries.
-*/
-   raw_spin_unlock_wait(>pi_lock);
 
do {
next = work->next;

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair

2017-06-30 Thread Paul E. McKenney
On Fri, Jun 30, 2017 at 03:50:33PM -0400, Alan Stern wrote:
> On Fri, 30 Jun 2017, Oleg Nesterov wrote:
> 
> > On 06/30, Paul E. McKenney wrote:
> > >
> > > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote:
> > > >
> > > > I do not think the overhead will be noticeable in this particular case.
> > > >
> > > > But I am not sure I understand why do we want to unlock_wait. Yes I 
> > > > agree,
> >^
> > 
> > if it was not clear, I tried to say "why do we want to _remove_ 
> > unlock_wait".
> > 
> > > > it has some problems, but still...
> > > >
> > > > The code above looks strange for me. If we are going to repeat this 
> > > > pattern
> > > > the perhaps we should add a helper for lock+unlock and name it 
> > > > unlock_wait2 ;)
> > > >
> > > > If not, we should probably change this code more:
> > >
> > > This looks -much- better than my patch!  May I have your Signed-off-by?
> > 
> > Only if you promise to replace all RCU flavors with a single simple 
> > implementation
> > based on rwlock ;)
> > 
> > Seriously, of course I won't argue, and it seems that nobody except me likes
> > this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() 
> > and
> > sometimes it makes sense.
> 
> If it looks like synchronize_rcu(), why not actually use 
> synchronize_rcu()?

My guess is that the latencies of synchronize_rcu() don't suit his needs.
When the lock is not held, spin_unlock_wait() is quite fast, even
compared to expedited grace periods.

Thanx, Paul

> Alan Stern
> 
> > Including this particular case. task_work_run() is going to flush/destroy 
> > the
> > ->task_works list, so it needs to wait until all currently executing 
> > "readers"
> > (task_work_cancel()'s which have started before ->task_works was updated) 
> > have
> > completed.
> 

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair

2017-06-30 Thread Paul E. McKenney
On Fri, Jun 30, 2017 at 09:21:23PM +0200, Oleg Nesterov wrote:
> On 06/30, Paul E. McKenney wrote:
> >
> > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote:
> > >
> > > I do not think the overhead will be noticeable in this particular case.
> > >
> > > But I am not sure I understand why do we want to unlock_wait. Yes I agree,
>^
> 
> if it was not clear, I tried to say "why do we want to _remove_ unlock_wait".
> 
> > > it has some problems, but still...
> > >
> > > The code above looks strange for me. If we are going to repeat this 
> > > pattern
> > > the perhaps we should add a helper for lock+unlock and name it 
> > > unlock_wait2 ;)
> > >
> > > If not, we should probably change this code more:
> >
> > This looks -much- better than my patch!  May I have your Signed-off-by?
> 
> Only if you promise to replace all RCU flavors with a single simple 
> implementation
> based on rwlock ;)

;-) ;-) ;-)

Here you go:

https://github.com/pramalhe/ConcurrencyFreaks/blob/master/papers/poormanurcu-2015.pdf

> Seriously, of course I won't argue, and it seems that nobody except me likes
> this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() and
> sometimes it makes sense.

Well, that analogy was what led me to propose that its semantics be
defined as spin_lock() immediately followed by spin_unlock().  But that
didn't go over well.

> Including this particular case. task_work_run() is going to flush/destroy the
> ->task_works list, so it needs to wait until all currently executing "readers"
> (task_work_cancel()'s which have started before ->task_works was updated) have
> completed.

Understood!

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nf_conntrack: Infoleak via CTA_ID and CTA_EXPECT_ID

2017-06-30 Thread Florian Westphal
Richard Weinberger  wrote:
> Florian,
> 
> Am 30.06.2017 um 21:35 schrieb Florian Westphal:
> > Richard Weinberger  wrote:
> >> Hi!
> >>
> >> I noticed that nf_conntrack leaks kernel addresses, it uses the memory 
> >> address
> >> as identifier used for generating conntrack and expect ids..
> >> Since these ids are also visible to unprivileged users via network 
> >> namespaces
> >> I suggest reverting these commits:
> > 
> > Why not use a hash of the address?
> 
> Would also work. Or xor it with a random number.
> 
> On the other hand, for user space it would be more useful when the conntrack 
> id
> does not repeat that often. That's why I favor the good old counter method.
> Currently the conntrack id is reused very fast.
> e.g. in one of our applications we use the conntack id via NFQUEUE and watch 
> the
> destroy events via conntrack. It happens regularly that a new connection has 
> the
> same id than a different connection we saw some moments before, before we 
> receive
> the destroy event from the conntrack socket.

Perhaps we can place that in a new extension (its not needed in any
fastpath ops)?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair

2017-06-30 Thread Alan Stern
On Fri, 30 Jun 2017, Oleg Nesterov wrote:

> On 06/30, Paul E. McKenney wrote:
> >
> > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote:
> > >
> > > I do not think the overhead will be noticeable in this particular case.
> > >
> > > But I am not sure I understand why do we want to unlock_wait. Yes I agree,
>^
> 
> if it was not clear, I tried to say "why do we want to _remove_ unlock_wait".
> 
> > > it has some problems, but still...
> > >
> > > The code above looks strange for me. If we are going to repeat this 
> > > pattern
> > > the perhaps we should add a helper for lock+unlock and name it 
> > > unlock_wait2 ;)
> > >
> > > If not, we should probably change this code more:
> >
> > This looks -much- better than my patch!  May I have your Signed-off-by?
> 
> Only if you promise to replace all RCU flavors with a single simple 
> implementation
> based on rwlock ;)
> 
> Seriously, of course I won't argue, and it seems that nobody except me likes
> this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() and
> sometimes it makes sense.

If it looks like synchronize_rcu(), why not actually use 
synchronize_rcu()?

Alan Stern

> Including this particular case. task_work_run() is going to flush/destroy the
> ->task_works list, so it needs to wait until all currently executing "readers"
> (task_work_cancel()'s which have started before ->task_works was updated) have
> completed.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nf_conntrack: Infoleak via CTA_ID and CTA_EXPECT_ID

2017-06-30 Thread Richard Weinberger
Florian,

Am 30.06.2017 um 21:35 schrieb Florian Westphal:
> Richard Weinberger  wrote:
>> Hi!
>>
>> I noticed that nf_conntrack leaks kernel addresses, it uses the memory 
>> address
>> as identifier used for generating conntrack and expect ids..
>> Since these ids are also visible to unprivileged users via network namespaces
>> I suggest reverting these commits:
> 
> Why not use a hash of the address?

Would also work. Or xor it with a random number.

On the other hand, for user space it would be more useful when the conntrack id
does not repeat that often. That's why I favor the good old counter method.
Currently the conntrack id is reused very fast.
e.g. in one of our applications we use the conntack id via NFQUEUE and watch the
destroy events via conntrack. It happens regularly that a new connection has the
same id than a different connection we saw some moments before, before we 
receive
the destroy event from the conntrack socket.

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


nf_conntrack: Infoleak via CTA_ID and CTA_EXPECT_ID

2017-06-30 Thread Richard Weinberger
Hi!

I noticed that nf_conntrack leaks kernel addresses, it uses the memory address
as identifier used for generating conntrack and expect ids..
Since these ids are also visible to unprivileged users via network namespaces
I suggest reverting these commits:

commit 7f85f914721ffcef382a57995182916bd43d8a65
Author: Patrick McHardy 
Date:   Fri Sep 28 14:41:27 2007 -0700

[NETFILTER]: nf_conntrack: kill unique ID

Remove the per-conntrack ID, its not necessary anymore for dumping.
For compatiblity reasons we send the address of the conntrack to
userspace as ID.

Signed-off-by: Patrick McHardy 
Signed-off-by: David S. Miller 

commit 3583240249ef354760e04ae49bd7b462a638f40c
Author: Patrick McHardy 
Date:   Fri Sep 28 14:41:50 2007 -0700

[NETFILTER]: nf_conntrack_expect: kill unique ID

Similar to the conntrack ID, the per-expectation ID is not needed
anymore, kill it.

Signed-off-by: Patrick McHardy 
Signed-off-by: David S. Miller 

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair

2017-06-30 Thread Oleg Nesterov
On 06/30, Paul E. McKenney wrote:
>
> On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote:
> >
> > I do not think the overhead will be noticeable in this particular case.
> >
> > But I am not sure I understand why do we want to unlock_wait. Yes I agree,
   ^

if it was not clear, I tried to say "why do we want to _remove_ unlock_wait".

> > it has some problems, but still...
> >
> > The code above looks strange for me. If we are going to repeat this pattern
> > the perhaps we should add a helper for lock+unlock and name it unlock_wait2 
> > ;)
> >
> > If not, we should probably change this code more:
>
> This looks -much- better than my patch!  May I have your Signed-off-by?

Only if you promise to replace all RCU flavors with a single simple 
implementation
based on rwlock ;)

Seriously, of course I won't argue, and it seems that nobody except me likes
this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() and
sometimes it makes sense.

Including this particular case. task_work_run() is going to flush/destroy the
->task_works list, so it needs to wait until all currently executing "readers"
(task_work_cancel()'s which have started before ->task_works was updated) have
completed.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH nf-next 3/3] netfilter: exthdr: early set support

2017-06-30 Thread Florian Westphal
This allows setting 2 and 4 byte quantities in the tcp option space.
This allows native replacement for xt_TCPMSS in nf_tables.

Todo:
-TCPOPTSTRIP equivalent
-possibly restrict write support so we can't generate
invalid options (e.g., don't allow write to kind,len).

Signed-off-by: Florian Westphal 
---
 include/uapi/linux/netfilter/nf_tables.h |   4 +-
 net/netfilter/nft_exthdr.c   | 161 ++-
 2 files changed, 162 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/netfilter/nf_tables.h 
b/include/uapi/linux/netfilter/nf_tables.h
index 683f6f88fcac..4766f5091b5f 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -731,7 +731,8 @@ enum nft_exthdr_op {
  * @NFTA_EXTHDR_OFFSET: extension header offset (NLA_U32)
  * @NFTA_EXTHDR_LEN: extension header length (NLA_U32)
  * @NFTA_EXTHDR_FLAGS: extension header flags (NLA_U32)
- * @NFTA_EXTHDR_OP: option match type (NLA_U8)
+ * @NFTA_EXTHDR_OP: option match type (NLA_U32)
+ * @NFTA_EXTHDR_SREG: option match type (NLA_U32)
  */
 enum nft_exthdr_attributes {
NFTA_EXTHDR_UNSPEC,
@@ -741,6 +742,7 @@ enum nft_exthdr_attributes {
NFTA_EXTHDR_LEN,
NFTA_EXTHDR_FLAGS,
NFTA_EXTHDR_OP,
+   NFTA_EXTHDR_SREG,
__NFTA_EXTHDR_MAX
 };
 #define NFTA_EXTHDR_MAX(__NFTA_EXTHDR_MAX - 1)
diff --git a/net/netfilter/nft_exthdr.c b/net/netfilter/nft_exthdr.c
index e3a6eebe7e0c..214211699f8d 100644
--- a/net/netfilter/nft_exthdr.c
+++ b/net/netfilter/nft_exthdr.c
@@ -8,6 +8,7 @@
  * Development of this code funded by Astaro AG (http://www.astaro.com/)
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -23,6 +24,7 @@ struct nft_exthdr {
u8  len;
u8  op;
enum nft_registers  dreg:8;
+   enum nft_registers  sreg:8;
u8  flags;
 };
 
@@ -124,6 +126,88 @@ static void nft_exthdr_tcp_eval(const struct nft_expr 
*expr,
regs->verdict.code = NFT_BREAK;
 }
 
+static void nft_exthdr_tcp_set_eval(const struct nft_expr *expr,
+   struct nft_regs *regs,
+   const struct nft_pktinfo *pkt)
+{
+   u8 buff[sizeof(struct tcphdr) + MAX_TCP_OPTION_SPACE];
+   struct nft_exthdr *priv = nft_expr_priv(expr);
+   unsigned int i, optl, tcphdr_len, offset;
+   struct tcphdr *tcph;
+   u8 *opt;
+   u32 src;
+
+   tcph = nft_tcp_header_pointer(pkt, sizeof(buff), buff, _len);
+   if (!tcph)
+   return;
+
+   opt = (u8 *)tcph;
+   for (i = sizeof(*tcph); i < tcphdr_len - 1; i += optl) {
+   union {
+   u8 octet;
+   __be16 v16;
+   __be32 v32;
+   } old, new;
+
+   optl = optlen(opt, i);
+
+   if (priv->type != opt[i])
+   continue;
+
+   if (i + optl > tcphdr_len || priv->len + priv->offset > optl)
+   return;
+
+   if (!skb_make_writable(pkt->skb, pkt->xt.thoff + i + priv->len))
+   return;
+
+   tcph = nft_tcp_header_pointer(pkt, sizeof(buff), buff,
+ _len);
+   if (!tcph)
+   return;
+
+   src = regs->data[priv->sreg];
+   offset = i + priv->offset;
+
+   switch (priv->len) {
+   case 2:
+   old.v16 = get_unaligned((u16 *)(opt + offset));
+   new.v16 = src;
+
+   if (old.v16 == new.v16)
+   return;
+
+   switch (priv->type) {
+   case TCPOPT_MSS:
+   /* increase can cause connection to stall */
+   if (ntohs(old.v16) <= ntohs(new.v16))
+   return;
+   break;
+   }
+
+   put_unaligned(new.v16, (u16*)(opt + offset));
+   inet_proto_csum_replace2(>check, pkt->skb,
+old.v16, new.v16, false);
+   break;
+   case 4:
+   new.v32 = src;
+   old.v32 = get_unaligned((u32 *)(opt + offset));
+
+   if (old.v32 == new.v32)
+   return;
+
+   put_unaligned(new.v32, (u32*)(opt + offset));
+   inet_proto_csum_replace4(>check, pkt->skb,
+old.v32, new.v32, false);
+   break;
+   default:
+   WARN_ON_ONCE(1);
+   break;
+   }
+
+   return;
+   }
+}
+
 static const 

Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair

2017-06-30 Thread Paul E. McKenney
On Fri, Jun 30, 2017 at 09:16:07AM -0700, Paul E. McKenney wrote:
> On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote:
> > On 06/30, Paul E. McKenney wrote:
> > >
> > > > > + raw_spin_lock_irq(>pi_lock);
> > > > > + raw_spin_unlock_irq(>pi_lock);
> > >
> > > I agree that the spin_unlock_wait() implementations would avoid the
> > > deadlock with an acquisition from an interrupt handler, while also
> > > avoiding the need to momentarily disable interrupts.  The ->pi_lock is
> > > a per-task lock, so I am assuming (perhaps naively) that contention is
> > > not a problem.  So is the overhead of interrupt disabling likely to be
> > > noticeable here?
> > 
> > I do not think the overhead will be noticeable in this particular case.
> > 
> > But I am not sure I understand why do we want to unlock_wait. Yes I agree,
> > it has some problems, but still...

Well, I tried documenting exactly what it did and did not do, which got
an ack from Peter.

https://marc.info/?l=linux-kernel=149575078313105

However, my later pull request spawned a bit of discussion:

https://marc.info/?l=linux-kernel=149730349001044

This discussion led me to propose strengthening spin_unlock_wait()
to act as a lock/unlock pair.  This can be implemented on x86 as
an smp_mb() followed by a read-only spinloop, as shown on branch
spin_unlock_wait.2017.06.23a on my -rcu tree.

Linus was not amused, and said that if we were going to make
spin_unlock_wait() have the semantics of lock+unlock, we should just
open-code that, especially given that there are way more definitions
of spin_unlock_wait() than there are uses.  He also suggested making
spin_unlock_wait() have only acquire semantics (x86 spin loop with
no memory-barrier instructions) and add explicit barriers where
required.

https://marc.info/?l=linux-kernel=149860012913036

I did a series for this which may be found on branch
spin_unlock_wait.2017.06.27a on my -rcu tree.

This approach was not loved by others (see later on the above thread), and
Linus's reply (which reiterated his opposition to lock+unlock semantics)
suggested the possibility of removing spin_unlock_wait() entirely.

https://marc.info/?l=linux-kernel=149869476911620

So I figured, in for a penny, in for a pound, and therefore did the series
that includes this patch.  The most recent update (which does not yet
include your improved version) is on branch spin_unlock_wait.2017.06.30b
of my -rcu tree.

Hey, you asked!  ;-)

Thanx, Paul

> > The code above looks strange for me. If we are going to repeat this pattern
> > the perhaps we should add a helper for lock+unlock and name it unlock_wait2 
> > ;)
> > 
> > If not, we should probably change this code more:
> 
> This looks -much- better than my patch!  May I have your Signed-off-by?
> 
>   Thanx, Paul
> 
> > --- a/kernel/task_work.c
> > +++ b/kernel/task_work.c
> > @@ -96,20 +96,16 @@ void task_work_run(void)
> >  * work->func() can do task_work_add(), do not set
> >  * work_exited unless the list is empty.
> >  */
> > +   raw_spin_lock_irq(>pi_lock);
> > do {
> > work = READ_ONCE(task->task_works);
> > head = !work && (task->flags & PF_EXITING) ?
> > _exited : NULL;
> > } while (cmpxchg(>task_works, work, head) != work);
> > +   raw_spin_unlock_irq(>pi_lock);
> > 
> > if (!work)
> > break;
> > -   /*
> > -* Synchronize with task_work_cancel(). It can't remove
> > -* the first entry == work, cmpxchg(task_works) should
> > -* fail, but it can play with *work and other entries.
> > -*/
> > -   raw_spin_unlock_wait(>pi_lock);
> > 
> > do {
> > next = work->next;
> > 
> > performance-wise this is almost the same, and if we do not really care about
> > overhead we can simplify the code: this way it is obvious that we can't race
> > with task_work_cancel().
> > 
> > Oleg.
> > 

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair

2017-06-30 Thread Paul E. McKenney
On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote:
> On 06/30, Paul E. McKenney wrote:
> >
> > > > +   raw_spin_lock_irq(>pi_lock);
> > > > +   raw_spin_unlock_irq(>pi_lock);
> >
> > I agree that the spin_unlock_wait() implementations would avoid the
> > deadlock with an acquisition from an interrupt handler, while also
> > avoiding the need to momentarily disable interrupts.  The ->pi_lock is
> > a per-task lock, so I am assuming (perhaps naively) that contention is
> > not a problem.  So is the overhead of interrupt disabling likely to be
> > noticeable here?
> 
> I do not think the overhead will be noticeable in this particular case.
> 
> But I am not sure I understand why do we want to unlock_wait. Yes I agree,
> it has some problems, but still...
> 
> The code above looks strange for me. If we are going to repeat this pattern
> the perhaps we should add a helper for lock+unlock and name it unlock_wait2 ;)
> 
> If not, we should probably change this code more:

This looks -much- better than my patch!  May I have your Signed-off-by?

Thanx, Paul

> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -96,20 +96,16 @@ void task_work_run(void)
>* work->func() can do task_work_add(), do not set
>* work_exited unless the list is empty.
>*/
> + raw_spin_lock_irq(>pi_lock);
>   do {
>   work = READ_ONCE(task->task_works);
>   head = !work && (task->flags & PF_EXITING) ?
>   _exited : NULL;
>   } while (cmpxchg(>task_works, work, head) != work);
> + raw_spin_unlock_irq(>pi_lock);
> 
>   if (!work)
>   break;
> - /*
> -  * Synchronize with task_work_cancel(). It can't remove
> -  * the first entry == work, cmpxchg(task_works) should
> -  * fail, but it can play with *work and other entries.
> -  */
> - raw_spin_unlock_wait(>pi_lock);
> 
>   do {
>   next = work->next;
> 
> performance-wise this is almost the same, and if we do not really care about
> overhead we can simplify the code: this way it is obvious that we can't race
> with task_work_cancel().
> 
> Oleg.
> 

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] netfilter: nfnetlink: Improve input length sanitization in nfnetlink_rcv

2017-06-30 Thread Mateusz Jurczyk
On Thu, Jun 29, 2017 at 6:22 PM, Pablo Neira Ayuso  wrote:
> On Tue, Jun 27, 2017 at 07:05:27PM +0200, Pablo Neira Ayuso wrote:
>> On Tue, Jun 27, 2017 at 05:58:25PM +0200, Pablo Neira Ayuso wrote:
>> > On Wed, Jun 07, 2017 at 03:50:38PM +0200, Mateusz Jurczyk wrote:
>> > > Verify that the length of the socket buffer is sufficient to cover the
>> > > nlmsghdr structure before accessing the nlh->nlmsg_len field for further
>> > > input sanitization. If the client only supplies 1-3 bytes of data in
>> > > sk_buff, then nlh->nlmsg_len remains partially uninitialized and
>> > > contains leftover memory from the corresponding kernel allocation.
>> > > Operating on such data may result in indeterminate evaluation of the
>> > > nlmsg_len < NLMSG_HDRLEN expression.
>> > >
>> > > The bug was discovered by a runtime instrumentation designed to detect
>> > > use of uninitialized memory in the kernel. The patch prevents this and
>> > > other similar tools (e.g. KMSAN) from flagging this behavior in the 
>> > > future.
>> >
>> > Applied, thanks.
>>
>> Wait, I keeping this back after closer look.
>>
>> I think we have to remove this:
>>
>> if (nlh->nlmsg_len < NLMSG_HDRLEN || <---
>> skb->len < NLMSG_HDRLEN + sizeof(struct nfgenmsg))
>> return;
>>
>> in nfnetlink_rcv_skb_batch()
>>
>> now that we make this unfront check from nfnetlink_rcv().
>
> BTW, I can just mangle your patch here to delete such line to speed up
> things. See the mangled patch that is attached to this email.

Sure, I think the condition in nfnetlink_rcv_skb_batch() can be now
safely removed. Feel free to proceed with the mangled patch. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 08/26] locking: Remove spin_unlock_wait() generic definitions

2017-06-30 Thread Will Deacon
On Fri, Jun 30, 2017 at 05:38:15AM -0700, Paul E. McKenney wrote:
> On Fri, Jun 30, 2017 at 10:19:29AM +0100, Will Deacon wrote:
> > On Thu, Jun 29, 2017 at 05:01:16PM -0700, Paul E. McKenney wrote:
> > > There is no agreed-upon definition of spin_unlock_wait()'s semantics,
> > > and it appears that all callers could do just as well with a lock/unlock
> > > pair.  This commit therefore removes spin_unlock_wait() and related
> > > definitions from core code.
> > > 
> > > Signed-off-by: Paul E. McKenney 
> > > Cc: Arnd Bergmann 
> > > Cc: Ingo Molnar 
> > > Cc: Will Deacon 
> > > Cc: Peter Zijlstra 
> > > Cc: Alan Stern 
> > > Cc: Andrea Parri 
> > > Cc: Linus Torvalds 
> > > ---
> > >  include/asm-generic/qspinlock.h |  14 -
> > >  include/linux/spinlock.h|  31 ---
> > >  include/linux/spinlock_up.h |   6 ---
> > >  kernel/locking/qspinlock.c  | 117 
> > > 
> > >  4 files changed, 168 deletions(-)
> > 
> > [...]
> > 
> > > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> > > index b2caec7315af..64a9051e4c2c 100644
> > > --- a/kernel/locking/qspinlock.c
> > > +++ b/kernel/locking/qspinlock.c
> > > @@ -267,123 +267,6 @@ static __always_inline u32  
> > > __pv_wait_head_or_lock(struct qspinlock *lock,
> > >  #define queued_spin_lock_slowpathnative_queued_spin_lock_slowpath
> > >  #endif
> > >  
> > > -/*
> > > - * Various notes on spin_is_locked() and spin_unlock_wait(), which are
> > > - * 'interesting' functions:
> > > - *
> > > - * PROBLEM: some architectures have an interesting issue with atomic 
> > > ACQUIRE
> > > - * operations in that the ACQUIRE applies to the LOAD _not_ the STORE 
> > > (ARM64,
> > > - * PPC). Also qspinlock has a similar issue per construction, the 
> > > setting of
> > > - * the locked byte can be unordered acquiring the lock proper.
> > > - *
> > > - * This gets to be 'interesting' in the following cases, where the 
> > > /should/s
> > > - * end up false because of this issue.
> > > - *
> > > - *
> > > - * CASE 1:
> > > - *
> > > - * So the spin_is_locked() correctness issue comes from something like:
> > > - *
> > > - *   CPU0CPU1
> > > - *
> > > - *   global_lock();  local_lock(i)
> > > - * spin_lock()   spin_lock([i])
> > > - * for (i) if (!spin_is_locked()) {
> > > - *   spin_unlock_wait([i]);
> > > smp_acquire__after_ctrl_dep();
> > > - *   return;
> > > - * }
> > > - * // deal with fail
> > > - *
> > > - * Where it is important CPU1 sees G locked or CPU0 sees L[i] locked such
> > > - * that there is exclusion between the two critical sections.
> > > - *
> > > - * The load from spin_is_locked() /should/ be constrained by the 
> > > ACQUIRE from
> > > - * spin_lock([i]), and similarly the load(s) from 
> > > spin_unlock_wait([i])
> > > - * /should/ be constrained by the ACQUIRE from spin_lock().
> > > - *
> > > - * Similarly, later stuff is constrained by the ACQUIRE from CTRL+RMB.
> > 
> > Might be worth keeping this comment about spin_is_locked, since we're not
> > removing that guy just yet!
> 
> Ah, all the examples had spin_unlock_wait() in them.  So what I need to
> do is to create a spin_unlock_wait()-free example to illustrate the
> text starting with "The load from spin_is_locked(", correct?

Yeah, I think so.

> I also need to check all uses of spin_is_locked().  There might no
> longer be any that rely on any particular ordering...

Right. I think we're looking for the "insane case" as per 38b850a73034
(which was apparently used by ipc/sem.c at the time, but no longer).

There's a usage in kernel/debug/debug_core.c, but it doesn't fill me with
joy.

Will
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair

2017-06-30 Thread Paul E. McKenney
On Fri, Jun 30, 2017 at 01:04:45PM +0200, Oleg Nesterov wrote:
> On 06/29, Paul E. McKenney wrote:
> >
> > --- a/kernel/task_work.c
> > +++ b/kernel/task_work.c
> > @@ -109,7 +109,8 @@ void task_work_run(void)
> >  * the first entry == work, cmpxchg(task_works) should
> >  * fail, but it can play with *work and other entries.
> >  */
> > -   raw_spin_unlock_wait(>pi_lock);
> > +   raw_spin_lock(>pi_lock);
> > +   raw_spin_unlock(>pi_lock);
> 
> Well, bit the you need spin_lock_irq(). And this is one of the reasons
> why I personally think unlock_wait have some sense...

Good catch, and I clearly need to double-check the other commits for
any need to disable interrupts.  Anyway, like this, with the addition
of a flags variable, correct?

> > +   raw_spin_lock_irq(>pi_lock);
> > +   raw_spin_unlock_irq(>pi_lock);

I agree that the spin_unlock_wait() implementations would avoid the
deadlock with an acquisition from an interrupt handler, while also
avoiding the need to momentarily disable interrupts.  The ->pi_lock is
a per-task lock, so I am assuming (perhaps naively) that contention is
not a problem.  So is the overhead of interrupt disabling likely to be
noticeable here?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 08/26] locking: Remove spin_unlock_wait() generic definitions

2017-06-30 Thread Paul E. McKenney
On Fri, Jun 30, 2017 at 10:19:29AM +0100, Will Deacon wrote:
> On Thu, Jun 29, 2017 at 05:01:16PM -0700, Paul E. McKenney wrote:
> > There is no agreed-upon definition of spin_unlock_wait()'s semantics,
> > and it appears that all callers could do just as well with a lock/unlock
> > pair.  This commit therefore removes spin_unlock_wait() and related
> > definitions from core code.
> > 
> > Signed-off-by: Paul E. McKenney 
> > Cc: Arnd Bergmann 
> > Cc: Ingo Molnar 
> > Cc: Will Deacon 
> > Cc: Peter Zijlstra 
> > Cc: Alan Stern 
> > Cc: Andrea Parri 
> > Cc: Linus Torvalds 
> > ---
> >  include/asm-generic/qspinlock.h |  14 -
> >  include/linux/spinlock.h|  31 ---
> >  include/linux/spinlock_up.h |   6 ---
> >  kernel/locking/qspinlock.c  | 117 
> > 
> >  4 files changed, 168 deletions(-)
> 
> [...]
> 
> > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> > index b2caec7315af..64a9051e4c2c 100644
> > --- a/kernel/locking/qspinlock.c
> > +++ b/kernel/locking/qspinlock.c
> > @@ -267,123 +267,6 @@ static __always_inline u32  
> > __pv_wait_head_or_lock(struct qspinlock *lock,
> >  #define queued_spin_lock_slowpath  native_queued_spin_lock_slowpath
> >  #endif
> >  
> > -/*
> > - * Various notes on spin_is_locked() and spin_unlock_wait(), which are
> > - * 'interesting' functions:
> > - *
> > - * PROBLEM: some architectures have an interesting issue with atomic 
> > ACQUIRE
> > - * operations in that the ACQUIRE applies to the LOAD _not_ the STORE 
> > (ARM64,
> > - * PPC). Also qspinlock has a similar issue per construction, the setting 
> > of
> > - * the locked byte can be unordered acquiring the lock proper.
> > - *
> > - * This gets to be 'interesting' in the following cases, where the 
> > /should/s
> > - * end up false because of this issue.
> > - *
> > - *
> > - * CASE 1:
> > - *
> > - * So the spin_is_locked() correctness issue comes from something like:
> > - *
> > - *   CPU0  CPU1
> > - *
> > - *   global_lock();local_lock(i)
> > - * spin_lock() spin_lock([i])
> > - * for (i)   if (!spin_is_locked()) {
> > - *   spin_unlock_wait([i]);  smp_acquire__after_ctrl_dep();
> > - * return;
> > - *   }
> > - *   // deal with fail
> > - *
> > - * Where it is important CPU1 sees G locked or CPU0 sees L[i] locked such
> > - * that there is exclusion between the two critical sections.
> > - *
> > - * The load from spin_is_locked() /should/ be constrained by the ACQUIRE 
> > from
> > - * spin_lock([i]), and similarly the load(s) from spin_unlock_wait([i])
> > - * /should/ be constrained by the ACQUIRE from spin_lock().
> > - *
> > - * Similarly, later stuff is constrained by the ACQUIRE from CTRL+RMB.
> 
> Might be worth keeping this comment about spin_is_locked, since we're not
> removing that guy just yet!

Ah, all the examples had spin_unlock_wait() in them.  So what I need to
do is to create a spin_unlock_wait()-free example to illustrate the
text starting with "The load from spin_is_locked(", correct?

I also need to check all uses of spin_is_locked().  There might no
longer be any that rely on any particular ordering...

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 03/26] sched: Replace spin_unlock_wait() with lock/unlock pair

2017-06-30 Thread Paul E. McKenney
On Fri, Jun 30, 2017 at 12:31:50PM +0200, Arnd Bergmann wrote:
> On Fri, Jun 30, 2017 at 2:01 AM, Paul E. McKenney
>  wrote:
> > There is no agreed-upon definition of spin_unlock_wait()'s semantics,
> > and it appears that all callers could do just as well with a lock/unlock
> > pair.  This commit therefore replaces the spin_unlock_wait() call in
> > do_task_dead() with spin_lock() followed immediately by spin_unlock().
> > This should be safe from a performance perspective because the lock is
> > this tasks ->pi_lock, and this is called only after the task exits.
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index e91138fcde86..6dea3d9728c8 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3461,7 +3461,8 @@ void __noreturn do_task_dead(void)
> >  * is held by try_to_wake_up()
> >  */
> > smp_mb();
> > -   raw_spin_unlock_wait(>pi_lock);
> > +   raw_spin_lock(>pi_lock);
> > +   raw_spin_unlock(>pi_lock);
> 
> Does the raw_spin_lock()/raw_spin_unlock() imply an smp_mb() or stronger?
> Maybe it would be clearer to remove the extra barrier if so.

No, it does not in general, but it does on most architectures, and
there are ways to allow those architectures to gain the benefit of their
stronger locks.  For example, would this work?  

> >  * is held by try_to_wake_up()
> >  */
> > -   smp_mb();
> > -   raw_spin_unlock_wait(>pi_lock);
> > +   smp_mb__before_spinlock();
> > +   raw_spin_lock(>pi_lock);
> > +   raw_spin_unlock(>pi_lock);

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] Netfilter (+folks) userday in Faro, Portugal (Monday, July 3th)

2017-06-30 Thread Pablo Neira Ayuso
Hi!

We're releasing today the planning for our Netfilter (and folks)
userday in Faro, Portugal. This is the pre-event that traditionally
come before the Netfilter developer Workshop.

This is the planning:

9h: Welcoming (Pablo Neira)
9h15:   Suricata and its interaction with Netfilter (Eric Leblond)
10h:Coffe break
10h15:  OVS and OVN (Joe Stringer)
10h50:  Introduction to kasan and syzkaller (Florian Westphal)
11h25:  Netfilter logging (Eric Leblond)
12h:Lunch
14: Linux network stack walkthrough (Florian Westphal)
14h50:  Hands-on: migrate your laptop to nftables (Netfilter coreteam)
15h:Coffe break
15h10:  TCP/IP is Boring: A tour of cellular protocol stacks (Harald Welte)
16h:Closing

This event will take place in the Campus da Penha, in the University
of Algarve, Building 9, room 2.25 (second floor).

This event is public, free (no cost) and it requires no previous
invitation. If you're in the area, you are welcome to join us.

Thanks!

P.S: Please, note that the Netfilter developer workshop starts on July
 4th and it requires an invitation to attend indeed.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 25/29] netfilter, kbuild: use canonical method to specify objs.

2017-06-30 Thread David Laight
From: Pablo Neira Ayuso
> Sent: 29 June 2017 23:53
> Should use ":=" instead of "+=".
...
> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
> index c9b78e7b342f..913380919301 100644
> --- a/net/netfilter/Makefile
> +++ b/net/netfilter/Makefile
> @@ -70,10 +70,9 @@ obj-$(CONFIG_NETFILTER_SYNPROXY) += nf_synproxy_core.o
>  obj-$(CONFIG_NF_DUP_NETDEV)  += nf_dup_netdev.o
> 
>  # nf_tables
> -nf_tables-objs += nf_tables_core.o nf_tables_api.o nf_tables_trace.o
> -nf_tables-objs += nft_immediate.o nft_cmp.o nft_range.o
> -nf_tables-objs += nft_bitwise.o nft_byteorder.o nft_payload.o
> -nf_tables-objs += nft_lookup.o nft_dynset.o
> +nf_tables-objs := nf_tables_core.o nf_tables_api.o nf_tables_trace.o \
> +   nft_immediate.o nft_cmp.o nft_range.o nft_bitwise.o \
> +   nft_byteorder.o nft_payload.o nft_lookup.o nft_dynset.o

Why?
My preference is to add each file on its own line.
Generates much less churn when files are added or removed.

David

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair

2017-06-30 Thread Oleg Nesterov
On 06/29, Paul E. McKenney wrote:
>
> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -109,7 +109,8 @@ void task_work_run(void)
>* the first entry == work, cmpxchg(task_works) should
>* fail, but it can play with *work and other entries.
>*/
> - raw_spin_unlock_wait(>pi_lock);
> + raw_spin_lock(>pi_lock);
> + raw_spin_unlock(>pi_lock);

Well, bit the you need spin_lock_irq(). And this is one of the reasons
why I personally think unlock_wait have some sense...

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nft 2/3] include: Remove __init macro definition.

2017-06-30 Thread Pablo Neira Ayuso
On Fri, Jun 30, 2017 at 02:57:12PM +0530, Varsha Rao wrote:
> Add nft_init function, which calls _init functions in main.c file.
> Remove __init macro definition as libnftables library will be created
> soon. Rename realm_table_init() function to avoid ambiguity as
> realm_table_rt_init() and realm_table_meta_init() in rt.c and meta.c
> files.
> 
> Signed-off-by: Varsha Rao 
> ---
>  include/nftables.h |  9 +
>  include/utils.h|  1 -
>  src/ct.c   |  2 +-
>  src/datatype.c |  2 +-
>  src/gmputil.c  |  2 +-
>  src/main.c | 15 +++
>  src/meta.c |  4 ++--
>  src/netlink.c  |  2 +-
>  src/rt.c   |  2 +-
>  src/xt.c   |  2 +-
>  10 files changed, 32 insertions(+), 9 deletions(-)
> 
> diff --git a/include/nftables.h b/include/nftables.h
> index 26fd344..b188b9e 100644
> --- a/include/nftables.h
> +++ b/include/nftables.h
> @@ -117,5 +117,14 @@ struct parser_state;
>  
>  int nft_run(struct nft_ctx *nft, void *scanner, struct parser_state *state,
>   struct list_head *msgs);
> +void ct_label_table_init(void);
> +void mark_table_init(void);
> +void gmp_init(void);
> +void realm_table_rt_init(void);
> +void devgroup_table_init(void);
> +void netlink_open_sock(void);

OK, so before I apply this, I would like that we remove
netlink_open_sock() from nft_init().

Could you make a patch that does the following (in steps):

1) Add a new struct mnl_socket * field to struct nft_ctx, you can name
   this new field as 'nf_sock'.
2) Call netlink_open_sock() from nft_netlink(), at the very beginning,
   so you set ctx->nf_sock.
3) Use ctx->nf_sock everywhere in src/netlink.c and src/mnl.c,
   so this is not global anymore.

All this in one single patch. As a result, we don't need to call
netlink_open_sock() from nft_init() anymore.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nft 1/3] include: Remove datatype_register().

2017-06-30 Thread Pablo Neira Ayuso
On Fri, Jun 30, 2017 at 02:56:19PM +0530, Varsha Rao wrote:
> Remove datatype_register() function and its calling __init functions.
> Add arguments of datatype_register() function to datatype array.

Applied, thanks Varsha.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 03/26] sched: Replace spin_unlock_wait() with lock/unlock pair

2017-06-30 Thread Arnd Bergmann
On Fri, Jun 30, 2017 at 2:01 AM, Paul E. McKenney
 wrote:
> There is no agreed-upon definition of spin_unlock_wait()'s semantics,
> and it appears that all callers could do just as well with a lock/unlock
> pair.  This commit therefore replaces the spin_unlock_wait() call in
> do_task_dead() with spin_lock() followed immediately by spin_unlock().
> This should be safe from a performance perspective because the lock is
> this tasks ->pi_lock, and this is called only after the task exits.
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index e91138fcde86..6dea3d9728c8 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3461,7 +3461,8 @@ void __noreturn do_task_dead(void)
>  * is held by try_to_wake_up()
>  */
> smp_mb();
> -   raw_spin_unlock_wait(>pi_lock);
> +   raw_spin_lock(>pi_lock);
> +   raw_spin_unlock(>pi_lock);

Does the raw_spin_lock()/raw_spin_unlock() imply an smp_mb() or stronger?
Maybe it would be clearer to remove the extra barrier if so.

 Arnd
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nft 3/3] include: Remove __exit macro definition.

2017-06-30 Thread Varsha Rao
Add nft_exit function, which calls _exit functions in main.c file.
Remove __exit macro definition as libnftables library will be created
soon. Rename realm_table_exit() function to avoid ambiguity as
realm_table_rt_exit() and realm_table_meta_exit() in rt.c and meta.c
files.

Signed-off-by: Varsha Rao 
---
 include/nftables.h |  9 +
 include/utils.h|  1 -
 src/ct.c   |  2 +-
 src/datatype.c |  2 +-
 src/main.c | 11 +++
 src/meta.c |  4 ++--
 src/netlink.c  |  2 +-
 src/rt.c   |  2 +-
 8 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/include/nftables.h b/include/nftables.h
index b188b9e..5d1a783 100644
--- a/include/nftables.h
+++ b/include/nftables.h
@@ -117,6 +117,7 @@ struct parser_state;
 
 int nft_run(struct nft_ctx *nft, void *scanner, struct parser_state *state,
struct list_head *msgs);
+
 void ct_label_table_init(void);
 void mark_table_init(void);
 void gmp_init(void);
@@ -127,4 +128,12 @@ void realm_table_meta_init(void);
 void xt_init(void);
 void nft_init(void);
 
+void ct_label_table_exit(void);
+void mark_table_exit(void);
+void realm_table_meta_exit(void);
+void devgroup_table_exit(void);
+void netlink_close_sock(void);
+void realm_table_rt_exit(void);
+void nft_exit(void);
+
 #endif /* NFTABLES_NFTABLES_H */
diff --git a/include/utils.h b/include/utils.h
index 0c3341b..0605eee 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -32,7 +32,6 @@
 #define __gmp_fmtstring(x, y)
 #endif
 
-#define __exit __attribute__((destructor))
 #define __must_check   __attribute__((warn_unused_result))
 #define __noreturn __attribute__((__noreturn__))
 
diff --git a/src/ct.c b/src/ct.c
index 25efc70..d64f467 100644
--- a/src/ct.c
+++ b/src/ct.c
@@ -210,7 +210,7 @@ void ct_label_table_init(void)
ct_label_tbl = rt_symbol_table_init(CONNLABEL_CONF);
 }
 
-static void __exit ct_label_table_exit(void)
+void ct_label_table_exit(void)
 {
rt_symbol_table_free(ct_label_tbl);
 }
diff --git a/src/datatype.c b/src/datatype.c
index 4f74c06..0d5ba51 100644
--- a/src/datatype.c
+++ b/src/datatype.c
@@ -722,7 +722,7 @@ void mark_table_init(void)
mark_tbl = rt_symbol_table_init("/etc/iproute2/rt_marks");
 }
 
-static void __exit mark_table_exit(void)
+void mark_table_exit(void)
 {
rt_symbol_table_free(mark_tbl);
 }
diff --git a/src/main.c b/src/main.c
index 301bce0..5da8bee 100644
--- a/src/main.c
+++ b/src/main.c
@@ -276,6 +276,16 @@ void nft_init(void)
 #endif
 }
 
+void nft_exit(void)
+{
+   netlink_close_sock();
+   ct_label_table_exit();
+   realm_table_rt_exit();
+   devgroup_table_exit();
+   realm_table_meta_exit();
+   mark_table_exit();
+}
+
 int main(int argc, char * const *argv)
 {
struct parser_state state;
@@ -412,6 +422,7 @@ out:
xfree(buf);
cache_release();
iface_cache_release();
+   nft_exit();
 
return rc;
 }
diff --git a/src/meta.c b/src/meta.c
index 4fb26d5..9c80893 100644
--- a/src/meta.c
+++ b/src/meta.c
@@ -42,7 +42,7 @@ void realm_table_meta_init(void)
realm_tbl = rt_symbol_table_init("/etc/iproute2/rt_realms");
 }
 
-static void __exit realm_table_exit(void)
+void realm_table_meta_exit(void)
 {
rt_symbol_table_free(realm_tbl);
 }
@@ -338,7 +338,7 @@ void devgroup_table_init(void)
devgroup_tbl = rt_symbol_table_init("/etc/iproute2/group");
 }
 
-static void __exit devgroup_table_exit(void)
+void devgroup_table_exit(void)
 {
rt_symbol_table_free(devgroup_tbl);
 }
diff --git a/src/netlink.c b/src/netlink.c
index 3993aa1..6d466c0 100644
--- a/src/netlink.c
+++ b/src/netlink.c
@@ -67,7 +67,7 @@ void netlink_open_sock(void)
fcntl(mnl_socket_get_fd(nf_sock), F_SETFL, O_NONBLOCK);
 }
 
-static void __exit netlink_close_sock(void)
+void netlink_close_sock(void)
 {
if (nf_sock)
mnl_socket_close(nf_sock);
diff --git a/src/rt.c b/src/rt.c
index 5f57cf0..cd2d5a4 100644
--- a/src/rt.c
+++ b/src/rt.c
@@ -29,7 +29,7 @@ void realm_table_rt_init(void)
realm_tbl = rt_symbol_table_init("/etc/iproute2/rt_realms");
 }
 
-static void __exit realm_table_exit(void)
+void realm_table_rt_exit(void)
 {
rt_symbol_table_free(realm_tbl);
 }
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nft 2/3] include: Remove __init macro definition.

2017-06-30 Thread Varsha Rao
Add nft_init function, which calls _init functions in main.c file.
Remove __init macro definition as libnftables library will be created
soon. Rename realm_table_init() function to avoid ambiguity as
realm_table_rt_init() and realm_table_meta_init() in rt.c and meta.c
files.

Signed-off-by: Varsha Rao 
---
 include/nftables.h |  9 +
 include/utils.h|  1 -
 src/ct.c   |  2 +-
 src/datatype.c |  2 +-
 src/gmputil.c  |  2 +-
 src/main.c | 15 +++
 src/meta.c |  4 ++--
 src/netlink.c  |  2 +-
 src/rt.c   |  2 +-
 src/xt.c   |  2 +-
 10 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/include/nftables.h b/include/nftables.h
index 26fd344..b188b9e 100644
--- a/include/nftables.h
+++ b/include/nftables.h
@@ -117,5 +117,14 @@ struct parser_state;
 
 int nft_run(struct nft_ctx *nft, void *scanner, struct parser_state *state,
struct list_head *msgs);
+void ct_label_table_init(void);
+void mark_table_init(void);
+void gmp_init(void);
+void realm_table_rt_init(void);
+void devgroup_table_init(void);
+void netlink_open_sock(void);
+void realm_table_meta_init(void);
+void xt_init(void);
+void nft_init(void);
 
 #endif /* NFTABLES_NFTABLES_H */
diff --git a/include/utils.h b/include/utils.h
index 3199388..0c3341b 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -32,7 +32,6 @@
 #define __gmp_fmtstring(x, y)
 #endif
 
-#define __init __attribute__((constructor))
 #define __exit __attribute__((destructor))
 #define __must_check   __attribute__((warn_unused_result))
 #define __noreturn __attribute__((__noreturn__))
diff --git a/src/ct.c b/src/ct.c
index 9b7140b..25efc70 100644
--- a/src/ct.c
+++ b/src/ct.c
@@ -205,7 +205,7 @@ static const struct datatype ct_label_type = {
.parse  = ct_label_type_parse,
 };
 
-static void __init ct_label_table_init(void)
+void ct_label_table_init(void)
 {
ct_label_tbl = rt_symbol_table_init(CONNLABEL_CONF);
 }
diff --git a/src/datatype.c b/src/datatype.c
index b3c8f66..4f74c06 100644
--- a/src/datatype.c
+++ b/src/datatype.c
@@ -717,7 +717,7 @@ void rt_symbol_table_free(struct symbol_table *tbl)
 }
 
 static struct symbol_table *mark_tbl;
-static void __init mark_table_init(void)
+void mark_table_init(void)
 {
mark_tbl = rt_symbol_table_init("/etc/iproute2/rt_marks");
 }
diff --git a/src/gmputil.c b/src/gmputil.c
index c763792..844ea61 100644
--- a/src/gmputil.c
+++ b/src/gmputil.c
@@ -207,7 +207,7 @@ static void *gmp_xrealloc(void *ptr, size_t old_size, 
size_t new_size)
return xrealloc(ptr, new_size);
 }
 
-static void __init gmp_init(void)
+void gmp_init(void)
 {
mp_set_memory_functions(xmalloc, gmp_xrealloc, NULL);
 }
diff --git a/src/main.c b/src/main.c
index 7fbf00a..301bce0 100644
--- a/src/main.c
+++ b/src/main.c
@@ -262,6 +262,20 @@ err1:
return ret;
 }
 
+void nft_init(void)
+{
+mark_table_init();
+realm_table_rt_init();
+devgroup_table_init();
+realm_table_meta_init();
+ct_label_table_init();
+netlink_open_sock();
+gmp_init();
+#ifdef HAVE_LIBXTABLES
+ xt_init();
+#endif
+}
+
 int main(int argc, char * const *argv)
 {
struct parser_state state;
@@ -272,6 +286,7 @@ int main(int argc, char * const *argv)
bool interactive = false;
int i, val, rc = NFT_EXIT_SUCCESS;
 
+   nft_init();
while (1) {
val = getopt_long(argc, argv, OPTSTRING, options, NULL);
if (val == -1)
diff --git a/src/meta.c b/src/meta.c
index e9334b8..4fb26d5 100644
--- a/src/meta.c
+++ b/src/meta.c
@@ -37,7 +37,7 @@
 #include 
 
 static struct symbol_table *realm_tbl;
-static void __init realm_table_init(void)
+void realm_table_meta_init(void)
 {
realm_tbl = rt_symbol_table_init("/etc/iproute2/rt_realms");
 }
@@ -333,7 +333,7 @@ const struct datatype pkttype_type = {
 };
 
 static struct symbol_table *devgroup_tbl;
-static void __init devgroup_table_init(void)
+void devgroup_table_init(void)
 {
devgroup_tbl = rt_symbol_table_init("/etc/iproute2/group");
 }
diff --git a/src/netlink.c b/src/netlink.c
index 880502c..3993aa1 100644
--- a/src/netlink.c
+++ b/src/netlink.c
@@ -61,7 +61,7 @@ static struct mnl_socket *nfsock_open(void)
return s;
 }
 
-static void __init netlink_open_sock(void)
+void netlink_open_sock(void)
 {
nf_sock = nfsock_open();
fcntl(mnl_socket_get_fd(nf_sock), F_SETFL, O_NONBLOCK);
diff --git a/src/rt.c b/src/rt.c
index 530ebe6..5f57cf0 100644
--- a/src/rt.c
+++ b/src/rt.c
@@ -24,7 +24,7 @@
 #include 
 
 static struct symbol_table *realm_tbl;
-static void __init realm_table_init(void)
+void realm_table_rt_init(void)
 {
realm_tbl = rt_symbol_table_init("/etc/iproute2/rt_realms");
 }
diff --git a/src/xt.c b/src/xt.c
index e24b0af..9680f8e 100644
--- a/src/xt.c
+++ b/src/xt.c
@@ -351,7 +351,7 @@ 

[PATCH nft 1/3] include: Remove datatype_register().

2017-06-30 Thread Varsha Rao
Remove datatype_register() function and its calling __init functions.
Add arguments of datatype_register() function to datatype array.

Signed-off-by: Varsha Rao 
---
 include/ct.h   |  3 +++
 include/datatype.h |  1 -
 include/exthdr.h   |  1 +
 include/fib.h  |  2 ++
 include/meta.h |  7 ++-
 include/proto.h|  8 
 include/rt.h   |  1 +
 src/ct.c   | 13 +++--
 src/datatype.c | 25 +++--
 src/exthdr.c   |  7 +--
 src/fib.c  |  7 +--
 src/meta.c | 44 +---
 src/proto.c| 24 ++--
 src/rt.c   |  7 +--
 14 files changed, 57 insertions(+), 93 deletions(-)

diff --git a/include/ct.h b/include/ct.h
index ae900ee..b8cd7bf 100644
--- a/include/ct.h
+++ b/include/ct.h
@@ -23,6 +23,9 @@ struct ct_template {
.len= (__len),  \
 }
 
+extern const struct datatype ct_dir_type;
+extern const struct datatype ct_state_type;
+extern const struct datatype ct_status_type;
 extern struct expr *ct_expr_alloc(const struct location *loc,
  enum nft_ct_keys key, int8_t direction);
 extern void ct_expr_update_type(struct proto_ctx *ctx, struct expr *expr);
diff --git a/include/datatype.h b/include/datatype.h
index 58c4d3e..2e34591 100644
--- a/include/datatype.h
+++ b/include/datatype.h
@@ -152,7 +152,6 @@ struct datatype {
const struct symbol_table   *sym_tbl;
 };
 
-extern void datatype_register(const struct datatype *dtype);
 extern const struct datatype *datatype_lookup(enum datatypes type);
 extern const struct datatype *datatype_lookup_byname(const char *name);
 
diff --git a/include/exthdr.h b/include/exthdr.h
index a2647ee..97ccc38 100644
--- a/include/exthdr.h
+++ b/include/exthdr.h
@@ -89,5 +89,6 @@ extern const struct exthdr_desc exthdr_rt2;
 extern const struct exthdr_desc exthdr_frag;
 extern const struct exthdr_desc exthdr_dst;
 extern const struct exthdr_desc exthdr_mh;
+extern const struct datatype mh_type_type;
 
 #endif /* NFTABLES_EXTHDR_H */
diff --git a/include/fib.h b/include/fib.h
index 3a019e6..9ce681c 100644
--- a/include/fib.h
+++ b/include/fib.h
@@ -4,4 +4,6 @@
 extern struct expr *fib_expr_alloc(const struct location *loc,
   unsigned int flags,
   unsigned int result);
+extern const struct datatype fib_addr_type;
+
 #endif /* NFTABLES_FIB_H */
diff --git a/include/meta.h b/include/meta.h
index 5578460..ebef7d8 100644
--- a/include/meta.h
+++ b/include/meta.h
@@ -28,7 +28,12 @@ extern struct expr *meta_expr_alloc(const struct location 
*loc,
 
 struct stmt *meta_stmt_meta_iiftype(const struct location *loc, uint16_t type);
 
-const struct datatype ifindex_type;
+extern const struct datatype ifindex_type;
+extern const struct datatype tchandle_type;
+extern const struct datatype gid_type;
+extern const struct datatype uid_type;
+extern const struct datatype devgroup_type;
+extern const struct datatype pkttype_type;
 
 struct error_record *meta_key_parse(const struct location *loc,
const char *name,
diff --git a/include/proto.h b/include/proto.h
index 01188ab..39aa485 100644
--- a/include/proto.h
+++ b/include/proto.h
@@ -322,4 +322,12 @@ extern const struct proto_desc proto_netdev;
 extern const struct proto_desc proto_unknown;
 extern const struct proto_hdr_template proto_unknown_template;
 
+extern const struct datatype icmp_type_type;
+extern const struct datatype tcp_flag_type;
+extern const struct datatype dccp_pkttype_type;
+extern const struct datatype arpop_type;
+extern const struct datatype icmp6_type_type;
+extern const struct datatype dscp_type;
+extern const struct datatype ecn_type;
+
 #endif /* NFTABLES_PROTO_H */
diff --git a/include/rt.h b/include/rt.h
index 728cf5f..bbffa74 100644
--- a/include/rt.h
+++ b/include/rt.h
@@ -26,6 +26,7 @@ struct rt_template {
.invalid= (__invalid),  \
 }
 
+extern const struct datatype realm_type;
 extern struct expr *rt_expr_alloc(const struct location *loc,
  enum nft_rt_keys key, bool invalid);
 extern void rt_expr_update_type(struct proto_ctx *ctx, struct expr *expr);
diff --git a/src/ct.c b/src/ct.c
index c705750..9b7140b 100644
--- a/src/ct.c
+++ b/src/ct.c
@@ -44,7 +44,7 @@ static const struct symbol_table ct_state_tbl = {
}
 };
 
-static const struct datatype ct_state_type = {
+const struct datatype ct_state_type = {
.type   = TYPE_CT_STATE,
.name   = "ct_state",
.desc   = "conntrack state",
@@ -63,7 +63,7 @@ static const struct symbol_table ct_dir_tbl = {
}
 };
 
-static const struct datatype ct_dir_type = {
+const struct datatype ct_dir_type = {
.type   = TYPE_CT_DIR,
.name   = "ct_dir",
 

[PATCH nft 0/3] include: Remove datatype_register(), __init and __exit macros.

2017-06-30 Thread Varsha Rao
This patchset removes datatype_register() function, __init and __exit
macro definitions.

Varsha Rao (3):
  include: Remove datatype_register().
  include: Remove __init macro definition.
  include: Remove __exit macro definition.

 include/ct.h   |  3 +++
 include/datatype.h |  1 -
 include/exthdr.h   |  1 +
 include/fib.h  |  2 ++
 include/meta.h |  7 ++-
 include/nftables.h | 18 ++
 include/proto.h|  8 
 include/rt.h   |  1 +
 include/utils.h|  2 --
 src/ct.c   | 17 +
 src/datatype.c | 29 +
 src/exthdr.c   |  7 +--
 src/fib.c  |  7 +--
 src/gmputil.c  |  2 +-
 src/main.c | 26 ++
 src/meta.c | 52 +---
 src/netlink.c  |  4 ++--
 src/proto.c| 24 ++--
 src/rt.c   | 11 +++
 src/xt.c   |  2 +-
 20 files changed, 115 insertions(+), 109 deletions(-)

-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 12/26] arm64: Remove spin_unlock_wait() arch-specific definitions

2017-06-30 Thread Will Deacon
On Thu, Jun 29, 2017 at 05:01:20PM -0700, Paul E. McKenney wrote:
> There is no agreed-upon definition of spin_unlock_wait()'s semantics,
> and it appears that all callers could do just as well with a lock/unlock
> pair.  This commit therefore removes the underlying arch-specific
> arch_spin_unlock_wait().
> 
> Signed-off-by: Paul E. McKenney 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: 
> Cc: Peter Zijlstra 
> Cc: Alan Stern 
> Cc: Andrea Parri 
> Cc: Linus Torvalds 
> ---
>  arch/arm64/include/asm/spinlock.h | 58 
> ---
>  1 file changed, 5 insertions(+), 53 deletions(-)

I'm going to miss this code.

Acked-by: Will Deacon 

Will
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 08/26] locking: Remove spin_unlock_wait() generic definitions

2017-06-30 Thread Will Deacon
On Thu, Jun 29, 2017 at 05:01:16PM -0700, Paul E. McKenney wrote:
> There is no agreed-upon definition of spin_unlock_wait()'s semantics,
> and it appears that all callers could do just as well with a lock/unlock
> pair.  This commit therefore removes spin_unlock_wait() and related
> definitions from core code.
> 
> Signed-off-by: Paul E. McKenney 
> Cc: Arnd Bergmann 
> Cc: Ingo Molnar 
> Cc: Will Deacon 
> Cc: Peter Zijlstra 
> Cc: Alan Stern 
> Cc: Andrea Parri 
> Cc: Linus Torvalds 
> ---
>  include/asm-generic/qspinlock.h |  14 -
>  include/linux/spinlock.h|  31 ---
>  include/linux/spinlock_up.h |   6 ---
>  kernel/locking/qspinlock.c  | 117 
> 
>  4 files changed, 168 deletions(-)

[...]

> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> index b2caec7315af..64a9051e4c2c 100644
> --- a/kernel/locking/qspinlock.c
> +++ b/kernel/locking/qspinlock.c
> @@ -267,123 +267,6 @@ static __always_inline u32  
> __pv_wait_head_or_lock(struct qspinlock *lock,
>  #define queued_spin_lock_slowpathnative_queued_spin_lock_slowpath
>  #endif
>  
> -/*
> - * Various notes on spin_is_locked() and spin_unlock_wait(), which are
> - * 'interesting' functions:
> - *
> - * PROBLEM: some architectures have an interesting issue with atomic ACQUIRE
> - * operations in that the ACQUIRE applies to the LOAD _not_ the STORE (ARM64,
> - * PPC). Also qspinlock has a similar issue per construction, the setting of
> - * the locked byte can be unordered acquiring the lock proper.
> - *
> - * This gets to be 'interesting' in the following cases, where the /should/s
> - * end up false because of this issue.
> - *
> - *
> - * CASE 1:
> - *
> - * So the spin_is_locked() correctness issue comes from something like:
> - *
> - *   CPU0CPU1
> - *
> - *   global_lock();  local_lock(i)
> - * spin_lock()   spin_lock([i])
> - * for (i) if (!spin_is_locked()) {
> - *   spin_unlock_wait([i]);smp_acquire__after_ctrl_dep();
> - *   return;
> - * }
> - * // deal with fail
> - *
> - * Where it is important CPU1 sees G locked or CPU0 sees L[i] locked such
> - * that there is exclusion between the two critical sections.
> - *
> - * The load from spin_is_locked() /should/ be constrained by the ACQUIRE 
> from
> - * spin_lock([i]), and similarly the load(s) from spin_unlock_wait([i])
> - * /should/ be constrained by the ACQUIRE from spin_lock().
> - *
> - * Similarly, later stuff is constrained by the ACQUIRE from CTRL+RMB.

Might be worth keeping this comment about spin_is_locked, since we're not
removing that guy just yet!

Will
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html