Re: [PATCH -v2 4/6] locking, arch: Update spin_unlock_wait()

2016-05-27 Thread Chris Metcalf

On 5/27/2016 5:05 AM, Peter Zijlstra wrote:

On Thu, May 26, 2016 at 05:10:36PM -0400, Chris Metcalf wrote:

On 5/26/2016 10:19 AM, Peter Zijlstra wrote:

--- a/arch/tile/lib/spinlock_32.c
+++ b/arch/tile/lib/spinlock_32.c
@@ -72,10 +72,14 @@ void arch_spin_unlock_wait(arch_spinlock
if (next == curr)
return;
+   smp_rmb();
+
/* Wait until the current locker has released the lock. */
do {
delay_backoff(iterations++);
} while (READ_ONCE(lock->current_ticket) == curr);
+
+   smp_acquire__after_ctrl_dep();
  }
  EXPORT_SYMBOL(arch_spin_unlock_wait);
--- a/arch/tile/lib/spinlock_64.c
+++ b/arch/tile/lib/spinlock_64.c
@@ -72,10 +72,14 @@ void arch_spin_unlock_wait(arch_spinlock
if (arch_spin_next(val) == curr)
return;
+   smp_rmb();
+
/* Wait until the current locker has released the lock. */
do {
delay_backoff(iterations++);
} while (arch_spin_current(READ_ONCE(lock->lock)) == curr);
+
+   smp_acquire__after_ctrl_dep();
  }
  EXPORT_SYMBOL(arch_spin_unlock_wait);

The smp_rmb() are unnecessary for tile.  We READ_ONCE next/curr from the
lock and compare them, so we know the load(s) are complete.  There's no
microarchitectural speculation going on so that's that.  Then we READ_ONCE
the next load on the lock from within the wait loop, so our load/load
ordering is guaranteed.

Does TILE never speculate reads? Because in that case the control
dependency already provides a full load->load,store barrier and you'd
want smp_acquire__after_ctrl_dep() to be a barrier() instead of
smp_rmb().


Yes, that's a good point.  I didn't look at the definition of 
smp_acquire__after_ctrl_dep(),
but it certainly sounds like that's exactly a compiler barrier for tile.  There 
is no load
speculation performed.  The only out-of-order stuff that happens is in the 
memory
subsystem: stores will become visible in arbitrary order, and loads will arrive 
in
arbitrary order, but as soon as the result of a load is used in any other kind 
of
instruction, the instruction issue will halt until the pending load(s) for the 
instruction
operands are available.

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf] netfilter: nf_tables: validate NFTA_SET_TABLE parameter

2016-05-27 Thread Phil Turnbull
If the NFTA_SET_TABLE parameter is missing and the NLM_F_DUMP flag is
not set, then a NULL pointer dereference is triggered in
nf_tables_set_lookup because ctx.table is NULL.

Signed-off-by: Phil Turnbull 
---
 net/netfilter/nf_tables_api.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index c444929..569ba6a 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2647,6 +2647,8 @@ static int nf_tables_getset(struct net *net, struct sock 
*nlsk,
/* Only accept unspec with dump */
if (nfmsg->nfgen_family == NFPROTO_UNSPEC)
return -EAFNOSUPPORT;
+   if (!nla[NFTA_SET_TABLE])
+   return -EINVAL;
 
set = nf_tables_set_lookup(ctx.table, nla[NFTA_SET_NAME]);
if (IS_ERR(set))
-- 
2.8.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2 libnftnl] set_elem: Copy user data memory

2016-05-27 Thread Carlos Falgueras García
All attributes are passed by copy, so user data should be copied too.

Signed-off-by: Carlos Falgueras García 
---
 src/set_elem.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/set_elem.c b/src/set_elem.c
index 990be24..b9c7e1e 100644
--- a/src/set_elem.c
+++ b/src/set_elem.c
@@ -125,8 +125,14 @@ void nftnl_set_elem_set(struct nftnl_set_elem *s, uint16_t 
attr,
s->timeout = *((uint64_t *)data);
break;
case NFTNL_SET_ELEM_USERDATA: /* NFTA_SET_ELEM_USERDATA */
-   s->user.data = (void *)data;
-   s->user.len  = data_len;
+   if (s->user.data != NULL)
+   xfree(s->user.data);
+
+   s->user.data = malloc(data_len);
+   if (!s->user.data)
+   return;
+   memcpy(s->user.data, data, data_len);
+   s->user.len = data_len;
break;
default:
return;
-- 
2.8.2

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2 nft] set_elem: Use libnftnl/udata to store set element comment

2016-05-27 Thread Carlos Falgueras García
The set element comment is stored in nftnl_set_elem->user.data using
libnftnl/udata infrastructure. This allows store multiple variable length
user data into set element.

Signed-off-by: Carlos Falgueras García 
---
 src/netlink.c | 50 ++
 1 file changed, 46 insertions(+), 4 deletions(-)

diff --git a/src/netlink.c b/src/netlink.c
index b0dcb90..f82d4fa 100644
--- a/src/netlink.c
+++ b/src/netlink.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -208,6 +209,7 @@ static struct nftnl_set_elem *alloc_nftnl_setelem(const 
struct expr *expr)
const struct expr *elem, *key, *data;
struct nftnl_set_elem *nlse;
struct nft_data_linearize nld;
+   struct nftnl_udata_buf *udbuf;
 
nlse = nftnl_set_elem_alloc();
if (nlse == NULL)
@@ -228,9 +230,18 @@ static struct nftnl_set_elem *alloc_nftnl_setelem(const 
struct expr *expr)
if (elem->timeout)
nftnl_set_elem_set_u64(nlse, NFTNL_SET_ELEM_TIMEOUT,
   elem->timeout);
-   if (elem->comment)
+   if (elem->comment) {
+   udbuf = nftnl_udata_buf_alloc(NFT_USERDATA_MAXLEN);
+   if (!udbuf)
+   memory_allocation_error();
+   if (!nftnl_udata_put_strz(udbuf, UDATA_TYPE_COMMENT,
+ elem->comment))
+   memory_allocation_error();
nftnl_set_elem_set(nlse, NFTNL_SET_ELEM_USERDATA,
-  elem->comment, strlen(elem->comment) + 1);
+  nftnl_udata_buf_data(udbuf),
+  nftnl_udata_buf_len(udbuf));
+   nftnl_udata_buf_free(udbuf);
+   }
 
if (data != NULL) {
netlink_gen_data(data, );
@@ -1421,6 +1432,38 @@ static struct expr *netlink_parse_concat_elem(const 
struct datatype *dtype,
return concat;
 }
 
+static int parse_udata_cb(const struct nftnl_udata *attr, void *data)
+{
+   unsigned char *value = nftnl_udata_get(attr);
+   uint8_t type = nftnl_udata_type(attr);
+   uint8_t len = nftnl_udata_len(attr);
+   const struct nftnl_udata **tb = data;
+
+   switch (type) {
+   case UDATA_TYPE_COMMENT:
+   if (value[len - 1] != '\0')
+   return -1;
+   break;
+   default:
+   return 0;
+   }
+   tb[type] = attr;
+   return 0;
+}
+
+static char *udata_get_comment(const void *data, uint32_t data_len)
+{
+   const struct nftnl_udata *tb[UDATA_TYPE_MAX + 1] = {};
+
+   if (nftnl_udata_parse(data, data_len, parse_udata_cb, tb) < 0)
+   return NULL;
+
+   if (!tb[UDATA_TYPE_COMMENT])
+   return NULL;
+
+   return xstrdup(nftnl_udata_get(tb[UDATA_TYPE_COMMENT]));
+}
+
 static int netlink_delinearize_setelem(struct nftnl_set_elem *nlse,
   const struct set *set)
 {
@@ -1457,8 +1500,7 @@ static int netlink_delinearize_setelem(struct 
nftnl_set_elem *nlse,
uint32_t len;
 
data = nftnl_set_elem_get(nlse, NFTNL_SET_ELEM_USERDATA, );
-   expr->comment = xmalloc(len);
-   memcpy((char *)expr->comment, data, len);
+   expr->comment = udata_get_comment(data, len);
}
if (nftnl_set_elem_is_set(nlse, NFT_SET_ELEM_ATTR_EXPR)) {
const struct nftnl_expr *nle;
-- 
2.8.2

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v2 4/6] locking, arch: Update spin_unlock_wait()

2016-05-27 Thread Peter Zijlstra
On Thu, May 26, 2016 at 05:10:36PM -0400, Chris Metcalf wrote:
> On 5/26/2016 10:19 AM, Peter Zijlstra wrote:
> >--- a/arch/tile/lib/spinlock_32.c
> >+++ b/arch/tile/lib/spinlock_32.c
> >@@ -72,10 +72,14 @@ void arch_spin_unlock_wait(arch_spinlock
> > if (next == curr)
> > return;
> >+smp_rmb();
> >+
> > /* Wait until the current locker has released the lock. */
> > do {
> > delay_backoff(iterations++);
> > } while (READ_ONCE(lock->current_ticket) == curr);
> >+
> >+smp_acquire__after_ctrl_dep();
> >  }
> >  EXPORT_SYMBOL(arch_spin_unlock_wait);
> >--- a/arch/tile/lib/spinlock_64.c
> >+++ b/arch/tile/lib/spinlock_64.c
> >@@ -72,10 +72,14 @@ void arch_spin_unlock_wait(arch_spinlock
> > if (arch_spin_next(val) == curr)
> > return;
> >+smp_rmb();
> >+
> > /* Wait until the current locker has released the lock. */
> > do {
> > delay_backoff(iterations++);
> > } while (arch_spin_current(READ_ONCE(lock->lock)) == curr);
> >+
> >+smp_acquire__after_ctrl_dep();
> >  }
> >  EXPORT_SYMBOL(arch_spin_unlock_wait);
> 
> The smp_rmb() are unnecessary for tile.  We READ_ONCE next/curr from the
> lock and compare them, so we know the load(s) are complete.  There's no
> microarchitectural speculation going on so that's that.  Then we READ_ONCE
> the next load on the lock from within the wait loop, so our load/load
> ordering is guaranteed.

Does TILE never speculate reads? Because in that case the control
dependency already provides a full load->load,store barrier and you'd
want smp_acquire__after_ctrl_dep() to be a barrier() instead of
smp_rmb().

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v2 4/6] locking, arch: Update spin_unlock_wait()

2016-05-27 Thread Peter Zijlstra
On Fri, May 27, 2016 at 08:46:49AM +0200, Martin Schwidefsky wrote:
> > This fixes a number of spin_unlock_wait() users that (not
> > unreasonably) rely on this.
> 
> All that is missing is an smp_rmb(), no? 

Indeed.

> > --- a/arch/s390/include/asm/spinlock.h
> > +++ b/arch/s390/include/asm/spinlock.h
> > @@ -95,8 +95,7 @@ static inline void arch_spin_unlock(arch
> > 
> >  static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
> >  {
> > -   while (arch_spin_is_locked(lock))
> > -   arch_spin_relax(lock);
> > +   smp_cond_load_acquire(>lock, !VAL);
> >  }
> > 
> >  /*
> 
> This change adds the smp_rmb() at the end of the waiting loop, but
> it also replaces arch_spin_relax() alias arch_lock_relax() with a
> cpu_relax(). This is not good, these two functions do *very* different
> things. cpu_relax() does an undirected yield with diagnose 0x44 but
> only if the system is non-SMT. arch_lock_relax() does an additional
> cpu_is_preempted() to test if the target cpu is running and does a
> directed yield with diagnose 0x9c. 
> 
> Why can't we just add the smp_rmb() to the arch_spin_unlock_wait()?

We can; I forgot about the special cpu_relax on s390, will fix.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v2 4/6] locking, arch: Update spin_unlock_wait()

2016-05-27 Thread Martin Schwidefsky
On Thu, 26 May 2016 16:19:26 +0200
Peter Zijlstra  wrote:

> This patch updates/fixes all spin_unlock_wait() implementations.
> 
> The update is in semantics; where it previously was only a control
> dependency, we now upgrade to a full load-acquire to match the
> store-release from the spin_unlock() we waited on. This ensures that
> when spin_unlock_wait() returns, we're guaranteed to observe the full
> critical section we waited on.
> 
> This fixes a number of spin_unlock_wait() users that (not
> unreasonably) rely on this.

All that is missing is an smp_rmb(), no? 

> --- a/arch/s390/include/asm/spinlock.h
> +++ b/arch/s390/include/asm/spinlock.h
> @@ -95,8 +95,7 @@ static inline void arch_spin_unlock(arch
> 
>  static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
>  {
> - while (arch_spin_is_locked(lock))
> - arch_spin_relax(lock);
> + smp_cond_load_acquire(>lock, !VAL);
>  }
> 
>  /*

This change adds the smp_rmb() at the end of the waiting loop, but
it also replaces arch_spin_relax() alias arch_lock_relax() with a
cpu_relax(). This is not good, these two functions do *very* different
things. cpu_relax() does an undirected yield with diagnose 0x44 but
only if the system is non-SMT. arch_lock_relax() does an additional
cpu_is_preempted() to test if the target cpu is running and does a
directed yield with diagnose 0x9c. 

Why can't we just add the smp_rmb() to the arch_spin_unlock_wait()?

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html