Re: [PATCH net] Revert "neighbour: force neigh_invalidate when NUD_FAILED update is from admin"

2018-10-20 Thread David Miller
From: Roopa Prabhu 
Date: Sat, 20 Oct 2018 18:09:31 -0700

> From: Roopa Prabhu 
> 
> This reverts commit 8e326289e3069dfc9fa9c209924668dd031ab8ef.
> 
> This patch results in unnecessary netlink notification when one
> tries to delete a neigh entry already in NUD_FAILED state. Found
> this with a buggy app that tries to delete a NUD_FAILED entry
> repeatedly. While the notification issue can be fixed with more
> checks, adding more complexity here seems unnecessary. Also,
> recent tests with other changes in the neighbour code have
> shown that the INCOMPLETE and PROBE checks are good enough for
> the original issue.
> 
> Signed-off-by: Roopa Prabhu 

Applied, thanks.


[PATCH net-next 3/3] sctp: process sk_reuseport in sctp_get_port_local

2018-10-20 Thread Xin Long
When socks' sk_reuseport is set, the same port and address are allowed
to be bound into these socks who have the same uid.

Note that the difference from sk_reuse is that it allows multiple socks
to listen on the same port and address.

Signed-off-by: Xin Long 
---
 include/net/sctp/structs.h |  4 +++-
 net/sctp/socket.c  | 46 +-
 2 files changed, 36 insertions(+), 14 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 15d017f..af9d494 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -96,7 +96,9 @@ struct sctp_stream;
 
 struct sctp_bind_bucket {
unsigned short  port;
-   unsigned short  fastreuse;
+   signed char fastreuse;
+   signed char fastreuseport;
+   kuid_t  fastuid;
struct hlist_node   node;
struct hlist_head   owner;
struct net  *net;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 44e7d8c..8605705 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -7642,8 +7642,10 @@ static struct sctp_bind_bucket *sctp_bucket_create(
 
 static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
 {
-   bool reuse = (sk->sk_reuse || sctp_sk(sk)->reuse);
+   struct sctp_sock *sp = sctp_sk(sk);
+   bool reuse = (sk->sk_reuse || sp->reuse);
struct sctp_bind_hashbucket *head; /* hash list */
+   kuid_t uid = sock_i_uid(sk);
struct sctp_bind_bucket *pp;
unsigned short snum;
int ret;
@@ -7719,7 +7721,10 @@ static long sctp_get_port_local(struct sock *sk, union 
sctp_addr *addr)
 
pr_debug("%s: found a possible match\n", __func__);
 
-   if (pp->fastreuse && reuse && sk->sk_state != SCTP_SS_LISTENING)
+   if ((pp->fastreuse && reuse &&
+sk->sk_state != SCTP_SS_LISTENING) ||
+   (pp->fastreuseport && sk->sk_reuseport &&
+uid_eq(pp->fastuid, uid)))
goto success;
 
/* Run through the list of sockets bound to the port
@@ -7733,16 +7738,18 @@ static long sctp_get_port_local(struct sock *sk, union 
sctp_addr *addr)
 * in an endpoint.
 */
sk_for_each_bound(sk2, >owner) {
-   struct sctp_endpoint *ep2;
-   ep2 = sctp_sk(sk2)->ep;
+   struct sctp_sock *sp2 = sctp_sk(sk2);
+   struct sctp_endpoint *ep2 = sp2->ep;
 
if (sk == sk2 ||
-   (reuse && (sk2->sk_reuse || sctp_sk(sk2)->reuse) &&
-sk2->sk_state != SCTP_SS_LISTENING))
+   (reuse && (sk2->sk_reuse || sp2->reuse) &&
+sk2->sk_state != SCTP_SS_LISTENING) ||
+   (sk->sk_reuseport && sk2->sk_reuseport &&
+uid_eq(uid, sock_i_uid(sk2
continue;
 
-   if (sctp_bind_addr_conflict(>base.bind_addr, addr,
-sctp_sk(sk2), sctp_sk(sk))) {
+   if (sctp_bind_addr_conflict(>base.bind_addr,
+   addr, sp2, sp)) {
ret = (long)sk2;
goto fail_unlock;
}
@@ -7765,19 +7772,32 @@ static long sctp_get_port_local(struct sock *sk, union 
sctp_addr *addr)
pp->fastreuse = 1;
else
pp->fastreuse = 0;
-   } else if (pp->fastreuse &&
-  (!reuse || sk->sk_state == SCTP_SS_LISTENING))
-   pp->fastreuse = 0;
+
+   if (sk->sk_reuseport) {
+   pp->fastreuseport = 1;
+   pp->fastuid = uid;
+   } else {
+   pp->fastreuseport = 0;
+   }
+   } else {
+   if (pp->fastreuse &&
+   (!reuse || sk->sk_state == SCTP_SS_LISTENING))
+   pp->fastreuse = 0;
+
+   if (pp->fastreuseport &&
+   (!sk->sk_reuseport || !uid_eq(pp->fastuid, uid)))
+   pp->fastreuseport = 0;
+   }
 
/* We are set, so fill up all the data in the hash table
 * entry, tie the socket list information with the rest of the
 * sockets FIXME: Blurry, NPI (ipg).
 */
 success:
-   if (!sctp_sk(sk)->bind_hash) {
+   if (!sp->bind_hash) {
inet_sk(sk)->inet_num = snum;
sk_add_bind_node(sk, >owner);
-   sctp_sk(sk)->bind_hash = pp;
+   sp->bind_hash = pp;
}
ret = 0;
 
-- 
2.1.0



[PATCH net-next 0/3] sctp: add support for sk_reuseport

2018-10-20 Thread Xin Long
sctp sk_reuseport allows multiple socks to listen on the same port and
addresses, as long as these socks have the same uid. This works pretty
much as TCP/UDP does, the only difference is that sctp is multi-homing
and all the bind_addrs in these socks will have to completely matched,
otherwise listen() will return err.

The below is when 5 sockets are listening on 172.16.254.254:6400 on a
server, 26 sockets on a client connect to 172.16.254.254:6400 and each
may be processed by a different socket on the server which is selected
by hash(lport, pport, paddr) in reuseport_select_sock():

 # ss --sctp -nn
   State  Recv-Q Send-QLocal Address:Port Peer Address:Port
   LISTEN 0  10   172.16.254.254:6400*:*
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.2.1:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.2.4:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.3.3:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.3.4:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.5.2:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.5.3:1234
   LISTEN 0  10   172.16.254.254:6400*:*
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.1.3:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.1.4:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.3.2:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.4.1:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.4.2:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.4.3:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.4.4:1234
   LISTEN 0  10   172.16.254.254:6400*:*
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.1.2:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.3.5:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.4.5:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.253.253:1234
   LISTEN 0  10   172.16.254.254:6400*:*
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.2.2:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.2.3:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.5.4:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.5.5:1234
   LISTEN 0  10   172.16.254.254:6400*:*
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.1.1:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.1.5:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.2.5:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.3.1:1234
   `- ESTAB   0  0   172.16.254.254%eth1:6400   172.16.5.1:1234

Xin Long (3):
  sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint
  sctp: add sock_reuseport for the sock in __sctp_hash_endpoint
  sctp: process sk_reuseport in sctp_get_port_local

 include/net/sctp/sctp.h|   2 +-
 include/net/sctp/structs.h |   6 ++-
 net/core/sock_reuseport.c  |   1 +
 net/sctp/bind_addr.c   |  28 ++
 net/sctp/input.c   | 129 -
 net/sctp/socket.c  |  49 +++--
 6 files changed, 162 insertions(+), 53 deletions(-)

-- 
2.1.0



[PATCH net-next 1/3] sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint

2018-10-20 Thread Xin Long
This is a part of sk_reuseport support for sctp, and it selects a
sock by the hashkey of lport, paddr and dport by default. It will
work until sk_reuseport support is added in sctp_get_port_local()
in the next patch.

Signed-off-by: Xin Long 
---
 net/sctp/input.c | 69 +---
 1 file changed, 41 insertions(+), 28 deletions(-)

diff --git a/net/sctp/input.c b/net/sctp/input.c
index 5c36a99..60ede89 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -57,6 +57,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Forward declarations for internal helpers. */
 static int sctp_rcv_ootb(struct sk_buff *);
@@ -65,8 +66,10 @@ static struct sctp_association *__sctp_rcv_lookup(struct net 
*net,
  const union sctp_addr *paddr,
  const union sctp_addr *laddr,
  struct sctp_transport **transportp);
-static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(struct net *net,
-   const union sctp_addr *laddr);
+static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(
+   struct net *net, struct sk_buff *skb,
+   const union sctp_addr *laddr,
+   const union sctp_addr *daddr);
 static struct sctp_association *__sctp_lookup_association(
struct net *net,
const union sctp_addr *local,
@@ -171,7 +174,7 @@ int sctp_rcv(struct sk_buff *skb)
asoc = __sctp_rcv_lookup(net, skb, , , );
 
if (!asoc)
-   ep = __sctp_rcv_lookup_endpoint(net, );
+   ep = __sctp_rcv_lookup_endpoint(net, skb, , );
 
/* Retrieve the common input handling substructure. */
rcvr = asoc ? >base : >base;
@@ -770,16 +773,35 @@ void sctp_unhash_endpoint(struct sctp_endpoint *ep)
local_bh_enable();
 }
 
+static inline __u32 sctp_hashfn(const struct net *net, __be16 lport,
+   const union sctp_addr *paddr, __u32 seed)
+{
+   __u32 addr;
+
+   if (paddr->sa.sa_family == AF_INET6)
+   addr = jhash(>v6.sin6_addr, 16, seed);
+   else
+   addr = (__force __u32)paddr->v4.sin_addr.s_addr;
+
+   return  jhash_3words(addr, ((__force __u32)paddr->v4.sin_port) << 16 |
+(__force __u32)lport, net_hash_mix(net), seed);
+}
+
 /* Look up an endpoint. */
-static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(struct net *net,
-   const union sctp_addr *laddr)
+static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(
+   struct net *net, struct sk_buff *skb,
+   const union sctp_addr *laddr,
+   const union sctp_addr *paddr)
 {
struct sctp_hashbucket *head;
struct sctp_ep_common *epb;
struct sctp_endpoint *ep;
+   struct sock *sk;
+   __be32 lport;
int hash;
 
-   hash = sctp_ep_hashfn(net, ntohs(laddr->v4.sin_port));
+   lport = laddr->v4.sin_port;
+   hash = sctp_ep_hashfn(net, ntohs(lport));
head = _ep_hashtable[hash];
read_lock(>lock);
sctp_for_each_hentry(epb, >chain) {
@@ -791,6 +813,15 @@ static struct sctp_endpoint 
*__sctp_rcv_lookup_endpoint(struct net *net,
ep = sctp_sk(net->sctp.ctl_sock)->ep;
 
 hit:
+   sk = ep->base.sk;
+   if (sk->sk_reuseport) {
+   __u32 phash = sctp_hashfn(net, lport, paddr, 0);
+
+   sk = reuseport_select_sock(sk, phash, skb,
+  sizeof(struct sctphdr));
+   if (sk)
+   ep = sctp_sk(sk)->ep;
+   }
sctp_endpoint_hold(ep);
read_unlock(>lock);
return ep;
@@ -829,35 +860,17 @@ static inline int sctp_hash_cmp(struct 
rhashtable_compare_arg *arg,
 static inline __u32 sctp_hash_obj(const void *data, u32 len, u32 seed)
 {
const struct sctp_transport *t = data;
-   const union sctp_addr *paddr = >ipaddr;
-   const struct net *net = sock_net(t->asoc->base.sk);
-   __be16 lport = htons(t->asoc->base.bind_addr.port);
-   __u32 addr;
-
-   if (paddr->sa.sa_family == AF_INET6)
-   addr = jhash(>v6.sin6_addr, 16, seed);
-   else
-   addr = (__force __u32)paddr->v4.sin_addr.s_addr;
 
-   return  jhash_3words(addr, ((__force __u32)paddr->v4.sin_port) << 16 |
-(__force __u32)lport, net_hash_mix(net), seed);
+   return sctp_hashfn(sock_net(t->asoc->base.sk),
+  htons(t->asoc->base.bind_addr.port),
+  >ipaddr, seed);
 }
 
 static inline __u32 sctp_hash_key(const void *data, u32 len, u32 seed)
 {
const struct 

[PATCH net-next 2/3] sctp: add sock_reuseport for the sock in __sctp_hash_endpoint

2018-10-20 Thread Xin Long
This is a part of sk_reuseport support for sctp. It defines a helper
sctp_bind_addrs_check() to check if the bind_addrs in two socks are
matched. It will add sock_reuseport if they are completely matched,
and return err if they are partly matched, and alloc sock_reuseport
if all socks are not matched at all.

It will work until sk_reuseport support is added in
sctp_get_port_local() in the next patch.

Signed-off-by: Xin Long 
---
 include/net/sctp/sctp.h|  2 +-
 include/net/sctp/structs.h |  2 ++
 net/core/sock_reuseport.c  |  1 +
 net/sctp/bind_addr.c   | 28 ++
 net/sctp/input.c   | 60 +++---
 net/sctp/socket.c  |  3 +--
 6 files changed, 85 insertions(+), 11 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 8c2caa3..b8cd58d 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -152,7 +152,7 @@ int sctp_primitive_RECONF(struct net *net, struct 
sctp_association *asoc,
  */
 int sctp_rcv(struct sk_buff *skb);
 void sctp_v4_err(struct sk_buff *skb, u32 info);
-void sctp_hash_endpoint(struct sctp_endpoint *);
+int sctp_hash_endpoint(struct sctp_endpoint *ep);
 void sctp_unhash_endpoint(struct sctp_endpoint *);
 struct sock *sctp_err_lookup(struct net *net, int family, struct sk_buff *,
 struct sctphdr *, struct sctp_association **,
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index a11f937..15d017f 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1190,6 +1190,8 @@ int sctp_bind_addr_conflict(struct sctp_bind_addr *, 
const union sctp_addr *,
 struct sctp_sock *, struct sctp_sock *);
 int sctp_bind_addr_state(const struct sctp_bind_addr *bp,
 const union sctp_addr *addr);
+int sctp_bind_addrs_check(struct sctp_sock *sp,
+ struct sctp_sock *sp2, int cnt2);
 union sctp_addr *sctp_find_unmatch_addr(struct sctp_bind_addr  *bp,
const union sctp_addr   *addrs,
int addrcnt,
diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
index ba5cba5..d8fe3e5 100644
--- a/net/core/sock_reuseport.c
+++ b/net/core/sock_reuseport.c
@@ -187,6 +187,7 @@ int reuseport_add_sock(struct sock *sk, struct sock *sk2, 
bool bind_inany)
call_rcu(_reuse->rcu, reuseport_free_rcu);
return 0;
 }
+EXPORT_SYMBOL(reuseport_add_sock);
 
 void reuseport_detach_sock(struct sock *sk)
 {
diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
index 7df3704..78d0d93 100644
--- a/net/sctp/bind_addr.c
+++ b/net/sctp/bind_addr.c
@@ -337,6 +337,34 @@ int sctp_bind_addr_match(struct sctp_bind_addr *bp,
return match;
 }
 
+int sctp_bind_addrs_check(struct sctp_sock *sp,
+ struct sctp_sock *sp2, int cnt2)
+{
+   struct sctp_bind_addr *bp2 = >ep->base.bind_addr;
+   struct sctp_bind_addr *bp = >ep->base.bind_addr;
+   struct sctp_sockaddr_entry *laddr, *laddr2;
+   bool exist = false;
+   int cnt = 0;
+
+   rcu_read_lock();
+   list_for_each_entry_rcu(laddr, >address_list, list) {
+   list_for_each_entry_rcu(laddr2, >address_list, list) {
+   if (sp->pf->af->cmp_addr(>a, >a) &&
+   laddr->valid == laddr2->valid) {
+   exist = true;
+   goto next;
+   }
+   }
+   cnt = 0;
+   break;
+next:
+   cnt++;
+   }
+   rcu_read_unlock();
+
+   return (cnt == cnt2) ? 0 : (exist ? -EEXIST : 1);
+}
+
 /* Does the address 'addr' conflict with any addresses in
  * the bp.
  */
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 60ede89..6bfeb10 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -723,43 +723,87 @@ static int sctp_rcv_ootb(struct sk_buff *skb)
 }
 
 /* Insert endpoint into the hash table.  */
-static void __sctp_hash_endpoint(struct sctp_endpoint *ep)
+static int __sctp_hash_endpoint(struct sctp_endpoint *ep)
 {
-   struct net *net = sock_net(ep->base.sk);
-   struct sctp_ep_common *epb;
+   struct sock *sk = ep->base.sk;
+   struct net *net = sock_net(sk);
struct sctp_hashbucket *head;
+   struct sctp_ep_common *epb;
 
epb = >base;
-
epb->hashent = sctp_ep_hashfn(net, epb->bind_addr.port);
head = _ep_hashtable[epb->hashent];
 
+   if (sk->sk_reuseport) {
+   bool any = sctp_is_ep_boundall(sk);
+   struct sctp_ep_common *epb2;
+   struct list_head *list;
+   int cnt = 0, err = 1;
+
+   list_for_each(list, >base.bind_addr.address_list)
+   cnt++;
+
+   sctp_for_each_hentry(epb2, >chain) {
+   struct sock *sk2 = epb2->sk;
+
+  

[PATCH net] Revert "neighbour: force neigh_invalidate when NUD_FAILED update is from admin"

2018-10-20 Thread Roopa Prabhu
From: Roopa Prabhu 

This reverts commit 8e326289e3069dfc9fa9c209924668dd031ab8ef.

This patch results in unnecessary netlink notification when one
tries to delete a neigh entry already in NUD_FAILED state. Found
this with a buggy app that tries to delete a NUD_FAILED entry
repeatedly. While the notification issue can be fixed with more
checks, adding more complexity here seems unnecessary. Also,
recent tests with other changes in the neighbour code have
shown that the INCOMPLETE and PROBE checks are good enough for
the original issue.

Signed-off-by: Roopa Prabhu 
---
Dave, Sorry about the revert so late in the release. The issue
is not serious, but i think its better to revert before
it gets into a released kernel. I am happy to fix the
notification issue but seems unnecessary at this point.
Thanks.

 net/core/neighbour.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 91592fc..4e07824 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1148,8 +1148,7 @@ int neigh_update(struct neighbour *neigh, const u8 
*lladdr, u8 new,
neigh->nud_state = new;
err = 0;
notify = old & NUD_VALID;
-   if (((old & (NUD_INCOMPLETE | NUD_PROBE)) ||
-(flags & NEIGH_UPDATE_F_ADMIN)) &&
+   if ((old & (NUD_INCOMPLETE | NUD_PROBE)) &&
(new & NUD_FAILED)) {
neigh_invalidate(neigh);
notify = 1;
-- 
2.1.4



[PATCH bpf-next 3/6] bpf, verifier: reject xadd on flow key memory

2018-10-20 Thread Daniel Borkmann
We should not enable xadd operation for flow key memory if not
needed there anyway. There is no such issue as described in the
commit f37a8cb84cce ("bpf: reject stores into ctx via st and xadd")
since there's no context rewriter for flow keys today, but it
also shouldn't become part of the user facing behavior to allow
for it. After patch:

  0: (79) r7 = *(u64 *)(r1 +144)
  1: (b7) r3 = 4096
  2: (db) lock *(u64 *)(r7 +0) += r3
  BPF_XADD stores into R7 flow_keys is not allowed

Signed-off-by: Daniel Borkmann 
---
 kernel/bpf/verifier.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 64e0981..0450ffc 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1553,6 +1553,14 @@ static bool is_pkt_reg(struct bpf_verifier_env *env, int 
regno)
return type_is_pkt_pointer(reg->type);
 }
 
+static bool is_flow_key_reg(struct bpf_verifier_env *env, int regno)
+{
+   const struct bpf_reg_state *reg = reg_state(env, regno);
+
+   /* Separate to is_ctx_reg() since we still want to allow BPF_ST here. */
+   return reg->type == PTR_TO_FLOW_KEYS;
+}
+
 static int check_pkt_ptr_alignment(struct bpf_verifier_env *env,
   const struct bpf_reg_state *reg,
   int off, int size, bool strict)
@@ -1961,7 +1969,8 @@ static int check_xadd(struct bpf_verifier_env *env, int 
insn_idx, struct bpf_ins
}
 
if (is_ctx_reg(env, insn->dst_reg) ||
-   is_pkt_reg(env, insn->dst_reg)) {
+   is_pkt_reg(env, insn->dst_reg) ||
+   is_flow_key_reg(env, insn->dst_reg)) {
verbose(env, "BPF_XADD stores into R%d %s is not allowed\n",
insn->dst_reg,
reg_type_str[reg_state(env, insn->dst_reg)->type]);
-- 
2.9.5



[PATCH bpf-next 4/6] bpf, verifier: remove unneeded flow key in check_helper_mem_access

2018-10-20 Thread Daniel Borkmann
They PTR_TO_FLOW_KEYS is not used today to be passed into a helper
as memory, so it can be removed from check_helper_mem_access().

Signed-off-by: Daniel Borkmann 
---
 kernel/bpf/verifier.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0450ffc..4f727c9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2077,8 +2077,6 @@ static int check_helper_mem_access(struct 
bpf_verifier_env *env, int regno,
case PTR_TO_PACKET_META:
return check_packet_access(env, regno, reg->off, access_size,
   zero_size_allowed);
-   case PTR_TO_FLOW_KEYS:
-   return check_flow_keys_access(env, reg->off, access_size);
case PTR_TO_MAP_VALUE:
return check_map_access(env, regno, reg->off, access_size,
zero_size_allowed);
-- 
2.9.5



[PATCH bpf-next 2/6] bpf, verifier: fix register type dump in xadd and st

2018-10-20 Thread Daniel Borkmann
Using reg_type_str[insn->dst_reg] is incorrect since insn->dst_reg
contains the register number but not the actual register type. Add
a small reg_state() helper and use it to get to the type. Also fix
up the test_verifier test cases that have an incorrect errstr.

Fixes: 9d2be44a7f33 ("bpf: Reuse canonical string formatter for ctx errs")
Signed-off-by: Daniel Borkmann 
---
 kernel/bpf/verifier.c   | 19 +--
 tools/testing/selftests/bpf/test_verifier.c | 10 +-
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 7d6d9cf..64e0981 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1528,14 +1528,19 @@ static bool __is_pointer_value(bool allow_ptr_leaks,
return reg->type != SCALAR_VALUE;
 }
 
+static struct bpf_reg_state *reg_state(struct bpf_verifier_env *env, int regno)
+{
+   return cur_regs(env) + regno;
+}
+
 static bool is_pointer_value(struct bpf_verifier_env *env, int regno)
 {
-   return __is_pointer_value(env->allow_ptr_leaks, cur_regs(env) + regno);
+   return __is_pointer_value(env->allow_ptr_leaks, reg_state(env, regno));
 }
 
 static bool is_ctx_reg(struct bpf_verifier_env *env, int regno)
 {
-   const struct bpf_reg_state *reg = cur_regs(env) + regno;
+   const struct bpf_reg_state *reg = reg_state(env, regno);
 
return reg->type == PTR_TO_CTX ||
   reg->type == PTR_TO_SOCKET;
@@ -1543,7 +1548,7 @@ static bool is_ctx_reg(struct bpf_verifier_env *env, int 
regno)
 
 static bool is_pkt_reg(struct bpf_verifier_env *env, int regno)
 {
-   const struct bpf_reg_state *reg = cur_regs(env) + regno;
+   const struct bpf_reg_state *reg = reg_state(env, regno);
 
return type_is_pkt_pointer(reg->type);
 }
@@ -1958,7 +1963,8 @@ static int check_xadd(struct bpf_verifier_env *env, int 
insn_idx, struct bpf_ins
if (is_ctx_reg(env, insn->dst_reg) ||
is_pkt_reg(env, insn->dst_reg)) {
verbose(env, "BPF_XADD stores into R%d %s is not allowed\n",
-   insn->dst_reg, reg_type_str[insn->dst_reg]);
+   insn->dst_reg,
+   reg_type_str[reg_state(env, insn->dst_reg)->type]);
return -EACCES;
}
 
@@ -1983,7 +1989,7 @@ static int check_stack_boundary(struct bpf_verifier_env 
*env, int regno,
int access_size, bool zero_size_allowed,
struct bpf_call_arg_meta *meta)
 {
-   struct bpf_reg_state *reg = cur_regs(env) + regno;
+   struct bpf_reg_state *reg = reg_state(env, regno);
struct bpf_func_state *state = func(env, reg);
int off, i, slot, spi;
 
@@ -5264,7 +5270,8 @@ static int do_check(struct bpf_verifier_env *env)
 
if (is_ctx_reg(env, insn->dst_reg)) {
verbose(env, "BPF_ST stores into R%d %s is not 
allowed\n",
-   insn->dst_reg, 
reg_type_str[insn->dst_reg]);
+   insn->dst_reg,
+   reg_type_str[reg_state(env, 
insn->dst_reg)->type]);
return -EACCES;
}
 
diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index f1ae8d0..769d68a 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -3430,7 +3430,7 @@ static struct bpf_test tests[] = {
BPF_ST_MEM(BPF_DW, BPF_REG_1, offsetof(struct 
__sk_buff, mark), 0),
BPF_EXIT_INSN(),
},
-   .errstr = "BPF_ST stores into R1 inv is not allowed",
+   .errstr = "BPF_ST stores into R1 ctx is not allowed",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
},
@@ -3442,7 +3442,7 @@ static struct bpf_test tests[] = {
 BPF_REG_0, offsetof(struct __sk_buff, 
mark), 0),
BPF_EXIT_INSN(),
},
-   .errstr = "BPF_XADD stores into R1 inv is not allowed",
+   .errstr = "BPF_XADD stores into R1 ctx is not allowed",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
},
@@ -5670,7 +5670,7 @@ static struct bpf_test tests[] = {
.errstr_unpriv = "R2 leaks addr into mem",
.result_unpriv = REJECT,
.result = REJECT,
-   .errstr = "BPF_XADD stores into R1 inv is not allowed",
+   .errstr = "BPF_XADD stores into R1 ctx is not allowed",
},
{
"leak pointer into ctx 2",
@@ -5685,7 +5685,7 @@ static struct bpf_test tests[] = {
.errstr_unpriv = "R10 leaks addr into mem",
.result_unpriv = REJECT,

[PATCH bpf-next 6/6] bpf, libbpf: simplify and cleanup perf ring buffer walk

2018-10-20 Thread Daniel Borkmann
Simplify bpf_perf_event_read_simple() a bit and fix up some minor
things along the way: the return code in the header is not of type
int but enum bpf_perf_event_ret instead. Once callback indicated
to break the loop walking event data, it also needs to be consumed
in data_tail since it has been processed already.

Moreover, bpf_perf_event_print_t callback should avoid void * as
we actually get a pointer to struct perf_event_header and thus
applications can make use of container_of() to have type checks.
The walk also doesn't have to use modulo op since the ring size is
required to be power of two.

Signed-off-by: Daniel Borkmann 
---
 tools/bpf/bpftool/map_perf_ring.c   | 10 +++--
 tools/lib/bpf/libbpf.c  | 67 +
 tools/lib/bpf/libbpf.h  | 15 ---
 tools/testing/selftests/bpf/trace_helpers.c |  7 +--
 4 files changed, 47 insertions(+), 52 deletions(-)

diff --git a/tools/bpf/bpftool/map_perf_ring.c 
b/tools/bpf/bpftool/map_perf_ring.c
index 6d41323..bdaf406 100644
--- a/tools/bpf/bpftool/map_perf_ring.c
+++ b/tools/bpf/bpftool/map_perf_ring.c
@@ -50,15 +50,17 @@ static void int_exit(int signo)
stop = true;
 }
 
-static enum bpf_perf_event_ret print_bpf_output(void *event, void *priv)
+static enum bpf_perf_event_ret
+print_bpf_output(struct perf_event_header *event, void *private_data)
 {
-   struct event_ring_info *ring = priv;
-   struct perf_event_sample *e = event;
+   struct perf_event_sample *e = container_of(event, struct 
perf_event_sample,
+  header);
+   struct event_ring_info *ring = private_data;
struct {
struct perf_event_header header;
__u64 id;
__u64 lost;
-   } *lost = event;
+   } *lost = (typeof(lost))event;
 
if (json_output) {
jsonw_start_object(json_wtr);
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 0c21355..b607be7 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -2415,56 +2415,47 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr 
*attr,
 }
 
 enum bpf_perf_event_ret
-bpf_perf_event_read_simple(void *mem, unsigned long size,
-  unsigned long page_size, void **buf, size_t *buf_len,
-  bpf_perf_event_print_t fn, void *priv)
+bpf_perf_event_read_simple(void *mmap_mem, size_t mmap_size, size_t page_size,
+  void **copy_mem, size_t *copy_size,
+  bpf_perf_event_print_t fn, void *private_data)
 {
-   struct perf_event_mmap_page *header = mem;
+   struct perf_event_mmap_page *header = mmap_mem;
__u64 data_head = ring_buffer_read_head(header);
__u64 data_tail = header->data_tail;
-   int ret = LIBBPF_PERF_EVENT_ERROR;
-   void *base, *begin, *end;
-
-   if (data_head == data_tail)
-   return LIBBPF_PERF_EVENT_CONT;
-
-   base = ((char *)header) + page_size;
-
-   begin = base + data_tail % size;
-   end = base + data_head % size;
-
-   while (begin != end) {
-   struct perf_event_header *ehdr;
-
-   ehdr = begin;
-   if (begin + ehdr->size > base + size) {
-   long len = base + size - begin;
-
-   if (*buf_len < ehdr->size) {
-   free(*buf);
-   *buf = malloc(ehdr->size);
-   if (!*buf) {
+   void *base = ((__u8 *)header) + page_size;
+   int ret = LIBBPF_PERF_EVENT_CONT;
+   struct perf_event_header *ehdr;
+   size_t ehdr_size;
+
+   while (data_head != data_tail) {
+   ehdr = base + (data_tail & (mmap_size - 1));
+   ehdr_size = ehdr->size;
+
+   if (((void *)ehdr) + ehdr_size > base + mmap_size) {
+   void *copy_start = ehdr;
+   size_t len_first = base + mmap_size - copy_start;
+   size_t len_secnd = ehdr_size - len_first;
+
+   if (*copy_size < ehdr_size) {
+   free(*copy_mem);
+   *copy_mem = malloc(ehdr_size);
+   if (!*copy_mem) {
+   *copy_size = 0;
ret = LIBBPF_PERF_EVENT_ERROR;
break;
}
-   *buf_len = ehdr->size;
+   *copy_size = ehdr_size;
}
 
-   memcpy(*buf, begin, len);
-   memcpy(*buf + len, base, ehdr->size - len);
-   ehdr = (void *)*buf;
-   begin = base + ehdr->size - len;
-   } else if (begin + ehdr->size == base + size) {
-   

[PATCH bpf-next 0/6] Misc improvements and few minor fixes

2018-10-20 Thread Daniel Borkmann
Last batch of misc patches I had in queue: first one removes some left-over
bits from ULP, second is a fix in the verifier where we wrongly use register
number as type to fetch the string for the dump, third disables xadd on flow
keys and subsequent one removes the flow key type from check_helper_mem_access()
as they cannot be passed into any helper as of today. Next one lets map push,
pop, peek avoid having to go through retpoline, and last one has a couple of
minor fixes and cleanups for the ring buffer walk.

Thanks!

Daniel Borkmann (6):
  ulp: remove uid and user_visible members
  bpf, verifier: fix register type dump in xadd and st
  bpf, verifier: reject xadd on flow key memory
  bpf, verifier: remove unneeded flow key in check_helper_mem_access
  bpf, verifier: avoid retpoline for map push/pop/peek operation
  bpf, libbpf: simplify and cleanup perf ring buffer walk

 include/net/tcp.h   |  7 ---
 kernel/bpf/verifier.c   | 57 +++-
 net/tls/tls_main.c  |  2 -
 tools/bpf/bpftool/map_perf_ring.c   | 10 +++--
 tools/lib/bpf/libbpf.c  | 67 +
 tools/lib/bpf/libbpf.h  | 15 ---
 tools/testing/selftests/bpf/test_verifier.c | 10 ++---
 tools/testing/selftests/bpf/trace_helpers.c |  7 +--
 8 files changed, 99 insertions(+), 76 deletions(-)

-- 
2.9.5



[PATCH bpf-next 1/6] ulp: remove uid and user_visible members

2018-10-20 Thread Daniel Borkmann
They are not used anymore and therefore should be removed.

Signed-off-by: Daniel Borkmann 
---
 include/net/tcp.h  | 7 ---
 net/tls/tls_main.c | 2 --
 2 files changed, 9 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 14fdd7c..8a61c3e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2051,11 +2051,6 @@ enum hrtimer_restart tcp_pace_kick(struct hrtimer 
*timer);
 #define TCP_ULP_MAX128
 #define TCP_ULP_BUF_MAX(TCP_ULP_NAME_MAX*TCP_ULP_MAX)
 
-enum {
-   TCP_ULP_TLS,
-   TCP_ULP_BPF,
-};
-
 struct tcp_ulp_ops {
struct list_headlist;
 
@@ -2064,9 +2059,7 @@ struct tcp_ulp_ops {
/* cleanup ulp */
void (*release)(struct sock *sk);
 
-   int uid;
charname[TCP_ULP_NAME_MAX];
-   booluser_visible;
struct module   *owner;
 };
 int tcp_register_ulp(struct tcp_ulp_ops *type);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index e90b6d5..311cec8 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -715,8 +715,6 @@ EXPORT_SYMBOL(tls_unregister_device);
 
 static struct tcp_ulp_ops tcp_tls_ulp_ops __read_mostly = {
.name   = "tls",
-   .uid= TCP_ULP_TLS,
-   .user_visible   = true,
.owner  = THIS_MODULE,
.init   = tls_init,
 };
-- 
2.9.5



[PATCH bpf-next 5/6] bpf, verifier: avoid retpoline for map push/pop/peek operation

2018-10-20 Thread Daniel Borkmann
Extend prior work from 09772d92cd5a ("bpf: avoid retpoline for
lookup/update/delete calls on maps") to also apply to the recently
added map helpers that perform push/pop/peek operations so that
the indirect call can be avoided.

Signed-off-by: Daniel Borkmann 
---
 kernel/bpf/verifier.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4f727c9..98fa0be 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6178,7 +6178,10 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
if (prog->jit_requested && BITS_PER_LONG == 64 &&
(insn->imm == BPF_FUNC_map_lookup_elem ||
 insn->imm == BPF_FUNC_map_update_elem ||
-insn->imm == BPF_FUNC_map_delete_elem)) {
+insn->imm == BPF_FUNC_map_delete_elem ||
+insn->imm == BPF_FUNC_map_push_elem   ||
+insn->imm == BPF_FUNC_map_pop_elem||
+insn->imm == BPF_FUNC_map_peek_elem)) {
aux = >insn_aux_data[i + delta];
if (bpf_map_ptr_poisoned(aux))
goto patch_call_imm;
@@ -6211,6 +6214,14 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
BUILD_BUG_ON(!__same_type(ops->map_update_elem,
 (int (*)(struct bpf_map *map, void *key, 
void *value,
  u64 flags))NULL));
+   BUILD_BUG_ON(!__same_type(ops->map_push_elem,
+(int (*)(struct bpf_map *map, void *value,
+ u64 flags))NULL));
+   BUILD_BUG_ON(!__same_type(ops->map_pop_elem,
+(int (*)(struct bpf_map *map, void 
*value))NULL));
+   BUILD_BUG_ON(!__same_type(ops->map_peek_elem,
+(int (*)(struct bpf_map *map, void 
*value))NULL));
+
switch (insn->imm) {
case BPF_FUNC_map_lookup_elem:
insn->imm = BPF_CAST_CALL(ops->map_lookup_elem) 
-
@@ -6224,6 +6235,18 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
insn->imm = BPF_CAST_CALL(ops->map_delete_elem) 
-
__bpf_call_base;
continue;
+   case BPF_FUNC_map_push_elem:
+   insn->imm = BPF_CAST_CALL(ops->map_push_elem) -
+   __bpf_call_base;
+   continue;
+   case BPF_FUNC_map_pop_elem:
+   insn->imm = BPF_CAST_CALL(ops->map_pop_elem) -
+   __bpf_call_base;
+   continue;
+   case BPF_FUNC_map_peek_elem:
+   insn->imm = BPF_CAST_CALL(ops->map_peek_elem) -
+   __bpf_call_base;
+   continue;
}
 
goto patch_call_imm;
-- 
2.9.5



Re: [PATCH net] net/ipv6: Fix index counter for unicast addresses in in6_dump_addrs

2018-10-20 Thread David Miller
From: David Ahern 
Date: Fri, 19 Oct 2018 10:00:19 -0700

> From: David Ahern 
> 
> The loop wants to skip previously dumped addresses, so loops until
> current index >= saved index. If the message fills it wants to save
> the index for the next address to dump - ie., the one that did not
> fit in the current message.
> 
> Currently, it is incrementing the index counter before comparing to the
> saved index, and then the saved index is off by 1 - it assumes the
> current address is going to fit in the message.
> 
> Change the index handling to increment only after a succesful dump.
> 
> Fixes: 502a2ffd7376a ("ipv6: convert idev_list to list macros")
> Signed-off-by: David Ahern 

Applied and queued up for -stable, thanks David.


[PATCH bpf-next 3/3] tools: bpftool: fix completion for "bpftool map update"

2018-10-20 Thread Quentin Monnet
When trying to complete "bpftool map update" commands, the call to
printf would print an error message that would show on the command line
if no map is found to complete the command line.

Fix it by making sure we have map ids to complete the line with, before
we try to print something.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/bpf/bpftool/bash-completion/bpftool | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/bash-completion/bpftool 
b/tools/bpf/bpftool/bash-completion/bpftool
index c56545e87b0d..3f78e6404589 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -143,7 +143,7 @@ _bpftool_map_update_map_type()
 local type
 type=$(bpftool -jp map show $keyword $ref | \
 command sed -n 's/.*"type": "\(.*\)",$/\1/p')
-printf $type
+[[ -n $type ]] && printf $type
 }
 
 _bpftool_map_update_get_id()
-- 
2.7.4



[PATCH bpf-next 2/3] tools: bpftool: print nb of cmds to stdout (not stderr) for batch mode

2018-10-20 Thread Quentin Monnet
When batch mode is used and all commands succeeds, bpftool prints the
number of commands processed to stderr. There is no particular reason to
use stderr for this, we could as well use stdout. It would avoid getting
unnecessary output on stderr if the standard ouptut is redirected, for
example.

Reported-by: David Beckett 
Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/bpf/bpftool/main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 828dde30e9ec..75a3296dc0bc 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -321,7 +321,8 @@ static int do_batch(int argc, char **argv)
p_err("reading batch file failed: %s", strerror(errno));
err = -1;
} else {
-   p_info("processed %d commands", lines);
+   if (!json_output)
+   printf("processed %d commands\n", lines);
err = 0;
}
 err_close:
-- 
2.7.4



[PATCH bpf-next 1/3] tools: bpftool: document restriction on '.' in names to pin in bpffs

2018-10-20 Thread Quentin Monnet
Names used to pin eBPF programs and maps under the eBPF virtual file
system cannot contain a dot character, which is reserved for future
extensions of this file system.

Document this in bpftool man pages to avoid users getting confused if
pinning fails because of a dot.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/bpf/bpftool/Documentation/bpftool-map.rst  | 4 +++-
 tools/bpf/bpftool/Documentation/bpftool-prog.rst | 8 ++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst 
b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index 3497f2d80328..f55a2daed59b 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -86,7 +86,9 @@ DESCRIPTION
**bpftool map pin** *MAP*  *FILE*
  Pin map *MAP* as *FILE*.
 
- Note: *FILE* must be located in *bpffs* mount.
+ Note: *FILE* must be located in *bpffs* mount. It must not
+ contain a dot character ('.'), which is reserved for future
+ extensions of *bpffs*.
 
**bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*]
  Read events from a BPF_MAP_TYPE_PERF_EVENT_ARRAY map.
diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index 12c803003ab2..ac4e904b10fb 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -75,7 +75,9 @@ DESCRIPTION
**bpftool prog pin** *PROG* *FILE*
  Pin program *PROG* as *FILE*.
 
- Note: *FILE* must be located in *bpffs* mount.
+ Note: *FILE* must be located in *bpffs* mount. It must not
+ contain a dot character ('.'), which is reserved for future
+ extensions of *bpffs*.
 
**bpftool prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** 
*IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
  Load bpf program from binary *OBJ* and pin as *FILE*.
@@ -91,7 +93,9 @@ DESCRIPTION
  If **dev** *NAME* is specified program will be loaded onto
  given networking device (offload).
 
- Note: *FILE* must be located in *bpffs* mount.
+ Note: *FILE* must be located in *bpffs* mount. It must not
+ contain a dot character ('.'), which is reserved for future
+ extensions of *bpffs*.
 
 **bpftool prog attach** *PROG* *ATTACH_TYPE* *MAP*
   Attach bpf program *PROG* (with type specified by 
*ATTACH_TYPE*)
-- 
2.7.4



[PATCH bpf-next 0/3] tools: bpftool: bring minor fixes to bpftool

2018-10-20 Thread Quentin Monnet
Hi,
These are three minor fixes for bpftool, its documentation and its bash
completion function. Please refer to individual patches for details.

Quentin Monnet (3):
  tools: bpftool: document restriction on '.' in names to pin in bpffs
  tools: bpftool: print nb of cmds to stdout (not stderr) for batch mode
  tools: bpftool: fix completion for "bpftool map update"

 tools/bpf/bpftool/Documentation/bpftool-map.rst  | 4 +++-
 tools/bpf/bpftool/Documentation/bpftool-prog.rst | 8 ++--
 tools/bpf/bpftool/bash-completion/bpftool| 2 +-
 tools/bpf/bpftool/main.c | 3 ++-
 4 files changed, 12 insertions(+), 5 deletions(-)

-- 
2.7.4



[PATCH bpf-next] selftests/bpf: enable (uncomment) all tests in test_libbpf.sh

2018-10-20 Thread Quentin Monnet
libbpf is now able to load successfully test_l4lb_noinline.o and
samples/bpf/tracex3_kern.o, so we can uncomment related tests from
test_libbpf.c and remove the associated "TODO"s.

It is also trivial to fix test_xdp_noinline.o so that it provides a
version and can be loaded. Fix it and uncomment this test as well.

For the record, the error message obtainted with tracex3_kern.o was
fixed by commit e3d91b0ca523 ("tools/libbpf: handle issues with bpf ELF
objects containing .eh_frames")

I have not been abled to reproduce the "libbpf: incorrect bpf_call
opcode" error for test_l4lb_noinline.o, even with the version of libbpf
present at the time when test_libbpf.sh and test_libbpf_open.c were
created.

Cc: Jesper Dangaard Brouer 
Signed-off-by: Quentin Monnet 
---
 tools/testing/selftests/bpf/test_libbpf.sh  | 12 +++-
 tools/testing/selftests/bpf/test_xdp_meta.c |  2 ++
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_libbpf.sh 
b/tools/testing/selftests/bpf/test_libbpf.sh
index 156d89f1edcc..a426f28163a5 100755
--- a/tools/testing/selftests/bpf/test_libbpf.sh
+++ b/tools/testing/selftests/bpf/test_libbpf.sh
@@ -33,17 +33,11 @@ trap exit_handler 0 2 3 6 9
 
 libbpf_open_file test_l4lb.o
 
-# TODO: fix libbpf to load noinline functions
-# [warning] libbpf: incorrect bpf_call opcode
-#libbpf_open_file test_l4lb_noinline.o
+libbpf_open_file test_l4lb_noinline.o
 
-# TODO: fix test_xdp_meta.c to load with libbpf
-# [warning] libbpf: test_xdp_meta.o doesn't provide kernel version
-#libbpf_open_file test_xdp_meta.o
+libbpf_open_file test_xdp_meta.o
 
-# TODO: fix libbpf to handle .eh_frame
-# [warning] libbpf: relocation failed: no section(10)
-#libbpf_open_file ../../../../samples/bpf/tracex3_kern.o
+libbpf_open_file ../../../../samples/bpf/tracex3_kern.o
 
 # Success
 exit 0
diff --git a/tools/testing/selftests/bpf/test_xdp_meta.c 
b/tools/testing/selftests/bpf/test_xdp_meta.c
index 8d0182650653..2f42de66e2bb 100644
--- a/tools/testing/selftests/bpf/test_xdp_meta.c
+++ b/tools/testing/selftests/bpf/test_xdp_meta.c
@@ -8,6 +8,8 @@
 #define round_up(x, y) x) - 1) | __round_mask(x, y)) + 1)
 #define ctx_ptr(ctx, mem) (void *)(unsigned long)ctx->mem
 
+int _version SEC("version") = 1;
+
 SEC("t")
 int ing_cls(struct __sk_buff *ctx)
 {
-- 
2.7.4



[PATCH bpf-next] selftests/bpf: fix return value comparison for tests in test_libbpf.sh

2018-10-20 Thread Quentin Monnet
The return value for each test in test_libbpf.sh is compared with

if (( $? == 0 )) ; then ...

This works well with bash, but not with dash, that /bin/sh is aliased to
on some systems (such as Ubuntu).

Let's replace this comparison by something that works on both shells.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/testing/selftests/bpf/test_libbpf.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_libbpf.sh 
b/tools/testing/selftests/bpf/test_libbpf.sh
index d97dc914cd49..156d89f1edcc 100755
--- a/tools/testing/selftests/bpf/test_libbpf.sh
+++ b/tools/testing/selftests/bpf/test_libbpf.sh
@@ -6,7 +6,7 @@ export TESTNAME=test_libbpf
 # Determine selftest success via shell exit code
 exit_handler()
 {
-   if (( $? == 0 )); then
+   if [ $? -eq 0 ]; then
echo "selftests: $TESTNAME [PASS]";
else
echo "$TESTNAME: failed at file $LAST_LOADED" 1>&2
-- 
2.7.4



Re: [PATCH net v2] net/sched: act_gact: properly init 'goto chain'

2018-10-20 Thread Davide Caratti
hello Cong and Jamal,

On Fri, 2018-10-19 at 13:40 -0700, Cong Wang wrote:
> On Thu, Oct 18, 2018 at 8:30 AM Davide Caratti  wrote:
> > The alternative is, we systematically forbid usage of 'goto chain' in
> > tcfg_paction, so that:
> > 
> > # tc f a dev v0 egress matchall action  random determ goto chain 
> > 4 5
> > 
> > is systematically rejected with -EINVAL. This comand never worked, so we
> > are not breaking anything in userspace.

> This is exactly why I asked you if we really need to support it. :)
> 
> If no one finds it useful, disallowing it is a good solution here, as
> we don't need
> to introduce any additional code to handle filter chains.

On Thu, 2018-10-18 at 08:52 -0400, Jamal Hadi Salim wrote:

> Rejection is a good solution[1].
> Would be helpful to set an ext_ack to something like
> "only one goto chain is supported currently"

OK to forbid 'goto chain' on fallback actions for gact and police: I just
sent out a small series for that, feedbacks are welcome.

@David: this patch is no more needed, it can be dropped from patchwork.

thanks!
regards,
-- 
davide




[PATCH net 1/4] net/sched: act_gact: disallow 'goto chain' on fallback control action

2018-10-20 Thread Davide Caratti
in the following command:

 # tc action add action  random   

'goto chain x' is allowed only for c1: setting it for c2 makes the kernel
crash with NULL pointer dereference, since TC core doesn't initialize the
chain handle.

Signed-off-by: Davide Caratti 
---
 net/sched/act_gact.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index cd1d9bd32ef9..505138047e5c 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -88,6 +88,11 @@ static int tcf_gact_init(struct net *net, struct nlattr *nla,
p_parm = nla_data(tb[TCA_GACT_PROB]);
if (p_parm->ptype >= MAX_RAND)
return -EINVAL;
+   if (TC_ACT_EXT_CMP(p_parm->paction, TC_ACT_GOTO_CHAIN)) {
+   NL_SET_ERR_MSG(extack,
+  "goto chain not allowed on fallback");
+   return -EINVAL;
+   }
}
 #endif
 
-- 
2.17.1



[PATCH net 4/4] tc-tests: test denial of 'goto chain' for exceed traffic in police.json

2018-10-20 Thread Davide Caratti
add test to verify if act_police forbids 'goto chain' control actions for
'exceed' traffic.

Signed-off-by: Davide Caratti 
---
 .../tc-testing/tc-tests/actions/police.json   | 24 +++
 1 file changed, 24 insertions(+)

diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/police.json 
b/tools/testing/selftests/tc-testing/tc-tests/actions/police.json
index 30f9b54bd666..4086a50a670e 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/actions/police.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/police.json
@@ -715,5 +715,29 @@
 "teardown": [
 "$TC actions flush action police"
 ]
+},
+{
+"id": "b48b",
+"name": "Add police action with exceed goto chain control action",
+"category": [
+"actions",
+"police"
+],
+"setup": [
+[
+"$TC actions flush action police",
+0,
+1,
+255
+]
+],
+"cmdUnderTest": "$TC actions add action police rate 1mbit burst 1k 
conform-exceed pass / goto chain 42",
+"expExitCode": "255",
+"verifyCmd": "$TC actions ls action police",
+"matchPattern": "action order [0-9]*:  police 0x1 rate 1Mbit burst 1Kb 
mtu 2Kb action pass/goto chain 42",
+"matchCount": "0",
+"teardown": [
+"$TC actions flush action police"
+]
 }
 ]
-- 
2.17.1



[PATCH net 2/4] net/sched: act_police: disallow 'goto chain' on fallback control action

2018-10-20 Thread Davide Caratti
in the following command:

 # tc action add action police rate  burst  conform-exceed /

'goto chain x' is allowed only for c1: setting it for c2 makes the kernel
crash with NULL pointer dereference, since TC core doesn't initialize the
chain handle.

Signed-off-by: Davide Caratti 
---
 net/sched/act_police.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 5d8bfa878477..3b793393efd1 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -150,6 +150,16 @@ static int tcf_police_init(struct net *net, struct nlattr 
*nla,
goto failure;
}
 
+   if (tb[TCA_POLICE_RESULT]) {
+   police->tcfp_result = nla_get_u32(tb[TCA_POLICE_RESULT]);
+   if (TC_ACT_EXT_CMP(police->tcfp_result, TC_ACT_GOTO_CHAIN)) {
+   NL_SET_ERR_MSG(extack,
+  "goto chain not allowed on fallback");
+   err = -EINVAL;
+   goto failure;
+   }
+   }
+
spin_lock_bh(>tcf_lock);
/* No failure allowed after this point */
police->tcfp_mtu = parm->mtu;
@@ -173,8 +183,6 @@ static int tcf_police_init(struct net *net, struct nlattr 
*nla,
police->peak_present = false;
}
 
-   if (tb[TCA_POLICE_RESULT])
-   police->tcfp_result = nla_get_u32(tb[TCA_POLICE_RESULT]);
police->tcfp_burst = PSCHED_TICKS2NS(parm->burst);
police->tcfp_toks = police->tcfp_burst;
if (police->peak_present) {
-- 
2.17.1



[PATCH net 3/4] tc-tests: test denial of 'goto chain' on 'random' traffic in gact.json

2018-10-20 Thread Davide Caratti
add test to verify if act_gact forbids 'goto chain' control actions on
'random' traffic in gact.json.

Signed-off-by: Davide Caratti 
---
 .../tc-testing/tc-tests/actions/gact.json | 24 +++
 1 file changed, 24 insertions(+)

diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/gact.json 
b/tools/testing/selftests/tc-testing/tc-tests/actions/gact.json
index 68c91023cdb9..89189a03ce3d 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/actions/gact.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/gact.json
@@ -536,5 +536,29 @@
 "matchPattern": "^[ \t]+index [0-9]+ ref",
 "matchCount": "0",
 "teardown": []
+},
+{
+"id": "8e47",
+"name": "Add gact action with random determ goto chain control action",
+"category": [
+"actions",
+"gact"
+],
+"setup": [
+[
+"$TC actions flush action gact",
+0,
+1,
+255
+]
+],
+"cmdUnderTest": "$TC actions add action pass random determ goto chain 
1 2 index 90",
+"expExitCode": "255",
+"verifyCmd": "$TC actions list action gact",
+"matchPattern": "action order [0-9]*: gact action pass random type 
determ goto chain 1 val 2.*index 90 ref",
+"matchCount": "0",
+"teardown": [
+"$TC actions flush action gact"
+]
 }
 ]
-- 
2.17.1



[PATCH net 0/4] net/sched: forbid 'goto_chain' on fallback actions

2018-10-20 Thread Davide Caratti
the following command:

 # tc actions add action police rate 1mbit burst 1k conform-exceed \
 > pass / goto chain 42

generates a NULL pointer dereference when packets exceed the configured
rate. Similarly, the following command:

 # tc actions add action pass random determ goto chain 42 2

makes the kernel crash with NULL dereference when the first packet does
not match the 'pass' action.

gact and police allow users to specify a fallback control action, that is
stored in the action private data. 'goto chain x' never worked for these
cases, since a->goto_chain handle was never initialized. There is only one
goto_chain handle per TC action, and it is designed to be non-NULL only if
tcf_action contains a 'goto chain' command. So, let's forbid 'goto chain'
on fallback actions.

Patch 1/4 and 2/4 change the .init() functions of police and gact, to let
them return an error when users try to set 'goto chain x' in the fallback
action. Patch 3/4 and 4/4 add TDC selftest coverage to this new behavior. 

Davide Caratti (4):
  net/sched: act_gact: disallow 'goto chain' on fallback control action
  net/sched: act_police: disallow 'goto chain' on fallback control
action
  tc-tests: test denial of 'goto chain' on 'random' traffic in gact.json
  tc-tests: test denial of 'goto chain' for exceed traffic in
police.json

 net/sched/act_gact.c  |  5 
 net/sched/act_police.c| 12 --
 .../tc-testing/tc-tests/actions/gact.json | 24 +++
 .../tc-testing/tc-tests/actions/police.json   | 24 +++
 4 files changed, 63 insertions(+), 2 deletions(-)

-- 
2.17.1



[PATCH] net: ethernet:fec: Consistently use SPEED_ prefix

2018-10-20 Thread Andrew Lunn
All other calls to phy_set_max_speed() use the SPEED_ prefix. Make the
FEC driver follow this common pattern. This makes no different to
generated code since SPEED_1000 is 1000, and SPEED_100 is 100.

Reported-by: Corentin Labbe 
Signed-off-by: Andrew Lunn 
---
 drivers/net/ethernet/freescale/fec_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 6db69ba30dcd..b067eaf8b792 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1948,7 +1948,7 @@ static int fec_enet_mii_probe(struct net_device *ndev)
 
/* mask with MAC supported features */
if (fep->quirks & FEC_QUIRK_HAS_GBIT) {
-   phy_set_max_speed(phy_dev, 1000);
+   phy_set_max_speed(phy_dev, SPEED_1000);
phy_remove_link_mode(phy_dev,
 ETHTOOL_LINK_MODE_1000baseT_Half_BIT);
 #if !defined(CONFIG_M5272)
@@ -1956,7 +1956,7 @@ static int fec_enet_mii_probe(struct net_device *ndev)
 #endif
}
else
-   phy_set_max_speed(phy_dev, 100);
+   phy_set_max_speed(phy_dev, SPEED_100);
 
fep->link = 0;
fep->full_duplex = 0;
-- 
2.19.0



[PATCH net-next] net: phy: phy_support_sym_pause: Clear Asym Pause

2018-10-20 Thread Andrew Lunn
When indicating the MAC supports Symmetric Pause, clear the Asymmetric
Pause bit, which could of been already set is the PHY supports it.

Reported-by: Labbe Corentin 
Fixes: c306ad36184f ("net: ethernet: Add helper for MACs which support pause")
Signed-off-by: Andrew Lunn 
---
 drivers/net/phy/phy_device.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 43cb08dcce81..ab33d1777132 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1940,6 +1940,7 @@ EXPORT_SYMBOL(phy_remove_link_mode);
  */
 void phy_support_sym_pause(struct phy_device *phydev)
 {
+   phydev->supported &= ~SUPPORTED_Asym_Pause;
phydev->supported |= SUPPORTED_Pause;
phydev->advertising = phydev->supported;
 }
-- 
2.19.0



Re: [PATCH] net/mlx5: allocate enough space in

2018-10-20 Thread Or Gerlitz
On Fri, Oct 19, 2018 at 11:08 PM Dan Carpenter  wrote:
>
> FDB_MAX_CHAIN is 3.  We wanted to allocate enough memory to hold four
> structs but there are missing parentheses so we only allocate enough
> memory for three structs and the first byte of the fourth one.

yeah, seems that we were wrong here and the fix is correct, at some
point I saw Kasan
screams but it was gone later, let me look, thanks for pointing it out.



> Fixes: 328edb499f99 ("net/mlx5: Split FDB fast path prio to multiple 
> namespaces")
> Signed-off-by: Dan Carpenter 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
> index 67ba4c975d81..9d73eb955f75 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
> @@ -2470,7 +2470,7 @@ static int init_fdb_root_ns(struct mlx5_flow_steering 
> *steering)
> return -ENOMEM;
>
> steering->fdb_sub_ns = kzalloc(sizeof(steering->fdb_sub_ns) *
> -  FDB_MAX_CHAIN + 1, GFP_KERNEL);
> +  (FDB_MAX_CHAIN + 1), GFP_KERNEL);
> if (!steering->fdb_sub_ns)
> return -ENOMEM;
>
> --
> 2.11.0
>


Re: [bpf-next v2 0/3] sockmap, bpf_msg_push_data helper

2018-10-20 Thread Daniel Borkmann
On 10/20/2018 04:56 AM, John Fastabend wrote:
> This series adds a new helper bpf_msg_push_data to be used by
> sk_msg programs. The helper can be used to insert extra bytes into
> the message that can then be used by the program as metadata tags
> among other things.
> 
> The first patch adds the helper, second patch the libbpf support,
> and last patch updates test_sockmap to run msg_push_data tests.
> 
> v2: rebase after queue map and in filter.c convert int -> u32
> 
> John Fastabend (3):
>   bpf: sk_msg program helper bpf_msg_push_data
>   bpf: libbpf support for msg_push_data
>   bpf: test_sockmap add options to use msg_push_data
> 
>  include/linux/skmsg.h   |   5 +
>  include/uapi/linux/bpf.h|  20 +++-
>  net/core/filter.c   | 134 
> 
>  tools/include/uapi/linux/bpf.h  |  20 +++-
>  tools/testing/selftests/bpf/bpf_helpers.h   |   2 +
>  tools/testing/selftests/bpf/test_sockmap.c  |  58 +-
>  tools/testing/selftests/bpf/test_sockmap_kern.h |  97 +
>  7 files changed, 308 insertions(+), 28 deletions(-)
> 

Applied, thanks!


Re: [PATCH net-next] r8169: add support for Byte Queue Limits

2018-10-20 Thread David Miller
From: Heiner Kallweit 
Date: Sat, 20 Oct 2018 12:25:27 +0200

> From: Florian Westphal 
> This patch is basically a resubmit of 1e918876853a ("r8169: add support
> for Byte Queue Limits") which was reverted later. The problems causing
> the revert seem to have been fixed in the meantime.
> Only change to the original patch is that the call to
> netdev_reset_queue was moved to rtl8169_tx_clear.
> 
> The Tested-by refers to a system using the RTL8168evl chip version.
> 
> Signed-off-by: Florian Westphal 
> Signed-off-by: Heiner Kallweit 
> Tested-by: Holger Hoffstätte 

Applied.

Heiner, I just want to say how happy I am with the work you've done on
the r8169 driver.  So many long term issues have been resolved, the
driver uses more and more generic facilities instead of reimplementing
things, and it's much cleaner and easier to maintain now.

Thank you!


Re: [PATCH net-next] r8169: handle all interrupt events in the hard irq handler

2018-10-20 Thread David Miller
From: Heiner Kallweit 
Date: Thu, 18 Oct 2018 22:19:28 +0200

> Having a separate "slow event" handler isn't needed because all
> interrupt events trigger asynchronous activity. And in case of SYSErr
> we have bigger problems than performance anyway.
> This patch also allows to get rid of acking interrupt events in the
> NAPI poll callback.
> 
> Signed-off-by: Heiner Kallweit 
> ---
> This patch will apply to net-next only after 6b839b6cf9ea ("r8169: fix
> NAPI handling under high load") was merged from net to net-next.

Applied.


Re: pull request: bluetooth-next 2018-10-20

2018-10-20 Thread David Miller
From: Johan Hedberg 
Date: Sat, 20 Oct 2018 12:03:42 +0300

> Here's one more bluetooth-next pull request for the 4.20 kernel.
> 
>  - Added new USB ID for QCA_ROME controller
>  - Added debug trace support from QCA wcn3990 controllers
>  - Updated L2CAP to conform to latest Errata Service Release
>  - Fix binding to non-removable BCM43430 devices
> 
> Please let me know if there are any issues pulling. Thanks.

Pulled, thanks Johan.


Re: [PATCH iproute2 1/1] DEBUG: Fix make check when need build generate_nlmsg

2018-10-20 Thread Petr Vorel
Hi Stephen,

> make check from top level Makefile defines several flags which break
> building generate_nlmsg:

> $ make check
> make -C tools
> gcc  -Wall -Wstrict-prototypes  -Wmissing-prototypes -Wmissing-declarations 
> -Wold-style-definition -Wformat=2 -O2 -I../include -I../include/uapi 
> -DRESOLVE_HOSTNAMES -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" 
> -DNETNS_RUN_DIR=\"/var/run/netns\" -DNETNS_ETC_DIR=\"/etc/netns\" 
> -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
> -D_LARGEFILE64_SOURCE  -DHAVE_SETNS -DHAVE_SELINUX -DHAVE_ELF -DHAVE_LIBMNL 
> -I/usr/include/libmnl -DNEED_STRLCPY -DHAVE_LIBCAP ../lib/libutil.a 
> ../lib/libnetlink.a -lselinux -lelf -lmnl -lcap  -I../../include 
> -include../../include/uapi/linux/netlink.h -o generate_nlmsg generate_nlmsg.c 
> ../../lib/libnetlink.c -lmnl
> gcc: error: ../lib/libutil.a: No such file or directory
> gcc: error: ../lib/libnetlink.a: No such file or directory
> make[2]: *** [Makefile:5: generate_nlmsg] Error 1
> make[1]: *** [Makefile:40: generate_nlmsg] Error 2

> To fix it reset CFLAGS in sub Makefile and remove LDLIBS entirely (as
> required -lmnl flag was specified in 5dc2204c ("testsuite: add libmnl").

> Fixes: 8804a8c0 ("Makefile: Add check target")

> Signed-off-by: Petr Vorel 
> ---
> Hi Stephen,

> I'm sorry for this regression.

> Kind regards,
> Petr
> ---
>  testsuite/tools/Makefile | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

> diff --git a/testsuite/tools/Makefile b/testsuite/tools/Makefile
> index 7d53d226..e1d9bfef 100644
> --- a/testsuite/tools/Makefile
> +++ b/testsuite/tools/Makefile
> @@ -1,8 +1,9 @@
>  # SPDX-License-Identifier: GPL-2.0
> +CFLAGS=
>  include ../../config.mk

>  generate_nlmsg: generate_nlmsg.c ../../lib/libnetlink.c
> - $(CC) $(CPPFLAGS) $(CFLAGS) $(LDLIBS) $(EXTRA_CFLAGS) -I../../include 
> -include../../include/uapi/linux/netlink.h -o $@ $^ -lmnl
> + $(CC) $(CPPFLAGS) $(CFLAGS) $(EXTRA_CFLAGS) -I../../include 
> -include../../include/uapi/linux/netlink.h -o $@ $^ -lmnl

>  clean:
>   rm -f generate_nlmsg

ping, please. Patch is in state "accepted in patchwork [1], but not merged.
Subject should be "testsuite: Fix make check when need build generate_nlmsg".


Kind regards,
Petr

[1] https://patchwork.ozlabs.org/patch/974391/


[iproute2 PATCH] bridge: fix vlan show stats formatting

2018-10-20 Thread Tobias Jungel
The output of -statistics vlan show was broken previous change for json
output. This aligns the format to vlan show.

Signed-off-by: Tobias Jungel 
---
 bridge/vlan.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/bridge/vlan.c b/bridge/vlan.c
index bdce55ae..85f4a539 100644
--- a/bridge/vlan.c
+++ b/bridge/vlan.c
@@ -487,8 +487,7 @@ static void print_vlan_stats_attr(struct rtattr *attr, int 
ifindex)
list = brtb[LINK_XSTATS_TYPE_BRIDGE];
rem = RTA_PAYLOAD(list);

-   ifname = ll_index_to_name(ifindex);
-   open_json_object(ifname);
+   open_vlan_port(ifindex);

print_color_string(PRINT_FP, COLOR_IFNAME,
   NULL, "%-16s", ifname);
@@ -509,8 +508,7 @@ static void print_vlan_stats_attr(struct rtattr *attr, int 
ifindex)

print_one_vlan_stats(vstats);
}
-   close_json_object();
-
+   close_vlan_port();
 }

 static int print_vlan_stats(const struct sockaddr_nl *who,
--
2.17.2



Re: [PATCH net-next v2] netpoll: allow cleanup to be synchronous

2018-10-20 Thread Neil Horman
On Fri, Oct 19, 2018 at 08:46:45PM +, Banerjee, Debabrata wrote:
> > From: Neil Horman 
> 
> > I presume you've tested this with some of the stacked devices?  I think I'm
> > ok with this change, but I'd like confirmation that its worked.
> > 
> > Neil
> 
> Yes I've tested this on a bond device with vlan stacked on top.
> 
> -Deb
> 

Cool, thanks
Acked-by: Neil Horman 

> > 
> > > CC: Neil Horman 
> > > CC: "David S. Miller" 
> > > Signed-off-by: Debabrata Banerjee 
> > > ---
> > >  drivers/net/bonding/bond_main.c |  3 ++-
> > >  drivers/net/macvlan.c   |  2 +-
> > >  drivers/net/team/team.c |  5 +
> > >  include/linux/netpoll.h |  4 +---
> > >  net/8021q/vlan_dev.c|  3 +--
> > >  net/bridge/br_device.c  |  2 +-
> > >  net/core/netpoll.c  | 20 +---
> > >  net/dsa/slave.c |  2 +-
> > >  8 files changed, 13 insertions(+), 28 deletions(-)
> > >
> > > diff --git a/drivers/net/bonding/bond_main.c
> > > b/drivers/net/bonding/bond_main.c index ee28ec9e0aba..ffa37adb7681
> > > 100644
> > > --- a/drivers/net/bonding/bond_main.c
> > > +++ b/drivers/net/bonding/bond_main.c
> > > @@ -963,7 +963,8 @@ static inline void slave_disable_netpoll(struct slave
> > *slave)
> > >   return;
> > >
> > >   slave->np = NULL;
> > > - __netpoll_free_async(np);
> > > +
> > > + __netpoll_free(np);
> > >  }
> > >
> > >  static void bond_poll_controller(struct net_device *bond_dev) diff
> > > --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index
> > > cfda146f3b3b..fc8d5f1ee1ad 100644
> > > --- a/drivers/net/macvlan.c
> > > +++ b/drivers/net/macvlan.c
> > > @@ -1077,7 +1077,7 @@ static void macvlan_dev_netpoll_cleanup(struct
> > > net_device *dev)
> > >
> > >   vlan->netpoll = NULL;
> > >
> > > - __netpoll_free_async(netpoll);
> > > + __netpoll_free(netpoll);
> > >  }
> > >  #endif   /* CONFIG_NET_POLL_CONTROLLER */
> > >
> > > diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c index
> > > d887016e54b6..db633ae9f784 100644
> > > --- a/drivers/net/team/team.c
> > > +++ b/drivers/net/team/team.c
> > > @@ -1104,10 +1104,7 @@ static void team_port_disable_netpoll(struct
> > team_port *port)
> > >   return;
> > >   port->np = NULL;
> > >
> > > - /* Wait for transmitting packets to finish before freeing. */
> > > - synchronize_rcu_bh();
> > > - __netpoll_cleanup(np);
> > > - kfree(np);
> > > + __netpoll_free(np);
> > >  }
> > >  #else
> > >  static int team_port_enable_netpoll(struct team_port *port) diff
> > > --git a/include/linux/netpoll.h b/include/linux/netpoll.h index
> > > 3ef82d3a78db..676f1ff161a9 100644
> > > --- a/include/linux/netpoll.h
> > > +++ b/include/linux/netpoll.h
> > > @@ -31,8 +31,6 @@ struct netpoll {
> > >   bool ipv6;
> > >   u16 local_port, remote_port;
> > >   u8 remote_mac[ETH_ALEN];
> > > -
> > > - struct work_struct cleanup_work;
> > >  };
> > >
> > >  struct netpoll_info {
> > > @@ -63,7 +61,7 @@ int netpoll_parse_options(struct netpoll *np, char
> > > *opt);  int __netpoll_setup(struct netpoll *np, struct net_device
> > > *ndev);  int netpoll_setup(struct netpoll *np);  void
> > > __netpoll_cleanup(struct netpoll *np); -void
> > > __netpoll_free_async(struct netpoll *np);
> > > +void __netpoll_free(struct netpoll *np);
> > >  void netpoll_cleanup(struct netpoll *np);  void
> > > netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
> > >struct net_device *dev);
> > > diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index
> > > 546af0e73ac3..ff720f1ebf73 100644
> > > --- a/net/8021q/vlan_dev.c
> > > +++ b/net/8021q/vlan_dev.c
> > > @@ -756,8 +756,7 @@ static void vlan_dev_netpoll_cleanup(struct
> > net_device *dev)
> > >   return;
> > >
> > >   vlan->netpoll = NULL;
> > > -
> > > - __netpoll_free_async(netpoll);
> > > + __netpoll_free(netpoll);
> > >  }
> > >  #endif /* CONFIG_NET_POLL_CONTROLLER */
> > >
> > > diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c index
> > > e053a4e43758..c6abf927f0c9 100644
> > > --- a/net/bridge/br_device.c
> > > +++ b/net/bridge/br_device.c
> > > @@ -344,7 +344,7 @@ void br_netpoll_disable(struct net_bridge_port *p)
> > >
> > >   p->np = NULL;
> > >
> > > - __netpoll_free_async(np);
> > > + __netpoll_free(np);
> > >  }
> > >
> > >  #endif
> > > diff --git a/net/core/netpoll.c b/net/core/netpoll.c index
> > > de1d1ba92f2d..6ac71624ead4 100644
> > > --- a/net/core/netpoll.c
> > > +++ b/net/core/netpoll.c
> > > @@ -591,7 +591,6 @@ int __netpoll_setup(struct netpoll *np, struct
> > > net_device *ndev)
> > >
> > >   np->dev = ndev;
> > >   strlcpy(np->dev_name, ndev->name, IFNAMSIZ);
> > > - INIT_WORK(>cleanup_work, netpoll_async_cleanup);
> > >
> > >   if (ndev->priv_flags & IFF_DISABLE_NETPOLL) {
> > >   np_err(np, "%s doesn't support polling, aborting\n", @@ -
> > 790,10
> > > +789,6 @@ void __netpoll_cleanup(struct netpoll *np)  {
> > >   struct netpoll_info *npinfo;
> > >
> > > - /* 

[PATCH net-next] r8169: add support for Byte Queue Limits

2018-10-20 Thread Heiner Kallweit
From: Florian Westphal 
This patch is basically a resubmit of 1e918876853a ("r8169: add support
for Byte Queue Limits") which was reverted later. The problems causing
the revert seem to have been fixed in the meantime.
Only change to the original patch is that the call to
netdev_reset_queue was moved to rtl8169_tx_clear.

The Tested-by refers to a system using the RTL8168evl chip version.

Signed-off-by: Florian Westphal 
Signed-off-by: Heiner Kallweit 
Tested-by: Holger Hoffstätte 
---
 drivers/net/ethernet/realtek/r8169.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 114bd9e54..006b0aa8c 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5851,6 +5851,7 @@ static void rtl8169_tx_clear(struct rtl8169_private *tp)
 {
rtl8169_tx_clear_range(tp, tp->dirty_tx, NUM_TX_DESC);
tp->cur_tx = tp->dirty_tx = 0;
+   netdev_reset_queue(tp->dev);
 }
 
 static void rtl_reset_work(struct rtl8169_private *tp)
@@ -6153,6 +6154,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
txd->opts2 = cpu_to_le32(opts[1]);
 
+   netdev_sent_queue(dev, skb->len);
+
skb_tx_timestamp(skb);
 
/* Force memory writes to complete before releasing descriptor */
@@ -6251,7 +6254,7 @@ static void rtl8169_pcierr_interrupt(struct net_device 
*dev)
 
 static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp)
 {
-   unsigned int dirty_tx, tx_left;
+   unsigned int dirty_tx, tx_left, bytes_compl = 0, pkts_compl = 0;
 
dirty_tx = tp->dirty_tx;
smp_rmb();
@@ -6275,10 +6278,8 @@ static void rtl_tx(struct net_device *dev, struct 
rtl8169_private *tp)
rtl8169_unmap_tx_skb(tp_to_dev(tp), tx_skb,
 tp->TxDescArray + entry);
if (status & LastFrag) {
-   u64_stats_update_begin(>tx_stats.syncp);
-   tp->tx_stats.packets++;
-   tp->tx_stats.bytes += tx_skb->skb->len;
-   u64_stats_update_end(>tx_stats.syncp);
+   pkts_compl++;
+   bytes_compl += tx_skb->skb->len;
dev_consume_skb_any(tx_skb->skb);
tx_skb->skb = NULL;
}
@@ -6287,6 +6288,13 @@ static void rtl_tx(struct net_device *dev, struct 
rtl8169_private *tp)
}
 
if (tp->dirty_tx != dirty_tx) {
+   netdev_completed_queue(dev, pkts_compl, bytes_compl);
+
+   u64_stats_update_begin(>tx_stats.syncp);
+   tp->tx_stats.packets += pkts_compl;
+   tp->tx_stats.bytes += bytes_compl;
+   u64_stats_update_end(>tx_stats.syncp);
+
tp->dirty_tx = dirty_tx;
/* Sync with rtl8169_start_xmit:
 * - publish dirty_tx ring index (write barrier)
-- 
2.19.1



Re: [PATCH net] r8169: fix NAPI handling under high load

2018-10-20 Thread Holger Hoffstätte

On 10/17/18 22:07, Holger Hoffstätte wrote:

On 10/17/18 21:27, Heiner Kallweit wrote:
(snip)

Good to know. What's your kernel version and RTL8168 chip version?
Regarding the chip version the dmesg line with the XID would be relevant.


4.18.15 + PDS (custom CPU scheduler) + cherry pickings from mainline.
Applied both the original patch in this thread & bql, built fine.


Good news everyone! Been running with the new BQL patch for three days
now on 2 machines and not a single hang/reset, regardless of load.
Coupled with the original patch in this thread (already in mainline)
this looks pretty good!

So while I can only speak for myself & my hardware, here's a

Tested-by: Holger Hoffstätte 

Thanks Heiner!

-h


pull request: bluetooth-next 2018-10-20

2018-10-20 Thread Johan Hedberg
Hi Dave,

Here's one more bluetooth-next pull request for the 4.20 kernel.

 - Added new USB ID for QCA_ROME controller
 - Added debug trace support from QCA wcn3990 controllers
 - Updated L2CAP to conform to latest Errata Service Release
 - Fix binding to non-removable BCM43430 devices

Please let me know if there are any issues pulling. Thanks.

Johan

---
The following changes since commit d864991b220b7c62e81d21209e1fd978fd67352c:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2018-10-12 
21:38:46 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git 
for-upstream

for you to fetch changes up to c614ca3f7476934de54dd731e09d094ad822696c:

  Bluetooth: hci_qca: Add support for controller debug logs. (2018-10-18 
09:55:16 +0200)


Balakrishna Godavarthi (1):
  Bluetooth: hci_qca: Add support for controller debug logs.

Cho, Yu-Chen (1):
  Bluetooth: btsdio: Do not bind to non-removable BCM43430

Colin Ian King (1):
  Bluetooth: Remove redundant check on status

Mallikarjun Phulari (2):
  Bluetooth: Use separate L2CAP LE credit based connection result values
  Bluetooth: Errata Service Release 8, Erratum 3253

Owen Lin (1):
  Bluetooth: btusb: Add support for 0cf3:535b QCA_ROME device

 drivers/bluetooth/btsdio.c| 14 +-
 drivers/bluetooth/btusb.c |  1 +
 drivers/bluetooth/hci_qca.c   | 19 ++-
 include/net/bluetooth/l2cap.h | 19 +--
 net/bluetooth/hci_event.c | 38 +-
 net/bluetooth/l2cap_core.c| 36 ++--
 6 files changed, 80 insertions(+), 47 deletions(-)


signature.asc
Description: PGP signature


Re: [PATCH net] net: fix pskb_trim_rcsum_slow() with odd trim offset

2018-10-20 Thread David Miller
From: Dimitris Michailidis 
Date: Fri, 19 Oct 2018 17:07:13 -0700

> We've been getting checksum errors involving small UDP packets, usually
> 59B packets with 1 extra non-zero padding byte. netdev_rx_csum_fault()
> has been complaining that HW is providing bad checksums. Turns out the
> problem is in pskb_trim_rcsum_slow(), introduced in commit 88078d98d1bb
> ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends").
> 
> The source of the problem is that when the bytes we are trimming start
> at an odd address, as in the case of the 1 padding byte above,
> skb_checksum() returns a byte-swapped value. We cannot just combine this
> with skb->csum using csum_sub(). We need to use csum_block_sub() here
> that takes into account the parity of the start address and handles the
> swapping.
> 
> Matches existing code in __skb_postpull_rcsum() and esp_remove_trailer().
> 
> Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are 
> friends")
> Signed-off-by: Dimitris Michailidis 

Applied and queued up for -stable, thanks!


Re: [PATCH net-next] net: loopback: clear skb->tstamp before netif_rx()

2018-10-20 Thread David Miller
From: Eric Dumazet 
Date: Fri, 19 Oct 2018 19:11:26 -0700

> At least UDP / TCP stacks can now cook skbs with a tstamp using
> MONOTONIC base (or arbitrary values with SCM_TXTIME)
> 
> Since loopback driver does not call (directly or indirectly)
> skb_scrub_packet(), we need to clear skb->tstamp so that
> net_timestamp_check() can eventually resample the time,
> using ktime_get_real().
> 
> Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit 
> time.")
> Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
> Signed-off-by: Eric Dumazet 

Applied, thanks Eric.