date:20160906

[PATCH v4 6/6] samples: bpf: add userspace example for attaching eBPF programs to cgroups

2016-09-06 Thread Daniel Mack

Add a simple userpace program to demonstrate the new API to attach eBPF
programs to cgroups. This is what it does:

 * Create arraymap in kernel with 4 byte keys and 8 byte values

 * Load eBPF program

   The eBPF program accesses the map passed in to store two pieces of
   information. The number of invocations of the program, which maps
   to the number of packets received, is stored to key 0. Key 1 is
   incremented on each iteration by the number of bytes stored in
   the skb.

 * Detach any eBPF program previously attached to the cgroup

 * Attach the new program to the cgroup using BPF_PROG_ATTACH

 * Once a second, read map[0] and map[1] to see how many bytes and
   packets were seen on any socket of tasks in the given cgroup.

The program takes a cgroup path as 1st argument, and either "ingress"
or "egress" as 2nd. Optionally, "drop" can be passed as 3rd argument,
which will make the generated eBPF program return 0 instead of 1, so
the kernel will drop the packet.

libbpf gained two new wrappers for the new syscall commands.

Signed-off-by: Daniel Mack 
---
 samples/bpf/Makefile|   2 +
 samples/bpf/libbpf.c|  21 ++
 samples/bpf/libbpf.h|   3 +
 samples/bpf/test_cgrp2_attach.c | 147 
 4 files changed, 173 insertions(+)
 create mode 100644 samples/bpf/test_cgrp2_attach.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 12b7304..e4cdc74 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -22,6 +22,7 @@ hostprogs-y += spintest
 hostprogs-y += map_perf_test
 hostprogs-y += test_overhead
 hostprogs-y += test_cgrp2_array_pin
+hostprogs-y += test_cgrp2_attach
 hostprogs-y += xdp1
 hostprogs-y += xdp2
 hostprogs-y += test_current_task_under_cgroup
@@ -49,6 +50,7 @@ spintest-objs := bpf_load.o libbpf.o spintest_user.o
 map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o
 test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o
 test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o
+test_cgrp2_attach-objs := libbpf.o test_cgrp2_attach.o
 xdp1-objs := bpf_load.o libbpf.o xdp1_user.o
 # reuse xdp1 source intentionally
 xdp2-objs := bpf_load.o libbpf.o xdp1_user.o
diff --git a/samples/bpf/libbpf.c b/samples/bpf/libbpf.c
index 9969e35..9ce707b 100644
--- a/samples/bpf/libbpf.c
+++ b/samples/bpf/libbpf.c
@@ -104,6 +104,27 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
return syscall(__NR_bpf, BPF_PROG_LOAD, , sizeof(attr));
 }
 
+int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type)
+{
+   union bpf_attr attr = {
+   .target_fd = target_fd,
+   .attach_bpf_fd = prog_fd,
+   .attach_type = type,
+   };
+
+   return syscall(__NR_bpf, BPF_PROG_ATTACH, , sizeof(attr));
+}
+
+int bpf_prog_detach(int target_fd, enum bpf_attach_type type)
+{
+   union bpf_attr attr = {
+   .target_fd = target_fd,
+   .attach_type = type,
+   };
+
+   return syscall(__NR_bpf, BPF_PROG_DETACH, , sizeof(attr));
+}
+
 int bpf_obj_pin(int fd, const char *pathname)
 {
union bpf_attr attr = {
diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h
index 364582b..f973241 100644
--- a/samples/bpf/libbpf.h
+++ b/samples/bpf/libbpf.h
@@ -15,6 +15,9 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
  const struct bpf_insn *insns, int insn_len,
  const char *license, int kern_version);
 
+int bpf_prog_attach(int prog_fd, int attachable_fd, enum bpf_attach_type type);
+int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
+
 int bpf_obj_pin(int fd, const char *pathname);
 int bpf_obj_get(const char *pathname);
 
diff --git a/samples/bpf/test_cgrp2_attach.c b/samples/bpf/test_cgrp2_attach.c
new file mode 100644
index 000..19e4ec0
--- /dev/null
+++ b/samples/bpf/test_cgrp2_attach.c
@@ -0,0 +1,147 @@
+/* eBPF example program:
+ *
+ * - Creates arraymap in kernel with 4 bytes keys and 8 byte values
+ *
+ * - Loads eBPF program
+ *
+ *   The eBPF program accesses the map passed in to store two pieces of
+ *   information. The number of invocations of the program, which maps
+ *   to the number of packets received, is stored to key 0. Key 1 is
+ *   incremented on each iteration by the number of bytes stored in
+ *   the skb.
+ *
+ * - Detaches any eBPF program previously attached to the cgroup
+ *
+ * - Attaches the new program to a cgroup using BPF_PROG_ATTACH
+ *
+ * - Every second, reads map[0] and map[1] to see how many bytes and
+ *   packets were seen on any socket of tasks in the given cgroup.
+ */
+
+#define _GNU_SOURCE
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "libbpf.h"
+
+enum {
+   MAP_KEY_PACKETS,
+   MAP_KEY_BYTES,
+};
+
+static int prog_load(int map_fd, int verdict)
+{
+   struct bpf_insn prog[] = {
+

[PATCH v4 0/6] Add eBPF hooks for cgroups

2016-09-06 Thread Daniel Mack

This is v4 of the patch set to allow eBPF programs for network
filtering and accounting to be attached to cgroups, so that they apply
to all sockets of all tasks placed in that cgroup. The logic also
allows to be extendeded for other cgroup based eBPF logic.

All the comments I got since v3 were addressed. FWIW, I left the
egress hook in __dev_queue_xmit() for now, as I don't currently see
any better place to put it. If we find one, we can still move the
hook around, and relax the !sk and sk->sk_family checks.


Changes from v3:

* Dropped the _FILTER suffix from BPF_PROG_TYPE_CGROUP_SOCKET_FILTER,
  renamed BPF_ATTACH_TYPE_CGROUP_INET_{E,IN}GRESS to
  BPF_CGROUP_INET_{IN,E}GRESS and alias BPF_MAX_ATTACH_TYPE to
  __BPF_MAX_ATTACH_TYPE, as suggested by Daniel Borkmann.

* Dropped the attach_flags member from the anonymous struct for BPF
  attach operations in union bpf_attr. They can be added later on via
  CHECK_ATTR. Requested by Daniel Borkmann and Alexei.

* Release old_prog at the end of __cgroup_bpf_update rather that at
  the beginning to fix a race gap between program updates and their
  users. Spotted by Daniel Borkmann.

* Plugged an skb leak when dropping packets on the egress path.
  Spotted by Daniel Borkmann.

* Add cgro...@vger.kernel.org to the loop, as suggested by Rami Rosen.

* Some minor coding style adoptions not worth mentioning in particular.


Changes from v2:

* Fixed the RCU locking details Tejun pointed out.

* Assert bpf_attr.flags == 0 in BPF_PROG_DETACH syscall handler.


Changes from v1:

* Moved all bpf specific cgroup code into its own file, and stub
  out related functions for !CONFIG_CGROUP_BPF as static inline nops.
  This way, the call sites are not cluttered with #ifdef guards while
  the feature remains compile-time configurable.

* Implemented the new scheme proposed by Tejun. Per cgroup, store one
  set of pointers that are pinned to the cgroup, and one for the
  programs that are effective. When a program is attached or detached,
  the change is propagated to all the cgroup's descendants. If a
  subcgroup has its own pinned program, skip the whole subbranch in
  order to allow delegation models.

* The hookup for egress packets is now done from __dev_queue_xmit().

* A static key is now used in both the ingress and egress fast paths
  to keep performance penalties close to zero if the feature is
  not in use.

* Overall cleanup to make the accessors use the program arrays.
  This should make it much easier to add new program types, which
  will then automatically follow the pinned vs. effective logic.

* Fixed locking issues, as pointed out by Eric Dumazet and Alexei
  Starovoitov. Changes to the program array are now done with
  xchg() and are protected by cgroup_mutex.

* eBPF programs are now expected to return 1 to let the packet pass,
  not >= 0. Pointed out by Alexei.

* Operation is now limited to INET sockets, so local AF_UNIX sockets
  are not affected. The enum members are renamed accordingly. In case
  other socket families should be supported, this can be extended in
  the future.

* The sample program learned to support both ingress and egress, and
  can now optionally make the eBPF program drop packets by making it
  return 0.


As always, feedback is much appreciated.

Thanks,
Daniel

Daniel Mack (6):
  bpf: add new prog type for cgroup socket filtering
  cgroup: add support for eBPF programs
  bpf: add BPF_PROG_ATTACH and BPF_PROG_DETACH commands
  net: filter: run cgroup eBPF ingress programs
  net: core: run cgroup eBPF egress programs
  samples: bpf: add userspace example for attaching eBPF programs to
cgroups

 include/linux/bpf-cgroup.h  |  70 +
 include/linux/cgroup-defs.h |   4 +
 include/uapi/linux/bpf.h|  17 +
 init/Kconfig|  12 +++
 kernel/bpf/Makefile |   1 +
 kernel/bpf/cgroup.c | 165 
 kernel/bpf/syscall.c|  81 
 kernel/bpf/verifier.c   |   1 +
 kernel/cgroup.c |  18 +
 net/core/dev.c  |   7 +-
 net/core/filter.c   |  10 +++
 samples/bpf/Makefile|   2 +
 samples/bpf/libbpf.c|  21 +
 samples/bpf/libbpf.h|   3 +
 samples/bpf/test_cgrp2_attach.c | 147 +++
 15 files changed, 558 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/bpf-cgroup.h
 create mode 100644 kernel/bpf/cgroup.c
 create mode 100644 samples/bpf/test_cgrp2_attach.c

-- 
2.5.5

Re: [PATCH v5 nf] netfilter: seqadj: Drop the packet directly when fail to add seqadj extension to avoid dereference NULL pointer later

2016-09-06 Thread Gao Feng

inline

On Tue, Sep 6, 2016 at 10:51 PM, Florian Westphal  wrote:
> f...@ikuai8.com  wrote:
>> From: Gao Feng 
>>
>> When memory is exhausted, nfct_seqadj_ext_add may fail to add the seqadj
>> extension. But the function nf_ct_seqadj_init doesn't check if get valid
>> seqadj pointer by the nfct_seqadj.
>>
>> Now drop the packet directly when fail to add seqadj extension to avoid
>> dereference NULL pointer in nf_ct_seqadj_init.
>>
>> Signed-off-by: Gao Feng 
>> ---
>>  v5: Return NF_ACCEPT instead of NF_DROP when nfct_seqadj_ext_add failed in 
>> nf_nat_setup_info
>>  v4: Drop the packet directly when fail to add seqadj extension;
>>  v3: Remove the warning log when seqadj is null;
>>  v2: Remove the unnessary seqadj check in nf_ct_seq_adjust
>>  v1: Initial patch
>>
>>  net/netfilter/nf_conntrack_core.c | 6 +-
>>  net/netfilter/nf_nat_core.c   | 3 ++-
>>  2 files changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/netfilter/nf_conntrack_core.c 
>> b/net/netfilter/nf_conntrack_core.c
>> index dd2c43a..dfa76ce 100644
>> --- a/net/netfilter/nf_conntrack_core.c
>> +++ b/net/netfilter/nf_conntrack_core.c
>> @@ -1036,7 +1036,11 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
>>   return (struct nf_conntrack_tuple_hash *)ct;
>>
>>   if (tmpl && nfct_synproxy(tmpl)) {
>> - nfct_seqadj_ext_add(ct);
>> + if (!nfct_seqadj_ext_add(ct)) {
>> + nf_conntrack_free(ct);
>> + pr_debug("Can't add seqadj extension\n");
>> + return NULL;
>> + }
>
> if (!nfct_add_synrpxy(ct, tmpl)) {
> nf_conntrack_free(ct);
> return NULL;
> }
>
> static bool nf_ct_add_synproxy(struct nf_conn *ct, const struct nf_conn *tmpl)
> {
> if (tmpl && nfct_synproxy(tmpl)) {
> if (!nfct_seqadj_ext_add(ct))
> return false;
>
> if (!nfct_synproxy_ext_add(ct))
> return false;
> }
>
> return true;
> }
>
>> diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
>> index de31818..f8b916a 100644
>> --- a/net/netfilter/nf_nat_core.c
>> +++ b/net/netfilter/nf_nat_core.c
>> @@ -441,7 +441,8 @@ nf_nat_setup_info(struct nf_conn *ct,
>>   ct->status |= IPS_DST_NAT;
>>
>>   if (nfct_help(ct))
>> - nfct_seqadj_ext_add(ct);
>> + if (!nfct_seqadj_ext_add(ct))
>> + return NF_ACCEPT;
>>   }
>
> Hmm, why accept?
>
> We are asked to add extension to rewrite sequence numbers, but
> we cannot.  How can the connection work if we cannot munge/track
> seqno rewrites?

I thought we should drop the packet in that case before.
But Pablo point out one case that the ctnetlink also could add the
seqadj extension too.
There is one synchronization case.

I think he is right. Then modify the patch according to his advice.

Best Regards
Feng

Re: [PATCH, net-next] perf, bpf: fix conditional call to bpf_overflow_handler

2016-09-06 Thread Alexei Starovoitov

On Tue, Sep 06, 2016 at 03:10:22PM +0200, Arnd Bergmann wrote:
> The newly added bpf_overflow_handler function is only built of both
> CONFIG_EVENT_TRACING and CONFIG_BPF_SYSCALL are enabled, but the caller
> only checks the latter:
> 
> kernel/events/core.c: In function 'perf_event_alloc':
> kernel/events/core.c:9106:27: error: 'bpf_overflow_handler' undeclared (first 
> use in this function)
> 
> This changes the caller so we also skip this call if CONFIG_EVENT_TRACING
> is disabled entirely.
> 
> Signed-off-by: Arnd Bergmann 
> Fixes: aa6a5f3cb2b2 ("perf, bpf: add perf events core support for 
> BPF_PROG_TYPE_PERF_EVENT programs")
> ---
>  kernel/events/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> I'm not entirely sure if this is the correct solution, please check before 
> applying

Acked-by: Alexei Starovoitov 

Thanks for the fix. Just saw build bot complaining last night and
by the morning your fix is already here. Thanks!

Re: [patch net-next RFC 1/2] fib: introduce fib notification infrastructure

2016-09-06 Thread David Ahern

On 9/6/16 6:01 AM, Jiri Pirko wrote:
> From: Jiri Pirko 
> 
> This allows to pass information about added/deleted fib entries to
> whoever is interested. This is done in a very similar way as devinet
> notifies address additions/removals.
> 
> Signed-off-by: Jiri Pirko 
> ---
>  include/net/ip_fib.h | 19 +++
>  net/ipv4/fib_trie.c  | 43 +++
>  2 files changed, 62 insertions(+)
> 

The notifier infrastructure should be generalized for use with IPv4 and IPv6. 
While the data will be family based, the infra can be generic.

Re: [RFC Patch net-next 5/6] net_sched: use rcu in fast path

2016-09-06 Thread Eric Dumazet

On Thu, 2016-09-01 at 22:57 -0700, Cong Wang wrote:



Missing changelog ?

Here I have no idea what you want to fix, since John already took care
all this infra.

Adding extra rcu_dereference() and rcu_read_lock() while the critical
RCU dereferences already happen in callers is not needed.

> Signed-off-by: Cong Wang 
> ---
>  net/sched/act_api.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index 2f8db3c..fb6ff52 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -470,10 +470,14 @@ int tcf_action_exec(struct sk_buff *skb, struct 
> tc_action **actions,
>   goto exec_done;
>   }
>   for (i = 0; i < nr_actions; i++) {
> - const struct tc_action *a = actions[i];
> + const struct tc_action *a;
>  
> + rcu_read_lock();

But the caller already has rcu_read_lock() or rcu_read_lock_bh()

This was done in commit 25d8c0d55f241ce2 ("net: rcu-ify tcf_proto")

> + a = rcu_dereference(actions[i]);


Add in your .config :
CONFIG_SPARSE_RCU_POINTER=y
make C=2 M=net/sched

>  repeat:
>   ret = a->ops->act(skb, a, res);
> + rcu_read_unlock();
> +
>   if (ret == TC_ACT_REPEAT)
>   goto repeat;/* we need a ttl - JHS */
>   if (ret != TC_ACT_PIPE)



I do not believe this patch is necessary.

Please add John as reviewer next time.

Thanks.

Re: [PATCH] bonding: Prevent deletion of a bond, or the last slave from a bond, with active usage.

2016-09-06 Thread Jay Vosburgh

Kaur, Jasminder  wrote:

>From: "Kaur, Jasminder" 
>
>If a bond is in use such as with IP address configured, removing it
>can result in application disruptions. If bond is used for cluster
>communication or network file system interfaces, removing it can cause
>system down time.
>
>An additional write option “?-” is added to sysfs bond interfaces as
>below, in order to prevent accidental deletions while bond is in use.
>In the absence of any usage, the below option proceeds with bond deletion.
>“ echo "?-bondX" > /sys/class/net/bonding_masters “ .
>If usage is detected such as an IP address configured, deletion is
>prevented with appropriate message logged to syslog.

The issue of interfaces being arbitrarily changed or deleted is
not specific to bonding, and could affect any networking device
(physical or virtual).  Thus, if a facility such as this is to be
provided, it should be generic, not specific to bonding.

Separately, I'm not sure I see the value of such an option.
Other than administrator error, I'm not sure when bonds (or other
interfaces) would be randomly deleted.  Are you seeing that happening?

Also, this patch does not prevent other errors or malicious
change, e.g., "ip link set bondX down" or "ip addr del 1.2.3.4/24" would
still cause the service disruption you're trying to avoid.

And, lastly, what Jiri said: use netlink for new bonding
functionality, not sysfs.

-J

>In the absence of any usage, the below option proceeds with deletion of
>slaves from a bond.
>“ echo "?-enoX" > /sys/class/net/bondX/bonding/slaves “ .
>If usage is detected such as an IP address configured on bond, deletion
>is prevented if the last slave is being removed from bond.
>An appropriate message is logged to syslog.
>
>Signed-off-by: Jasminder Kaur 
>---
> drivers/net/bonding/bond_options.c | 24 ++--
> drivers/net/bonding/bond_sysfs.c   | 35 +--
> 2 files changed, 55 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_options.c 
>b/drivers/net/bonding/bond_options.c
>index 577e57c..e7640ea 100644
>--- a/drivers/net/bonding/bond_options.c
>+++ b/drivers/net/bonding/bond_options.c
>@@ -1335,9 +1335,15 @@ static int bond_option_slaves_set(struct bonding *bond,
>   struct net_device *dev;
>   char *ifname;
>   int ret;
>+  struct in_device *in_dev;
> 
>   sscanf(newval->string, "%16s", command); /* IFNAMSIZ*/
>-  ifname = command + 1;
>+
>+  if ((command[0] == '?') && (command[1] == '-'))
>+  ifname = command + 2;
>+  else
>+  ifname = command + 1;
>+
>   if ((strlen(command) <= 1) ||
>   !dev_valid_name(ifname))
>   goto err_no_cmd;
>@@ -1356,6 +1362,20 @@ static int bond_option_slaves_set(struct bonding *bond,
>   ret = bond_enslave(bond->dev, dev);
>   break;
> 
>+  case '?':
>+  if (command[1] == '-') {
>+  in_dev = __in_dev_get_rtnl(bond->dev);
>+  if ((bond->slave_cnt == 1) &&
>+  ((in_dev->ifa_list) != NULL)) {
>+  netdev_info(bond->dev, "attempt to remove last 
>slave %s from bond.\n",
>+  dev->name);
>+  ret = -EBUSY;
>+  break;
>+  }
>+  } else {
>+  goto err_no_cmd;
>+  }
>+
>   case '-':
>   netdev_info(bond->dev, "Removing slave %s\n", dev->name);
>   ret = bond_release(bond->dev, dev);
>@@ -1369,7 +1389,7 @@ out:
>   return ret;
> 
> err_no_cmd:
>-  netdev_err(bond->dev, "no command found in slaves file - use +ifname or 
>-ifname\n");
>+  netdev_err(bond->dev, "no command found in slaves file - use +ifname or 
>-ifname or ?-ifname\n");
>   ret = -EPERM;
>   goto out;
> }
>diff --git a/drivers/net/bonding/bond_sysfs.c 
>b/drivers/net/bonding/bond_sysfs.c
>index e23c3ed..7c2ef64 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -102,7 +102,12 @@ static ssize_t bonding_store_bonds(struct class *cls,
>   int rv, res = count;
> 
>   sscanf(buffer, "%16s", command); /* IFNAMSIZ*/
>-  ifname = command + 1;
>+
>+  if ((command[0] == '?') && (command[1] == '-'))
>+  ifname = command + 2;
>+  else
>+  ifname = command + 1;
>+
>   if ((strlen(command) <= 1) ||
>   !dev_valid_name(ifname))
>   goto err_no_cmd;
>@@ -130,6 +135,32 @@ static ssize_t bonding_store_bonds(struct class *cls,
>   res = -ENODEV;
>   }
>   rtnl_unlock();
>+  } else if ((command[0] == '?') && (command[1] == '-')) {
>+  struct net_device *bond_dev;
>+
>+  rtnl_lock();
>+

Re: [patch net-next RFC 1/2] fib: introduce fib notification infrastructure

2016-09-06 Thread David Ahern

On 9/6/16 6:01 AM, Jiri Pirko wrote:
> From: Jiri Pirko 
> 
> This allows to pass information about added/deleted fib entries to
> whoever is interested. This is done in a very similar way as devinet
> notifies address additions/removals.
> 
> Signed-off-by: Jiri Pirko 
> ---
>  include/net/ip_fib.h | 19 +++
>  net/ipv4/fib_trie.c  | 43 +++
>  2 files changed, 62 insertions(+)

Do you intend for this set of notifiers to work with policy routing (FIB rules) 
as well?

[PATCH v4 5/6] net: core: run cgroup eBPF egress programs

2016-09-06 Thread Daniel Mack

If the cgroup associated with the receiving socket has an eBPF
programs installed, run them from __dev_queue_xmit().

eBPF programs used in this context are expected to either return 1 to
let the packet pass, or != 1 to drop them. The programs have access to
the full skb, including the MAC headers.

Note that cgroup_bpf_run_filter() is stubbed out as static inline nop
for !CONFIG_CGROUP_BPF, and is otherwise guarded by a static key if
the feature is unused.

Signed-off-by: Daniel Mack 
---
 net/core/dev.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 34b5322..eb2bd20 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -141,6 +141,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "net-sysfs.h"
 
@@ -3329,6 +3330,10 @@ static int __dev_queue_xmit(struct sk_buff *skb, void 
*accel_priv)
if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
__skb_tstamp_tx(skb, NULL, skb->sk, SCM_TSTAMP_SCHED);
 
+   rc = cgroup_bpf_run_filter(skb->sk, skb, BPF_CGROUP_INET_EGRESS);
+   if (rc)
+   goto free_skb_list;
+
/* Disable soft irqs for various locks below. Also
 * stops preemption for RCU.
 */
@@ -3414,8 +3419,8 @@ recursion_alert:
 
rc = -ENETDOWN;
rcu_read_unlock_bh();
-
atomic_long_inc(>tx_dropped);
+free_skb_list:
kfree_skb_list(skb);
return rc;
 out:
-- 
2.5.5

[PATCH] ptp: ixp46x: remove NO_IRQ handling

2016-09-06 Thread Arnd Bergmann

gpio_to_irq does not return NO_IRQ but instead returns a negative
error code on failure. Returning NO_IRQ from the function has no
negative effects as we only compare the result to the expected
interrupt number, but it's better to return a proper failure
code for consistency, and we should remove NO_IRQ from the kernel
entirely.

Signed-off-by: Arnd Bergmann 
---
 drivers/ptp/ptp_ixp46x.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/ptp/ptp_ixp46x.c b/drivers/ptp/ptp_ixp46x.c
index ee4f183ef9ee..fbcb940b6348 100644
--- a/drivers/ptp/ptp_ixp46x.c
+++ b/drivers/ptp/ptp_ixp46x.c
@@ -268,18 +268,19 @@ static int setup_interrupt(int gpio)
return err;
 
irq = gpio_to_irq(gpio);
+   if (irq < 0)
+   return irq;
 
-   if (NO_IRQ == irq)
-   return NO_IRQ;
-
-   if (irq_set_irq_type(irq, IRQF_TRIGGER_FALLING)) {
+   err = irq_set_irq_type(irq, IRQF_TRIGGER_FALLING);
+   if (err) {
pr_err("cannot set trigger type for irq %d\n", irq);
-   return NO_IRQ;
+   return err;
}
 
-   if (request_irq(irq, isr, 0, DRIVER, _clock)) {
+   err = request_irq(irq, isr, 0, DRIVER, _clock);
+   if (err)
pr_err("request_irq failed for irq %d\n", irq);
-   return NO_IRQ;
+   return err;
}
 
return irq;
-- 
2.9.0

[PATCH v4 2/6] cgroup: add support for eBPF programs

2016-09-06 Thread Daniel Mack

This patch adds two sets of eBPF program pointers to struct cgroup.
One for such that are directly pinned to a cgroup, and one for such
that are effective for it.

To illustrate the logic behind that, assume the following example
cgroup hierarchy.

  A - B - C
\ D - E

If only B has a program attached, it will be effective for B, C, D
and E. If D then attaches a program itself, that will be effective for
both D and E, and the program in B will only affect B and C. Only one
program of a given type is effective for a cgroup.

Attaching and detaching programs will be done through the bpf(2)
syscall. For now, ingress and egress inet socket filtering are the
only supported use-cases.

Signed-off-by: Daniel Mack 
---
 include/linux/bpf-cgroup.h  |  70 +++
 include/linux/cgroup-defs.h |   4 ++
 init/Kconfig|  12 
 kernel/bpf/Makefile |   1 +
 kernel/bpf/cgroup.c | 165 
 kernel/cgroup.c |  18 +
 6 files changed, 270 insertions(+)
 create mode 100644 include/linux/bpf-cgroup.h
 create mode 100644 kernel/bpf/cgroup.c

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
new file mode 100644
index 000..eac0957
--- /dev/null
+++ b/include/linux/bpf-cgroup.h
@@ -0,0 +1,70 @@
+#ifndef _BPF_CGROUP_H
+#define _BPF_CGROUP_H
+
+#include 
+#include 
+
+struct sock;
+struct cgroup;
+struct sk_buff;
+
+#ifdef CONFIG_CGROUP_BPF
+
+extern struct static_key_false cgroup_bpf_enabled_key;
+#define cgroup_bpf_enabled static_branch_unlikely(_bpf_enabled_key)
+
+struct cgroup_bpf {
+   /*
+* Store two sets of bpf_prog pointers, one for programs that are
+* pinned directly to this cgroup, and one for those that are effective
+* when this cgroup is accessed.
+*/
+   struct bpf_prog *prog[MAX_BPF_ATTACH_TYPE];
+   struct bpf_prog *effective[MAX_BPF_ATTACH_TYPE];
+};
+
+void cgroup_bpf_put(struct cgroup *cgrp);
+void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent);
+
+void __cgroup_bpf_update(struct cgroup *cgrp,
+struct cgroup *parent,
+struct bpf_prog *prog,
+enum bpf_attach_type type);
+
+/* Wrapper for __cgroup_bpf_update() protected by cgroup_mutex */
+void cgroup_bpf_update(struct cgroup *cgrp,
+  struct bpf_prog *prog,
+  enum bpf_attach_type type);
+
+int __cgroup_bpf_run_filter(struct sock *sk,
+   struct sk_buff *skb,
+   enum bpf_attach_type type);
+
+/* Wrapper for __cgroup_bpf_run_filter() guarded by cgroup_bpf_enabled */
+static inline int cgroup_bpf_run_filter(struct sock *sk,
+   struct sk_buff *skb,
+   enum bpf_attach_type type)
+{
+   if (cgroup_bpf_enabled)
+   return __cgroup_bpf_run_filter(sk, skb, type);
+
+   return 0;
+}
+
+#else
+
+struct cgroup_bpf {};
+static inline void cgroup_bpf_put(struct cgroup *cgrp) {}
+static inline void cgroup_bpf_inherit(struct cgroup *cgrp,
+ struct cgroup *parent) {}
+
+static inline int cgroup_bpf_run_filter(struct sock *sk,
+   struct sk_buff *skb,
+   enum bpf_attach_type type)
+{
+   return 0;
+}
+
+#endif /* CONFIG_CGROUP_BPF */
+
+#endif /* _BPF_CGROUP_H */
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 5b17de6..861b467 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_CGROUPS
 
@@ -300,6 +301,9 @@ struct cgroup {
/* used to schedule release agent */
struct work_struct release_agent_work;
 
+   /* used to store eBPF programs */
+   struct cgroup_bpf bpf;
+
/* ids of the ancestors at each level including self */
int ancestor_ids[];
 };
diff --git a/init/Kconfig b/init/Kconfig
index cac3f09..71c71b0 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1144,6 +1144,18 @@ config CGROUP_PERF
 
  Say N if unsure.
 
+config CGROUP_BPF
+   bool "Support for eBPF programs attached to cgroups"
+   depends on BPF_SYSCALL && SOCK_CGROUP_DATA
+   help
+ Allow attaching eBPF programs to a cgroup using the bpf(2)
+ syscall command BPF_PROG_ATTACH.
+
+ In which context these programs are accessed depends on the type
+ of attachment. For instance, programs that are attached using
+ BPF_CGROUP_INET_INGRESS will be executed on the ingress path of
+ inet sockets.
+
 config CGROUP_DEBUG
bool "Example controller"
default n
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index eed911d..b22256b 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -5,3 +5,4 @@

[PATCH v4 3/6] bpf: add BPF_PROG_ATTACH and BPF_PROG_DETACH commands

2016-09-06 Thread Daniel Mack

Extend the bpf(2) syscall by two new commands, BPF_PROG_ATTACH and
BPF_PROG_DETACH which allow attaching and detaching eBPF programs
to a target.

On the API level, the target could be anything that has an fd in
userspace, hence the name of the field in union bpf_attr is called
'target_fd'.

When called with BPF_ATTACH_TYPE_CGROUP_INET_{E,IN}GRESS, the target is
expected to be a valid file descriptor of a cgroup v2 directory which
has the bpf controller enabled. These are the only use-cases
implemented by this patch at this point, but more can be added.

If a program of the given type already exists in the given cgroup,
the program is swapped automically, so userspace does not have to drop
an existing program first before installing a new one, which would
otherwise leave a gap in which no program is attached.

For more information on the propagation logic to subcgroups, please
refer to the bpf cgroup controller implementation.

The API is guarded by CAP_NET_ADMIN.

Signed-off-by: Daniel Mack 
---
 include/uapi/linux/bpf.h |  8 +
 kernel/bpf/syscall.c | 81 
 2 files changed, 89 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 55f815e..7cd3616 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -73,6 +73,8 @@ enum bpf_cmd {
BPF_PROG_LOAD,
BPF_OBJ_PIN,
BPF_OBJ_GET,
+   BPF_PROG_ATTACH,
+   BPF_PROG_DETACH,
 };
 
 enum bpf_map_type {
@@ -150,6 +152,12 @@ union bpf_attr {
__aligned_u64   pathname;
__u32   bpf_fd;
};
+
+   struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */
+   __u32   target_fd;  /* container object to attach 
to */
+   __u32   attach_bpf_fd;  /* eBPF program to attach */
+   __u32   attach_type;
+   };
 } __attribute__((aligned(8)));
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 228f962..1a8592a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -822,6 +822,77 @@ static int bpf_obj_get(const union bpf_attr *attr)
return bpf_obj_get_user(u64_to_ptr(attr->pathname));
 }
 
+#ifdef CONFIG_CGROUP_BPF
+
+#define BPF_PROG_ATTACH_LAST_FIELD attach_type
+
+static int bpf_prog_attach(const union bpf_attr *attr)
+{
+   struct bpf_prog *prog;
+   struct cgroup *cgrp;
+
+   if (!capable(CAP_NET_ADMIN))
+   return -EPERM;
+
+   if (CHECK_ATTR(BPF_PROG_ATTACH))
+   return -EINVAL;
+
+   switch (attr->attach_type) {
+   case BPF_CGROUP_INET_INGRESS:
+   case BPF_CGROUP_INET_EGRESS:
+   prog = bpf_prog_get_type(attr->attach_bpf_fd,
+BPF_PROG_TYPE_CGROUP_SOCKET);
+   if (IS_ERR(prog))
+   return PTR_ERR(prog);
+
+   cgrp = cgroup_get_from_fd(attr->target_fd);
+   if (IS_ERR(cgrp)) {
+   bpf_prog_put(prog);
+   return PTR_ERR(cgrp);
+   }
+
+   cgroup_bpf_update(cgrp, prog, attr->attach_type);
+   cgroup_put(cgrp);
+   break;
+
+   default:
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+#define BPF_PROG_DETACH_LAST_FIELD attach_type
+
+static int bpf_prog_detach(const union bpf_attr *attr)
+{
+   struct cgroup *cgrp;
+
+   if (!capable(CAP_NET_ADMIN))
+   return -EPERM;
+
+   if (CHECK_ATTR(BPF_PROG_DETACH))
+   return -EINVAL;
+
+   switch (attr->attach_type) {
+   case BPF_CGROUP_INET_INGRESS:
+   case BPF_CGROUP_INET_EGRESS:
+   cgrp = cgroup_get_from_fd(attr->target_fd);
+   if (IS_ERR(cgrp))
+   return PTR_ERR(cgrp);
+
+   cgroup_bpf_update(cgrp, NULL, attr->attach_type);
+   cgroup_put(cgrp);
+   break;
+
+   default:
+   return -EINVAL;
+   }
+
+   return 0;
+}
+#endif /* CONFIG_CGROUP_BPF */
+
 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, 
size)
 {
union bpf_attr attr = {};
@@ -888,6 +959,16 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, 
uattr, unsigned int, siz
case BPF_OBJ_GET:
err = bpf_obj_get();
break;
+
+#ifdef CONFIG_CGROUP_BPF
+   case BPF_PROG_ATTACH:
+   err = bpf_prog_attach();
+   break;
+   case BPF_PROG_DETACH:
+   err = bpf_prog_detach();
+   break;
+#endif
+
default:
err = -EINVAL;
break;
-- 
2.5.5

Re: [PATCH v4 net-next 1/1] net_sched: Introduce skbmod action

2016-09-06 Thread Jamal Hadi Salim


On 16-09-06 09:37 AM, Jamal Hadi Salim wrote:

From: Jamal Hadi Salim 

This action is intended to be an upgrade from a usability perspective
from pedit (as well as operational debugability).
Compare this:



Dave,
I will have to send some new version of this action - so please
dont apply.
As it stands right now, the per-cpu stats dont work; so code uses
global stats (which works).
Hoping to get some clarity from other people then some more testing.

cheers,
jamal

Re: [PATCH v4 net-next 1/1] net_sched: Introduce skbmod action

2016-09-06 Thread Eric Dumazet

On Tue, 2016-09-06 at 09:37 -0400, Jamal Hadi Salim wrote:
> From: Jamal Hadi Salim 
> +
> +struct tcf_skbmod_params {
> + struct rcu_head rcu;
> + u64 flags; /*up to 64 types of operations; extend if needed */
> + u8  eth_dst[ETH_ALEN];
> + u16 eth_type;
> + u8  eth_src[ETH_ALEN];
> +};
> +
> +struct tcf_skbmod {
> + struct tc_actioncommon;
> + struct tcf_skbmod_params  *skbmod_p;

struct tcf_skbmod_params __rcu *skbmod_p;

> +};

Then add CONFIG_SPARSE_RCU_POINTER=y to your .config
And build/check

make C=2 M=net/sched

Thanks.

Re: [PATCH] ptp: ixp46x: remove NO_IRQ handling

2016-09-06 Thread Richard Cochran

On Tue, Sep 06, 2016 at 04:28:30PM +0200, Arnd Bergmann wrote:
> gpio_to_irq does not return NO_IRQ but instead returns a negative
> error code on failure. Returning NO_IRQ from the function has no
> negative effects as we only compare the result to the expected
> interrupt number, but it's better to return a proper failure
> code for consistency, and we should remove NO_IRQ from the kernel
> entirely.
> 
> Signed-off-by: Arnd Bergmann 

Acked-by: Richard Cochran

Re: [RFC PATCH 2/2] macb: Enable 1588 support in SAMA5D2 platform.

2016-09-06 Thread Richard Cochran

On Fri, Sep 02, 2016 at 02:53:37PM +0200, Andrei Pistirica wrote:
> Hardware time stamp on the PTP Ethernet packets are received using the
> SO_TIMESTAMPING API. Timers are obtained from the PTP event/peer
> gem registers.
> 
> Signed-off-by: Andrei Pistirica 
> ---
> Integration with SAMA5D2 only. This feature wasn't tested on any
> other platform that might use cadence/gem.

What does that mean?  I didn't see any references to SAMA5D2 anywhere
in your patch.

The driver needs to positively identify the HW that supports this
feature.

> diff --git a/drivers/net/ethernet/cadence/macb.c 
> b/drivers/net/ethernet/cadence/macb.c
> index 8d54e7b..18f0715 100644
> --- a/drivers/net/ethernet/cadence/macb.c
> +++ b/drivers/net/ethernet/cadence/macb.c
> @@ -697,6 +697,11 @@ static void macb_tx_interrupt(struct macb_queue *queue)
>  
>   /* First, update TX stats if needed */
>   if (skb) {
> +/* guard the hot-path */
> +#ifdef CONFIG_MACB_USE_HWSTAMP
> + if (bp->hwts_tx_en)
> + macb_ptp_do_txstamp(bp, skb);
> +#endif

Pull the test into the helper function, and then you can drop the
ifdef and the funny comment.

>   netdev_vdbg(bp->dev, "skb %u (data %p) TX 
> complete\n",
>   macb_tx_ring_wrap(tail), skb->data);
>   bp->stats.tx_packets++;
> @@ -853,6 +858,11 @@ static int gem_rx(struct macb *bp, int budget)
>   GEM_BFEXT(RX_CSUM, ctrl) & GEM_RX_CSUM_CHECKED_MASK)
>   skb->ip_summed = CHECKSUM_UNNECESSARY;
>  
> +/* guard the hot-path */
> +#ifdef CONFIG_MACB_USE_HWSTAMP
> + if (bp->hwts_rx_en)
> + macb_ptp_do_rxstamp(bp, skb);
> +#endif

Same here.

>   bp->stats.rx_packets++;
>   bp->stats.rx_bytes += skb->len;
>  
> @@ -1723,6 +1733,11 @@ static void macb_init_hw(struct macb *bp)
>  
>   /* Enable TX and RX */
>   macb_writel(bp, NCR, MACB_BIT(RE) | MACB_BIT(TE) | MACB_BIT(MPE));
> +
> +#ifdef CONFIG_MACB_USE_HWSTAMP
> + bp->hwts_tx_en = 0;
> + bp->hwts_rx_en = 0;
> +#endif

We don't initialize to zero unless we have to.

>  }
>  
>  /*
> @@ -1885,6 +1900,8 @@ static int macb_open(struct net_device *dev)
>  
>   netif_tx_start_all_queues(dev);
>  
> + macb_ptp_init(dev);
> +
>   return 0;
>  }
>  
> @@ -2143,7 +2160,7 @@ static const struct ethtool_ops gem_ethtool_ops = {
>   .get_regs_len   = macb_get_regs_len,
>   .get_regs   = macb_get_regs,
>   .get_link   = ethtool_op_get_link,
> - .get_ts_info= ethtool_op_get_ts_info,
> + .get_ts_info= macb_get_ts_info,
>   .get_ethtool_stats  = gem_get_ethtool_stats,
>   .get_strings= gem_get_ethtool_strings,
>   .get_sset_count = gem_get_sset_count,
> @@ -2157,6 +2174,12 @@ static int macb_ioctl(struct net_device *dev, struct 
> ifreq *rq, int cmd)
>   if (!netif_running(dev))
>   return -EINVAL;
>  
> + if (cmd == SIOCSHWTSTAMP)
> + return macb_hwtst_set(dev, rq, cmd);
> +
> + if (cmd == SIOCGHWTSTAMP)
> + return macb_hwtst_get(dev, rq);

switch/case?

> +
>   if (!phydev)
>   return -ENODEV;
>  
> diff --git a/drivers/net/ethernet/cadence/macb.h 
> b/drivers/net/ethernet/cadence/macb.h
> index 8c3779d..555316a 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -920,8 +920,21 @@ struct macb {
>  
>  #ifdef CONFIG_MACB_USE_HWSTAMP
>  void macb_ptp_init(struct net_device *ndev);
> +void macb_ptp_do_rxstamp(struct macb *bp, struct sk_buff *skb);
> +void macb_ptp_do_txstamp(struct macb *bp, struct sk_buff *skb);
> +int macb_ptp_get_ts_info(struct net_device *dev, struct ethtool_ts_info 
> *info);
> +#define macb_get_ts_info macb_ptp_get_ts_info
> +int macb_hwtst_set(struct net_device *netdev, struct ifreq *ifr, int cmd);
> +int macb_hwtst_get(struct net_device *netdev, struct ifreq *ifr);
>  #else
>  void macb_ptp_init(struct net_device *ndev) { }
> +void macb_ptp_do_rxstamp(struct macb *bp, struct sk_buff *skb) { }
> +void macb_ptp_do_txstamp(struct macb *bp, struct sk_buff *skb) { }
> +#define macb_get_ts_info ethtool_op_get_ts_info
> +int macb_hwtst_set(struct net_device *netdev, struct ifreq *ifr, int cmd)
> + { return -1; }
> +int macb_hwtst_get(struct net_device *netdev, struct ifreq *ifr)
> + { return -1; }

Use a proper return code please.

>  #endif
>  
>  static inline bool macb_is_gem(struct macb *bp)
> diff --git a/drivers/net/ethernet/cadence/macb_ptp.c 
> b/drivers/net/ethernet/cadence/macb_ptp.c
> index 6d6a6ec..e3f784a 100644
> --- a/drivers/net/ethernet/cadence/macb_ptp.c
> +++ b/drivers/net/ethernet/cadence/macb_ptp.c
> @@ -5,6 +5,7 @@
>   * Copyright (C) 2016 Microchip

Re: [PATCH net-next V2 6/6] net/mlx5: Add handling for port module event

2016-09-06 Thread Joe Perches

On Tue, 2016-09-06 at 19:04 +0300, Saeed Mahameed wrote:
> From: Huy Nguyen 

[]

> +void mlx5_port_module_event(struct mlx5_core_dev *dev, struct mlx5_eqe *eqe)
> +{
> > +   struct mlx5_eqe_port_module *module_event_eqe;
> > +   u8 module_status;
> > +   u8 module_num;
> > +   u8 error_type;
> +
> > +   module_event_eqe = >data.port_module;
> > +   module_num = module_event_eqe->module;
> > +   module_status = module_event_eqe->module_status &
> > +   PORT_MODULE_EVENT_MODULE_STATUS_MASK;
> > +   error_type = module_event_eqe->error_type &
> > +    PORT_MODULE_EVENT_ERROR_TYPE_MASK;
> +
> > +   switch (module_status) {
> > +   case MLX5_MODULE_STATUS_PLUGGED:
> + mlx5_core_info(dev, "Module %u, status: plugged", module_num);


Missing format '\n' line terminations

> + break;
> +
> > +   case MLX5_MODULE_STATUS_UNPLUGGED:
> > +   mlx5_core_info(dev, "Module %u, status: unplugged", module_num);
> > +   break;
> +
> > +   case MLX5_MODULE_STATUS_ERROR:
> > +   mlx5_core_info(dev, "Module %u, status: error, %s", module_num,
> > +      
> > mlx5_port_event_error_type_to_string(error_type));
> > +   break;
> +
> > +   default:
> > +   mlx5_core_info(dev, "Module %u, unknown module status %x",
> > +      module_num, module_status);
> + }

Should any of these be ratelimited?

Re: [PATCH v3 5/5] net: asix: autoneg will set WRITE_MEDIUM reg

2016-09-06 Thread Grant Grundler

On Thu, Sep 1, 2016 at 10:02 AM, Eric Dumazet  wrote:
> On Thu, 2016-09-01 at 12:47 -0400, Robert Foss wrote:
>
>> I'm not quite sure how the first From line was added, it
>> should not have been.
>> Grant Grundler is most definitely the author.
>>
>> Would you like me to resubmit in v++ and make sure that it has been
>> corrected?
>
> This is too late, patches are already merged in David Miller net-next
> tree.
>
> These kinds of errors can not be fixed, we have to be very careful at
> submission/review time.
>
> I guess Grant does not care, but some contributors, especially new ones
> would like to get proper attribution.

I do not mind. Robert will get email about bugs instead of me. :D

cheers,
grant

>
> Thanks.
>
>

[PATCH net-next V2 3/6] net/mlx5e: Read ETS settings directly from firmware

2016-09-06 Thread Saeed Mahameed

From: Huy Nguyen 

Issue description:
Current implementation saves the ETS settings from user in
a temporal soft copy and returns this settings when user
queries the ETS settings.

With the new DCBX firmware, the ETS settings can be changed
by firmware when the DCBX is in firmware controlled mode. Therefore,
user will obtain wrong values from the temporal soft copy.

Solution:
1. Read the ETS settings directly from firmware.
2. For tc_tsa:
   a. Initialize tc_tsa to vendor IEEE_8021QAZ_TSA_VENDOR at netdev
  creation.
   b. When reading ETS setting from FW, if the traffic class bandwidth
  is less than 100, set tc_tsa to IEEE_8021QAZ_TSA_ETS. This
  implementation solves the scenarios when the DCBX is in FW control
  and willing bit is on which means the ETS setting is dictated
  by remote switch.

Signed-off-by: Huy Nguyen 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  6 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 35 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 26 
 3 files changed, 46 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 6919e3c..0d41287 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -203,9 +203,6 @@ struct mlx5e_params {
u8  toeplitz_hash_key[40];
u32 indirection_rqt[MLX5E_INDIR_RQT_SIZE];
bool vlan_strip_disable;
-#ifdef CONFIG_MLX5_CORE_EN_DCB
-   struct ieee_ets ets;
-#endif
bool rx_am_enabled;
 };
 
@@ -226,6 +223,9 @@ enum {
 
 struct mlx5e_dcbx {
struct mlx5e_cee_configcee_cfg; /* pending configuration */
+
+   /* The only setting that cannot be read from FW */
+   u8 tc_tsa[IEEE_8021QAZ_MAX_TCS];
 };
 #endif
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index 0595243..8f6b5a7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -45,12 +45,31 @@ static int mlx5e_dcbnl_ieee_getets(struct net_device 
*netdev,
   struct ieee_ets *ets)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
+   struct mlx5_core_dev *mdev = priv->mdev;
+   int i;
+   int err = 0;
 
if (!MLX5_CAP_GEN(priv->mdev, ets))
return -ENOTSUPP;
 
-   memcpy(ets, >params.ets, sizeof(*ets));
-   return 0;
+   ets->ets_cap = mlx5_max_tc(priv->mdev) + 1;
+   for (i = 0; i < ets->ets_cap; i++) {
+   err = mlx5_query_port_prio_tc(mdev, i, >prio_tc[i]);
+   if (err)
+   return err;
+   }
+
+   for (i = 0; i < ets->ets_cap; i++) {
+   err = mlx5_query_port_tc_bw_alloc(mdev, i, >tc_tx_bw[i]);
+   if (err)
+   return err;
+   if (ets->tc_tx_bw[i] < MLX5E_MAX_BW_ALLOC)
+   priv->dcbx.tc_tsa[i] = IEEE_8021QAZ_TSA_ETS;
+   }
+
+   memcpy(ets->tc_tsa, priv->dcbx.tc_tsa, sizeof(ets->tc_tsa));
+
+   return err;
 }
 
 enum {
@@ -127,7 +146,14 @@ int mlx5e_dcbnl_ieee_setets_core(struct mlx5e_priv *priv, 
struct ieee_ets *ets)
if (err)
return err;
 
-   return mlx5_set_port_tc_bw_alloc(mdev, tc_tx_bw);
+   err = mlx5_set_port_tc_bw_alloc(mdev, tc_tx_bw);
+
+   if (err)
+   return err;
+
+   memcpy(priv->dcbx.tc_tsa, ets->tc_tsa, sizeof(ets->tc_tsa));
+
+   return err;
 }
 
 static int mlx5e_dbcnl_validate_ets(struct net_device *netdev,
@@ -181,9 +207,6 @@ static int mlx5e_dcbnl_ieee_setets(struct net_device 
*netdev,
if (err)
return err;
 
-   memcpy(>params.ets, ets, sizeof(*ets));
-   priv->params.ets.ets_cap = mlx5_max_tc(priv->mdev) + 1;
-
return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 3b05810..da70ef3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2875,17 +2875,23 @@ u16 mlx5e_get_max_inline_cap(struct mlx5_core_dev *mdev)
 static void mlx5e_ets_init(struct mlx5e_priv *priv)
 {
int i;
+   struct ieee_ets ets;
 
-   priv->params.ets.ets_cap = mlx5_max_tc(priv->mdev) + 1;
-   for (i = 0; i < priv->params.ets.ets_cap; i++) {
-   priv->params.ets.tc_tx_bw[i] = MLX5E_MAX_BW_ALLOC;
-   priv->params.ets.tc_tsa[i] = IEEE_8021QAZ_TSA_VENDOR;
-   priv->params.ets.prio_tc[i] = i;
+   memset(, 0, sizeof(ets));
+   ets.ets_cap = mlx5_max_tc(priv->mdev) + 1;
+   for (i = 0; i < ets.ets_cap; i++) {
+   ets.tc_tx_bw[i] =

Re: [patch net-next v7 1/3] netdevice: Add offload statistics ndo

2016-09-06 Thread Jiri Pirko

Tue, Sep 06, 2016 at 05:08:25PM CEST, ro...@cumulusnetworks.com wrote:
>On 9/5/16, 10:18 AM, Jiri Pirko wrote:
>> From: Nogah Frankel 
>>
>> Add a new ndo to return statistics for offloaded operation.
>> Since there can be many different offloaded operation with many
>> stats types, the ndo gets an attribute id by which it knows which
>> stats are wanted. The ndo also gets a void pointer to be cast according
>> to the attribute id.
>>
>> Signed-off-by: Nogah Frankel 
>> Signed-off-by: Jiri Pirko 
>> ---
>>  include/linux/netdevice.h | 12 
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 67bb978..2d2c09b 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -924,6 +924,14 @@ struct netdev_xdp {
>>   *  3. Update dev->stats asynchronously and atomically, and define
>>   * neither operation.
>>   *
>> + * bool (*ndo_has_offload_stats)(int attr_id)
>> + *  Return true if this device supports offload stats of this attr_id.
>> + *
>> + * int (*ndo_get_offload_stats)(int attr_id, const struct net_device *dev,
>> + *  void *attr_data)
>> + *  Get statistics for offload operations by attr_id. Write it into the
>> + *  attr_data pointer.
>> + *
>
>this could have been a single ndo_get_offload_stats like the others.
>and possibly new ndo_get_offload_stats_size.

Size is determined by the attribute.


>Ideally the driver could do the nest.
>But, this can be changed if needed in the future.
>
>>   * int (*ndo_vlan_rx_add_vid)(struct net_device *dev, __be16 proto, u16 
>> vid);
>>   *  If device supports VLAN filtering this function is called when a
>>   *  VLAN id is registered.
>> @@ -1155,6 +1163,10 @@ struct net_device_ops {
>>  
>>  struct rtnl_link_stats64* (*ndo_get_stats64)(struct net_device *dev,
>>   struct rtnl_link_stats64 
>> *storage);
>> +bool(*ndo_has_offload_stats)(int attr_id);
>> +int (*ndo_get_offload_stats)(int attr_id,
>> + const struct 
>> net_device *dev,
>> + void *attr_data);
>>  struct net_device_stats* (*ndo_get_stats)(struct net_device *dev);
>>  
>>  int (*ndo_vlan_rx_add_vid)(struct net_device *dev,
>

[PATCH net-next V2 1/6] net/mlx5e: Add qos capability check

2016-09-06 Thread Saeed Mahameed

From: Huy Nguyen 

Make sure firmware supports qos before exposing the dcb api.

Signed-off-by: Huy Nguyen 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 03586ee..3b05810 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3111,7 +3111,8 @@ static void mlx5e_build_nic_netdev(struct net_device 
*netdev)
if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
netdev->netdev_ops = _netdev_ops_sriov;
 #ifdef CONFIG_MLX5_CORE_EN_DCB
-   netdev->dcbnl_ops = _dcbnl_ops;
+   if (MLX5_CAP_GEN(mdev, qos))
+   netdev->dcbnl_ops = _dcbnl_ops;
 #endif
} else {
netdev->netdev_ops = _netdev_ops_basic;
-- 
2.7.4

[PATCH net-next V2 2/6] net/mlx5e: Support DCBX CEE API

2016-09-06 Thread Saeed Mahameed

From: Huy Nguyen 

Add DCBX CEE API interface for ConnectX-4. Configurations are stored in
a temporary structure and are applied to the card's firmware when
the CEE's setall callback function is called.

Note:
  priority group in CEE is equivalent to traffic class in ConnectX-4
  hardware spec.

  bw allocation per priority in CEE is not supported because ConnectX-4
  only supports bw allocation per traffic class.

  user priority in CEE does not have an equivalent term in ConnectX-4.
  Therefore, user priority to priority mapping in CEE is not supported.

Signed-off-by: Huy Nguyen 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  24 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 301 -
 drivers/net/ethernet/mellanox/mlx5/core/port.c |  43 +++
 include/linux/mlx5/port.h  |   4 +
 4 files changed, 370 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 9699560..6919e3c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -209,6 +209,26 @@ struct mlx5e_params {
bool rx_am_enabled;
 };
 
+#ifdef CONFIG_MLX5_CORE_EN_DCB
+struct mlx5e_cee_config {
+   /* bw pct for priority group */
+   u8 pg_bw_pct[CEE_DCBX_MAX_PGS];
+   u8 prio_to_pg_map[CEE_DCBX_MAX_PRIO];
+   bool   pfc_setting[CEE_DCBX_MAX_PRIO];
+   bool   pfc_enable;
+};
+
+enum {
+   MLX5_DCB_CHG_RESET,
+   MLX5_DCB_NO_CHG,
+   MLX5_DCB_CHG_NO_RESET,
+};
+
+struct mlx5e_dcbx {
+   struct mlx5e_cee_configcee_cfg; /* pending configuration */
+};
+#endif
+
 struct mlx5e_tstamp {
rwlock_t   lock;
struct cyclecountercycles;
@@ -650,6 +670,10 @@ struct mlx5e_priv {
struct mlx5e_stats stats;
struct mlx5e_tstamptstamp;
u16 q_counter;
+#ifdef CONFIG_MLX5_CORE_EN_DCB
+   struct mlx5e_dcbx  dcbx;
+#endif
+
const struct mlx5e_profile *profile;
void  *ppriv;
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index 762af16..0595243 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -38,6 +38,9 @@
 #define MLX5E_100MB (10)
 #define MLX5E_1GB   (100)
 
+#define MLX5E_CEE_STATE_UP1
+#define MLX5E_CEE_STATE_DOWN  0
+
 static int mlx5e_dcbnl_ieee_getets(struct net_device *netdev,
   struct ieee_ets *ets)
 {
@@ -222,13 +225,15 @@ static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev,
 
 static u8 mlx5e_dcbnl_getdcbx(struct net_device *dev)
 {
-   return DCB_CAP_DCBX_HOST | DCB_CAP_DCBX_VER_IEEE;
+   return DCB_CAP_DCBX_HOST |
+  DCB_CAP_DCBX_VER_IEEE |
+  DCB_CAP_DCBX_VER_CEE;
 }
 
 static u8 mlx5e_dcbnl_setdcbx(struct net_device *dev, u8 mode)
 {
if ((mode & DCB_CAP_DCBX_LLD_MANAGED) ||
-   (mode & DCB_CAP_DCBX_VER_CEE) ||
+   !(mode & DCB_CAP_DCBX_VER_CEE) ||
!(mode & DCB_CAP_DCBX_VER_IEEE) ||
!(mode & DCB_CAP_DCBX_HOST))
return 1;
@@ -304,6 +309,281 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device 
*netdev,
return mlx5_modify_port_ets_rate_limit(mdev, max_bw_value, max_bw_unit);
 }
 
+static u8 mlx5e_dcbnl_setall(struct net_device *netdev)
+{
+   struct mlx5e_priv *priv = netdev_priv(netdev);
+   struct mlx5e_cee_config *cee_cfg = >dcbx.cee_cfg;
+   struct mlx5_core_dev *mdev = priv->mdev;
+   struct ieee_ets ets;
+   struct ieee_pfc pfc;
+   int err;
+   int i;
+
+   memset(, 0, sizeof(ets));
+   memset(, 0, sizeof(pfc));
+
+   ets.ets_cap = IEEE_8021QAZ_MAX_TCS;
+   for (i = 0; i < CEE_DCBX_MAX_PGS; i++) {
+   ets.tc_tx_bw[i] = cee_cfg->pg_bw_pct[i];
+   ets.tc_rx_bw[i] = cee_cfg->pg_bw_pct[i];
+   ets.tc_tsa[i]   = IEEE_8021QAZ_TSA_ETS;
+   ets.prio_tc[i]  = cee_cfg->prio_to_pg_map[i];
+   }
+
+   err = mlx5e_dbcnl_validate_ets(netdev, );
+   if (err) {
+   netdev_err(netdev,
+  "%s, Failed to validate ETS: %d\n", __func__, err);
+   goto out;
+   }
+
+   err = mlx5e_dcbnl_ieee_setets_core(priv, );
+   if (err) {
+   netdev_err(netdev,
+  "%s, Failed to set ETS: %d\n", __func__, err);
+   goto out;
+   }
+
+   /* Set PFC */
+   pfc.pfc_cap = mlx5_max_tc(mdev) + 1;
+   if (!cee_cfg->pfc_enable)
+   pfc.pfc_en = 0;
+   else
+   for (i = 0; i <

[PATCH net-next V2 0/6] Mellanox 100G mlx5 DCBX CEE and firmware support

2016-09-06 Thread Saeed Mahameed

Hi Dave,

This series from Huy provides mlx5 DCBX updates to support DCBX CEE
API and DCBX firmware/host modes support.

1st patch ensures the dcbnl_rtnl_ops is published only when the qos capability 
bits is on.

2nd patch adds the support for CEE interfaces into mlx5 dcbnl_rtnl_ops.

3rd patch refactors ETS query to read ETS configuration directly from firmware 
rather 
than having a software shadow to it. The existing IEEE interfaces stays the 
same.

4th patch adds the support for MLX5_REG_DCBX_PARAM and MLX5_REG_DCBX_APP 
firmware
commands to manipulate mlx5 DCBX mode.

5th patch adds the driver support for the new DCBX firmware.
This ensures the backward compatibility versus the old and new firmware.
With the new DCBX firmware, qos settings can be controlled by either firmware
or software depending on the DCBX mode.

6th patch adds support for module events log.

Changes since V1:
1. Add qos capability check
2. In port module events eqe structure, change rsvd_n to reserved_at_n to be 
consistent with mlx5_ifc.h
3. Improve commit messages
4. Drop DCBX private flags patch
5. Add patch to check for qos capability bit check before exposing dcb 
interfaces
6. Replace memset with static array initialization

Thanks,
Saeed.

Huy Nguyen (6):
  net/mlx5e: Add qos capability check
  net/mlx5e: Support DCBX CEE API
  net/mlx5e: Read ETS settings directly from firmware
  net/mlx5: Add DCBX firmware commands support
  net/mlx5e: ConnectX-4 firmware support for DCBX
  net/mlx5: Add handling for port module event

 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  36 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 483 -
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  27 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   |  12 +
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   1 +
 drivers/net/ethernet/mellanox/mlx5/core/port.c | 148 +++
 include/linux/mlx5/device.h|  11 +
 include/linux/mlx5/driver.h|   7 +
 include/linux/mlx5/mlx5_ifc.h  |   3 +-
 include/linux/mlx5/port.h  |   6 +
 10 files changed, 698 insertions(+), 36 deletions(-)

-- 
2.7.4

[PATCH net-next V2 4/6] net/mlx5: Add DCBX firmware commands support

2016-09-06 Thread Saeed Mahameed

From: Huy Nguyen 

Add set/query commands for DCBX_PARAM register

Signed-off-by: Huy Nguyen 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/port.c | 20 
 include/linux/mlx5/driver.h|  7 +++
 include/linux/mlx5/port.h  |  2 ++
 3 files changed, 29 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c 
b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index 2f75f86..8d409b2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -548,6 +548,26 @@ int mlx5_max_tc(struct mlx5_core_dev *mdev)
return num_tc - 1;
 }
 
+int mlx5_query_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *out)
+{
+   u32 in[MLX5_ST_SZ_DW(dcbx_param)] = {0};
+
+   MLX5_SET(dcbx_param, in, port_number, 1);
+
+   return  mlx5_core_access_reg(mdev, in, sizeof(in), out,
+   sizeof(in), MLX5_REG_DCBX_PARAM, 0, 0);
+}
+
+int mlx5_set_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *in)
+{
+   u32 out[MLX5_ST_SZ_DW(dcbx_param)];
+
+   MLX5_SET(dcbx_param, in, port_number, 1);
+
+   return mlx5_core_access_reg(mdev, in, sizeof(out), out,
+   sizeof(out), MLX5_REG_DCBX_PARAM, 0, 1);
+}
+
 int mlx5_set_port_prio_tc(struct mlx5_core_dev *mdev, u8 *prio_tc)
 {
u32 in[MLX5_ST_SZ_DW(qtct_reg)] = {0};
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 5cb9fa7..b53f19c 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -104,6 +104,8 @@ enum {
 enum {
MLX5_REG_QETCR   = 0x4005,
MLX5_REG_QTCT= 0x400a,
+   MLX5_REG_DCBX_PARAM  = 0x4020,
+   MLX5_REG_DCBX_APP= 0x4021,
MLX5_REG_PCAP= 0x5001,
MLX5_REG_PMTU= 0x5003,
MLX5_REG_PTYS= 0x5004,
@@ -123,6 +125,11 @@ enum {
MLX5_REG_MLCR= 0x902b,
 };
 
+enum mlx5_dcbx_oper_mode {
+   MLX5E_DCBX_PARAM_VER_OPER_HOST  = 0x0,
+   MLX5E_DCBX_PARAM_VER_OPER_AUTO  = 0x3,
+};
+
 enum {
MLX5_ATOMIC_OPS_CMP_SWAP= 1 << 0,
MLX5_ATOMIC_OPS_FETCH_ADD   = 1 << 1,
diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h
index ddad24d..62e2259 100644
--- a/include/linux/mlx5/port.h
+++ b/include/linux/mlx5/port.h
@@ -159,4 +159,6 @@ void mlx5_query_port_fcs(struct mlx5_core_dev *mdev, bool 
*supported,
 int mlx5_query_module_eeprom(struct mlx5_core_dev *dev,
 u16 offset, u16 size, u8 *data);
 
+int mlx5_query_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *out);
+int mlx5_set_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *in);
 #endif /* __MLX5_PORT_H__ */
-- 
2.7.4

[PATCH net-next V2 5/6] net/mlx5e: ConnectX-4 firmware support for DCBX

2016-09-06 Thread Saeed Mahameed

From: Huy Nguyen 

DBCX by default is controlled by firmware. In this
mode, firmware is responsible for reading/sending the TLVs packets
from/to the remote partner. When the driver is loaded, the driver
can leave the DCBX in firmware controlled mode or
switch the DCBX back to host controlled mode.

This patch sets up the infrastructure to support changing
DCBX control mode.

Signed-off-by: Huy Nguyen 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   6 +
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 147 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  26 +---
 3 files changed, 154 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 0d41287..806f5e8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -222,6 +222,7 @@ enum {
 };
 
 struct mlx5e_dcbx {
+   enum mlx5_dcbx_oper_mode   mode;
struct mlx5e_cee_configcee_cfg; /* pending configuration */
 
/* The only setting that cannot be read from FW */
@@ -810,6 +811,11 @@ extern const struct ethtool_ops mlx5e_ethtool_ops;
 #ifdef CONFIG_MLX5_CORE_EN_DCB
 extern const struct dcbnl_rtnl_ops mlx5e_dcbnl_ops;
 int mlx5e_dcbnl_ieee_setets_core(struct mlx5e_priv *priv, struct ieee_ets 
*ets);
+int mlx5e_dcbnl_set_dcbx_mode(struct mlx5e_priv *priv,
+ enum mlx5_dcbx_oper_mode mode);
+void mlx5e_dcbnl_query_dcbx_mode(struct mlx5e_priv *priv,
+enum mlx5_dcbx_oper_mode *mode);
+void mlx5e_dcbnl_initialize(struct mlx5e_priv *priv);
 #endif
 
 #ifndef CONFIG_RFS_ACCEL
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index 8f6b5a7..c33cdba 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -41,6 +41,26 @@
 #define MLX5E_CEE_STATE_UP1
 #define MLX5E_CEE_STATE_DOWN  0
 
+/* If dcbx mode is non-host and qos_with_dcbx_by_fw is off, set the
+ * dcbx mode to host.
+ */
+static inline bool mlx5e_dcbnl_is_allowed(struct mlx5e_priv *priv)
+{
+   struct mlx5e_dcbx *dcbx = >dcbx;
+
+   if (!MLX5_CAP_GEN(priv->mdev, dcbx))
+   return true;
+
+   if (dcbx->mode == MLX5E_DCBX_PARAM_VER_OPER_HOST)
+   return true;
+
+   if (mlx5e_dcbnl_set_dcbx_mode(priv, MLX5E_DCBX_PARAM_VER_OPER_HOST))
+   return false;
+
+   dcbx->mode = MLX5E_DCBX_PARAM_VER_OPER_HOST;
+   return true;
+}
+
 static int mlx5e_dcbnl_ieee_getets(struct net_device *netdev,
   struct ieee_ets *ets)
 {
@@ -52,6 +72,9 @@ static int mlx5e_dcbnl_ieee_getets(struct net_device *netdev,
if (!MLX5_CAP_GEN(priv->mdev, ets))
return -ENOTSUPP;
 
+   if (!mlx5e_dcbnl_is_allowed(priv))
+   return -EPERM;
+
ets->ets_cap = mlx5_max_tc(priv->mdev) + 1;
for (i = 0; i < ets->ets_cap; i++) {
err = mlx5_query_port_prio_tc(mdev, i, >prio_tc[i]);
@@ -199,6 +222,12 @@ static int mlx5e_dcbnl_ieee_setets(struct net_device 
*netdev,
struct mlx5e_priv *priv = netdev_priv(netdev);
int err;
 
+   if (!MLX5_CAP_GEN(priv->mdev, ets))
+   return -ENOTSUPP;
+
+   if (!mlx5e_dcbnl_is_allowed(priv))
+   return -EPERM;
+
err = mlx5e_dbcnl_validate_ets(netdev, ets);
if (err)
return err;
@@ -218,6 +247,9 @@ static int mlx5e_dcbnl_ieee_getpfc(struct net_device *dev,
struct mlx5e_pport_stats *pstats = >stats.pport;
int i;
 
+   if (!mlx5e_dcbnl_is_allowed(priv))
+   return -EPERM;
+
pfc->pfc_cap = mlx5_max_tc(mdev) + 1;
for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
pfc->requests[i]= PPORT_PER_PRIO_GET(pstats, i, tx_pause);
@@ -235,6 +267,9 @@ static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev,
u8 curr_pfc_en;
int ret;
 
+   if (!mlx5e_dcbnl_is_allowed(priv))
+   return -EPERM;
+
mlx5_query_port_pfc(mdev, _pfc_en, NULL);
 
if (pfc->pfc_en == curr_pfc_en)
@@ -255,6 +290,9 @@ static u8 mlx5e_dcbnl_getdcbx(struct net_device *dev)
 
 static u8 mlx5e_dcbnl_setdcbx(struct net_device *dev, u8 mode)
 {
+   if (!mlx5e_dcbnl_is_allowed(netdev_priv(dev)))
+   return 1;
+
if ((mode & DCB_CAP_DCBX_LLD_MANAGED) ||
!(mode & DCB_CAP_DCBX_VER_CEE) ||
!(mode & DCB_CAP_DCBX_VER_IEEE) ||
@@ -274,6 +312,9 @@ static int mlx5e_dcbnl_ieee_getmaxrate(struct net_device 
*netdev,
int err;
int i;
 
+   if (!mlx5e_dcbnl_is_allowed(priv))
+   return -EPERM;
+
err = mlx5_query_port_ets_rate_limit(mdev, max_bw_value, max_bw_unit);
if

[PATCH net-next V2 6/6] net/mlx5: Add handling for port module event

2016-09-06 Thread Saeed Mahameed

From: Huy Nguyen 

Add dmesg log for asynchronous port module event.

Signed-off-by: Huy Nguyen 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   | 12 +++
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|  1 +
 drivers/net/ethernet/mellanox/mlx5/core/port.c | 85 ++
 include/linux/mlx5/device.h| 11 +++
 include/linux/mlx5/mlx5_ifc.h  |  3 +-
 5 files changed, 111 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index aaca090..d775fea 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -139,6 +139,8 @@ static const char *eqe_type_str(u8 type)
return "MLX5_EVENT_TYPE_PORT_CHANGE";
case MLX5_EVENT_TYPE_GPIO_EVENT:
return "MLX5_EVENT_TYPE_GPIO_EVENT";
+   case MLX5_EVENT_TYPE_PORT_MODULE_EVENT:
+   return "MLX5_EVENT_TYPE_PORT_MODULE_EVENT";
case MLX5_EVENT_TYPE_REMOTE_CONFIG:
return "MLX5_EVENT_TYPE_REMOTE_CONFIG";
case MLX5_EVENT_TYPE_DB_BF_CONGESTION:
@@ -285,6 +287,11 @@ static int mlx5_eq_int(struct mlx5_core_dev *dev, struct 
mlx5_eq *eq)
mlx5_eswitch_vport_event(dev->priv.eswitch, eqe);
break;
 #endif
+
+   case MLX5_EVENT_TYPE_PORT_MODULE_EVENT:
+   mlx5_port_module_event(dev, eqe);
+   break;
+
default:
mlx5_core_warn(dev, "Unhandled event 0x%x on EQ 0x%x\n",
   eqe->type, eq->eqn);
@@ -480,6 +487,11 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
mlx5_core_is_pf(dev))
async_event_mask |= (1ull << MLX5_EVENT_TYPE_NIC_VPORT_CHANGE);
 
+   if (MLX5_CAP_GEN(dev, port_module_event))
+   async_event_mask |= (1ull << MLX5_EVENT_TYPE_PORT_MODULE_EVENT);
+   else
+   mlx5_core_dbg(dev, "port_module_event is not set\n");
+
err = mlx5_create_map_eq(dev, >cmd_eq, MLX5_EQ_VEC_CMD,
 MLX5_NUM_CMD_EQE, 1ull << MLX5_EVENT_TYPE_CMD,
 "mlx5_cmd_eq", >priv.uuari.uars[0]);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h 
b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 714b71b..d023d05 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -87,6 +87,7 @@ int mlx5_cmd_init_hca(struct mlx5_core_dev *dev);
 int mlx5_cmd_teardown_hca(struct mlx5_core_dev *dev);
 void mlx5_core_event(struct mlx5_core_dev *dev, enum mlx5_dev_event event,
 unsigned long param);
+void mlx5_port_module_event(struct mlx5_core_dev *dev, struct mlx5_eqe *eqe);
 void mlx5_enter_error_state(struct mlx5_core_dev *dev);
 void mlx5_disable_device(struct mlx5_core_dev *dev);
 int mlx5_core_sriov_configure(struct pci_dev *dev, int num_vfs);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c 
b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index 8d409b2..e6d49d5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -36,6 +36,25 @@
 #include 
 #include "mlx5_core.h"
 
+#define PORT_MODULE_EVENT_MODULE_STATUS_MASK 0xF
+#define PORT_MODULE_EVENT_ERROR_TYPE_MASK 0xF
+enum {
+   MLX5_MODULE_STATUS_PLUGGED  = 0x1,
+   MLX5_MODULE_STATUS_UNPLUGGED  = 0x2,
+   MLX5_MODULE_STATUS_ERROR  = 0x3,
+};
+
+enum {
+   MLX5_MODULE_EVENT_ERROR_POWER_BUDGET_EXCEEDED  = 0x0,
+   MLX5_MODULE_EVENT_ERROR_LONG_RANGE_FOR_NON_MLNX_CABLE_MODULE  = 0x1,
+   MLX5_MODULE_EVENT_ERROR_BUS_STUCK  = 0x2,
+   MLX5_MODULE_EVENT_ERROR_NO_EEPROM_RETRY_TIMEOUT  = 0x3,
+   MLX5_MODULE_EVENT_ERROR_ENFORCE_PART_NUMBER_LIST  = 0x4,
+   MLX5_MODULE_EVENT_ERROR_UNKNOWN_IDENTIFIER  = 0x5,
+   MLX5_MODULE_EVENT_ERROR_HIGH_TEMPERATURE  = 0x6,
+   MLX5_MODULE_EVENT_ERROR_BAD_CABLE = 0x7,
+};
+
 int mlx5_core_access_reg(struct mlx5_core_dev *dev, void *data_in,
 int size_in, void *data_out, int size_out,
 u16 reg_id, int arg, int write)
@@ -809,3 +828,69 @@ void mlx5_query_port_fcs(struct mlx5_core_dev *mdev, bool 
*supported,
*supported = !!(MLX5_GET(pcmr_reg, out, fcs_cap));
*enabled = !!(MLX5_GET(pcmr_reg, out, fcs_chk));
 }
+
+static const char *mlx5_port_event_error_type_to_string(u8 error_type)
+{
+   switch (error_type) {
+   case MLX5_MODULE_EVENT_ERROR_POWER_BUDGET_EXCEEDED:
+   return "Power Budget Exceeded";
+
+   case MLX5_MODULE_EVENT_ERROR_LONG_RANGE_FOR_NON_MLNX_CABLE_MODULE:
+   return "Long Range for non MLNX cable/module";
+
+   case

Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)

2016-09-06 Thread Brandon Cazander

Sorry to resurrect this so much later—I just got back from holidays and this 
was still on my desk.

Will anyone have another chance to look at this? It appears that the DIVERT 
rule is not working in our case, and I wonder if it is possible to fix the 
TPROXY target as well as the socket target fix that Florian provided.

It appears as though nobody else has encountered this regression, so I can 
appreciate that it comes up pretty low on the priority list. If it is not 
realistic that this will be looked at further, then we will have to look at 
replacing TPROXY.

Thanks for your time.


From: Brandon Cazander
Sent: Monday, August 15, 2016 9:28 AM
To: Florian Westphal
Cc: netdev@vger.kernel.org; Eric Dumazet
Subject: Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)
    
I can recreate the issue with these rules:

ip rule add fwmark 1 lookup 100
ip route add local 0.0.0.0/0 dev lo table 100
iptables -t mangle -A PREROUTING -p tcp -m tcp --dport 8080 -j TPROXY --on-port 
9876 --on-ip 0.0.0.0 --tproxy-mark 0x1/0x1
iptables -t nat -A PREROUTING -d 192.168.7.20/32 -i eth0 -j DNAT 
--to-destination 192.168.8.1

If I add in the DIVERT chain it works:

iptables -t mangle -N DIVERT
iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT
iptables -t mangle -A DIVERT -j MARK --set-mark 1
iptables -t mangle -A DIVERT -j ACCEPT
iptables -t mangle -A PREROUTING -p tcp -m tcp --dport 8080 -j TPROXY --on-port 
9876 --on-ip 0.0.0.0 --tproxy-mark 0x1/0x1

But that's still a regression in my opinion.

Re: [PATCH net-next 1/3] smsc95xx: Add maintainer

2016-09-06 Thread Steve Glendinning

On 2 September 2016 at 21:34,   wrote:
> From: Woojung Huh 
>
> Add Microchip Linux Driver Support as maintainer
> because this driver is maintaining by Microchip.
>
> Signed-off-by: Woojung Huh 
> ---

Acked-by: Steve Glendinning

Re: [patch net-next RFC 1/2] fib: introduce fib notification infrastructure

2016-09-06 Thread Jiri Pirko

Tue, Sep 06, 2016 at 05:11:11PM CEST, d...@cumulusnetworks.com wrote:
>On 9/6/16 8:44 AM, Jiri Pirko wrote:
>> Tue, Sep 06, 2016 at 04:32:12PM CEST, d...@cumulusnetworks.com wrote:
>>> On 9/6/16 6:01 AM, Jiri Pirko wrote:
 From: Jiri Pirko 

 This allows to pass information about added/deleted fib entries to
 whoever is interested. This is done in a very similar way as devinet
 notifies address additions/removals.

 Signed-off-by: Jiri Pirko 
 ---
  include/net/ip_fib.h | 19 +++
  net/ipv4/fib_trie.c  | 43 +++
  2 files changed, 62 insertions(+)

>>>
>>> The notifier infrastructure should be generalized for use with IPv4 and 
>>> IPv6. While the data will be family based, the infra can be generic.
>>>
>> 
>> Yeah, that I thought about as well. Thing is, ipv6 notifier has to be
>> atomic. That is the reason we have:
>> inetaddr_chain and register_inetaddr_notifier (blocking notifier)
>> inet6addr_chain and register_inet6addr_notifier (atomic notifier)
>> 
>
>Why is IPv6 atomic? Looking at code paths for adding addresses seems like all 
>of the locks are dropped before the notifier is called and adding and deleting 
>ipv6 addresses does not show a hit with this WARN_ON:


Maybe historic reasons. Would be good to unite the notifiers then. I'll
look at it.


>
>
>diff --git a/net/ipv6/addrconf_core.c b/net/ipv6/addrconf_core.c
>index bfa941fc1165..4f9f964d95e5 100644
>--- a/net/ipv6/addrconf_core.c
>+++ b/net/ipv6/addrconf_core.c
>@@ -103,6 +103,7 @@ EXPORT_SYMBOL(unregister_inet6addr_notifier);
>
> int inet6addr_notifier_call_chain(unsigned long val, void *v)
> {
>+WARN_ON(in_atomic());
>return atomic_notifier_call_chain(_chain, val, v);
> }
> EXPORT_SYMBOL(inet6addr_notifier_call_chain);

Re: [RFC PATCH 1/2] macb: Add 1588 support in Cadence GEM.

2016-09-06 Thread Richard Cochran


I have some issues with this patch...

On Fri, Sep 02, 2016 at 02:53:36PM +0200, Andrei Pistirica wrote:

> - Frequency adjustment is not directly supported by this IP.
>   addend is the initial value ns increment and similarly addendesub.
>   The ppb (parts per billion) provided is used as
>   ns_incr = addend +/- (ppb/rate).
>   Similarly the remainder of the above is used to populate subns increment.
>   In case the ppb requested is negative AND subns adjustment greater than
>   the addendsub, ns_incr is reduced by 1 and subns_incr is adjusted in
>   positive accordingly.

This makes no sense.  If you cannot adjust the frequency, then you
must implement a timecounter/cyclecounter and do in software.

> diff --git a/drivers/net/ethernet/cadence/macb.h 
> b/drivers/net/ethernet/cadence/macb.h
> index 3f385ab..8c3779d 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -10,6 +10,12 @@
>  #ifndef _MACB_H
>  #define _MACB_H
>  
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

Here in the header file, you only need ptp_clock_kernel.h.  You don't
need all the others here.  Move them to the .c file, but only if
really needed.  (timecounter.h isn't used, now is it?)

> @@ -129,6 +135,20 @@
>  #define GEM_RXIPCCNT 0x01a8 /* IP header Checksum Error Counter */
>  #define GEM_RXTCPCCNT0x01ac /* TCP Checksum Error Counter */
>  #define GEM_RXUDPCCNT0x01b0 /* UDP Checksum Error Counter */

> +#define GEM_TISUBN   0x01bc /* 1588 Timer Increment Sub-ns */

This regsiter does not exist.  Looking at
 
   Zynq-7000 AP SoC Technical Reference Manual
   UG585 (v1.10) February 23, 2015

starting on page 1273 we see:

udp_csum_errors 0x01B0 32 ro0x UDP checksum error
timer_strobe_s  0x01C8 32 rw0x 1588 timer sync strobe seconds
timer_strobe_ns 0x01CC 32 mixed 0x 1588 timer sync strobe 
nanoseconds
timer_s 0x01D0 32 rw0x 1588 timer seconds
timer_ns0x01D4 32 mixed 0x 1588 timer nanoseconds
timer_adjust0x01D8 32 mixed 0x 1588 timer adjust
timer_incr  0x01DC 32 mixed 0x 1588 timer increment

There is no register at 0x1BC.

> +#define GEM_TSH  0x01c0 /* 1588 Timer Seconds High */

This one doesn't exist either.  What is going on here?

> +#define GEM_TSL  0x01d0 /* 1588 Timer Seconds Low */
> +#define GEM_TN   0x01d4 /* 1588 Timer Nanoseconds */
> +#define GEM_TA   0x01d8 /* 1588 Timer Adjust */
> +#define GEM_TI   0x01dc /* 1588 Timer Increment */
> +#define GEM_EFTSL0x01e0 /* PTP Event Frame Tx Seconds Low */
> +#define GEM_EFTN 0x01e4 /* PTP Event Frame Tx Nanoseconds */
> +#define GEM_EFRSL0x01e8 /* PTP Event Frame Rx Seconds Low */
> +#define GEM_EFRN 0x01ec /* PTP Event Frame Rx Nanoseconds */
> +#define GEM_PEFTSL   0x01f0 /* PTP Peer Event Frame Tx Secs Low */
> +#define GEM_PEFTN0x01f4 /* PTP Peer Event Frame Tx Ns */
> +#define GEM_PEFRSL   0x01f8 /* PTP Peer Event Frame Rx Sec Low */
> +#define GEM_PEFRN0x01fc /* PTP Peer Event Frame Rx Ns */

BTW, it is really annoying that you invent new register names.  Why
can't you use the names from the TRM?

> +#ifdef CONFIG_MACB_USE_HWSTAMP
> +void macb_ptp_init(struct net_device *ndev);
> +#else
> +void macb_ptp_init(struct net_device *ndev) { }

This should be static inline.

> +#endif

> diff --git a/drivers/net/ethernet/cadence/macb_ptp.c 
> b/drivers/net/ethernet/cadence/macb_ptp.c
> new file mode 100644
> index 000..6d6a6ec
> --- /dev/null
> +++ b/drivers/net/ethernet/cadence/macb_ptp.c
> @@ -0,0 +1,224 @@
> +/*
> + * PTP 1588 clock for SAMA5D2 platform.
> + *
> + * Copyright (C) 2015 Xilinx Inc.
> + * Copyright (C) 2016 Microchip Technology
> + *
> + * Authors: Harini Katakam 
> + *
> + * This file is licensed under the terms of the GNU General Public
> + * License version 2. This program is licensed "as is" without any
> + * warranty of any kind, whether express or implied.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "macb.h"
> +
> +#define  GMAC_TIMER_NAME "gmac-ptp"
> +
> +static inline void macb_tsu_get_time(struct macb *bp, struct timespec64 *ts)
> +{
> + u64 sech, secl;
> +
> + /* get GEM internal time */
> + sech = gem_readl(bp, TSH);
> + secl = gem_readl(bp, TSL);

Does reading TSH latch the time?  The TRM is silent about that, and
most other designs latch on reading the LSB.

> + ts->tv_sec = (sech << 32) | secl;
> + ts->tv_nsec = gem_readl(bp, TN);
> +}
> +
> +static inline void macb_tsu_set_time(struct macb *bp,
> +  const struct timespec64 *ts)
> +{
> + u32 ns, sech, secl;
> +

Re: [PATCH v3 5/6] net: core: run cgroup eBPF egress programs

2016-09-06 Thread Daniel Borkmann


On 09/05/2016 04:22 PM, Daniel Mack wrote:

On 08/30/2016 12:03 AM, Daniel Borkmann wrote:

On 08/26/2016 09:58 PM, Daniel Mack wrote:



diff --git a/net/core/dev.c b/net/core/dev.c
index a75df86..17484e6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -141,6 +141,7 @@
   #include 
   #include 
   #include 
+#include 

   #include "net-sysfs.h"

@@ -3329,6 +3330,11 @@ static int __dev_queue_xmit(struct sk_buff *skb, void 
*accel_priv)
if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
__skb_tstamp_tx(skb, NULL, skb->sk, SCM_TSTAMP_SCHED);

+   rc = cgroup_bpf_run_filter(skb->sk, skb,
+  BPF_ATTACH_TYPE_CGROUP_INET_EGRESS);
+   if (rc)
+   return rc;


This would leak the whole skb by the way.


Ah, right.


Apart from that, could this be modeled w/o affecting the forwarding path (at 
some
local output point where we know to have a valid socket)? Then you could also 
drop
the !sk and sk->sk_family tests, and we wouldn't need to replicate parts of what
clsact is doing as well. Hmm, maybe access to src/dst mac could be handled to be
just zeroes since not available at that point?


Hmm, I wonder where this hook could be put instead then. When placed in
ip_output() and ip6_output(), the mac headers cannot be pushed before
running the program, resulting in bogus skb data from the eBPF program.


But as it stands right now, RX will only see a subset of packets in sk_filter()
layer (depending on where it's called in the proto handler implementation,
so might not even include all control messages, for example) as opposed to
the TX hook going that far even 'seeing' everything incl. forwarded packets
in the sense that we know a priori that these kind of skbs going through the
cgroup_bpf_run_filter() handler when the hook is enabled will just skip this
hook eventually anyway. What about letting such progs see /only/ local skbs
for RX and TX, with skb->data from L3 onwards (iirc, that would be similar
to what current sk_filter() programs see)?

Re: [patch net-next RFC 1/2] fib: introduce fib notification infrastructure

2016-09-06 Thread Hannes Frederic Sowa

On Tue, Sep 6, 2016, at 17:49, Jiri Pirko wrote:
> Tue, Sep 06, 2016 at 05:11:11PM CEST, d...@cumulusnetworks.com wrote:
> >On 9/6/16 8:44 AM, Jiri Pirko wrote:
> >> Tue, Sep 06, 2016 at 04:32:12PM CEST, d...@cumulusnetworks.com wrote:
> >>> On 9/6/16 6:01 AM, Jiri Pirko wrote:
>  From: Jiri Pirko 
> 
>  This allows to pass information about added/deleted fib entries to
>  whoever is interested. This is done in a very similar way as devinet
>  notifies address additions/removals.
> 
>  Signed-off-by: Jiri Pirko 
>  ---
>   include/net/ip_fib.h | 19 +++
>   net/ipv4/fib_trie.c  | 43 +++
>   2 files changed, 62 insertions(+)
> 
> >>>
> >>> The notifier infrastructure should be generalized for use with IPv4 and 
> >>> IPv6. While the data will be family based, the infra can be generic.
> >>>
> >> 
> >> Yeah, that I thought about as well. Thing is, ipv6 notifier has to be
> >> atomic. That is the reason we have:
> >> inetaddr_chain and register_inetaddr_notifier (blocking notifier)
> >> inet6addr_chain and register_inet6addr_notifier (atomic notifier)
> >> 
> >
> >Why is IPv6 atomic? Looking at code paths for adding addresses seems like 
> >all of the locks are dropped before the notifier is called and adding and 
> >deleting ipv6 addresses does not show a hit with this WARN_ON:
> 
> 
> Maybe historic reasons. Would be good to unite the notifiers then. I'll
> look at it.

We add IPs and routes from bottom half layer because of neighbour
discovery router advertisements. They need to run in atomic context
without sleeping.

Bye,
Hannes

Re: [PATCH v4 2/6] cgroup: add support for eBPF programs

2016-09-06 Thread Daniel Borkmann


On 09/06/2016 03:46 PM, Daniel Mack wrote:

This patch adds two sets of eBPF program pointers to struct cgroup.
One for such that are directly pinned to a cgroup, and one for such
that are effective for it.

To illustrate the logic behind that, assume the following example
cgroup hierarchy.

   A - B - C
 \ D - E

If only B has a program attached, it will be effective for B, C, D
and E. If D then attaches a program itself, that will be effective for
both D and E, and the program in B will only affect B and C. Only one
program of a given type is effective for a cgroup.

Attaching and detaching programs will be done through the bpf(2)
syscall. For now, ingress and egress inet socket filtering are the
only supported use-cases.

Signed-off-by: Daniel Mack 

[...]

+/**
+ * __cgroup_bpf_run_filter() - Run a program for packet filtering
+ * @sk: The socken sending or receiving traffic
+ * @skb: The skb that is being sent or received
+ * @type: The type of program to be exectuted
+ *
+ * If no socket is passed, or the socket is not of type INET or INET6,
+ * this function does nothing and returns 0.
+ *
+ * The program type passed in via @type must be suitable for network
+ * filtering. No further check is performed to assert that.
+ *
+ * This function will return %-EPERM if any if an attached program was found
+ * and if it returned != 1 during execution. In all other cases, 0 is returned.
+ */
+int __cgroup_bpf_run_filter(struct sock *sk,
+   struct sk_buff *skb,
+   enum bpf_attach_type type)
+{
+   struct bpf_prog *prog;
+   struct cgroup *cgrp;
+   int ret = 0;
+
+   if (!sk)
+   return 0;


Doesn't this also need to check || !sk_fullsock(sk)?


+
+   if (sk->sk_family != AF_INET &&
+   sk->sk_family != AF_INET6)
+   return 0;
+
+   cgrp = sock_cgroup_ptr(>sk_cgrp_data);
+
+   rcu_read_lock();
+
+   prog = rcu_dereference(cgrp->bpf.effective[type]);
+   if (prog) {
+   unsigned int offset = skb->data - skb_mac_header(skb);
+
+   __skb_push(skb, offset);
+   ret = bpf_prog_run_clear_cb(prog, skb) == 1 ? 0 : -EPERM;
+   __skb_pull(skb, offset);
+   }
+
+   rcu_read_unlock();
+
+   return ret;
+}

[PATCH net-next] sfc: check MTU against minimum threshold

2016-09-06 Thread Bert Kenward

Reported-by: Ma Yuying 
Suggested-by: Jarod Wilson 
Signed-off-by: Bert Kenward 
---
 drivers/net/ethernet/sfc/efx.c| 12 +++-
 drivers/net/ethernet/sfc/net_driver.h |  3 +++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index f3826ae..3cf3557 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -2263,8 +2263,18 @@ static int efx_change_mtu(struct net_device *net_dev, 
int new_mtu)
rc = efx_check_disabled(efx);
if (rc)
return rc;
-   if (new_mtu > EFX_MAX_MTU)
+   if (new_mtu > EFX_MAX_MTU) {
+   netif_err(efx, drv, efx->net_dev,
+ "Requested MTU of %d too big (max: %d)\n",
+ new_mtu, EFX_MAX_MTU);
return -EINVAL;
+   }
+   if (new_mtu < EFX_MIN_MTU) {
+   netif_err(efx, drv, efx->net_dev,
+ "Requested MTU of %d too small (min: %d)\n",
+ new_mtu, EFX_MIN_MTU);
+   return -EINVAL;
+   }
 
netif_dbg(efx, drv, efx->net_dev, "changing MTU to %d\n", new_mtu);
 
diff --git a/drivers/net/ethernet/sfc/net_driver.h 
b/drivers/net/ethernet/sfc/net_driver.h
index 0a2504b..99d8c82 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -76,6 +76,9 @@
 /* Maximum possible MTU the driver supports */
 #define EFX_MAX_MTU (9 * 1024)
 
+/* Minimum MTU, from RFC791 (IP) */
+#define EFX_MIN_MTU 68
+
 /* Size of an RX scatter buffer.  Small enough to pack 2 into a 4K page,
  * and should be a multiple of the cache line size.
  */
-- 
2.7.4

[PATCH net-next] netlink: don't forget to release a rhashtable_iter structure

2016-09-06 Thread Andrei Vagin

This bug was detected by kmemleak:
unreferenced object 0x8804269cc3c0 (size 64):
  comm "criu", pid 1042, jiffies 4294907360 (age 13.713s)
  hex dump (first 32 bytes):
a0 32 cc 2c 04 88 ff ff 00 00 00 00 00 00 00 00  .2.,
00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  
  backtrace:
[] kmemleak_alloc+0x4a/0xa0
[] kmem_cache_alloc_trace+0x10f/0x280
[] __netlink_diag_dump+0x26c/0x290 [netlink_diag]

Cc: Herbert Xu 
Fixes: ad202074320c ("netlink: Use rhashtable walk interface in diag dump")
Signed-off-by: Andrei Vagin 
---
 net/netlink/diag.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netlink/diag.c b/net/netlink/diag.c
index 3e3e253..951670c 100644
--- a/net/netlink/diag.c
+++ b/net/netlink/diag.c
@@ -127,6 +127,7 @@ stop:
goto done;
 
rhashtable_walk_exit(hti);
+   kfree(hti);
cb->args[2] = 0;
num++;
 
-- 
2.5.5

Re: [RFC PATCH kernel] PCI: Enable access to custom VPD for Chelsio devices (cxgb3)

2016-09-06 Thread Alexander Duyck

On Tue, Sep 6, 2016 at 8:48 AM, Bjorn Helgaas  wrote:
> Hi Alexey,
>
> On Thu, Aug 11, 2016 at 08:03:29PM +1000, Alexey Kardashevskiy wrote:
>> There is at least one Chelsio 10Gb card which uses VPD area to store
>> some custom blocks (example below). However pci_vpd_size() returns
>> the length of the first block only assuming that there can be only
>> one VPD "End Tag" and VFIO blocks access beyond that offset
>> (since 4e1a63555) which leads to the situation when the guest "cxgb3"
>> driver fails to probe the device. The host system does not have this
>> problem as the drives accesses the config space directly without
>> pci_read_vpd()/...
>>
>> This adds a quirk to override the VPD size to a bigger value.
>>
>> This is the controller:
>> Ethernet controller [0200]: Chelsio Communications Inc T310 10GbE Single 
>> Port Adapter [1425:0030]
>>
>> This is its VPD:
>> # Large item 42 bytes; name 0x2 Identifier String
>>   b'10 Gigabit Ethernet-SR PCI Express Adapter'
>> #002d Large item 74 bytes; name 0x10
>>   #00 [EC] len=7: b'D76809 '
>>   #0a [FN] len=7: b'46K7897'
>>   #14 [PN] len=7: b'46K7897'
>>   #1e [MN] len=4: b'1037'
>>   #25 [FC] len=4: b'5769'
>>   #2c [SN] len=12: b'YL102035603V'
>>   #3b [NA] len=12: b'00145E992ED1'
>> #007a Small item 1 bytes; name 0xf End Tag
>> ---
>> #0c00 Large item 16 bytes; name 0x2 Identifier String
>>   b'S310E-SR-X  '
>> #0c13 Large item 234 bytes; name 0x10
>>   #00 [PN] len=16: b'TBD '
>>   #13 [EC] len=16: b'110107730D2 '
>>   #26 [SN] len=16: b'97YL102035603V  '
>>   #39 [NA] len=12: b'00145E992ED1'
>>   #48 [V0] len=6: b'175000'
>>   #51 [V1] len=6: b'26'
>>   #5a [V2] len=6: b'26'
>>   #63 [V3] len=6: b'2000  '
>>   #6c [V4] len=2: b'1 '
>>   #71 [V5] len=6: b'c2'
>>   #7a [V6] len=6: b'0 '
>>   #83 [V7] len=2: b'1 '
>>   #88 [V8] len=2: b'0 '
>>   #8d [V9] len=2: b'0 '
>>   #92 [VA] len=2: b'0 '
>>   #97 [RV] len=80: 
>> b's\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
>> #0d00 Large item 252 bytes; name 0x11
>>   #00 [VC] len=16: b'122310_1222 dp  '
>>   #13 [VD] len=16: b'610-0001-00 H1\x00\x00'
>>   #26 [VE] len=16: b'122310_1353 fp  '
>>   #39 [VF] len=16: b'610-0001-00 H1\x00\x00'
>>   #4c [RW] len=173: 
>> b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
>> #0dff Small item 0 bytes; name 0xf End Tag
>>
>> Signed-off-by: Alexey Kardashevskiy 
>> ---
>>  drivers/pci/quirks.c | 12 
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index ee72ebe1..94d3fb5 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -3241,6 +3241,18 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 
>> PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C
>>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PORT_RIDGE,
>>   quirk_thunderbolt_hotplug_msi);
>>
>> +static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
>> +{
>> + if (!dev->vpd || !dev->vpd->ops || !dev->vpd->ops->set_size)
>> + return;
>> +
>> + dev->vpd->ops->set_size(dev, max_t(unsigned int, dev->vpd->len, 
>> 0xe00));
>> +}
>> +

So one thing you might want to look at doing is actually validating
there is something there before increasing the size.  If you look at
the get_vpd_params function from the cxgb3 driver you will see what
they do is verify the first tag located at 0xC00 is 0x82 before they
do any further reads.  You might do something similar just to verify
there is something there before you open it up to access by anyone.

One option would be to modify pci_vpd_size so that you can use it
outside of access.c and can pass it an offset.  Then you could update
your quirk so that you call pci_vpd_size and pass it the offset of
0xC00.  It should then be able to walk from that starting point and
reach the end of the list.  If you do then pci_vpd_size will return
the total size, else it returns 0.  So if size comes back as a
non-zero value then you could pass that into pci_set_vpd_size,
otherwise we can assume the starting offset is 0 and let the existing
code run its course.

>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO,
>> + PCI_ANY_ID,
>> + quirk_chelsio_extend_vpd);
>
> Do you really want this for *all* Chelsio devices?  If you only need
> it for certain devices, the quirk could probably go in the driver.
>
> Can you use pci_set_vpd_size() instead?  There's already a use of that
> in cxgb4.
>

I would assume this quirk needs to support the same device IDs as
supported by the cxgb3 driver.  If so you might just clone the ID list
from cxgb3_pci_tbl for this quirk.

Also from the looks of it the cxgb3 driver probably needs to be
updated to use the VPD accessor functions instead of just open coding
it themselves.

Re: ipv6: release dst in ping_v6_sendmsg

2016-09-06 Thread Eric Dumazet

On Tue, 2016-09-06 at 10:36 -0700, Martin KaFai Lau wrote:
> On Fri, Sep 02, 2016 at 02:39:50PM -0400, Dave Jones wrote:
> > Neither the failure or success paths of ping_v6_sendmsg release
> > the dst it acquires.  This leads to a flood of warnings from
> > "net/core/dst.c:288 dst_release" on older kernels that
> > don't have 8bf4ada2e21378816b28205427ee6b0e1ca4c5f1 backported.
> >
> > That patch optimistically hoped this had been fixed post 3.10, but
> > it seems at least one case wasn't, where I've seen this triggered
> > a lot from machines doing unprivileged icmp sockets.
> >
> > Cc: Martin Lau 
> > Signed-off-by: Dave Jones 
> >
> > diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
> > index 0900352c924c..0e983b694ee8 100644
> > --- a/net/ipv6/ping.c
> > +++ b/net/ipv6/ping.c
> > @@ -126,8 +126,10 @@ static int ping_v6_sendmsg(struct sock *sk, struct 
> > msghdr *msg, size_t len)
> > rt = (struct rt6_info *) dst;
> >
> > np = inet6_sk(sk);
> > -   if (!np)
> > -   return -EBADF;
> > +   if (!np) {
> > +   err = -EBADF;
> > +   goto dst_err_out;
> > +   }
> >
> > if (!fl6.flowi6_oif && ipv6_addr_is_multicast())
> > fl6.flowi6_oif = np->mcast_oif;
> > @@ -163,6 +165,9 @@ static int ping_v6_sendmsg(struct sock *sk, struct 
> > msghdr *msg, size_t len)
> > }
> > release_sock(sk);
> >
> > +dst_err_out:
> > +   dst_release(dst);
> > +
> > if (err)
> > return err;
> >
> 
> Acked-by: Martin KaFai Lau 

This really does not make sense to me.

If np was NULL, we should have a crash before.

So we should remove this test, since it is absolutely useless.

Re: ipv6: release dst in ping_v6_sendmsg

2016-09-06 Thread Dave Jones

On Tue, Sep 06, 2016 at 10:52:43AM -0700, Eric Dumazet wrote:
 
 > > > @@ -126,8 +126,10 @@ static int ping_v6_sendmsg(struct sock *sk, struct 
 > > > msghdr *msg, size_t len)
 > > >  rt = (struct rt6_info *) dst;
 > > >
 > > >  np = inet6_sk(sk);
 > > > -if (!np)
 > > > -return -EBADF;
 > > > +if (!np) {
 > > > +err = -EBADF;
 > > > +goto dst_err_out;
 > > > +}
 > > >
 > > >  if (!fl6.flowi6_oif && ipv6_addr_is_multicast())
 > > >  fl6.flowi6_oif = np->mcast_oif;
 > > > @@ -163,6 +165,9 @@ static int ping_v6_sendmsg(struct sock *sk, struct 
 > > > msghdr *msg, size_t len)
 > > >  }
 > > >  release_sock(sk);
 > > >
 > > > +dst_err_out:
 > > > +dst_release(dst);
 > > > +
 > > >  if (err)
 > > >  return err;
 > > >
 > > 
 > > Acked-by: Martin KaFai Lau 
 > 
 > This really does not make sense to me.
 > 
 > If np was NULL, we should have a crash before.

In the case where I was seeing the traces, we were taking the 'success'
path through the function, so sk was non-null.

 > So we should remove this test, since it is absolutely useless.

Looking closer, it seems the assignment of np is duplicated also,
so that can also go.   This is orthogonal to the dst leak though.
I'll submit a follow-up cleaning that up.

Dave

Re: ipv6: release dst in ping_v6_sendmsg

2016-09-06 Thread Martin KaFai Lau

On Fri, Sep 02, 2016 at 02:39:50PM -0400, Dave Jones wrote:
> Neither the failure or success paths of ping_v6_sendmsg release
> the dst it acquires.  This leads to a flood of warnings from
> "net/core/dst.c:288 dst_release" on older kernels that
> don't have 8bf4ada2e21378816b28205427ee6b0e1ca4c5f1 backported.
>
> That patch optimistically hoped this had been fixed post 3.10, but
> it seems at least one case wasn't, where I've seen this triggered
> a lot from machines doing unprivileged icmp sockets.
>
> Cc: Martin Lau 
> Signed-off-by: Dave Jones 
>
> diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
> index 0900352c924c..0e983b694ee8 100644
> --- a/net/ipv6/ping.c
> +++ b/net/ipv6/ping.c
> @@ -126,8 +126,10 @@ static int ping_v6_sendmsg(struct sock *sk, struct 
> msghdr *msg, size_t len)
>   rt = (struct rt6_info *) dst;
>
>   np = inet6_sk(sk);
> - if (!np)
> - return -EBADF;
> + if (!np) {
> + err = -EBADF;
> + goto dst_err_out;
> + }
>
>   if (!fl6.flowi6_oif && ipv6_addr_is_multicast())
>   fl6.flowi6_oif = np->mcast_oif;
> @@ -163,6 +165,9 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr 
> *msg, size_t len)
>   }
>   release_sock(sk);
>
> +dst_err_out:
> + dst_release(dst);
> +
>   if (err)
>   return err;
>

Acked-by: Martin KaFai Lau

XPS configuration question (on tg3)

2016-09-06 Thread Michal Soltys

Hi,

I've been testing different configurations and I didn't manage to get XPS to 
"behave" correctly - so I'm probably misunderstanding or forgetting something. 
The nic in question (under tg3 driver - BCM5720 and BCM5719 models) was 
configured to 3 tx and 4 rx queues. 3 irqs were shared (tx and rx), 1 was 
unused (this got me scratching my head a bit) and the remaining one was for the 
last rx (though due to another bug recently fixed the 4th rx queue was 
inconfigurable on receive side). The names were: eth1b-0, eth1b-txrx-1, 
eth1b-txrx-2, eth1b-txrx-3, eth1b-rx-4.

The XPS was configured as:

echo f >/sys/class/net/eth1b/queues/tx-0/xps_cpus
echo f0 >/sys/class/net/eth1b/queues/tx-1/xps_cpus
echo ff00 >/sys/class/net/eth1b/queues/tx-2/xps_cpus

So as far as I understand - cpus 0-3 should be allowed to use tx-0 queue only, 
4-7 tx-1 and 8-15 tx-2.

Just in case rx side could get in the way as far as flows go, relevant irqs 
were pinned to specific cpus - txrx-1 to 2, txrx-2 to 4, txrx-3 to 10 - falling 
into groups defined by the above masks.

I tested both with mx and multiq scheduler, essentially either this:

qdisc mq 2: root
qdisc pfifo_fast 0: parent 2:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent 2:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent 2:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 

or this (for the record, skbaction queue_mapping was behaving correctly with 
the one below):

qdisc multiq 3: root refcnt 6 bands 3/5
qdisc pfifo_fast 31: parent 3:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 32: parent 3:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 33: parent 3:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 
1 

Now, do I understand correctly, that under the above setup - commands such as

taskset 400 nc -p $prt host_ip 12345

Re: [RFC PATCH kernel] PCI: Enable access to custom VPD for Chelsio devices (cxgb3)

2016-09-06 Thread Bjorn Helgaas

Hi Alexey,

On Thu, Aug 11, 2016 at 08:03:29PM +1000, Alexey Kardashevskiy wrote:
> There is at least one Chelsio 10Gb card which uses VPD area to store
> some custom blocks (example below). However pci_vpd_size() returns
> the length of the first block only assuming that there can be only
> one VPD "End Tag" and VFIO blocks access beyond that offset
> (since 4e1a63555) which leads to the situation when the guest "cxgb3"
> driver fails to probe the device. The host system does not have this
> problem as the drives accesses the config space directly without
> pci_read_vpd()/...
> 
> This adds a quirk to override the VPD size to a bigger value.
> 
> This is the controller:
> Ethernet controller [0200]: Chelsio Communications Inc T310 10GbE Single Port 
> Adapter [1425:0030]
> 
> This is its VPD:
> # Large item 42 bytes; name 0x2 Identifier String
>   b'10 Gigabit Ethernet-SR PCI Express Adapter'
> #002d Large item 74 bytes; name 0x10
>   #00 [EC] len=7: b'D76809 '
>   #0a [FN] len=7: b'46K7897'
>   #14 [PN] len=7: b'46K7897'
>   #1e [MN] len=4: b'1037'
>   #25 [FC] len=4: b'5769'
>   #2c [SN] len=12: b'YL102035603V'
>   #3b [NA] len=12: b'00145E992ED1'
> #007a Small item 1 bytes; name 0xf End Tag
> ---
> #0c00 Large item 16 bytes; name 0x2 Identifier String
>   b'S310E-SR-X  '
> #0c13 Large item 234 bytes; name 0x10
>   #00 [PN] len=16: b'TBD '
>   #13 [EC] len=16: b'110107730D2 '
>   #26 [SN] len=16: b'97YL102035603V  '
>   #39 [NA] len=12: b'00145E992ED1'
>   #48 [V0] len=6: b'175000'
>   #51 [V1] len=6: b'26'
>   #5a [V2] len=6: b'26'
>   #63 [V3] len=6: b'2000  '
>   #6c [V4] len=2: b'1 '
>   #71 [V5] len=6: b'c2'
>   #7a [V6] len=6: b'0 '
>   #83 [V7] len=2: b'1 '
>   #88 [V8] len=2: b'0 '
>   #8d [V9] len=2: b'0 '
>   #92 [VA] len=2: b'0 '
>   #97 [RV] len=80: 
> b's\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
> #0d00 Large item 252 bytes; name 0x11
>   #00 [VC] len=16: b'122310_1222 dp  '
>   #13 [VD] len=16: b'610-0001-00 H1\x00\x00'
>   #26 [VE] len=16: b'122310_1353 fp  '
>   #39 [VF] len=16: b'610-0001-00 H1\x00\x00'
>   #4c [RW] len=173: 
> b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
> #0dff Small item 0 bytes; name 0xf End Tag
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  drivers/pci/quirks.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index ee72ebe1..94d3fb5 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3241,6 +3241,18 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 
> PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PORT_RIDGE,
>   quirk_thunderbolt_hotplug_msi);
>  
> +static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
> +{
> + if (!dev->vpd || !dev->vpd->ops || !dev->vpd->ops->set_size)
> + return;
> +
> + dev->vpd->ops->set_size(dev, max_t(unsigned int, dev->vpd->len, 0xe00));
> +}
> +
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO,
> + PCI_ANY_ID,
> + quirk_chelsio_extend_vpd);

Do you really want this for *all* Chelsio devices?  If you only need
it for certain devices, the quirk could probably go in the driver.

Can you use pci_set_vpd_size() instead?  There's already a use of that
in cxgb4.

> +
>  #ifdef CONFIG_ACPI
>  /*
>   * Apple: Shutdown Cactus Ridge Thunderbolt controller.
> -- 
> 2.5.0.rc3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next V2 3/6] net/mlx5e: Read ETS settings directly from firmware

2016-09-06 Thread Or Gerlitz

On Tue, Sep 6, 2016 at 7:04 PM, Saeed Mahameed  wrote:
> From: Huy Nguyen 
>
> Issue description:
> Current implementation saves the ETS settings from user in
> a temporal soft copy and returns this settings when user
> queries the ETS settings.
>
> With the new DCBX firmware, the ETS settings can be changed
> by firmware when the DCBX is in firmware controlled mode. Therefore,
> user will obtain wrong values from the temporal soft copy.

Can this happen also with V2? I thought that when you dropped the
private flag it means that this hybrid mode
isn't exposed now, please clarify.

Re: XPS configuration question (on tg3)

2016-09-06 Thread Alexander Duyck

On Tue, Sep 6, 2016 at 11:46 AM, Michal Soltys  wrote:
> Hi,
>
> I've been testing different configurations and I didn't manage to get XPS to 
> "behave" correctly - so I'm probably misunderstanding or forgetting 
> something. The nic in question (under tg3 driver - BCM5720 and BCM5719 
> models) was configured to 3 tx and 4 rx queues. 3 irqs were shared (tx and 
> rx), 1 was unused (this got me scratching my head a bit) and the remaining 
> one was for the last rx (though due to another bug recently fixed the 4th rx 
> queue was inconfigurable on receive side). The names were: eth1b-0, 
> eth1b-txrx-1, eth1b-txrx-2, eth1b-txrx-3, eth1b-rx-4.
>
> The XPS was configured as:
>
> echo f >/sys/class/net/eth1b/queues/tx-0/xps_cpus
> echo f0 >/sys/class/net/eth1b/queues/tx-1/xps_cpus
> echo ff00 >/sys/class/net/eth1b/queues/tx-2/xps_cpus
>
> So as far as I understand - cpus 0-3 should be allowed to use tx-0 queue 
> only, 4-7 tx-1 and 8-15 tx-2.
>
> Just in case rx side could get in the way as far as flows go, relevant irqs 
> were pinned to specific cpus - txrx-1 to 2, txrx-2 to 4, txrx-3 to 10 - 
> falling into groups defined by the above masks.
>
> I tested both with mx and multiq scheduler, essentially either this:
>
> qdisc mq 2: root
> qdisc pfifo_fast 0: parent 2:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 
> 1
> qdisc pfifo_fast 0: parent 2:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 
> 1
> qdisc pfifo_fast 0: parent 2:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 
> 1
>
> or this (for the record, skbaction queue_mapping was behaving correctly with 
> the one below):
>
> qdisc multiq 3: root refcnt 6 bands 3/5
> qdisc pfifo_fast 31: parent 3:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 
> 1 1
> qdisc pfifo_fast 32: parent 3:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 
> 1 1
> qdisc pfifo_fast 33: parent 3:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 
> 1 1
>
> Now, do I understand correctly, that under the above setup - commands such as
>
> taskset 400 nc -p $prt host_ip 12345  or
> yancat -i /dev/zero -o t:host_ip:12345 -u 10 -U 10
>
> ITOW - pinning simple nc command on cpu #10 (or using a tool that supports 
> affinity by itself) and sending data to some other host on the net - should 
> *always* use tx-2 queue ?
> I also tested variation such as: taskset 400 nc -l -p host_ip 12345 
> 
> In my case, what queue it used was basically random (on top of that it 
> sometimes changed the used queue mid-transfer) what could be easily confirmed 
> through both /proc/interrupts and tc -s qdisc show. And I'm a bit at loss 
> now, as I though xps configuration should be absolute.
>
> Well, I'd be greatful for some pointers / hints.

So it sounds like you have everything configured correctly.  The one
question I would have is if we are certain the CPU pinning is working
for the application.  You might try using something like perf to
verify what is running on CPU 10, and what is running on the CPUs that
the queues are associated with.

Also after you have configured things you may want to double check and
verify the xps_cpus value is still set.  I know under some
circumstances the value can be reset by a device driver if the number
of queues changes, or if the interface toggles between being
administratively up/down.

Thanks.

- Alex

[PATCH v2] ptp: ixp46x: remove NO_IRQ handling

2016-09-06 Thread Arnd Bergmann

gpio_to_irq does not return NO_IRQ but instead returns a negative
error code on failure. Returning NO_IRQ from the function has no
negative effects as we only compare the result to the expected
interrupt number, but it's better to return a proper failure
code for consistency, and we should remove NO_IRQ from the kernel
entirely.

Signed-off-by: Arnd Bergmann 
Acked-by: Richard Cochran 
---
 drivers/ptp/ptp_ixp46x.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

v2: fix trivial typo

diff --git a/drivers/ptp/ptp_ixp46x.c b/drivers/ptp/ptp_ixp46x.c
index ee4f183ef9ee..344a3bac210b 100644
--- a/drivers/ptp/ptp_ixp46x.c
+++ b/drivers/ptp/ptp_ixp46x.c
@@ -268,18 +268,19 @@ static int setup_interrupt(int gpio)
return err;
 
irq = gpio_to_irq(gpio);
+   if (irq < 0)
+   return irq;
 
-   if (NO_IRQ == irq)
-   return NO_IRQ;
-
-   if (irq_set_irq_type(irq, IRQF_TRIGGER_FALLING)) {
+   err = irq_set_irq_type(irq, IRQF_TRIGGER_FALLING);
+   if (err) {
pr_err("cannot set trigger type for irq %d\n", irq);
-   return NO_IRQ;
+   return err;
}
 
-   if (request_irq(irq, isr, 0, DRIVER, _clock)) {
+   err = request_irq(irq, isr, 0, DRIVER, _clock);
+   if (err) {
pr_err("request_irq failed for irq %d\n", irq);
-   return NO_IRQ;
+   return err;
}
 
return irq;
-- 
2.9.0

Re: ipv6: release dst in ping_v6_sendmsg

2016-09-06 Thread David Miller

From: Dave Jones 
Date: Fri, 2 Sep 2016 14:39:50 -0400

> Neither the failure or success paths of ping_v6_sendmsg release
> the dst it acquires.  This leads to a flood of warnings from
> "net/core/dst.c:288 dst_release" on older kernels that
> don't have 8bf4ada2e21378816b28205427ee6b0e1ca4c5f1 backported.
> 
> That patch optimistically hoped this had been fixed post 3.10, but
> it seems at least one case wasn't, where I've seen this triggered
> a lot from machines doing unprivileged icmp sockets.
> 
> Cc: Martin Lau 
> Signed-off-by: Dave Jones 

Applied and queued up for -stable, thanks Dave.

Re: [PATCH net-next v2 0/3] net: dsa: mv88e6xxx: isolate Global2 support

2016-09-06 Thread David Miller

From: Vivien Didelot 
Date: Fri,  2 Sep 2016 14:45:31 -0400

> Registers of Marvell chips are organized in internal SMI devices.
> 
> One of them at address 0x1C is called Global2. It provides an extended
> set of registers, used for interrupt control, EEPROM access, indirect
> PHY access (to bypass the PHY Polling Unit) and cross-chip setup.
> 
> Most chips have it, but some others don't (older ones such as 6060).
> 
> Now that its related code is isolated in mv88e6xxx_g2_* functions, move
> it to its own global2.c file, making most of its setup code static.
> 
> Then make its compilation optional, which allows to reduce the size of
> the mv88e6xxx driver for devices such as home routers embedding Ethernet
> chips without Global2 support.
> 
> It is present on most recent chips, thus enable its support by default.
> 
> Changes in v2: fail probe if GLOBAL2 is required but not enabled.

Series applied, thanks.

RE

2016-09-06 Thread Easy Loan

Are you need of Loan offer?  We here to meet your need

Contact us

Jim Elssin

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Re: [PATCH v3 5/5] net: asix: autoneg will set WRITE_MEDIUM reg

2016-09-06 Thread Robert Foss




On 2016-09-06 12:41 PM, Grant Grundler wrote:

On Thu, Sep 1, 2016 at 10:02 AM, Eric Dumazet  wrote:

On Thu, 2016-09-01 at 12:47 -0400, Robert Foss wrote:


I'm not quite sure how the first From line was added, it
should not have been.
Grant Grundler is most definitely the author.

Would you like me to resubmit in v++ and make sure that it has been
corrected?


This is too late, patches are already merged in David Miller net-next
tree.

These kinds of errors can not be fixed, we have to be very careful at
submission/review time.

I guess Grant does not care, but some contributors, especially new ones
would like to get proper attribution.


I do not mind. Robert will get email about bugs instead of me. :D


Thanks Grant, sorry about the mixup!


Rob.

vlan aware bridge doesn't propagate mac changes to vlans on top of it

2016-09-06 Thread Michal Soltys

Consider following scenario:

- create vlan aware bridge (say br0)
- setup br0's vlans, e.g.

bridge vlan add dev br0 vid 10 self

This will add necessary fdb entries directing appropriate traffic to the
bridge itself.

- create appropriate vlan interfaces on top of it, for example:

ip li add link br0 name br0.10 type vlan id 10
ip add add 10.0.0.1/8 dev br0.10 

This will add vlan devices on top of br0 and *inherit br0's mac address*.

- now after all of the above is done

ip li set eth0 master br0

This will attach interface eth0 to the bridge. With this being the first
interface attached, br0 will take it's mac address as its own. Any
further changes to br0's ports may cause the same, with the lowest mac
address of some port becoming br0's mac.

This will update fdb entries as well, but all vlan interfaces on top of
br0 (e.g. br0.10) will be using old mac address from the time when vlan
was created.

The side effect of it is that any traffic addressed to such interface
will be flooded to all ports (and br0 itself).

The only workaround I found is to either manually update mac addresses
with 'ip' or recreate vlans (bridge fdb refused to update relevant entries).

But if br0's mac changes due to some port changes - shouldn't it be
somehow propagated automatically to vlans created on top of it ?

[PATCH v4 4/4] wcn36xx: Implement print_reg indication

2016-09-06 Thread Bjorn Andersson

Some firmware versions sends a "print register indication", handle this
by printing out the content.

Cc: Nicolas Dechesne 
Signed-off-by: Bjorn Andersson 
---

Changes since v3:
- Rebased separate patch onto this series

 drivers/net/wireless/ath/wcn36xx/hal.h | 16 
 drivers/net/wireless/ath/wcn36xx/smd.c | 30 ++
 2 files changed, 46 insertions(+)

diff --git a/drivers/net/wireless/ath/wcn36xx/hal.h 
b/drivers/net/wireless/ath/wcn36xx/hal.h
index 4f87ef1e1eb8..b765c647319d 100644
--- a/drivers/net/wireless/ath/wcn36xx/hal.h
+++ b/drivers/net/wireless/ath/wcn36xx/hal.h
@@ -350,6 +350,8 @@ enum wcn36xx_hal_host_msg_type {
 
WCN36XX_HAL_AVOID_FREQ_RANGE_IND = 233,
 
+   WCN36XX_HAL_PRINT_REG_INFO_IND = 259,
+
WCN36XX_HAL_MSG_MAX = WCN36XX_HAL_MSG_TYPE_MAX_ENUM_SIZE
 };
 
@@ -4703,4 +4705,18 @@ struct stats_class_b_ind {
u32 rx_time_total;
 };
 
+/* WCN36XX_HAL_PRINT_REG_INFO_IND */
+struct wcn36xx_hal_print_reg_info_ind {
+   struct wcn36xx_hal_msg_header header;
+
+   u32 count;
+   u32 scenario;
+   u32 reason;
+
+   struct {
+   u32 addr;
+   u32 value;
+   } regs[];
+} __packed;
+
 #endif /* _HAL_H_ */
diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index be5e5ea1e5c3..1c2966f7db7a 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -2109,6 +2109,30 @@ static int wcn36xx_smd_delete_sta_context_ind(struct 
wcn36xx *wcn,
return -ENOENT;
 }
 
+static int wcn36xx_smd_print_reg_info_ind(struct wcn36xx *wcn,
+ void *buf,
+ size_t len)
+{
+   struct wcn36xx_hal_print_reg_info_ind *rsp = buf;
+   int i;
+
+   if (len < sizeof(*rsp)) {
+   wcn36xx_warn("Corrupted print reg info indication\n");
+   return -EIO;
+   }
+
+   wcn36xx_dbg(WCN36XX_DBG_HAL,
+   "reginfo indication, scenario: 0x%x reason: 0x%x\n",
+   rsp->scenario, rsp->reason);
+
+   for (i = 0; i < rsp->count; i++) {
+   wcn36xx_dbg(WCN36XX_DBG_HAL, "\t0x%x: 0x%x\n",
+   rsp->regs[i].addr, rsp->regs[i].value);
+   }
+
+   return 0;
+}
+
 int wcn36xx_smd_update_cfg(struct wcn36xx *wcn, u32 cfg_id, u32 value)
 {
struct wcn36xx_hal_update_cfg_req_msg msg_body, *body;
@@ -2237,6 +2261,7 @@ int wcn36xx_smd_rsp_process(struct qcom_smd_channel 
*channel,
case WCN36XX_HAL_OTA_TX_COMPL_IND:
case WCN36XX_HAL_MISSED_BEACON_IND:
case WCN36XX_HAL_DELETE_STA_CONTEXT_IND:
+   case WCN36XX_HAL_PRINT_REG_INFO_IND:
msg_ind = kmalloc(sizeof(*msg_ind) + len, GFP_ATOMIC);
if (!msg_ind) {
wcn36xx_err("Run out of memory while handling SMD_EVENT 
(%d)\n",
@@ -2296,6 +2321,11 @@ static void wcn36xx_ind_smd_work(struct work_struct 
*work)
   hal_ind_msg->msg,
   hal_ind_msg->msg_len);
break;
+   case WCN36XX_HAL_PRINT_REG_INFO_IND:
+   wcn36xx_smd_print_reg_info_ind(wcn,
+  hal_ind_msg->msg,
+  hal_ind_msg->msg_len);
+   break;
default:
wcn36xx_err("SMD_EVENT (%d) not supported\n",
  msg_header->msg_type);
-- 
2.5.0

[PATCH v4 2/4] wcn36xx: Transition driver to SMD client

2016-09-06 Thread Bjorn Andersson

The wcn36xx wifi driver follows the life cycle of the WLAN_CTRL SMD
channel, as such it should be a SMD client. This patch makes this
transition, now that we have the necessary frameworks available.

Signed-off-by: Bjorn Andersson 
---

Changes since v3:
- Made msg_header const in wcn36xx_smd_rsp_process()

Changes since v2:
- Correct the call to the new ieee80211_scan_completed()

 drivers/net/wireless/ath/wcn36xx/dxe.c | 16 +++---
 drivers/net/wireless/ath/wcn36xx/main.c| 79 --
 drivers/net/wireless/ath/wcn36xx/smd.c | 31 +---
 drivers/net/wireless/ath/wcn36xx/smd.h |  5 ++
 drivers/net/wireless/ath/wcn36xx/wcn36xx.h | 21 +++-
 5 files changed, 86 insertions(+), 66 deletions(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.c 
b/drivers/net/wireless/ath/wcn36xx/dxe.c
index 231fd022f0f5..87dfdaf9044c 100644
--- a/drivers/net/wireless/ath/wcn36xx/dxe.c
+++ b/drivers/net/wireless/ath/wcn36xx/dxe.c
@@ -23,6 +23,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include 
+#include 
 #include "wcn36xx.h"
 #include "txrx.h"
 
@@ -151,9 +152,12 @@ int wcn36xx_dxe_alloc_ctl_blks(struct wcn36xx *wcn)
goto out_err;
 
/* Initialize SMSM state  Clear TX Enable RING EMPTY STATE */
-   ret = wcn->ctrl_ops->smsm_change_state(
-   WCN36XX_SMSM_WLAN_TX_ENABLE,
-   WCN36XX_SMSM_WLAN_TX_RINGS_EMPTY);
+   ret = qcom_smem_state_update_bits(wcn->tx_enable_state,
+ WCN36XX_SMSM_WLAN_TX_ENABLE |
+ WCN36XX_SMSM_WLAN_TX_RINGS_EMPTY,
+ WCN36XX_SMSM_WLAN_TX_RINGS_EMPTY);
+   if (ret)
+   goto out_err;
 
return 0;
 
@@ -678,9 +682,9 @@ int wcn36xx_dxe_tx_frame(struct wcn36xx *wcn,
 * notify chip about new frame through SMSM bus.
 */
if (is_low &&  vif_priv->pw_state == WCN36XX_BMPS) {
-   wcn->ctrl_ops->smsm_change_state(
- 0,
- WCN36XX_SMSM_WLAN_TX_ENABLE);
+   qcom_smem_state_update_bits(wcn->tx_rings_empty_state,
+   WCN36XX_SMSM_WLAN_TX_ENABLE,
+   WCN36XX_SMSM_WLAN_TX_ENABLE);
} else {
/* indicate End Of Packet and generate interrupt on descriptor
 * done.
diff --git a/drivers/net/wireless/ath/wcn36xx/main.c 
b/drivers/net/wireless/ath/wcn36xx/main.c
index e1d59da2ad20..3c2522b07c90 100644
--- a/drivers/net/wireless/ath/wcn36xx/main.c
+++ b/drivers/net/wireless/ath/wcn36xx/main.c
@@ -21,6 +21,10 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 #include "wcn36xx.h"
 
 unsigned int wcn36xx_dbg_mask;
@@ -1058,8 +1062,7 @@ static int wcn36xx_platform_get_resources(struct wcn36xx 
*wcn,
int ret;
 
/* Set TX IRQ */
-   res = platform_get_resource_byname(pdev, IORESOURCE_IRQ,
-  "wcnss_wlantx_irq");
+   res = platform_get_resource_byname(pdev, IORESOURCE_IRQ, "tx");
if (!res) {
wcn36xx_err("failed to get tx_irq\n");
return -ENOENT;
@@ -1067,14 +1070,29 @@ static int wcn36xx_platform_get_resources(struct 
wcn36xx *wcn,
wcn->tx_irq = res->start;
 
/* Set RX IRQ */
-   res = platform_get_resource_byname(pdev, IORESOURCE_IRQ,
-  "wcnss_wlanrx_irq");
+   res = platform_get_resource_byname(pdev, IORESOURCE_IRQ, "rx");
if (!res) {
wcn36xx_err("failed to get rx_irq\n");
return -ENOENT;
}
wcn->rx_irq = res->start;
 
+   /* Acquire SMSM tx enable handle */
+   wcn->tx_enable_state = qcom_smem_state_get(>dev,
+   "tx-enable", >tx_enable_state_bit);
+   if (IS_ERR(wcn->tx_enable_state)) {
+   wcn36xx_err("failed to get tx-enable state\n");
+   return PTR_ERR(wcn->tx_enable_state);
+   }
+
+   /* Acquire SMSM tx rings empty handle */
+   wcn->tx_rings_empty_state = qcom_smem_state_get(>dev,
+   "tx-rings-empty", >tx_rings_empty_state_bit);
+   if (IS_ERR(wcn->tx_rings_empty_state)) {
+   wcn36xx_err("failed to get tx-rings-empty state\n");
+   return PTR_ERR(wcn->tx_rings_empty_state);
+   }
+
mmio_node = of_parse_phandle(pdev->dev.parent->of_node, "qcom,mmio", 0);
if (!mmio_node) {
wcn36xx_err("failed to acquire qcom,mmio reference\n");
@@ -1115,11 +1133,14 @@ static int wcn36xx_probe(struct platform_device *pdev)
 {
struct ieee80211_hw *hw;
struct wcn36xx *wcn;
+   void *wcnss;
int ret;
-   u8 addr[ETH_ALEN];
+   const u8 *addr;
 
wcn36xx_dbg(WCN36XX_DBG_MAC, "platform probe\n");
 
+

[PATCH v4 3/4] wcn36xx: Implement firmware assisted scan

2016-09-06 Thread Bjorn Andersson

Using the software based channel scan mechanism from mac80211 keeps us
offline for 10-15 second, we should instead issue a start_scan/end_scan
on each channel reducing this time.

Signed-off-by: Bjorn Andersson 
---

Changes since v3:
- None
Changes since v2:
- Match prototype change of ieee80211_scan_completed()

 drivers/net/wireless/ath/wcn36xx/main.c| 64 +-
 drivers/net/wireless/ath/wcn36xx/smd.c |  8 ++--
 drivers/net/wireless/ath/wcn36xx/smd.h |  4 +-
 drivers/net/wireless/ath/wcn36xx/txrx.c| 19 ++---
 drivers/net/wireless/ath/wcn36xx/wcn36xx.h |  9 +
 5 files changed, 81 insertions(+), 23 deletions(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/main.c 
b/drivers/net/wireless/ath/wcn36xx/main.c
index 3c2522b07c90..96a9584edcbb 100644
--- a/drivers/net/wireless/ath/wcn36xx/main.c
+++ b/drivers/net/wireless/ath/wcn36xx/main.c
@@ -568,23 +568,59 @@ out:
return ret;
 }
 
-static void wcn36xx_sw_scan_start(struct ieee80211_hw *hw,
- struct ieee80211_vif *vif,
- const u8 *mac_addr)
+static void wcn36xx_hw_scan_worker(struct work_struct *work)
 {
-   struct wcn36xx *wcn = hw->priv;
+   struct wcn36xx *wcn = container_of(work, struct wcn36xx, scan_work);
+   struct cfg80211_scan_request *req = wcn->scan_req;
+   u8 channels[WCN36XX_HAL_PNO_MAX_NETW_CHANNELS_EX];
+   struct cfg80211_scan_info scan_info = {};
+   int i;
+
+   wcn36xx_dbg(WCN36XX_DBG_MAC, "mac80211 scan %d channels worker\n", 
req->n_channels);
+
+   for (i = 0; i < req->n_channels; i++)
+   channels[i] = req->channels[i]->hw_value;
+
+   wcn36xx_smd_update_scan_params(wcn, channels, req->n_channels);
 
wcn36xx_smd_init_scan(wcn, HAL_SYS_MODE_SCAN);
-   wcn36xx_smd_start_scan(wcn);
+   for (i = 0; i < req->n_channels; i++) {
+   wcn->scan_freq = req->channels[i]->center_freq;
+   wcn->scan_band = req->channels[i]->band;
+
+   wcn36xx_smd_start_scan(wcn, req->channels[i]->hw_value);
+   msleep(30);
+   wcn36xx_smd_end_scan(wcn, req->channels[i]->hw_value);
+
+   wcn->scan_freq = 0;
+   }
+   wcn36xx_smd_finish_scan(wcn, HAL_SYS_MODE_SCAN);
+
+   scan_info.aborted = false;
+   ieee80211_scan_completed(wcn->hw, _info);
+
+   mutex_lock(>scan_lock);
+   wcn->scan_req = NULL;
+   mutex_unlock(>scan_lock);
 }
 
-static void wcn36xx_sw_scan_complete(struct ieee80211_hw *hw,
-struct ieee80211_vif *vif)
+static int wcn36xx_hw_scan(struct ieee80211_hw *hw,
+  struct ieee80211_vif *vif,
+  struct ieee80211_scan_request *hw_req)
 {
struct wcn36xx *wcn = hw->priv;
 
-   wcn36xx_smd_end_scan(wcn);
-   wcn36xx_smd_finish_scan(wcn, HAL_SYS_MODE_SCAN);
+   mutex_lock(>scan_lock);
+   if (wcn->scan_req) {
+   mutex_unlock(>scan_lock);
+   return -EBUSY;
+   }
+   wcn->scan_req = _req->req;
+   mutex_unlock(>scan_lock);
+
+   schedule_work(>scan_work);
+
+   return 0;
 }
 
 static void wcn36xx_update_allowed_rates(struct ieee80211_sta *sta,
@@ -997,8 +1033,7 @@ static const struct ieee80211_ops wcn36xx_ops = {
.configure_filter   = wcn36xx_configure_filter,
.tx = wcn36xx_tx,
.set_key= wcn36xx_set_key,
-   .sw_scan_start  = wcn36xx_sw_scan_start,
-   .sw_scan_complete   = wcn36xx_sw_scan_complete,
+   .hw_scan= wcn36xx_hw_scan,
.bss_info_changed   = wcn36xx_bss_info_changed,
.set_rts_threshold  = wcn36xx_set_rts_threshold,
.sta_add= wcn36xx_sta_add,
@@ -1023,6 +1058,7 @@ static int wcn36xx_init_ieee80211(struct wcn36xx *wcn)
ieee80211_hw_set(wcn->hw, SUPPORTS_PS);
ieee80211_hw_set(wcn->hw, SIGNAL_DBM);
ieee80211_hw_set(wcn->hw, HAS_RATE_CONTROL);
+   ieee80211_hw_set(wcn->hw, SINGLE_SCAN_ON_ALL_BANDS);
 
wcn->hw->wiphy->interface_modes = BIT(NL80211_IFTYPE_STATION) |
BIT(NL80211_IFTYPE_AP) |
@@ -1032,6 +1068,9 @@ static int wcn36xx_init_ieee80211(struct wcn36xx *wcn)
wcn->hw->wiphy->bands[NL80211_BAND_2GHZ] = _band_2ghz;
wcn->hw->wiphy->bands[NL80211_BAND_5GHZ] = _band_5ghz;
 
+   wcn->hw->wiphy->max_scan_ssids = WCN36XX_MAX_SCAN_SSIDS;
+   wcn->hw->wiphy->max_scan_ie_len = WCN36XX_MAX_SCAN_IE_LEN;
+
wcn->hw->wiphy->cipher_suites = cipher_suites;
wcn->hw->wiphy->n_cipher_suites = ARRAY_SIZE(cipher_suites);
 
@@ -1152,6 +1191,9 @@ static int wcn36xx_probe(struct platform_device *pdev)
wcn->hw = hw;
wcn->dev = >dev;
mutex_init(>hal_mutex);
+   mutex_init(>scan_lock);
+
+   INIT_WORK(>scan_work, wcn36xx_hw_scan_worker);

[PATCH v4 1/4] soc: qcom: wcnss_ctrl: Stub wcnss_ctrl API

2016-09-06 Thread Bjorn Andersson

Stub the wcnss_ctrl API to allow compile testing wcnss function drivers.

Cc: Marcel Holtmann 
Signed-off-by: Bjorn Andersson 
---

There are no other pending changes colliding with this, so if Andy is okay with
this it could be merged through Kalle's tree - together with the other patches.

Marcel, with this applied we can drop the depends on QCOM_SMD from the
btqcomsmd driver as well.

Changes since v3:
- Added this patch to allow compile testing without SMD support after patch 2

 include/linux/soc/qcom/wcnss_ctrl.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/include/linux/soc/qcom/wcnss_ctrl.h 
b/include/linux/soc/qcom/wcnss_ctrl.h
index a37bc5538f19..eab64976a73b 100644
--- a/include/linux/soc/qcom/wcnss_ctrl.h
+++ b/include/linux/soc/qcom/wcnss_ctrl.h
@@ -3,6 +3,19 @@
 
 #include 
 
+#if IS_ENABLED(CONFIG_QCOM_WCNSS_CTRL)
+
 struct qcom_smd_channel *qcom_wcnss_open_channel(void *wcnss, const char 
*name, qcom_smd_cb_t cb);
 
+#else
+
+static inline struct qcom_smd_channel*
+qcom_wcnss_open_channel(void *wcnss, const char *name, qcom_smd_cb_t cb)
+{
+   WARN_ON(1);
+   return ERR_PTR(-ENXIO);
+}
+
+#endif
+
 #endif
-- 
2.5.0

Re: [PATCH net-next v5 0/2] net: ethernet: mediatek: add enhancements to RX path

2016-09-06 Thread David Miller

From: 
Date: Sat, 3 Sep 2016 17:59:25 +0800

> Changes since v1:
> - fix message typos and add coverletter
>   
> Changes since v2:
> - split from the previous series for submitting add enhancements as 
> a series targeting 'net-next' and add indents before comments.
> 
> Changes since v3:
> - merge the patch using PDMA RX path
> - fixed the input of mtk_poll_rx is with the remaining budget
> 
> Changes since v4:
> - save one wmb and register update when no packet is being handled
> inside mtk_poll_rx call
> - fixed incorrect return packet count from mtk_napi_rx

Series applied.

Re: [PATCH v2] net: smsc: remove build warning of duplicate definition

2016-09-06 Thread David Miller

From: Sudip Mukherjee 
Date: Sun,  4 Sep 2016 23:02:21 +0530

> The build of m32r was giving warning:
> 
> In file included from drivers/net/ethernet/smsc/smc91x.c:92:0:
> drivers/net/ethernet/smsc/smc91x.h:448:0: warning: "SMC_inb" redefined
>  #define SMC_inb(ioaddr, reg)  ({ BUG(); 0; })
>  
> drivers/net/ethernet/smsc/smc91x.h:106:0:
>   note: this is the location of the previous definition
>  #define SMC_inb(a, r)  inb(((u32)a) + (r))
>  
> drivers/net/ethernet/smsc/smc91x.h:449:0: warning: "SMC_outb" redefined
>  #define SMC_outb(x, ioaddr, reg) BUG()
>  
> drivers/net/ethernet/smsc/smc91x.h:108:0:
>   note: this is the location of the previous definition
>  #define SMC_outb(v, a, r) outb(v, ((u32)a) + (r))
> 
> Signed-off-by: Sudip Mukherjee 

Applied.

Re: [PATCH net-next 0/9] rxrpc: Small fixes

2016-09-06 Thread David Miller

From: David Howells 
Date: Sun, 04 Sep 2016 22:02:24 +0100

> 
> Here's a set of small fix patches:
> 
>  (1) Fix some uninitialised variables.
> 
>  (2) Set the client call state before making it live by attaching it to the
>  conn struct.
> 
>  (3) Randomise the epoch and starting client conn ID values, and don't
>  change the epoch when the client conn ID rolls round.
> 
>  (4) Replace deprecated create_singlethread_workqueue() calls.
> 
> The patches can be found here also (non-terminally on the branch):
> 
>   
> http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite
> 
> Tagged thusly:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
>   rxrpc-rewrite-20160904-1

Both rxrpc-rewrite-20160904-1 and rxrpc-rewrite-20160904-2 pulled, thanks.

Re: [PATCH] tcp: cwnd does not increase in TCP YeAH

2016-09-06 Thread David Miller

From: Artem Germanov 
Date: Sun, 4 Sep 2016 21:03:27 -0700

> Commit 76174004a0f19785a328f40388e87e982bbf69b9
> (tcp: do not slow start when cwnd equals ssthresh )
> introduced regression in TCP YeAH. Using 100ms delay 1% loss virtual
> ethernet link kernel 4.2 shows bandwidth ~500KB/s for single TCP
> connection and kernel 4.3 and above (including 4.8-rc4) shows
> bandwidth ~100KB/s.
>  That is caused by stalled cwnd when cwnd equals ssthresh. This patch
>  fixes it by proper increasing cwnd in this case.
> 
> Signed-off-by: Artem Germanov 

This patch won't apply properly.

First, it isn't rooted properly.

Second, all the TAB characters have been mangled into spaces.

Please fix these problems, send a test patch to yourself, and only
resubmit this patch when you can successfully apply that test patch.

Thank you.

Re: [PATCH v2] net: Don't delete routes in different VRFs

2016-09-06 Thread David Miller

From: Mark Tomlinson 
Date: Mon,  5 Sep 2016 10:20:20 +1200

> When deleting an IP address from an interface, there is a clean-up of
> routes which refer to this local address. However, there was no check to
> see that the VRF matched. This meant that deletion wasn't confined to
> the VRF it should have been.
> 
> To solve this, a new field has been added to fib_info to hold a table
> id. When removing fib entries corresponding to a local ip address, this
> table id is also used in the comparison.
> 
> The table id is populated when the fib_info is created. This was already
> done in some places, but not in ip_rt_ioctl(). This has now been fixed.
> 
> Fixes: 021dd3b8a142 ("net: Add routes to the table associated with the 
> device")
> Acked-by: David Ahern 
> Tested-by: David Ahern 
> Signed-off-by: Mark Tomlinson 

Applied and queued up for -stable, thanks.

Re: [PATCH net-next 0/3] qed*: Debug data collection

2016-09-06 Thread David Miller

From: Tomer Tayar 
Date: Mon, 5 Sep 2016 14:35:09 +0300

> This patch series adds the support of debug data collection in the qed driver,
> and the means to extract it in the qede driver via the get_regs operation.
 ...
> Please consider applying this to 'net-next'.

There are conflicts due to another qed patch I applied to net-next,
please respin.

Thanks.

[PATCH v5 1/6] net: dt-bindings: Document the new Meson8b and GXBB DWMAC bindings

2016-09-06 Thread Martin Blumenstingl

This patch adds the documentation for the DWMAC ethernet controller
found in Amlogic Meson 8b (S805) and GXBB (S905) SoCs.
The main difference between the Meson6 glue is that different registers
(with different layout) are used.

Signed-off-by: Martin Blumenstingl 
Acked-by: Rob Herring 
Acked-by: David S. Miller 
---
 .../devicetree/bindings/net/meson-dwmac.txt| 45 ++
 1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/meson-dwmac.txt 
b/Documentation/devicetree/bindings/net/meson-dwmac.txt
index ec633d7..89e62dd 100644
--- a/Documentation/devicetree/bindings/net/meson-dwmac.txt
+++ b/Documentation/devicetree/bindings/net/meson-dwmac.txt
@@ -1,18 +1,32 @@
 * Amlogic Meson DWMAC Ethernet controller
 
 The device inherits all the properties of the dwmac/stmmac devices
-described in the file net/stmmac.txt with the following changes.
+described in the file stmmac.txt in the current directory with the
+following changes.
 
-Required properties:
+Required properties on all platforms:
 
-- compatible: should be "amlogic,meson6-dwmac" along with "snps,dwmac"
- and any applicable more detailed version number
- described in net/stmmac.txt
+- compatible:  Depending on the platform this should be one of:
+   - "amlogic,meson6-dwmac"
+   - "amlogic,meson8b-dwmac"
+   - "amlogic,meson-gxbb-dwmac"
+   Additionally "snps,dwmac" and any applicable more
+   detailed version number described in net/stmmac.txt
+   should be used.
 
-- reg: should contain a register range for the dwmac controller and
-   another one for the Amlogic specific configuration
+- reg: The first register range should be the one of the DWMAC
+   controller. The second range is is for the Amlogic specific
+   configuration (for example the PRG_ETHERNET register range
+   on Meson8b and newer)
 
-Example:
+Required properties on Meson8b and newer:
+- clock-names: Should contain the following:
+   - "stmmaceth" - see stmmac.txt
+   - "clkin0" - first parent clock of the internal mux
+   - "clkin1" - second parent clock of the internal mux
+
+
+Example for Meson6:
 
ethmac: ethernet@c941 {
compatible = "amlogic,meson6-dwmac", "snps,dwmac";
@@ -23,3 +37,18 @@ Example:
clocks = <>;
clock-names = "stmmaceth";
}
+
+Example for GXBB:
+   ethmac: ethernet@c941 {
+   compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac";
+   reg = <0x0 0xc941 0x0 0x1>,
+   <0x0 0xc8834540 0x0 0x8>;
+   interrupts = <0 8 1>;
+   interrupt-names = "macirq";
+   clocks = < CLKID_ETH>,
+   < CLKID_FCLK_DIV2>,
+   < CLKID_MPLL2>;
+   clock-names = "stmmaceth", "clkin0", "clkin1";
+   phy-mode = "rgmii";
+   status = "disabled";
+   };
-- 
2.9.3

[PATCH v5 4/6] net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC

2016-09-06 Thread Martin Blumenstingl

The Ethernet controller available in Meson8b and GXBB SoCs is a Synopsys
DesignWare MAC IP core which is already supported by the stmmac driver.

In addition to the standard stmmac driver some Meson8b / GXBB specific
registers have to be configured for the PHY clocks. These SoC specific
registers are called PRG_ETHERNET_ADDR0 and PRG_ETHERNET_ADDR1 in the
datasheet.
These registers are not backwards compatible with those on Meson 6b,
which is why a new glue driver is introduced. This worked for many
boards because the bootloader programs the PRG_ETHERNET registers
correctly. Additionally the meson6-dwmac driver only sets bit 1 of
PRG_ETHERNET_ADDR0 which (according to the datasheet) is only used
during reset.

Currently all configuration values can be determined automatically,
based on the configured phy-mode (which is mandatory for the stmmac
driver). If required the tx-delay and the mux clock (so it supports
the MPLL2 clock as well) can be made configurable in the future.

Signed-off-by: Martin Blumenstingl 
Tested-by: Kevin Hilman 
Acked-by: David S. Miller 
---
 drivers/net/ethernet/stmicro/stmmac/Kconfig|   6 +-
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   2 +-
 .../net/ethernet/stmicro/stmmac/dwmac-meson8b.c| 324 +
 3 files changed, 328 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c

diff --git a/drivers/net/ethernet/stmicro/stmmac/Kconfig 
b/drivers/net/ethernet/stmicro/stmmac/Kconfig
index 8f06a66..54de175 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Kconfig
+++ b/drivers/net/ethernet/stmicro/stmmac/Kconfig
@@ -61,13 +61,13 @@ config DWMAC_LPC18XX
 config DWMAC_MESON
tristate "Amlogic Meson dwmac support"
default ARCH_MESON
-   depends on OF && (ARCH_MESON || COMPILE_TEST)
+   depends on OF && COMMON_CLK && (ARCH_MESON || COMPILE_TEST)
help
  Support for Ethernet controller on Amlogic Meson SoCs.
 
  This selects the Amlogic Meson SoC glue layer support for
- the stmmac device driver. This driver is used for Meson6 and
- Meson8 SoCs.
+ the stmmac device driver. This driver is used for Meson6,
+ Meson8, Meson8b and GXBB SoCs.
 
 config DWMAC_ROCKCHIP
tristate "Rockchip dwmac support"
diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
b/drivers/net/ethernet/stmicro/stmmac/Makefile
index 44b630c..f77edb9 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -9,7 +9,7 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o 
ring_mode.o  \
 obj-$(CONFIG_STMMAC_PLATFORM)  += stmmac-platform.o
 obj-$(CONFIG_DWMAC_IPQ806X)+= dwmac-ipq806x.o
 obj-$(CONFIG_DWMAC_LPC18XX)+= dwmac-lpc18xx.o
-obj-$(CONFIG_DWMAC_MESON)  += dwmac-meson.o
+obj-$(CONFIG_DWMAC_MESON)  += dwmac-meson.o dwmac-meson8b.o
 obj-$(CONFIG_DWMAC_ROCKCHIP)   += dwmac-rk.o
 obj-$(CONFIG_DWMAC_SOCFPGA)+= dwmac-altr-socfpga.o
 obj-$(CONFIG_DWMAC_STI)+= dwmac-sti.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
new file mode 100644
index 000..250e4ce
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
@@ -0,0 +1,324 @@
+/*
+ * Amlogic Meson8b and GXBB DWMAC glue layer
+ *
+ * Copyright (C) 2016 Martin Blumenstingl 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "stmmac_platform.h"
+
+#define PRG_ETH0   0x0
+
+#define PRG_ETH0_RGMII_MODEBIT(0)
+
+/* mux to choose between fclk_div2 (bit unset) and mpll2 (bit set) */
+#define PRG_ETH0_CLK_M250_SEL_SHIFT4
+#define PRG_ETH0_CLK_M250_SEL_MASK GENMASK(4, 4)
+
+#define PRG_ETH0_TXDLY_SHIFT   5
+#define PRG_ETH0_TXDLY_MASKGENMASK(6, 5)
+#define PRG_ETH0_TXDLY_OFF (0x0 << PRG_ETH0_TXDLY_SHIFT)
+#define PRG_ETH0_TXDLY_QUARTER (0x1 << PRG_ETH0_TXDLY_SHIFT)
+#define PRG_ETH0_TXDLY_HALF(0x2 << PRG_ETH0_TXDLY_SHIFT)
+#define PRG_ETH0_TXDLY_THREE_QUARTERS  (0x3 << PRG_ETH0_TXDLY_SHIFT)
+
+/* divider for the result of m250_sel */
+#define PRG_ETH0_CLK_M250_DIV_SHIFT7
+#define PRG_ETH0_CLK_M250_DIV_WIDTH3
+
+/* divides the result of m25_sel by either 5 (bit unset) or 10 (bit set) */
+#define PRG_ETH0_CLK_M25_DIV_SHIFT 10
+#define PRG_ETH0_CLK_M25_DIV_WIDTH 1
+
+#define

[PATCH v5 3/6] stmmac: introduce get_stmmac_bsp_priv() helper

2016-09-06 Thread Martin Blumenstingl

From: Joachim Eastwood 

Create a helper to retrieve dwmac private data from a dev
pointer. This is useful in PM callbacks and driver remove.

Signed-off-by: Joachim Eastwood 
Tested-by: Martin Blumenstingl 
Acked-by: David S. Miller 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h
index ffeb8d9..64e147f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h
@@ -30,4 +30,12 @@ int stmmac_get_platform_resources(struct platform_device 
*pdev,
 int stmmac_pltfr_remove(struct platform_device *pdev);
 extern const struct dev_pm_ops stmmac_pltfr_pm_ops;
 
+static inline void *get_stmmac_bsp_priv(struct device *dev)
+{
+   struct net_device *ndev = dev_get_drvdata(dev);
+   struct stmmac_priv *priv = netdev_priv(ndev);
+
+   return priv->plat->bsp_priv;
+}
+
 #endif /* __STMMAC_PLATFORM_H__ */
-- 
2.9.3

[PATCH v5 0/6] meson: Meson8b and GXBB DWMAC glue driver

2016-09-06 Thread Martin Blumenstingl

This adds a DWMAC glue driver for the PRG_ETHERNET registers found in
Meson8b and GXBB SoCs. Based on the "old" meson6b-dwmac glue driver
the register layout is completely different.
Thus I introduced a separate driver.


Changes since v4:
- DWMAC_MESON now depends on COMMON_CLK because the new glue driver is
  also a clock provider (which requires COMMON_CLK)
- use Meson8b and GXBB in the module description (instead of the
  marketing names S805 and S905)
- fixed a trivial typo (retrive -> retrieve) in the
  get_stmmac_bsp_priv() helper patch
- added a new patch to update the module description of the dwmac-meson
  driver to indicate which SoCs are supported exactly (this patch is
  optional and does not affect the rest of the series)


Joachim Eastwood (1):
  stmmac: introduce get_stmmac_bsp_priv() helper

Martin Blumenstingl (5):
  net: dt-bindings: Document the new Meson8b and GXBB DWMAC bindings
  clk: gxbb: expose MPLL2 clock for use by DT
  net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC
  ARM64: dts: meson-gxbb: use the new GXBB DWMAC glue driver
  net: stmmac: update the module description of the dwmac-meson driver

 .../devicetree/bindings/net/meson-dwmac.txt|  45 ++-
 arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi|   8 +-
 drivers/clk/meson/gxbb.h   |   2 +-
 drivers/net/ethernet/stmicro/stmmac/Kconfig|   6 +-
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   2 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac-meson.c  |   4 +-
 .../net/ethernet/stmicro/stmmac/dwmac-meson8b.c| 324 +
 .../net/ethernet/stmicro/stmmac/stmmac_platform.h  |   8 +
 include/dt-bindings/clock/gxbb-clkc.h  |   1 +
 9 files changed, 382 insertions(+), 18 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c

-- 
2.9.3

[PATCH v5 5/6] ARM64: dts: meson-gxbb: use the new GXBB DWMAC glue driver

2016-09-06 Thread Martin Blumenstingl

The Amlogic reference driver uses the "mc_val" devicetree property to
configure the PRG_ETHERNET_ADDR0 register. Unfortunately it uses magic
values for this configuration.
According to the datasheet the PRG_ETHERNET_ADDR0 register is at address
0xc8834108. However, the reference driver uses 0xc8834540 instead.
According to my tests, the value from the reference driver is correct.

No changes are required to the board dts files because the only
required configuration option is the phy-mode, which had to be
configured correctly before as well.

Signed-off-by: Martin Blumenstingl 
---
 arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi 
b/arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi
index 2b47415..2e8a3d9 100644
--- a/arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi
@@ -497,13 +497,15 @@
};
 
ethmac: ethernet@c941 {
-   compatible = "amlogic,meson6-dwmac", "snps,dwmac";
+   compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac";
reg = <0x0 0xc941 0x0 0x1
   0x0 0xc8834540 0x0 0x4>;
interrupts = <0 8 1>;
interrupt-names = "macirq";
-   clocks = < CLKID_ETH>;
-   clock-names = "stmmaceth";
+   clocks = < CLKID_ETH>,
+< CLKID_FCLK_DIV2>,
+< CLKID_MPLL2>;
+   clock-names = "stmmaceth", "clkin0", "clkin1";
phy-mode = "rgmii";
status = "disabled";
};
-- 
2.9.3

Re: [PATCH] vmxnet3: mark vmxnet3_rq_destroy_all_rxdataring() static

2016-09-06 Thread Arnd Bergmann

On Tuesday, September 6, 2016 4:11:59 PM CEST Baoyou Xie wrote:
> We get 1 warning when building kernel with W=1:
> drivers/net/vmxnet3/vmxnet3_drv.c:1643:1: warning: no previous prototype for 
> 'vmxnet3_rq_destroy_all_rxdataring' [-Wmissing-prototypes]
> 
> In fact, this function is only used in the file in which it is
> declared and don't need a declaration, but can be made static.
> so this patch marks this function with 'static'.
> 
> Signed-off-by: Baoyou Xie 
> 

Acked-by: Arnd Bergmann

[PATCH] bonding: Prevent deletion of a bond, or the last slave from a bond, with active usage.

2016-09-06 Thread Kaur, Jasminder

From: "Kaur, Jasminder" 

If a bond is in use such as with IP address configured, removing it
can result in application disruptions. If bond is used for cluster
communication or network file system interfaces, removing it can cause
system down time.

An additional write option “?-” is added to sysfs bond interfaces as
below, in order to prevent accidental deletions while bond is in use.
In the absence of any usage, the below option proceeds with bond deletion.
“ echo "?-bondX" > /sys/class/net/bonding_masters “ .
If usage is detected such as an IP address configured, deletion is
prevented with appropriate message logged to syslog.

In the absence of any usage, the below option proceeds with deletion of
slaves from a bond.
“ echo "?-enoX" > /sys/class/net/bondX/bonding/slaves “ .
If usage is detected such as an IP address configured on bond, deletion
is prevented if the last slave is being removed from bond.
An appropriate message is logged to syslog.

Signed-off-by: Jasminder Kaur 
---
 drivers/net/bonding/bond_options.c | 24 ++--
 drivers/net/bonding/bond_sysfs.c   | 35 +--
 2 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/drivers/net/bonding/bond_options.c 
b/drivers/net/bonding/bond_options.c
index 577e57c..e7640ea 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -1335,9 +1335,15 @@ static int bond_option_slaves_set(struct bonding *bond,
struct net_device *dev;
char *ifname;
int ret;
+   struct in_device *in_dev;
 
sscanf(newval->string, "%16s", command); /* IFNAMSIZ*/
-   ifname = command + 1;
+
+   if ((command[0] == '?') && (command[1] == '-'))
+   ifname = command + 2;
+   else
+   ifname = command + 1;
+
if ((strlen(command) <= 1) ||
!dev_valid_name(ifname))
goto err_no_cmd;
@@ -1356,6 +1362,20 @@ static int bond_option_slaves_set(struct bonding *bond,
ret = bond_enslave(bond->dev, dev);
break;
 
+   case '?':
+   if (command[1] == '-') {
+   in_dev = __in_dev_get_rtnl(bond->dev);
+   if ((bond->slave_cnt == 1) &&
+   ((in_dev->ifa_list) != NULL)) {
+   netdev_info(bond->dev, "attempt to remove last 
slave %s from bond.\n",
+   dev->name);
+   ret = -EBUSY;
+   break;
+   }
+   } else {
+   goto err_no_cmd;
+   }
+
case '-':
netdev_info(bond->dev, "Removing slave %s\n", dev->name);
ret = bond_release(bond->dev, dev);
@@ -1369,7 +1389,7 @@ out:
return ret;
 
 err_no_cmd:
-   netdev_err(bond->dev, "no command found in slaves file - use +ifname or 
-ifname\n");
+   netdev_err(bond->dev, "no command found in slaves file - use +ifname or 
-ifname or ?-ifname\n");
ret = -EPERM;
goto out;
 }
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index e23c3ed..7c2ef64 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -102,7 +102,12 @@ static ssize_t bonding_store_bonds(struct class *cls,
int rv, res = count;
 
sscanf(buffer, "%16s", command); /* IFNAMSIZ*/
-   ifname = command + 1;
+
+   if ((command[0] == '?') && (command[1] == '-'))
+   ifname = command + 2;
+   else
+   ifname = command + 1;
+
if ((strlen(command) <= 1) ||
!dev_valid_name(ifname))
goto err_no_cmd;
@@ -130,6 +135,32 @@ static ssize_t bonding_store_bonds(struct class *cls,
res = -ENODEV;
}
rtnl_unlock();
+   } else if ((command[0] == '?') && (command[1] == '-')) {
+   struct net_device *bond_dev;
+
+   rtnl_lock();
+   bond_dev = bond_get_by_name(bn, ifname);
+
+   if (bond_dev) {
+   struct in_device *in_dev;
+   struct bonding *bond = netdev_priv(bond_dev);
+
+   in_dev = __in_dev_get_rtnl(bond_dev);
+
+   if (((in_dev->ifa_list) != NULL) &&
+   (bond->slave_cnt > 0)) {
+   pr_err("%s is in use. Unconfigure IP %pI4 
before deletion.\n",
+  ifname, _dev->ifa_list->ifa_local);
+   rtnl_unlock();
+   return -EBUSY;
+   }
+   pr_info("%s is being deleted...\n", ifname);
+   unregister_netdevice(bond_dev);
+   } else {
+   pr_err("unable to delete

Re: [PATCH v4 4/5] net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC

2016-09-06 Thread Arnd Bergmann

On Monday, September 5, 2016 9:07:03 PM CEST Martin Blumenstingl wrote:
> On Mon, Sep 5, 2016 at 12:53 PM, Arnd Bergmann  wrote:
> > On Monday, September 5, 2016 9:37:29 AM CEST kbuild test robot wrote:
> >> All error/warnings (new ones prefixed by >>):
> >>
> >> >> drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:63:18: error: field 
> >> >> 'm250_mux' has incomplete type
> >>  struct clk_mux  m250_mux;
> >>  ^
> >> >> drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:67:21: error: field 
> >> >> 'm250_div' has incomplete type
> >>  struct clk_divider m250_div;
> >> ^
> >>
> >
> > I think this needs a compile-time dependency on COMMON_CLK
> indeed, since we are also a clock provider we have to depend on
> CONFIG_COMMON_CLK.
> 
> That brings up a question though:
> so far the new driver uses the same Kconfig symbol as the "old" driver
> (CONFIG_DWMAC_MESON).
> The "old" driver does not need CONFIG_COMMON_CLK while the new one does.
> I see a few options here:
> 1. simply adding the dependency (as most configurations will have
> CONFIG_COMMON_CLK enabled anyways)

I think that's fine. At least on both ARM multiplatform and ARM64 it is
always defined by definition, and when build testing, it should be possible
to enable it on other architectures as well.

> 2. add some depends on COMMON_CLK || MACH_MESON6 || MACH_MESON8 foo

That doesn't work unless you also put the calls into the clk interface
inside of ugly #ifdef

> 3. use a new Kconfig symbol for new new driver (CONFIG_DWMAC_MESON8B?)

That would be ok as well, probably not necessary.

> And finally regarding your other mail: I have already changed
> WARN_ON(PTR_ERR_OR_ZERO(...)) to WARN_ON(IS_ERR(...)) in v4

Ok, thanks.

Arnd

Re: [PATCH iproute2] ip route: check ftell, fseek return value

2016-09-06 Thread Phil Sutter

On Tue, Sep 06, 2016 at 02:39:50PM +0800, Hangbin Liu wrote:
> ftell() may return -1 in error case, which is not handled and therefore pass a
> negative offset to fseek(). The return code of fseek() is also not checked.
> 
> Reported-by: Phil Sutter 
> Signed-off-by: Hangbin Liu 

Acked-by: Phil Sutter

Re: [PATCH v4 nf] netfilter: seqadj: Drop the packet directly when fail to add seqadj extension to avoid dereference NULL pointer later

2016-09-06 Thread Pablo Neira Ayuso

On Tue, Sep 06, 2016 at 09:57:23AM +0800, f...@ikuai8.com wrote:
> From: Gao Feng 
> 
> When memory is exhausted, nfct_seqadj_ext_add may fail to add the seqadj
> extension. But the function nf_ct_seqadj_init doesn't check if get valid
> seqadj pointer by the nfct_seqadj.
> 
> Now drop the packet directly when fail to add seqadj extension to avoid
> dereference NULL pointer in nf_ct_seqadj_init.
> 
> Signed-off-by: Gao Feng 
> ---
>  v4: Drop the packet directly when fail to add seqadj extension;
>  v3: Remove the warning log when seqadj is null;
>  v2: Remove the unnessary seqadj check in nf_ct_seq_adjust
>  v1: Initial patch
> 
>  net/netfilter/nf_conntrack_core.c | 6 +-
>  net/netfilter/nf_nat_core.c   | 3 ++-
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/nf_conntrack_core.c 
> b/net/netfilter/nf_conntrack_core.c
> index dd2c43a..dfa76ce 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -1036,7 +1036,11 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
>   return (struct nf_conntrack_tuple_hash *)ct;
>  
>   if (tmpl && nfct_synproxy(tmpl)) {
> - nfct_seqadj_ext_add(ct);
> + if (!nfct_seqadj_ext_add(ct)) {
> + nf_conntrack_free(ct);
> + pr_debug("Can't add seqadj extension\n");
> + return NULL;
> + }
>   nfct_synproxy_ext_add(ct);

I think this is part of the same logical change, ie. nf_ct_ext_add()
returns NULL, then I would also fix nfct_synproxy_ext_add() in this
go.

>   }
>  
> diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
> index de31818..b82282a 100644
> --- a/net/netfilter/nf_nat_core.c
> +++ b/net/netfilter/nf_nat_core.c
> @@ -441,7 +441,8 @@ nf_nat_setup_info(struct nf_conn *ct,
>   ct->status |= IPS_DST_NAT;
>  
>   if (nfct_help(ct))
> - nfct_seqadj_ext_add(ct);
> + if (!nfct_seqadj_ext_add(ct))
> + return NF_DROP;

ctnetlink may have created a conntrack with seqadj in place by when we
call nf_nat_setup_info() so NF_ACCEPT would be more conservative, eg.
via conntrackd state synchronization.

Actually, after a quick look at ctnetlink, I don't see any any call to
nfct_seqadj_ext_add() from there, so I suspect this is broken since
SYNPROXY was introduced. It would be great if you can review this and
send us patches to fix this, if indeed needed.

Thanks!

Re: [PATCH net-next] netlink: don't forget to release a rhashtable_iter structure

2016-09-06 Thread David Miller

From: Andrei Vagin 
Date: Tue,  6 Sep 2016 11:23:39 -0700

> This bug was detected by kmemleak:
> unreferenced object 0x8804269cc3c0 (size 64):
>   comm "criu", pid 1042, jiffies 4294907360 (age 13.713s)
>   hex dump (first 32 bytes):
> a0 32 cc 2c 04 88 ff ff 00 00 00 00 00 00 00 00  .2.,
> 00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  
>   backtrace:
> [] kmemleak_alloc+0x4a/0xa0
> [] kmem_cache_alloc_trace+0x10f/0x280
> [] __netlink_diag_dump+0x26c/0x290 [netlink_diag]
> 
> Cc: Herbert Xu 
> Fixes: ad202074320c ("netlink: Use rhashtable walk interface in diag dump")
> Signed-off-by: Andrei Vagin 

Hmmm, why isn't this handled by netlink_diag_dump_done()?

It seems like the intent is to have the hashtable iter be cached
across multiple __netlink_diag_dump() calls within a single
netlink_diag_dump invocation.

Re: vlan aware bridge doesn't propagate mac changes to vlans on top of it

2016-09-06 Thread Toshiaki Makita

On 2016/09/07 6:59, Michal Soltys wrote:
> Consider following scenario:
> 
> - create vlan aware bridge (say br0)
> - setup br0's vlans, e.g.
> 
> bridge vlan add dev br0 vid 10 self
> 
> This will add necessary fdb entries directing appropriate traffic to the
> bridge itself.
> 
> - create appropriate vlan interfaces on top of it, for example:
> 
> ip li add link br0 name br0.10 type vlan id 10
> ip add add 10.0.0.1/8 dev br0.10 
> 
> This will add vlan devices on top of br0 and *inherit br0's mac address*.
> 
> - now after all of the above is done
> 
> ip li set eth0 master br0
> 
> This will attach interface eth0 to the bridge. With this being the first
> interface attached, br0 will take it's mac address as its own. Any
> further changes to br0's ports may cause the same, with the lowest mac
> address of some port becoming br0's mac.
> 
> This will update fdb entries as well, but all vlan interfaces on top of
> br0 (e.g. br0.10) will be using old mac address from the time when vlan
> was created.
> 
> The side effect of it is that any traffic addressed to such interface
> will be flooded to all ports (and br0 itself).
> 
> The only workaround I found is to either manually update mac addresses
> with 'ip' or recreate vlans (bridge fdb refused to update relevant entries).
> 
> But if br0's mac changes due to some port changes - shouldn't it be
> somehow propagated automatically to vlans created on top of it ?

This should have been addressed at least in kernel 4.7...
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=308453aa9156a3b8ee382c0949befb507a32b0c1

Which kernel version do you use?

-- 
Toshiaki Makita

Re: [PATCH 0/2] lan78xx: Remove trailing underscores from macros

2016-09-06 Thread Joe Perches

On Tue, 2016-09-06 at 23:19 +, woojung@microchip.com wrote:
> > Joe Perches (2):
> >   lan78xx: Remove locally defined trailing underscores from defines and uses
> >   microchipphy.h and uses: Remove trailing underscores from defines and
> > uses
> > 
> >  drivers/net/phy/microchip.c  |4 +-
> >  drivers/net/usb/lan78xx.c|  368 +++
> >  drivers/net/usb/lan78xx.h| 1068 +-
> > 
> >  include/linux/microchipphy.h |   72 +--
> >  4 files changed, 756 insertions(+), 756 deletions(-)
> 
> 
> Because there is no specific rule how to name defines, I'm not sure it is 
> worth to change 1000+ lines.
> It may be better to set guideline for new submissions.
> 
> Welcome any comments.

Generally, more conforming to norms is better.
These FOO_ uses are non-conforming.

Is there anything other than a one-time cost
to apply these?  Is the same code used for
other platforms?

[PATCH net-next 2/2] tcp: put a TLV list of TCP stats in error queue

2016-09-06 Thread Francis Y. Yan

To export useful TCP statistics along with timestamps, such as
rwnd-limited time and min RTT, we enqueue a TLV list in error queue
immediately when a timestamp is generated.

Specifically, if user space requests SOF_TIMESTAMPING_TX_* timestamps
and sets SOF_TIMESTAMPING_OPT_STATS, the kernel will create a list of
TLVs (struct nlattr) containing all the statistics and store the list
in payload of the skb that is going to be enqueued into error queue.
Notice that SOF_TIMESTAMPING_OPT_STATS can only be set together with
SOF_TIMESTAMPING_OPT_TSONLY.

In addition, if the application in user space also enables receiving
timestamp (e.g. by SOF_TIMESTAMPING_SOFTWARE), calling recvfrom() on
error queue will return one more control message with a cmsg_type of
SCM_OPT_STATS containing the list of TLVs in its cmsg_data.

Signed-off-by: Francis Y. Yan 
Signed-off-by: Yuchung Cheng 
---
 arch/alpha/include/uapi/asm/socket.h   |  2 ++
 arch/avr32/include/uapi/asm/socket.h   |  2 ++
 arch/frv/include/uapi/asm/socket.h |  2 ++
 arch/ia64/include/uapi/asm/socket.h|  2 ++
 arch/m32r/include/uapi/asm/socket.h|  2 ++
 arch/mips/include/uapi/asm/socket.h|  2 ++
 arch/mn10300/include/uapi/asm/socket.h |  2 ++
 arch/parisc/include/uapi/asm/socket.h  |  2 ++
 arch/powerpc/include/uapi/asm/socket.h |  2 ++
 arch/s390/include/uapi/asm/socket.h|  2 ++
 arch/sparc/include/uapi/asm/socket.h   |  2 ++
 arch/xtensa/include/uapi/asm/socket.h  |  2 ++
 include/linux/tcp.h|  3 +++
 include/uapi/asm-generic/socket.h  |  2 ++
 include/uapi/linux/net_tstamp.h|  3 ++-
 include/uapi/linux/tcp.h   |  7 +++
 net/core/skbuff.c  | 10 --
 net/core/sock.c|  7 +++
 net/ipv4/tcp.c | 23 +++
 net/socket.c   |  7 ++-
 20 files changed, 82 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/socket.h 
b/arch/alpha/include/uapi/asm/socket.h
index 9e46d6e..c9f30508b 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SCM_OPT_STATS  54
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/avr32/include/uapi/asm/socket.h 
b/arch/avr32/include/uapi/asm/socket.h
index 1fd147f..e937d3f 100644
--- a/arch/avr32/include/uapi/asm/socket.h
+++ b/arch/avr32/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SCM_OPT_STATS  54
+
 #endif /* _UAPI__ASM_AVR32_SOCKET_H */
diff --git a/arch/frv/include/uapi/asm/socket.h 
b/arch/frv/include/uapi/asm/socket.h
index afbc98f0..f17c124 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -90,5 +90,7 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SCM_OPT_STATS  54
+
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h 
b/arch/ia64/include/uapi/asm/socket.h
index 0018fad..bdd72e2 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -99,4 +99,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SCM_OPT_STATS  54
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h 
b/arch/m32r/include/uapi/asm/socket.h
index 5fe42fc..217e311 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SCM_OPT_STATS  54
+
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h 
b/arch/mips/include/uapi/asm/socket.h
index 2027240a..76ea723 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -108,4 +108,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SCM_OPT_STATS  54
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h 
b/arch/mn10300/include/uapi/asm/socket.h
index 5129f23..a4c0295 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SCM_OPT_STATS  54
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h 
b/arch/parisc/include/uapi/asm/socket.h
index 9c935d7..05ef7c0 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -89,4 +89,6 @@
 
 #define SO_CNX_ADVICE  0x402E
 
+#define SCM_OPT_STATS  0x402F
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/powerpc/include/uapi/asm/socket.h 
b/arch/powerpc/include/uapi/asm/socket.h
index 1672e33..fe2dea1 100644
--- a/arch/powerpc/include/uapi/asm/socket.h
+++ b/arch/powerpc/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SCM_OPT_STATS  54
+
 #endif /* _ASM_POWERPC_SOCKET_H

[PATCH net-next 1/2] tcp: measure rwnd-limited time

2016-09-06 Thread Francis Y. Yan

This patch measures the total time when TCP transmission is limited
by receiver's advertised window (rwnd), and exports it in tcp_info as
tcpi_rwnd_limited.

The rwnd-limited time is defined as the period when the next segment
to send by TCP cannot fit into rwnd. To measure it, we record the last
timestamp when limited by rwnd (rwnd_limited_ts) and the total
rwnd-limited time (rwnd_limited) in tcp_sock.

Then we export the total rwnd-limited time so far in tcp_info, where
by so far, we mean that if TCP transmission is still being limited by
rwnd, the time interval since rwnd_limited_ts needs to be counted as
well; otherwise, we simply export rwnd_limited.

It is worth noting that we also have to add a new sequence counter
(seqcnt) in tcp_sock to carefully handle tcp_info's reading of
rwnd_limited_ts and rwnd_limited in order to get a consistent snapshot
of both variables together.

Signed-off-by: Francis Y. Yan 
Signed-off-by: Yuchung Cheng 
---
 include/linux/tcp.h  |  5 +
 include/uapi/linux/tcp.h |  1 +
 net/ipv4/tcp.c   |  9 -
 net/ipv4/tcp_output.c| 39 ++-
 4 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 7be9b12..f5b588e 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -176,6 +176,7 @@ struct tcp_sock {
 * were acked.
 */
struct u64_stats_sync syncp; /* protects 64bit vars (cf tcp_get_info()) 
*/
+   seqcount_t seqcnt;  /* proctects rwnd-limited-related vars, etc. */
 
u32 snd_una;/* First byte we want an ack for*/
u32 snd_sml;/* Last byte of the most recently transmitted 
small packet */
@@ -204,6 +205,8 @@ struct tcp_sock {
 
u32 window_clamp;   /* Maximal window to advertise  */
u32 rcv_ssthresh;   /* Current window clamp */
+   struct skb_mstamp rwnd_limited_ts; /* Last timestamp limited by rwnd */
+   u64 rwnd_limited;   /* Total time (us) limited by rwnd */
 
/* Information of the most recently (s)acked skb */
struct tcp_rack {
@@ -422,4 +425,6 @@ static inline void tcp_saved_syn_free(struct tcp_sock *tp)
tp->saved_syn = NULL;
 }
 
+u32 tcp_rwnd_limited_delta(const struct tcp_sock *tp);
+
 #endif /* _LINUX_TCP_H */
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 482898f..f1e2de4 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -211,6 +211,7 @@ struct tcp_info {
__u32   tcpi_min_rtt;
__u32   tcpi_data_segs_in;  /* RFC4898 tcpEStatsDataSegsIn */
__u32   tcpi_data_segs_out; /* RFC4898 tcpEStatsDataSegsOut */
+   __u64   tcpi_rwnd_limited;  /* total time (us) limited by rwnd */
 };
 
 /* for TCP_MD5SIG socket option */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 77311a9..ed77f2c 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -380,6 +380,7 @@ void tcp_init_sock(struct sock *sk)
struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
 
+   seqcount_init(>seqcnt);
__skb_queue_head_init(>out_of_order_queue);
tcp_init_xmit_timers(sk);
tcp_prequeue_init(tp);
@@ -2690,7 +2691,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
u32 now = tcp_time_stamp;
unsigned int start;
int notsent_bytes;
-   u64 rate64;
+   u64 rate64, rwnd_limited;
u32 rate;
 
memset(info, 0, sizeof(*info));
@@ -2777,6 +2778,12 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
info->tcpi_min_rtt = tcp_min_rtt(tp);
info->tcpi_data_segs_in = tp->data_segs_in;
info->tcpi_data_segs_out = tp->data_segs_out;
+
+   do {
+   start = read_seqcount_begin(>seqcnt);
+   rwnd_limited = tp->rwnd_limited + tcp_rwnd_limited_delta(tp);
+   } while (read_seqcount_retry(>seqcnt, start));
+   put_unaligned(rwnd_limited, >tcpi_rwnd_limited);
 }
 EXPORT_SYMBOL_GPL(tcp_get_info);
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8b45794..dab0883 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2020,6 +2020,39 @@ static int tcp_mtu_probe(struct sock *sk)
return -1;
 }
 
+u32 tcp_rwnd_limited_delta(const struct tcp_sock *tp)
+{
+   if (tp->rwnd_limited_ts.v64) {
+   struct skb_mstamp now;
+
+   skb_mstamp_get();
+   return skb_mstamp_us_delta(, >rwnd_limited_ts);
+   }
+
+   return 0;
+}
+
+static void tcp_start_rwnd_limited(struct tcp_sock *tp)
+{
+   if (!tp->rwnd_limited_ts.v64) {
+   write_seqcount_begin(>seqcnt);
+   skb_mstamp_get(>rwnd_limited_ts);
+   write_seqcount_end(>seqcnt);
+   }
+}
+
+static void tcp_stop_rwnd_limited(struct

Re: [PATCH v4 1/4] soc: qcom: wcnss_ctrl: Stub wcnss_ctrl API

2016-09-06 Thread Andy Gross

On Tue, Sep 06, 2016 at 03:18:29PM -0700, Bjorn Andersson wrote:
> Stub the wcnss_ctrl API to allow compile testing wcnss function drivers.
> 
> Cc: Marcel Holtmann 
> Signed-off-by: Bjorn Andersson 
> ---
> 
> There are no other pending changes colliding with this, so if Andy is okay 
> with
> this it could be merged through Kalle's tree - together with the other 
> patches.
> 
> Marcel, with this applied we can drop the depends on QCOM_SMD from the
> btqcomsmd driver as well.
> 
> Changes since v3:
> - Added this patch to allow compile testing without SMD support after patch 2
> 
>  include/linux/soc/qcom/wcnss_ctrl.h | 13 +
>  1 file changed, 13 insertions(+)
> 

This is fine.

Acked-by: Andy Gross

Re: [PATCH] net: hns: declare function as static

2016-09-06 Thread Kefeng Wang



On 2016/9/7 7:20, David Miller wrote:
> From: Kefeng Wang 
> Date: Tue, 6 Sep 2016 19:53:11 +0800
> 
>> Declare function as static to kill warning about missing-prototypes.
>>
>> Cc: Yisen Zhuang 
>> Cc: Kejian Yan 
>> Signed-off-by: Kefeng Wang 
>  ...
>> @@ -73,7 +73,7 @@ static struct ring_pair_cb *hns_ae_get_ring_pair(struct 
>> hnae_queue *q)
>>  return container_of(q, struct ring_pair_cb, q);
>>  }
>>  
>> -struct hnae_handle *hns_ae_get_handle(struct hnae_ae_dev *dev,
>> +static struct hnae_handle *hns_ae_get_handle(struct hnae_ae_dev *dev,
>>u32 port_id)
> 
> You have to adjust the indentation of the arguments on the following lines
> if you change where the openning parenthesis is.  They must start exactly
> at the column folling that openning parenthesis.

Hi David,

Talked with Yisen, please ignore it for now, will resend it after some hns/roce 
functional changes.

Thanks,
Kefeng



> 
> 
> .
>

RE: [PATCH 0/2] lan78xx: Remove trailing underscores from macros

2016-09-06 Thread Ronnie.Kunin

Microchip's internal convention is for register (offset) definitions to be 
capitalized (i.e.: MY_REGISTER). Our convention for bits (position) definitions 
within a register is to carry as a prefix the name of the register and suffix 
it with the bit name and adding a trailing underscore (i.e. 
MY_REGISTER_MY_BIT_). The trailing underscore is what easily lets us 
distinguish a bit from a register definition when reading code. We have been 
using this convention for many years and has worked very well for us across all 
projects (by now hundreds).

>Is there anything other than a one-time cost
>to apply these?  Is the same code used for
>other platforms?
Yes, a single header file with the definition of registers and bits is shared 
(either as a standalone file or with its contents pasted into a native 
environment "carrier" header file) across all drivers (and other non driver 
software projects as well) for the same device. So a change like this indeed 
has a high cost for Microchip and we'd rather not do this unless it is an 
absolutely mandated requirement. 

Thanks,
Ronnie

From: Joe Perches [j...@perches.com]
Sent: Tuesday, September 06, 2016 9:18 PM
To: Woojung Huh - C21699; netdev@vger.kernel.org; linux-...@vger.kernel.org
Cc: f.faine...@gmail.com; UNGLinuxDriver; linux-ker...@vger.kernel.org
Subject: Re: [PATCH 0/2] lan78xx: Remove trailing underscores from macros

On Tue, 2016-09-06 at 23:19 +, woojung@microchip.com wrote:
> > Joe Perches (2):
> >   lan78xx: Remove locally defined trailing underscores from defines and uses
> >   microchipphy.h and uses: Remove trailing underscores from defines and
> > uses
> >
> >  drivers/net/phy/microchip.c  |4 +-
> >  drivers/net/usb/lan78xx.c|  368 +++
> >  drivers/net/usb/lan78xx.h| 1068 +-
> > 
> >  include/linux/microchipphy.h |   72 +--
> >  4 files changed, 756 insertions(+), 756 deletions(-)
>
>
> Because there is no specific rule how to name defines, I'm not sure it is 
> worth to change 1000+ lines.
> It may be better to set guideline for new submissions.
>
> Welcome any comments.

Generally, more conforming to norms is better.
These FOO_ uses are non-conforming.

Is there anything other than a one-time cost
to apply these?  Is the same code used for
other platforms?

[PATCH net-next] MAINTAINERS: Update CPMAC email address

2016-09-06 Thread Florian Fainelli

Signed-off-by: Florian Fainelli 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 9a1783547baf..5ec858e61ddb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3277,7 +3277,7 @@ S:Maintained
 F: drivers/net/wan/cosa*
 
 CPMAC ETHERNET DRIVER
-M: Florian Fainelli 
+M: Florian Fainelli 
 L: netdev@vger.kernel.org
 S: Maintained
 F: drivers/net/ethernet/ti/cpmac.c
-- 
2.7.4

Re: [net:master 29/33] drivers/net/ethernet/cadence/macb.c:1385:2-8: preceding lock on line 1372

2016-09-06 Thread Julia Lawall

The lock acquired on line 1372 is still held when leaving the function 
at line 1385.  Is this intentional?


thanks,
julia



Le 07.09.2016 11:11, kbuild test robot a écrit :

CC: kbuild-...@01.org
CC: netdev@vger.kernel.org
TO: Helmut Buchsbaum 

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 
master

head:   751eb6b6042a596b0080967c1a529a9fe98dac1d
commit: 007e4ba3ee137f4700f39aa6dbaf01a71047c5f6 [29/33] net: macb:
initialize checksum when using checksum offloading
:: branch date: 6 hours ago
:: commit date: 6 hours ago

drivers/net/ethernet/cadence/macb.c:1385:2-8: preceding lock on line 
1372


git remote add net 
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git

git remote update net
git checkout 007e4ba3ee137f4700f39aa6dbaf01a71047c5f6
vim +1385 drivers/net/ethernet/cadence/macb.c

a4c35ed3f drivers/net/ethernet/cadence/macb.c Cyrille Pitchen
2014-07-24  1366nr_frags = skb_shinfo(skb)->nr_frags;
a4c35ed3f drivers/net/ethernet/cadence/macb.c Cyrille Pitchen
2014-07-24  1367for (f = 0; f < nr_frags; f++) {
a4c35ed3f drivers/net/ethernet/cadence/macb.c Cyrille Pitchen
2014-07-24  1368frag_size =
skb_frag_size(_shinfo(skb)->frags[f]);
94b295edc drivers/net/ethernet/cadence/macb.c Andy Shevchenko
2015-07-24  1369count += DIV_ROUND_UP(frag_size,
bp->max_tx_length);
a4c35ed3f drivers/net/ethernet/cadence/macb.c Cyrille Pitchen
2014-07-24  1370}
a4c35ed3f drivers/net/ethernet/cadence/macb.c Cyrille Pitchen
2014-07-24  1371
4871953c0 drivers/net/macb.c  Dongdong Deng
2009-08-23 @1372spin_lock_irqsave(>lock, flags);
89e5785fc drivers/net/macb.c  Haavard Skinnemoen
2006-11-09  1373
89e5785fc drivers/net/macb.c  Haavard Skinnemoen
2006-11-09  1374/* This is a hard error, log it. */
02c958dd3 drivers/net/ethernet/cadence/macb.c Cyrille Pitchen
2014-12-12  1375if (CIRC_SPACE(queue->tx_head, queue->tx_tail,
TX_RING_SIZE) < count) {
02c958dd3 drivers/net/ethernet/cadence/macb.c Cyrille Pitchen
2014-12-12  1376netif_stop_subqueue(dev, queue_index);
4871953c0 drivers/net/macb.c  Dongdong Deng
2009-08-23  1377spin_unlock_irqrestore(>lock, flags);
c220f8cd0 drivers/net/ethernet/cadence/macb.c Jamie Iles
2011-03-08  1378netdev_dbg(bp->dev, "tx_head = %u, tx_tail =
%u\n",
02c958dd3 drivers/net/ethernet/cadence/macb.c Cyrille Pitchen
2014-12-12  1379   queue->tx_head, queue->tx_tail);
5b5481402 drivers/net/macb.c  Patrick McHardy
2009-06-12  1380return NETDEV_TX_BUSY;
89e5785fc drivers/net/macb.c  Haavard Skinnemoen
2006-11-09  1381}
89e5785fc drivers/net/macb.c  Haavard Skinnemoen
2006-11-09  1382
007e4ba3e drivers/net/ethernet/cadence/macb.c Helmut Buchsbaum
2016-09-04  1383if (macb_clear_csum(skb)) {
007e4ba3e drivers/net/ethernet/cadence/macb.c Helmut Buchsbaum
2016-09-04  1384dev_kfree_skb_any(skb);
007e4ba3e drivers/net/ethernet/cadence/macb.c Helmut Buchsbaum
2016-09-04 @1385return NETDEV_TX_OK;
007e4ba3e drivers/net/ethernet/cadence/macb.c Helmut Buchsbaum
2016-09-04  1386}
007e4ba3e drivers/net/ethernet/cadence/macb.c Helmut Buchsbaum
2016-09-04  1387
a4c35ed3f drivers/net/ethernet/cadence/macb.c Cyrille Pitchen
2014-07-24  1388/* Map socket buffer for DMA transfer */

---
0-DAY kernel test infrastructureOpen Source Technology 
Center
https://lists.01.org/pipermail/kbuild-all   Intel 
Corporation

[PATCH net-next v2] netlink: don't forget to release a rhashtable_iter structure

2016-09-06 Thread Andrei Vagin

This bug was detected by kmemleak:
unreferenced object 0x8804269cc3c0 (size 64):
  comm "criu", pid 1042, jiffies 4294907360 (age 13.713s)
  hex dump (first 32 bytes):
a0 32 cc 2c 04 88 ff ff 00 00 00 00 00 00 00 00  .2.,
00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  
  backtrace:
[] kmemleak_alloc+0x4a/0xa0
[] kmem_cache_alloc_trace+0x10f/0x280
[] __netlink_diag_dump+0x26c/0x290 [netlink_diag]

v2: don't remove a reference on a rhashtable_iter structure to
release it from netlink_diag_dump_done

Cc: Herbert Xu 
Fixes: ad202074320c ("netlink: Use rhashtable walk interface in diag dump")
Signed-off-by: Andrei Vagin 
---
 net/netlink/diag.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/netlink/diag.c b/net/netlink/diag.c
index 3e3e253..b2f0e98 100644
--- a/net/netlink/diag.c
+++ b/net/netlink/diag.c
@@ -127,7 +127,6 @@ stop:
goto done;
 
rhashtable_walk_exit(hti);
-   cb->args[2] = 0;
num++;
 
 mc_list:
-- 
2.5.5

Re: [PATCH v4 2/6] cgroup: add support for eBPF programs

2016-09-06 Thread Rami Rosen

Hi,


+ * __cgroup_bpf_update() - Update the pinned program of a cgroup, and
+ * propagate the change to descendants
+ * @cgrp: The cgroup which descendants to traverse

Missing here is @parent

+ * @prog: A new program to pin
+ * @type: Type of pinning operation (ingress/egress)

...

> +void __cgroup_bpf_update(struct cgroup *cgrp,
> +struct cgroup *parent,
> +struct bpf_prog *prog,
> +enum bpf_attach_type type)
> +{

Regards,
Rami Rosen

Re: [PATCH net-next 2/2] tcp: put a TLV list of TCP stats in error queue

2016-09-06 Thread Soheil Hassas Yeganeh

On Tue, Sep 6, 2016 at 9:32 PM, Francis Y. Yan  wrote:
>
> To export useful TCP statistics along with timestamps, such as
> rwnd-limited time and min RTT, we enqueue a TLV list in error queue
> immediately when a timestamp is generated.
>
> Specifically, if user space requests SOF_TIMESTAMPING_TX_* timestamps
> and sets SOF_TIMESTAMPING_OPT_STATS, the kernel will create a list of
> TLVs (struct nlattr) containing all the statistics and store the list
> in payload of the skb that is going to be enqueued into error queue.
> Notice that SOF_TIMESTAMPING_OPT_STATS can only be set together with
> SOF_TIMESTAMPING_OPT_TSONLY.
>
> In addition, if the application in user space also enables receiving
> timestamp (e.g. by SOF_TIMESTAMPING_SOFTWARE), calling recvfrom() on
> error queue will return one more control message with a cmsg_type of
> SCM_OPT_STATS containing the list of TLVs in its cmsg_data.
>
> Signed-off-by: Francis Y. Yan 
> Signed-off-by: Yuchung Cheng 

Acked-by: Soheil Hassas Yeganeh

Re: [PATCH v4 2/6] cgroup: add support for eBPF programs

2016-09-06 Thread kbuild test robot

Hi Daniel,

[auto build test ERROR on net-next/master]
[also build test ERROR on next-20160906]
[cannot apply to linus/master linux/master v4.8-rc5]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]
[Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for 
convenience) to record what (public, well-known) commit your patch series was 
built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:
https://github.com/0day-ci/linux/commits/Daniel-Mack/Add-eBPF-hooks-for-cgroups/20160907-110357
config: xtensa-allmodconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 4.9.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa 

All errors (new ones prefixed by >>):

   In file included from include/linux/cgroup-defs.h:19:0,
from include/linux/sched.h:61,
from include/linux/ptrace.h:5,
from arch/xtensa/kernel/asm-offsets.c:21:
   include/linux/bpf-cgroup.h: In function 'cgroup_bpf_run_filter':
>> include/linux/bpf-cgroup.h:48:2: error: implicit declaration of function 
>> 'static_branch_unlikely' [-Werror=implicit-function-declaration]
 if (cgroup_bpf_enabled)
 ^
   cc1: some warnings being treated as errors
   make[2]: *** [arch/xtensa/kernel/asm-offsets.s] Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [sub-make] Error 2

vim +/static_branch_unlikely +48 include/linux/bpf-cgroup.h

42  
43  /* Wrapper for __cgroup_bpf_run_filter() guarded by cgroup_bpf_enabled 
*/
44  static inline int cgroup_bpf_run_filter(struct sock *sk,
45  struct sk_buff *skb,
46  enum bpf_attach_type type)
47  {
  > 48  if (cgroup_bpf_enabled)
49  return __cgroup_bpf_run_filter(sk, skb, type);
50  
51  return 0;

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

Re: [PATCH net-next 1/2] tcp: measure rwnd-limited time

2016-09-06 Thread Soheil Hassas Yeganeh

On Tue, Sep 6, 2016 at 9:32 PM, Francis Y. Yan  wrote:
> This patch measures the total time when TCP transmission is limited
> by receiver's advertised window (rwnd), and exports it in tcp_info as
> tcpi_rwnd_limited.
>
> The rwnd-limited time is defined as the period when the next segment
> to send by TCP cannot fit into rwnd. To measure it, we record the last
> timestamp when limited by rwnd (rwnd_limited_ts) and the total
> rwnd-limited time (rwnd_limited) in tcp_sock.
>
> Then we export the total rwnd-limited time so far in tcp_info, where
> by so far, we mean that if TCP transmission is still being limited by
> rwnd, the time interval since rwnd_limited_ts needs to be counted as
> well; otherwise, we simply export rwnd_limited.
>
> It is worth noting that we also have to add a new sequence counter
> (seqcnt) in tcp_sock to carefully handle tcp_info's reading of
> rwnd_limited_ts and rwnd_limited in order to get a consistent snapshot
> of both variables together.
>
> Signed-off-by: Francis Y. Yan 
> Signed-off-by: Yuchung Cheng 

Acked-by: Soheil Hassas Yeganeh

linux-next: manual merge of the net-next tree with the net tree

2016-09-06 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  drivers/net/ethernet/qlogic/qed/qed_dcbx.c

between commit:

  561ed23331df ("qed: fix kzalloc-simple.cocci warnings")

from the net tree and commit:

  2591c280c375 ("qed: Remove OOM messages")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/net/ethernet/qlogic/qed/qed_dcbx.c
index 3656d2fd673d,be7b3dc7c9a7..
--- a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
@@@ -1189,11 -1172,9 +1186,9 @@@ int qed_dcbx_get_config_params(struct q
return 0;
}
  
 -  dcbx_info = kmalloc(sizeof(*dcbx_info), GFP_KERNEL);
 +  dcbx_info = kzalloc(sizeof(*dcbx_info), GFP_KERNEL);
-   if (!dcbx_info) {
-   DP_ERR(p_hwfn, "Failed to allocate struct qed_dcbx_info\n");
+   if (!dcbx_info)
return -ENOMEM;
-   }
  
rc = qed_dcbx_query_params(p_hwfn, dcbx_info, QED_DCBX_OPERATIONAL_MIB);
if (rc) {
@@@ -1226,11 -1207,9 +1221,9 @@@ static struct qed_dcbx_get *qed_dcbnl_g
  {
struct qed_dcbx_get *dcbx_info;
  
 -  dcbx_info = kmalloc(sizeof(*dcbx_info), GFP_KERNEL);
 +  dcbx_info = kzalloc(sizeof(*dcbx_info), GFP_KERNEL);
-   if (!dcbx_info) {
-   DP_ERR(hwfn->cdev, "Failed to allocate memory for dcbx_info\n");
+   if (!dcbx_info)
return NULL;
-   }
  
if (qed_dcbx_query_params(hwfn, dcbx_info, type)) {
kfree(dcbx_info);

Re: [PATCH v4 net-next 1/1] net_sched: Introduce skbmod action

2016-09-06 Thread kbuild test robot

Hi Jamal,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Jamal-Hadi-Salim/net_sched-Introduce-skbmod-action/20160907-095338
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   include/linux/compiler.h:230:8: sparse: attribute 'no_sanitize_address': 
unknown attribute
>> net/sched/act_skbmod.c:58:13: sparse: incompatible types in comparison 
>> expression (different address spaces)
   net/sched/act_skbmod.c:165:17: sparse: incompatible types in comparison 
expression (different address spaces)
   net/sched/act_skbmod.c:194:40: sparse: incompatible types in comparison 
expression (different address spaces)

vim +58 net/sched/act_skbmod.c

42   * then MAX_EDIT_LEN needs to change appropriately
43  */
44  err = skb_ensure_writable(skb, ETH_HLEN);
45  if (unlikely(err)) /* best policy is to drop on the floor */
46  action = TC_ACT_SHOT;
47  
48  tcf_lastuse_update(>tcf_tm);
49  
50  rcu_read_lock();
51  action = READ_ONCE(d->tcf_action);
52  if (unlikely(action == TC_ACT_SHOT)) {
53  d->tcf_qstats.drops++;
54  rcu_read_unlock();
55  return action;
56  }
57  
  > 58  p = rcu_dereference(d->skbmod_p);
59  flags = p->flags;
60  if (flags & SKBMOD_F_DMAC)
61  ether_addr_copy(eth_hdr(skb)->h_dest, p->eth_dst);
62  if (flags & SKBMOD_F_SMAC)
63  ether_addr_copy(eth_hdr(skb)->h_source, p->eth_src);
64  if (flags & SKBMOD_F_ETYPE)
65  eth_hdr(skb)->h_proto = p->eth_type;
66  rcu_read_unlock();

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

[PATCH net-next] net: diag: make udp_diag_destroy work for mapped addresses.

2016-09-06 Thread Lorenzo Colitti

udp_diag_destroy does look up the IPv4 UDP hashtable for mapped
addresses, but it gets the IPv4 address to look up from the
beginning of the IPv6 address instead of the end.

Tested: https://android-review.googlesource.com/269874
Fixes: 5d77dca82839 ("net: diag: support SOCK_DESTROY for UDP sockets")
Signed-off-by: Lorenzo Colitti 
---
 net/ipv4/udp_diag.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/udp_diag.c b/net/ipv4/udp_diag.c
index 8a9f6e5..58b79c0 100644
--- a/net/ipv4/udp_diag.c
+++ b/net/ipv4/udp_diag.c
@@ -186,8 +186,8 @@ static int __udp_diag_destroy(struct sk_buff *in_skb,
if (ipv6_addr_v4mapped((struct in6_addr *)req->id.idiag_dst) &&
ipv6_addr_v4mapped((struct in6_addr *)req->id.idiag_src))
sk = __udp4_lib_lookup(net,
-   req->id.idiag_dst[0], 
req->id.idiag_dport,
-   req->id.idiag_src[0], 
req->id.idiag_sport,
+   req->id.idiag_dst[3], 
req->id.idiag_dport,
+   req->id.idiag_src[3], 
req->id.idiag_sport,
req->id.idiag_if, tbl, NULL);
 
else
-- 
2.8.0.rc3.226.g39d4020

Re: [PATCH net-next v2] netlink: don't forget to release a rhashtable_iter structure

2016-09-06 Thread Herbert Xu

On Tue, Sep 06, 2016 at 09:31:17PM -0700, Andrei Vagin wrote:
> This bug was detected by kmemleak:
> unreferenced object 0x8804269cc3c0 (size 64):
>   comm "criu", pid 1042, jiffies 4294907360 (age 13.713s)
>   hex dump (first 32 bytes):
> a0 32 cc 2c 04 88 ff ff 00 00 00 00 00 00 00 00  .2.,
> 00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  
>   backtrace:
> [] kmemleak_alloc+0x4a/0xa0
> [] kmem_cache_alloc_trace+0x10f/0x280
> [] __netlink_diag_dump+0x26c/0x290 [netlink_diag]
> 
> v2: don't remove a reference on a rhashtable_iter structure to
> release it from netlink_diag_dump_done
> 
> Cc: Herbert Xu 
> Fixes: ad202074320c ("netlink: Use rhashtable walk interface in diag dump")
> Signed-off-by: Andrei Vagin 

Acked-by: Herbert Xu 

Thanks for catching this!
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] ptp: ixp46x: remove NO_IRQ handling

2016-09-06 Thread Arnd Bergmann

On Tuesday, September 6, 2016 6:39:10 PM CEST Richard Cochran wrote:
> On Tue, Sep 06, 2016 at 04:28:30PM +0200, Arnd Bergmann wrote:
> > gpio_to_irq does not return NO_IRQ but instead returns a negative
> > error code on failure. Returning NO_IRQ from the function has no
> > negative effects as we only compare the result to the expected
> > interrupt number, but it's better to return a proper failure
> > code for consistency, and we should remove NO_IRQ from the kernel
> > entirely.
> > 
> > Signed-off-by: Arnd Bergmann 
> 
> Acked-by: Richard Cochran 

Thanks!

I just realized that the randconfig builder had not picked up this
one and there was a typo. I'll send a replacement patch with your
Ack added in.

Arnd

Re: [PATCH 00/29] Netfilter updates for net-next

2016-09-06 Thread David Miller

From: Pablo Neira Ayuso 
Date: Mon,  5 Sep 2016 12:58:15 +0200

> Hi David,
> 
> The following patchset contains Netfilter updates for your net-next
> tree.  Most relevant updates are the removal of per-conntrack timers to
> use a workqueue/garbage collection approach instead from Florian
> Westphal, the hash and numgen expression for nf_tables from Laura
> Garcia, updates on nf_tables hash set to honor the NLM_F_EXCL flag,
> removal of ip_conntrack sysctl and many other incremental updates on our
> Netfilter codebase.
> 
> More specifically, they are:
 ...
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

Pulled, thanks Pablo.

Re: [PATCH v4 net-next 1/1] net_sched: Introduce skbmod action

2016-09-06 Thread David Miller

From: Jamal Hadi Salim 
Date: Tue, 6 Sep 2016 09:54:28 -0400

> On 16-09-06 09:37 AM, Jamal Hadi Salim wrote:
>> From: Jamal Hadi Salim 
>>
>> This action is intended to be an upgrade from a usability perspective
>> from pedit (as well as operational debugability).
>> Compare this:
>>
> 
> Dave,
> I will have to send some new version of this action - so please
> dont apply.

Ok.

[ethtool PATCH v1] ethtool: Document ethtool advertised speeds for 1G/10G

2016-09-06 Thread Vidya Sagar Ravipati

From: Vidya Sagar Ravipati 

Man page update to include updated advertised speeds for
1G/10G

Signed-off-by: Vidya Sagar Ravipati 
---
 ethtool.8.in | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/ethtool.8.in b/ethtool.8.in
index 821d0bc..376831d 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -575,10 +575,16 @@ lBl   lB.
 0x010  1000baseT Half  (not supported by IEEE standards)
 0x020  1000baseT Full
 0x21000baseKX Full
+0x200  1000baseX Full
 0x8000 2500baseX Full  (not supported by IEEE standards)
 0x1000 1baseT Full
 0x41baseKX4 Full
 0x81baseKR Full
+0x400  1baseCR  Full
+0x800  1baseSR  Full
+0x1000 1baseLR  Full
+0x2000 1baseLRM Full
+0x4000 1baseER  Full
 0x20   2baseMLD2 Full  (not supported by IEEE standards)
 0x40   2baseKR2 Full   (not supported by IEEE standards)
 0x8000 25000baseCR Full
-- 
2.1.4

Re: [PATCH net-next] sfc: check MTU against minimum threshold

2016-09-06 Thread David Miller

From: Bert Kenward 
Date: Tue, 6 Sep 2016 17:50:00 +0100

> Reported-by: Ma Yuying 
> Suggested-by: Jarod Wilson 
> Signed-off-by: Bert Kenward 

Applied.

Re: Minimum MTU Mess

2016-09-06 Thread David Miller

From: Jarod Wilson 
Date: Fri, 2 Sep 2016 13:07:42 -0400

> In any case, the number of "mtu < 68" and "#define FOO_MIN_MTU 68", or
> variations thereof, under drivers/net/ is kind of crazy.

Agreed, we can have a default and let the different cases provide
overrides.

Mostly what to do here is a function of the hardware though.

Re: [PATCH v2] ptp: ixp46x: remove NO_IRQ handling

2016-09-06 Thread David Miller

From: Arnd Bergmann 
Date: Tue,  6 Sep 2016 21:20:36 +0200

> gpio_to_irq does not return NO_IRQ but instead returns a negative
> error code on failure. Returning NO_IRQ from the function has no
> negative effects as we only compare the result to the expected
> interrupt number, but it's better to return a proper failure
> code for consistency, and we should remove NO_IRQ from the kernel
> entirely.
> 
> Signed-off-by: Arnd Bergmann 
> Acked-by: Richard Cochran 

Applied to net-next, thanks.

[PATCH net-next] net: xfrm: Change u32 sysctl entries to use proc_douintvec

2016-09-06 Thread Subash Abhinov Kasiviswanathan

proc_dointvec limits the values to INT_MAX in u32 sysctl entries.
proc_douintvec allows to write upto UINT_MAX.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 net/xfrm/xfrm_sysctl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_sysctl.c b/net/xfrm/xfrm_sysctl.c
index 05a6e3d..35a7e79 100644
--- a/net/xfrm/xfrm_sysctl.c
+++ b/net/xfrm/xfrm_sysctl.c
@@ -17,13 +17,13 @@ static struct ctl_table xfrm_table[] = {
.procname   = "xfrm_aevent_etime",
.maxlen = sizeof(u32),
.mode   = 0644,
-   .proc_handler   = proc_dointvec
+   .proc_handler   = proc_douintvec
},
{
.procname   = "xfrm_aevent_rseqth",
.maxlen = sizeof(u32),
.mode   = 0644,
-   .proc_handler   = proc_dointvec
+   .proc_handler   = proc_douintvec
},
{
.procname   = "xfrm_larval_drop",
-- 
1.9.1

Re: XPS configuration question (on tg3)

2016-09-06 Thread Eric Dumazet

On Tue, 2016-09-06 at 23:00 +0200, Michal Soltys wrote:
> On 2016-09-06 22:21, Alexander Duyck wrote:
> > On Tue, Sep 6, 2016 at 11:46 AM, Michal Soltys  wrote:
> >> Hi,
> >>
> >> I've been testing different configurations and I didn't manage to get XPS 
> >> to "behave" correctly - so I'm probably misunderstanding or forgetting 
> >> something. The nic in question (under tg3 driver - BCM5720 and BCM5719 
> >> models) was configured to 3 tx and 4 rx queues. 3 irqs were shared (tx and 
> >> rx), 1 was unused (this got me scratching my head a bit) and the remaining 
> >> one was for the last rx (though due to another bug recently fixed the 4th 
> >> rx queue was inconfigurable on receive side). The names were: eth1b-0, 
> >> eth1b-txrx-1, eth1b-txrx-2, eth1b-txrx-3, eth1b-rx-4.
> >>
> >> The XPS was configured as:
> >>
> >> echo f >/sys/class/net/eth1b/queues/tx-0/xps_cpus
> >> echo f0 >/sys/class/net/eth1b/queues/tx-1/xps_cpus
> >> echo ff00 >/sys/class/net/eth1b/queues/tx-2/xps_cpus
> >>
> >> So as far as I understand - cpus 0-3 should be allowed to use tx-0 queue 
> >> only, 4-7 tx-1 and 8-15 tx-2.
> >>
> >> Just in case rx side could get in the way as far as flows go, relevant 
> >> irqs were pinned to specific cpus - txrx-1 to 2, txrx-2 to 4, txrx-3 to 10 
> >> - falling into groups defined by the above masks.
> >>
> >> I tested both with mq and multiq scheduler, essentially either this:
> >>
> >> qdisc mq 2: root
> >> qdisc pfifo_fast 0: parent 2:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 
> >> 1 1 1
> >> qdisc pfifo_fast 0: parent 2:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 
> >> 1 1 1
> >> qdisc pfifo_fast 0: parent 2:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 
> >> 1 1 1
> >>
> >> or this (for the record, skbaction queue_mapping was behaving correctly 
> >> with the one below):
> >>
> >> qdisc multiq 3: root refcnt 6 bands 3/5
> >> qdisc pfifo_fast 31: parent 3:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 
> >> 1 1 1
> >> qdisc pfifo_fast 32: parent 3:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 
> >> 1 1 1
> >> qdisc pfifo_fast 33: parent 3:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 
> >> 1 1 1
> >>
> >> Now, do I understand correctly, that under the above setup - commands such 
> >> as
> >>
> >> taskset 400 nc -p $prt host_ip 12345  >> or
> >> yancat -i /dev/zero -o t:host_ip:12345 -u 10 -U 10
> >>
> >> ITOW - pinning simple nc command on cpu #10 (or using a tool that supports 
> >> affinity by itself) and sending data to some other host on the net - 
> >> should *always* use tx-2 queue ?
> >> I also tested variation such as: taskset 400 nc -l -p host_ip 12345 
> >>  >>
> >> In my case, what queue it used was basically random (on top of that it 
> >> sometimes changed the used queue mid-transfer) what could be easily 
> >> confirmed through both /proc/interrupts and tc -s qdisc show. And I'm a 
> >> bit at loss now, as I though xps configuration should be absolute.
> >>
> >> Well, I'd be greatful for some pointers / hints.
> > 
> > So it sounds like you have everything configured correctly.  The one
> > question I would have is if we are certain the CPU pinning is working
> > for the application.  You might try using something like perf to
> > verify what is running on CPU 10, and what is running on the CPUs that
> > the queues are associated with.
> > 
> 
> I did verify with 'top' in this case. I'll double check tommorow just to
> be sure. Other than testing, there was nothing else running on the machine.
> 
> > Also after you have configured things you may want to double check and
> > verify the xps_cpus value is still set.  I know under some
> > circumstances the value can be reset by a device driver if the number
> > of queues changes, or if the interface toggles between being
> > administratively up/down.
> 
> Hmm, none of this was happening during tests.
> 
> Are there any other circumstances where xps settings could be ignored or
> changed during the test (that is during the actual transfer, not between
> separate attempts) ?
> 
> One thing I'm a bit afraid is that kernel was not exactly the newest
> (3.16), maybe I'm missing some crucial fixes, though xps was added much
> earlier than that. Either way, I'll try to redo tests with current
> kernel tommorow.
> 

Keep in mind that TCP stack can send packets, responding to incoming
ACK.

So you might check that incoming ACK are handled by the 'right' cpu.

Without RFS, there is no such guarantee.

echo 32768 >/proc/sys/net/core/rps_sock_flow_entries
echo 8192 >/sys/class/net/eth1/queues/rx-0/rps_flow_cnt
echo 8192 >/sys/class/net/eth1/queues/rx-1/rps_flow_cnt
echo 8192 >/sys/class/net/eth1/queues/rx-2/rps_flow_cnt
echo 8192 >/sys/class/net/eth1/queues/rx-3/rps_flow_cnt

< 1 2

101 - 197 of 197 matches

Mail list logo