date:20170524

Re: [PATCH] bonding: Don't update slave->link until ready to commit

2017-05-24 Thread महेश बंडेवार

On Wed, May 24, 2017 at 7:45 PM, Nithin Nayak Sujir  wrote:
> In the loadbalance arp monitoring scheme, when a slave link change is
> detected, the slave->link is immediately updated and slave_state_changed
> is set. Later down the function, the rtnl_lock is acquired and the
> changes are committed, updating the bond link state.
>
> However, the acquisition of the rtnl_lock can fail. The next time the
> monitor runs, since slave->link is already updated, it determines that
> link is unchanged. This results in the bond link state permanently out
> of sync with the slave link.
>
> This patch modifies bond_loadbalance_arp_mon() to handle link changes
> identical to bond_ab_arp_{inspect/commit}(). The new link state is
> maintained in slave->new_link until we're ready to commit at which point
> it's copied into slave->link.
>
> NOTE: miimon_{inspect/commit}() has a more complex state machine
> requiring the use of the bond_{propose,commit}_link_state() functions
> which maintains the intermediate state in slave->link_new_state. The arp
> monitors don't require that.
>
> Testing: This bug is very easy to reproduce with the following steps.
> 1. In a loop, toggle a slave link of a bond slave interface.
> 2. In a separate loop, do ifconfig up/down of an unrelated interface to
> create contention for rtnl_lock.
> Within a few iterations, the bond link goes out of sync with the slave
> link.
>
> Signed-off-by: Nithin Nayak Sujir 
Acked-by: Mahesh Bandewar 
> Cc: Mahesh Bandewar 
> Cc: Jay Vosburgh 
> ---
>  drivers/net/bonding/bond_main.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 7331331..2359478b 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -2612,11 +2612,13 @@ static void bond_loadbalance_arp_mon(struct bonding 
> *bond)
> bond_for_each_slave_rcu(bond, slave, iter) {
> unsigned long trans_start = dev_trans_start(slave->dev);
>
> +   slave->new_link = BOND_LINK_NOCHANGE;
> +
> if (slave->link != BOND_LINK_UP) {
> if (bond_time_in_interval(bond, trans_start, 1) &&
> bond_time_in_interval(bond, slave->last_rx, 1)) {
>
> -   slave->link  = BOND_LINK_UP;
> +   slave->new_link = BOND_LINK_UP;
> slave_state_changed = 1;
>
> /* primary_slave has no meaning in round-robin
> @@ -2643,7 +2645,7 @@ static void bond_loadbalance_arp_mon(struct bonding 
> *bond)
> if (!bond_time_in_interval(bond, trans_start, 2) ||
> !bond_time_in_interval(bond, slave->last_rx, 2)) {
>
> -   slave->link  = BOND_LINK_DOWN;
> +   slave->new_link = BOND_LINK_DOWN;
> slave_state_changed = 1;
>
> if (slave->link_failure_count < UINT_MAX)
> @@ -2674,6 +2676,11 @@ static void bond_loadbalance_arp_mon(struct bonding 
> *bond)
> if (!rtnl_trylock())
> goto re_arm;
>
> +   bond_for_each_slave(bond, slave, iter) {
> +   if (slave->new_link != BOND_LINK_NOCHANGE)
> +   slave->link = slave->new_link;
> +   }
> +
> if (slave_state_changed) {
> bond_slave_state_change(bond);
> if (BOND_MODE(bond) == BOND_MODE_XOR)
> --
> 2.8.2
>

RE: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova

2017-05-24 Thread Ilan Tayari

> -Original Message-
> From: Alexei Starovoitov [mailto:alexei.starovoi...@gmail.com]
> 
> On Tue, May 23, 2017 at 02:44:02PM +0300, Saeed Mahameed wrote:
> > From: Ilan Tayari 
> >
> > Mellanox Innova is a NIC with ConnectX and an FPGA on the same
> > board. The FPGA is a bump-on-the-wire and thus affects operation of
> > the mlx5_core driver on the ConnectX ASIC.
> >
> > Add basic support for Innova in mlx5_core.
> >
> > This allows using the Innova card as a regular NIC, by detecting
> > the FPGA capability bit, and verifying its load state before
> > initializing ConnectX interfaces.
> 
> That doesn't sound like cx4/cx5 hw that mlx5 driver suppose to support.

Hi Alexei,
Thanks for your feedback.
Let me address your concerns.

I didn't state it, but the current iterations of Innova cards include the
ConnectX-4LX chip, which is exactly what mlx5 driver supports.

The PCI interface is *only* through the ConnectX chip with some new
Firmware commands. This patch doesn't register any new PCI device, 
so it does not make sense to create a separate driver.

> 
> > Also detect FPGA fatal runtime failures and enter error state if
> > they ever happen.
> >
> > All new FPGA-related logic is placed in its own subdirectory 'fpga',
> > which may be built by selecting CONFIG_MLX5_FPGA.
> > This prepares for further support of various Innova features in later
> > patchsets.
> > Additional details about hardware architecture will be provided as
> > more features get submitted.
> >
> > Signed-off-by: Ilan Tayari 
> > Reviewed-by: Boris Pismenny 
> > Signed-off-by: Saeed Mahameed 
> > ---
> >  MAINTAINERS|  10 +
> >  drivers/net/ethernet/mellanox/mlx5/core/Kconfig|  10 +
> >  drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   3 +
> >  drivers/net/ethernet/mellanox/mlx5/core/eq.c   |  11 ++
> >  drivers/net/ethernet/mellanox/mlx5/core/fpga/cmd.c |  64 +++
> >  drivers/net/ethernet/mellanox/mlx5/core/fpga/cmd.h |  59 ++
> >  .../net/ethernet/mellanox/mlx5/core/fpga/core.c| 202
> +
> >  .../net/ethernet/mellanox/mlx5/core/fpga/core.h|  99 ++
> >  drivers/net/ethernet/mellanox/mlx5/core/main.c |  19 +-
> >  include/linux/mlx5/device.h|   6 +
> >  include/linux/mlx5/driver.h|   5 +
> >  include/linux/mlx5/mlx5_ifc.h  |  11 +-
> >  include/linux/mlx5/mlx5_ifc_fpga.h | 144
> +++
> >  13 files changed, 640 insertions(+), 3 deletions(-)
> 
> Can you put it into different driver? Dumping everything into by far
> the biggest nic driver already is already huge headache in terms on
> maintainability, debugging and back ports.
> Look at how intel splits their drivers.
> ixgb, ixgbe, ixgbevf are different drivers thought they have a lot in
> common. On one side it's a bit of copy paste, but on the other side
> it makes drivers much easier to develop and maintain independently.
> ConnectX-6 code and any future hw support doesn't belong to
> mlx5 driver at all.

If you build your driver without explicitly enabling CONFIG_MLX5_FPGA=y
(default is N) then you get none of this, and your driver is practically
the same as before.

The FPGA and CX operation is very tightly integrated.
If you don't want any of this, simply don't opt-in to CONFIG_MLX5_FPGA.

If you do want this, then splitting some of the logic to a
separate kernel object will not gain anything useful (logic would stay
the same), and just pollute the exported symbol table and open up the door
for issues of inter-module compatibility and so on.

Furthermore, the next patchset will introduce IPSec offload capabilities
Which are declared in netdev->hw_features. Those cannot be modified
after the netdevice is created, so all the extra logic has to be 
integrated into the mlx5 ethernet driver. This is another reason why
a separate driver is a bad idea.

We will include details about the board and how things work in the
cover letter of the IPSec offload patchset.

Ilan.

Re: [PATCH net-next] ibmvnic: Enable TSO support

2017-05-24 Thread kbuild test robot

Hi Thomas,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Thomas-Falcon/ibmvnic-Enable-TSO-support/20170525-111039
config: powerpc-allmodconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

All warnings (new ones prefixed by >>):

   In file included from include/linux/module.h:18:0,
from drivers/net/ethernet/ibm/ibmvnic.c:46:
   drivers/net/ethernet/ibm/ibmvnic.c: In function '__check_large_send_offload':
   include/linux/moduleparam.h:148:27: error: return from incompatible pointer 
type [-Werror=incompatible-pointer-types]
 param_check_##type(name, &(value));   \
  ^
   include/linux/moduleparam.h:346:68: note: in definition of macro 
'__param_check'
 static inline type __always_unused *__check_##name(void) { return(p); }
   ^
   include/linux/moduleparam.h:148:2: note: in expansion of macro 
'param_check_bool'
 param_check_##type(name, &(value));   \
 ^~~~
   include/linux/moduleparam.h:128:2: note: in expansion of macro 
'module_param_named'
 module_param_named(name, name, type, perm)
 ^~
>> drivers/net/ethernet/ibm/ibmvnic.c:85:1: note: in expansion of macro 
>> 'module_param'
module_param(large_send_offload, bool, 0644);
^~~~
   cc1: some warnings being treated as errors

vim +/module_param +85 drivers/net/ethernet/ibm/ibmvnic.c

69  #include 
70  #include 
71  #include 
72  #include 
73  #include 
74  #include 
75  #include 
76  #include 
77  #include 
78  
79  #include "ibmvnic.h"
80  
81  static const char ibmvnic_driver_name[] = "ibmvnic";
82  static const char ibmvnic_driver_string[] = "IBM System i/p Virtual NIC 
Driver";
83  
84  static int large_send_offload;
  > 85  module_param(large_send_offload, bool, 0644);
86  MODULE_PARM_DESC(large_send_offload,
87   "Determines whether large send offload is enabled");
88  
89  MODULE_AUTHOR("Santiago Leon ");
90  MODULE_DESCRIPTION("IBM System i/p Virtual NIC Driver");
91  MODULE_LICENSE("GPL");
92  MODULE_VERSION(IBMVNIC_DRIVER_VERSION);
93  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

[PATCH net-next 5/6] net: mpls: Add extack messages for route add and delete failures

2017-05-24 Thread David Ahern

Add error messages for failures in adding and deleting mpls routes.
This covers most of the annoying EINVAL errors.

Signed-off-by: David Ahern 
---
 net/mpls/af_mpls.c  | 136 +---
 net/mpls/internal.h |   2 +-
 2 files changed, 98 insertions(+), 40 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index f3830951fb1c..d0120f8c8a2c 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -720,7 +720,8 @@ static int mpls_nh_build_from_cfg(struct mpls_route_config 
*cfg,
 
 static int mpls_nh_build(struct net *net, struct mpls_route *rt,
 struct mpls_nh *nh, int oif, struct nlattr *via,
-struct nlattr *newdst, u8 max_labels)
+struct nlattr *newdst, u8 max_labels,
+struct netlink_ext_ack *extack)
 {
int err = -ENOMEM;
 
@@ -729,14 +730,14 @@ static int mpls_nh_build(struct net *net, struct 
mpls_route *rt,
 
if (newdst) {
err = nla_get_labels(newdst, max_labels, >nh_labels,
-nh->nh_label, NULL);
+nh->nh_label, extack);
if (err)
goto errout;
}
 
if (via) {
err = nla_get_via(via, >nh_via_alen, >nh_via_table,
- __mpls_nh_via(rt, nh));
+ __mpls_nh_via(rt, nh), extack);
if (err)
goto errout;
} else {
@@ -803,7 +804,8 @@ static u8 mpls_count_nexthops(struct rtnexthop *rtnh, int 
len,
 }
 
 static int mpls_nh_build_multi(struct mpls_route_config *cfg,
-  struct mpls_route *rt, u8 max_labels)
+  struct mpls_route *rt, u8 max_labels,
+  struct netlink_ext_ack *extack)
 {
struct rtnexthop *rtnh = cfg->rc_mp;
struct nlattr *nla_via, *nla_newdst;
@@ -837,7 +839,7 @@ static int mpls_nh_build_multi(struct mpls_route_config 
*cfg,
 
err = mpls_nh_build(cfg->rc_nlinfo.nl_net, rt, nh,
rtnh->rtnh_ifindex, nla_via, nla_newdst,
-   max_labels);
+   max_labels, extack);
if (err)
goto errout;
 
@@ -856,7 +858,8 @@ static int mpls_nh_build_multi(struct mpls_route_config 
*cfg,
return err;
 }
 
-static int mpls_route_add(struct mpls_route_config *cfg)
+static int mpls_route_add(struct mpls_route_config *cfg,
+ struct netlink_ext_ack *extack)
 {
struct mpls_route __rcu **platform_label;
struct net *net = cfg->rc_nlinfo.nl_net;
@@ -876,17 +879,25 @@ static int mpls_route_add(struct mpls_route_config *cfg)
}
 
/* Reserved labels may not be set */
-   if (index < MPLS_LABEL_FIRST_UNRESERVED)
+   if (index < MPLS_LABEL_FIRST_UNRESERVED) {
+   NL_SET_ERR_MSG(extack,
+  "Invalid label - must be 
MPLS_LABEL_FIRST_UNRESERVED or higher");
goto errout;
+   }
 
/* The full 20 bit range may not be supported. */
-   if (index >= net->mpls.platform_labels)
+   if (index >= net->mpls.platform_labels) {
+   NL_SET_ERR_MSG(extack,
+  "Label >= configured max in platform_labels");
goto errout;
+   }
 
/* Append makes no sense with mpls */
err = -EOPNOTSUPP;
-   if (cfg->rc_nlflags & NLM_F_APPEND)
+   if (cfg->rc_nlflags & NLM_F_APPEND) {
+   NL_SET_ERR_MSG(extack, "MPLS does not support route append");
goto errout;
+   }
 
err = -EEXIST;
platform_label = rtnl_dereference(net->mpls.platform_label);
@@ -913,8 +924,10 @@ static int mpls_route_add(struct mpls_route_config *cfg)
nhs = 1;
}
 
-   if (nhs == 0)
+   if (nhs == 0) {
+   NL_SET_ERR_MSG(extack, "Route does not contain a nexthop");
goto errout;
+   }
 
err = -ENOMEM;
rt = mpls_rt_alloc(nhs, max_via_alen, max_labels);
@@ -928,7 +941,7 @@ static int mpls_route_add(struct mpls_route_config *cfg)
rt->rt_ttl_propagate = cfg->rc_ttl_propagate;
 
if (cfg->rc_mp)
-   err = mpls_nh_build_multi(cfg, rt, max_labels);
+   err = mpls_nh_build_multi(cfg, rt, max_labels, extack);
else
err = mpls_nh_build_from_cfg(cfg, rt);
if (err)
@@ -944,7 +957,8 @@ static int mpls_route_add(struct mpls_route_config *cfg)
return err;
 }
 
-static int mpls_route_del(struct mpls_route_config *cfg)
+static int mpls_route_del(struct mpls_route_config *cfg,
+ struct netlink_ext_ack *extack)
 {
struct net *net = cfg->rc_nlinfo.nl_net;
unsigned

[PATCH net-next 3/6] net: plumb extack arg down to lwtunnel build state

2017-05-24 Thread David Ahern

Pass extack arg down to lwtunnel_build_state and the build_state callbacks.
Add messages for failures in lwtunnel_build_state, and add the extarg to
nla_parse where possible in the build_state callbacks.

Signed-off-by: David Ahern 
---
 include/linux/netlink.h   | 10 ++
 include/net/lwtunnel.h|  9 ++---
 net/core/lwt_bpf.c|  5 +++--
 net/core/lwtunnel.c   | 20 +---
 net/ipv4/fib_lookup.h |  3 ++-
 net/ipv4/fib_semantics.c  | 20 +++-
 net/ipv4/fib_trie.c   |  2 +-
 net/ipv4/ip_tunnel_core.c | 11 +++
 net/ipv6/ila/ila_lwt.c|  5 +++--
 net/ipv6/route.c  |  2 +-
 net/ipv6/seg6_iptunnel.c  |  5 +++--
 net/mpls/mpls_iptunnel.c  |  5 +++--
 12 files changed, 67 insertions(+), 30 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index a68aad484c69..8664fd26eb5d 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -102,6 +102,16 @@ struct netlink_ext_ack {
(extack)->bad_attr = (attr);\
 } while (0)
 
+#define NL_SET_ERR_MSG_ATTR(extack, attr, msg) do {\
+   static const char __msg[] = (msg);  \
+   struct netlink_ext_ack *__extack = (extack);\
+   \
+   if (__extack) { \
+   __extack->_msg = __msg; \
+   __extack->bad_attr = (attr);\
+   }   \
+} while (0)
+
 extern void netlink_kernel_release(struct sock *sk);
 extern int __netlink_change_ngroups(struct sock *sk, unsigned int groups);
 extern int netlink_change_ngroups(struct sock *sk, unsigned int groups);
diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index ca6f002774ef..7c26863b8cf4 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -35,7 +35,8 @@ struct lwtunnel_state {
 struct lwtunnel_encap_ops {
int (*build_state)(struct nlattr *encap,
   unsigned int family, const void *cfg,
-  struct lwtunnel_state **ts);
+  struct lwtunnel_state **ts,
+  struct netlink_ext_ack *extack);
void (*destroy_state)(struct lwtunnel_state *lws);
int (*output)(struct net *net, struct sock *sk, struct sk_buff *skb);
int (*input)(struct sk_buff *skb);
@@ -114,7 +115,8 @@ int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int 
len,
 int lwtunnel_build_state(u16 encap_type,
 struct nlattr *encap,
 unsigned int family, const void *cfg,
-struct lwtunnel_state **lws);
+struct lwtunnel_state **lws,
+struct netlink_ext_ack *extack);
 int lwtunnel_fill_encap(struct sk_buff *skb,
struct lwtunnel_state *lwtstate);
 int lwtunnel_get_encap_size(struct lwtunnel_state *lwtstate);
@@ -192,7 +194,8 @@ static inline int lwtunnel_valid_encap_type_attr(struct 
nlattr *attr, int len,
 static inline int lwtunnel_build_state(u16 encap_type,
   struct nlattr *encap,
   unsigned int family, const void *cfg,
-  struct lwtunnel_state **lws)
+  struct lwtunnel_state **lws,
+  struct netlink_ext_ack *extack)
 {
return -EOPNOTSUPP;
 }
diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index b3bc0a31af9f..1307731ddfe4 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -240,7 +240,8 @@ static const struct nla_policy bpf_nl_policy[LWT_BPF_MAX + 
1] = {
 
 static int bpf_build_state(struct nlattr *nla,
   unsigned int family, const void *cfg,
-  struct lwtunnel_state **ts)
+  struct lwtunnel_state **ts,
+  struct netlink_ext_ack *extack)
 {
struct nlattr *tb[LWT_BPF_MAX + 1];
struct lwtunnel_state *newts;
@@ -250,7 +251,7 @@ static int bpf_build_state(struct nlattr *nla,
if (family != AF_INET && family != AF_INET6)
return -EAFNOSUPPORT;
 
-   ret = nla_parse_nested(tb, LWT_BPF_MAX, nla, bpf_nl_policy, NULL);
+   ret = nla_parse_nested(tb, LWT_BPF_MAX, nla, bpf_nl_policy, extack);
if (ret < 0)
return ret;
 
diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index ab840386a74d..a0feca1df7b5 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -103,25 +103,39 @@ EXPORT_SYMBOL(lwtunnel_encap_del_ops);
 
 int lwtunnel_build_state(u16 encap_type,
 struct nlattr *encap, unsigned int family,
-const void *cfg, struct lwtunnel_state **lws)
+const void *cfg, struct lwtunnel_state **lws,
+

[PATCH net-next 4/6] net: Fill in extack for mpls lwt encap

2017-05-24 Thread David Ahern

Fill in extack for errors in build_state for mpls lwt encap including
passing extack to nla_get_labels and adding error messages for failures
in it.

Signed-off-by: David Ahern 
---
 net/mpls/af_mpls.c   | 49 ++--
 net/mpls/internal.h  |  2 +-
 net/mpls/mpls_iptunnel.c | 12 +++-
 3 files changed, 47 insertions(+), 16 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 257ec66009da..f3830951fb1c 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -728,8 +728,8 @@ static int mpls_nh_build(struct net *net, struct mpls_route 
*rt,
goto errout;
 
if (newdst) {
-   err = nla_get_labels(newdst, max_labels,
->nh_labels, nh->nh_label);
+   err = nla_get_labels(newdst, max_labels, >nh_labels,
+nh->nh_label, NULL);
if (err)
goto errout;
}
@@ -782,7 +782,8 @@ static u8 mpls_count_nexthops(struct rtnexthop *rtnh, int 
len,
 
nla = nla_find(attrs, attrlen, RTA_NEWDST);
if (nla &&
-   nla_get_labels(nla, MAX_NEW_LABELS, _labels, NULL) != 0)
+   nla_get_labels(nla, MAX_NEW_LABELS, _labels,
+  NULL, NULL) != 0)
return 0;
 
*max_labels = max_t(u8, *max_labels, n_labels);
@@ -1541,8 +1542,8 @@ int nla_put_labels(struct sk_buff *skb, int attrtype,
 }
 EXPORT_SYMBOL_GPL(nla_put_labels);
 
-int nla_get_labels(const struct nlattr *nla,
-  u8 max_labels, u8 *labels, u32 label[])
+int nla_get_labels(const struct nlattr *nla, u8 max_labels, u8 *labels,
+  u32 label[], struct netlink_ext_ack *extack)
 {
unsigned len = nla_len(nla);
struct mpls_shim_hdr *nla_label;
@@ -1553,13 +1554,18 @@ int nla_get_labels(const struct nlattr *nla,
/* len needs to be an even multiple of 4 (the label size). Number
 * of labels is a u8 so check for overflow.
 */
-   if (len & 3 || len / 4 > 255)
+   if (len & 3 || len / 4 > 255) {
+   NL_SET_ERR_MSG_ATTR(extack, nla,
+   "Invalid length for labels attribute");
return -EINVAL;
+   }
 
/* Limit the number of new labels allowed */
nla_labels = len/4;
-   if (nla_labels > max_labels)
+   if (nla_labels > max_labels) {
+   NL_SET_ERR_MSG(extack, "Too many labels");
return -EINVAL;
+   }
 
/* when label == NULL, caller wants number of labels */
if (!label)
@@ -1574,8 +1580,29 @@ int nla_get_labels(const struct nlattr *nla,
/* Ensure the bottom of stack flag is properly set
 * and ttl and tc are both clear.
 */
-   if ((dec.bos != bos) || dec.ttl || dec.tc)
+   if (dec.ttl) {
+   NL_SET_ERR_MSG_ATTR(extack, nla,
+   "TTL in label must be 0");
+   return -EINVAL;
+   }
+
+   if (dec.tc) {
+   NL_SET_ERR_MSG_ATTR(extack, nla,
+   "Traffic class in label must be 0");
return -EINVAL;
+   }
+
+   if (dec.bos != bos) {
+   NL_SET_BAD_ATTR(extack, nla);
+   if (bos) {
+   NL_SET_ERR_MSG(extack,
+  "BOS bit must be set in first 
label");
+   } else {
+   NL_SET_ERR_MSG(extack,
+  "BOS bit can only be set in 
first label");
+   }
+   return -EINVAL;
+   }
 
switch (dec.label) {
case MPLS_LABEL_IMPLNULL:
@@ -1583,6 +1610,8 @@ int nla_get_labels(const struct nlattr *nla,
 * assign and distribute, but which never
 * actually appears in the encapsulation.
 */
+   NL_SET_ERR_MSG_ATTR(extack, nla,
+   "Implicit NULL Label (3) can not be 
used in encapsulation");
return -EINVAL;
}
 
@@ -1696,14 +1725,14 @@ static int rtm_to_route_config(struct sk_buff *skb,  
struct nlmsghdr *nlh,
case RTA_NEWDST:
if (nla_get_labels(nla, MAX_NEW_LABELS,
   >rc_output_labels,
-  cfg->rc_output_label))
+  cfg->rc_output_label, NULL))
goto errout;
break;
case RTA_DST:

[PATCH net-next 0/6] net: another round of extack handling for routing

2017-05-24 Thread David Ahern

This set focuses on passing extack through lwtunnel and MPLS with
additional catches for IPv4 route add and minor cleanups in MPLS
encountered passing the extack arg around.

David Ahern (6):
  net: ipv4: Add extack message for invalid prefix or length
  net: lwtunnel: Add extack to encap attr validation
  net: plumb extack arg down to lwtunnel build state
  net: Fill in extack for mpls lwt encap
  net: mpls: Add extack messages for route add and delete failures
  net: mpls: minor cleanups

 include/linux/netlink.h   |  10 +++
 include/net/ip_fib.h  |   3 +-
 include/net/lwtunnel.h|  22 --
 net/core/lwt_bpf.c|   5 +-
 net/core/lwtunnel.c   |  38 --
 net/ipv4/fib_frontend.c   |  12 +--
 net/ipv4/fib_lookup.h |   3 +-
 net/ipv4/fib_semantics.c  |  20 ++---
 net/ipv4/fib_trie.c   |  22 --
 net/ipv4/ip_tunnel_core.c |  11 ++-
 net/ipv6/ila/ila_lwt.c|   5 +-
 net/ipv6/route.c  |   6 +-
 net/ipv6/seg6_iptunnel.c  |   5 +-
 net/mpls/af_mpls.c| 184 ++
 net/mpls/internal.h   |   4 +-
 net/mpls/mpls_iptunnel.c  |  17 +++--
 16 files changed, 259 insertions(+), 108 deletions(-)

-- 
2.11.0 (Apple Git-81)

[PATCH net-next 2/6] net: lwtunnel: Add extack to encap attr validation

2017-05-24 Thread David Ahern

Pass extack down to lwtunnel_valid_encap_type and
lwtunnel_valid_encap_type_attr. Add messages for unknown
or unsupported encap types.

Signed-off-by: David Ahern 
---
 include/net/lwtunnel.h  | 13 +
 net/core/lwtunnel.c | 18 +-
 net/ipv4/fib_frontend.c |  6 --
 net/ipv6/route.c|  4 ++--
 4 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index ebfe237aad7e..ca6f002774ef 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -107,8 +107,10 @@ int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops 
*op,
   unsigned int num);
 int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op,
   unsigned int num);
-int lwtunnel_valid_encap_type(u16 encap_type);
-int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int len);
+int lwtunnel_valid_encap_type(u16 encap_type,
+ struct netlink_ext_ack *extack);
+int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int len,
+  struct netlink_ext_ack *extack);
 int lwtunnel_build_state(u16 encap_type,
 struct nlattr *encap,
 unsigned int family, const void *cfg,
@@ -172,11 +174,14 @@ static inline int lwtunnel_encap_del_ops(const struct 
lwtunnel_encap_ops *op,
return -EOPNOTSUPP;
 }
 
-static inline int lwtunnel_valid_encap_type(u16 encap_type)
+static inline int lwtunnel_valid_encap_type(u16 encap_type,
+   struct netlink_ext_ack *extack)
 {
+   NL_SET_ERR_MSG(extack, "CONFIG_LWTUNNEL is not enabled in this kernel");
return -EOPNOTSUPP;
 }
-static inline int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int len)
+static inline int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int len,
+struct netlink_ext_ack *extack)
 {
/* return 0 since we are not walking attr looking for
 * RTA_ENCAP_TYPE attribute on nexthops.
diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index cfae3d5fe11f..ab840386a74d 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -126,14 +126,16 @@ int lwtunnel_build_state(u16 encap_type,
 }
 EXPORT_SYMBOL(lwtunnel_build_state);
 
-int lwtunnel_valid_encap_type(u16 encap_type)
+int lwtunnel_valid_encap_type(u16 encap_type, struct netlink_ext_ack *extack)
 {
const struct lwtunnel_encap_ops *ops;
int ret = -EINVAL;
 
if (encap_type == LWTUNNEL_ENCAP_NONE ||
-   encap_type > LWTUNNEL_ENCAP_MAX)
+   encap_type > LWTUNNEL_ENCAP_MAX) {
+   NL_SET_ERR_MSG(extack, "Unknown lwt encapsulation type");
return ret;
+   }
 
rcu_read_lock();
ops = rcu_dereference(lwtun_encaps[encap_type]);
@@ -153,11 +155,16 @@ int lwtunnel_valid_encap_type(u16 encap_type)
}
}
 #endif
-   return ops ? 0 : -EOPNOTSUPP;
+   ret = ops ? 0 : -EOPNOTSUPP;
+   if (ret < 0)
+   NL_SET_ERR_MSG(extack, "lwt encapsulation type not supported");
+
+   return ret;
 }
 EXPORT_SYMBOL(lwtunnel_valid_encap_type);
 
-int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int remaining)
+int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int remaining,
+  struct netlink_ext_ack *extack)
 {
struct rtnexthop *rtnh = (struct rtnexthop *)attr;
struct nlattr *nla_entype;
@@ -174,7 +181,8 @@ int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int 
remaining)
if (nla_entype) {
encap_type = nla_get_u16(nla_entype);
 
-   if (lwtunnel_valid_encap_type(encap_type) != 0)
+   if (lwtunnel_valid_encap_type(encap_type,
+ extack) != 0)
return -EOPNOTSUPP;
}
}
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 2d55882d566e..f03fe8521f6e 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -684,7 +684,8 @@ static int rtm_to_fib_config(struct net *net, struct 
sk_buff *skb,
break;
case RTA_MULTIPATH:
err = lwtunnel_valid_encap_type_attr(nla_data(attr),
-nla_len(attr));
+nla_len(attr),
+extack);
if (err < 0)
goto errout;
cfg->fc_mp = nla_data(attr);
@@ -701,7 +702,8 @@ static int rtm_to_fib_config(struct net *net, struct 
sk_buff *skb,
break;

[PATCH net-next 1/6] net: ipv4: Add extack message for invalid prefix or length

2017-05-24 Thread David Ahern

Add extack error message for invalid prefix length and invalid prefix.
Example of the latter is a route spec containing 172.16.100.1/24, where
the /24 mask means the lower 8-bits should be 0. Amazing how easy that
one is to overlook when an EINVAL is returned.

Signed-off-by: David Ahern 
---
 include/net/ip_fib.h|  3 ++-
 net/ipv4/fib_frontend.c |  6 +++---
 net/ipv4/fib_trie.c | 20 +++-
 3 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 42e8b8f55f7c..d490a0f8dc4c 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -265,7 +265,8 @@ int fib_table_lookup(struct fib_table *tb, const struct 
flowi4 *flp,
 struct fib_result *res, int fib_flags);
 int fib_table_insert(struct net *, struct fib_table *, struct fib_config *,
 struct netlink_ext_ack *extack);
-int fib_table_delete(struct net *, struct fib_table *, struct fib_config *);
+int fib_table_delete(struct net *, struct fib_table *, struct fib_config *,
+struct netlink_ext_ack *extack);
 int fib_table_dump(struct fib_table *table, struct sk_buff *skb,
   struct netlink_callback *cb);
 int fib_table_flush(struct net *net, struct fib_table *table);
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 14d2f7bd7c76..2d55882d566e 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -588,7 +588,7 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void 
__user *arg)
if (cmd == SIOCDELRT) {
tb = fib_get_table(net, cfg.fc_table);
if (tb)
-   err = fib_table_delete(net, tb, );
+   err = fib_table_delete(net, tb, , 
NULL);
else
err = -ESRCH;
} else {
@@ -732,7 +732,7 @@ static int inet_rtm_delroute(struct sk_buff *skb, struct 
nlmsghdr *nlh,
goto errout;
}
 
-   err = fib_table_delete(net, tb, );
+   err = fib_table_delete(net, tb, , extack);
 errout:
return err;
 }
@@ -851,7 +851,7 @@ static void fib_magic(int cmd, int type, __be32 dst, int 
dst_len, struct in_ifad
if (cmd == RTM_NEWROUTE)
fib_table_insert(net, tb, , NULL);
else
-   fib_table_delete(net, tb, );
+   fib_table_delete(net, tb, , NULL);
 }
 
 void fib_add_ifaddr(struct in_ifaddr *ifa)
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 6d0f6c79d9aa..015ff759b0fd 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1115,15 +1115,20 @@ int fib_table_insert(struct net *net, struct fib_table 
*tb,
u32 key;
int err;
 
-   if (plen > KEYLENGTH)
+   if (plen > KEYLENGTH) {
+   NL_SET_ERR_MSG(extack, "Invalid prefix length");
return -EINVAL;
+   }
 
key = ntohl(cfg->fc_dst);
 
pr_debug("Insert table=%u %08x/%d\n", tb->tb_id, key, plen);
 
-   if ((plen < KEYLENGTH) && (key << plen))
+   if ((plen < KEYLENGTH) && (key << plen)) {
+   NL_SET_ERR_MSG(extack,
+  "Invalid prefix for given prefix length");
return -EINVAL;
+   }
 
fi = fib_create_info(cfg, extack);
if (IS_ERR(fi)) {
@@ -1507,7 +1512,7 @@ static void fib_remove_alias(struct trie *t, struct 
key_vector *tp,
 
 /* Caller must hold RTNL. */
 int fib_table_delete(struct net *net, struct fib_table *tb,
-struct fib_config *cfg)
+struct fib_config *cfg, struct netlink_ext_ack *extack)
 {
struct trie *t = (struct trie *) tb->tb_data;
struct fib_alias *fa, *fa_to_delete;
@@ -1517,13 +1522,18 @@ int fib_table_delete(struct net *net, struct fib_table 
*tb,
u8 tos = cfg->fc_tos;
u32 key;
 
-   if (plen > KEYLENGTH)
+   if (plen > KEYLENGTH) {
+   NL_SET_ERR_MSG(extack, "Invalid prefix length");
return -EINVAL;
+   }
 
key = ntohl(cfg->fc_dst);
 
-   if ((plen < KEYLENGTH) && (key << plen))
+   if ((plen < KEYLENGTH) && (key << plen)) {
+   NL_SET_ERR_MSG(extack,
+  "Invalid prefix for given prefix length");
return -EINVAL;
+   }
 
l = fib_find_node(t, , key);
if (!l)
-- 
2.11.0 (Apple Git-81)

[PATCH net-next 6/6] net: mpls: minor cleanups

2017-05-24 Thread David Ahern

Noticed these doing the extack support:
- nla_get_via is only used in af_mpls.c so remove from internal.h

- err is initialized to EINVAL in mpls_nh_build_from_cfg but then
  set again before it is checked. Remove the EINVAL setting

Signed-off-by: David Ahern 
---
 net/mpls/af_mpls.c  | 5 +++--
 net/mpls/internal.h | 2 --
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index d0120f8c8a2c..ee468feb9e64 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -43,6 +43,9 @@ static void rtmsg_lfib(int event, u32 label, struct 
mpls_route *rt,
   struct nlmsghdr *nlh, struct net *net, u32 portid,
   unsigned int nlm_flags);
 
+static int nla_get_via(const struct nlattr *nla, u8 *via_alen, u8 *via_table,
+  u8 via[], struct netlink_ext_ack *extack);
+
 static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
 {
struct mpls_route *rt = NULL;
@@ -695,8 +698,6 @@ static int mpls_nh_build_from_cfg(struct mpls_route_config 
*cfg,
if (!nh)
return -ENOMEM;
 
-   err = -EINVAL;
-
nh->nh_labels = cfg->rc_output_labels;
for (i = 0; i < nh->nh_labels; i++)
nh->nh_label[i] = cfg->rc_output_label[i];
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index a015a6a1143b..cf65aec2e551 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -204,8 +204,6 @@ int nla_put_labels(struct sk_buff *skb, int attrtype,  u8 
labels,
   const u32 label[]);
 int nla_get_labels(const struct nlattr *nla, u8 max_labels, u8 *labels,
   u32 label[], struct netlink_ext_ack *extack);
-int nla_get_via(const struct nlattr *nla, u8 *via_alen, u8 *via_table,
-   u8 via[], struct netlink_ext_ack *extack);
 bool mpls_output_possible(const struct net_device *dev);
 unsigned int mpls_dev_mtu(const struct net_device *dev);
 bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu);
-- 
2.11.0 (Apple Git-81)

Re: [PATCH net-next 1/8] net: ipv4: refactor __ip_route_output_key_hash

2017-05-24 Thread Roopa Prabhu

On Wed, May 24, 2017 at 6:10 PM, David Ahern  wrote:
> On 5/24/17 1:33 PM, Rosen, Rami wrote:
>> Hi, Rupa /David Ahern,
>>
>> First, thanks for this patch set!
>>
>> Second, it seems to me that something might be incorrect here.
>>
>> You have these additions in this patch  (1/8):
>> ...
>> +struct rtable *ip_route_output_key_hash_rcu(struct net *net, struct flowi4 
>> *flp,
>> + const struct sk_buff *skb,
>> + struct fib_result *res);
>> ...
>> +struct rtable *ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
>> + const struct sk_buff *skb)
>> +{
>> + struct fib_result res;
>> + struct rtable *rth;
>> +
>> + res.tclassid= 0;
>> + res.fi  = NULL;
>> + res.table   = NULL;
>> +
>> + rcu_read_lock();
>> + rth = ip_route_output_key_hash_rcu(net, fl4, , mp_hash);
>>   rcu_read_unlock();
>> +
>>   return rth;
>>  }
>> -EXPORT_SYMBOL_GPL(__ip_route_output_key_hash);
>> +EXPORT_SYMBOL_GPL(ip_route_output_key_hash);
>>
>>
>> So the third parameter to ip_route_output_key_hash_rcu() should be skb*, and 
>> the fourth parameter should be fib_result *. However, you do not pass the 
>> skb parameter
>> when calling ip_route_output_key_hash_rcu() in
>> ip_route_output_key_hash()  (in fact you don't use it at all),  and you pass 
>> mp_hash as the fourth parameter.
>
> Yep, it's a problem with the forward port of the first round of patches.

yep,

>
> Roopa: in include/net/route.h, __ip_route_output_key_hash should be
> removed as well.

ack, thanks Rami and david.

will fix it in v2.

[PATCH] bonding: Don't update slave->link until ready to commit

2017-05-24 Thread Nithin Nayak Sujir

In the loadbalance arp monitoring scheme, when a slave link change is
detected, the slave->link is immediately updated and slave_state_changed
is set. Later down the function, the rtnl_lock is acquired and the
changes are committed, updating the bond link state.

However, the acquisition of the rtnl_lock can fail. The next time the
monitor runs, since slave->link is already updated, it determines that
link is unchanged. This results in the bond link state permanently out
of sync with the slave link.

This patch modifies bond_loadbalance_arp_mon() to handle link changes
identical to bond_ab_arp_{inspect/commit}(). The new link state is
maintained in slave->new_link until we're ready to commit at which point
it's copied into slave->link.

NOTE: miimon_{inspect/commit}() has a more complex state machine
requiring the use of the bond_{propose,commit}_link_state() functions
which maintains the intermediate state in slave->link_new_state. The arp
monitors don't require that.

Testing: This bug is very easy to reproduce with the following steps.
1. In a loop, toggle a slave link of a bond slave interface.
2. In a separate loop, do ifconfig up/down of an unrelated interface to
create contention for rtnl_lock.
Within a few iterations, the bond link goes out of sync with the slave
link.

Signed-off-by: Nithin Nayak Sujir 
Cc: Mahesh Bandewar 
Cc: Jay Vosburgh 
---
 drivers/net/bonding/bond_main.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 7331331..2359478b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2612,11 +2612,13 @@ static void bond_loadbalance_arp_mon(struct bonding 
*bond)
bond_for_each_slave_rcu(bond, slave, iter) {
unsigned long trans_start = dev_trans_start(slave->dev);
 
+   slave->new_link = BOND_LINK_NOCHANGE;
+
if (slave->link != BOND_LINK_UP) {
if (bond_time_in_interval(bond, trans_start, 1) &&
bond_time_in_interval(bond, slave->last_rx, 1)) {
 
-   slave->link  = BOND_LINK_UP;
+   slave->new_link = BOND_LINK_UP;
slave_state_changed = 1;
 
/* primary_slave has no meaning in round-robin
@@ -2643,7 +2645,7 @@ static void bond_loadbalance_arp_mon(struct bonding *bond)
if (!bond_time_in_interval(bond, trans_start, 2) ||
!bond_time_in_interval(bond, slave->last_rx, 2)) {
 
-   slave->link  = BOND_LINK_DOWN;
+   slave->new_link = BOND_LINK_DOWN;
slave_state_changed = 1;
 
if (slave->link_failure_count < UINT_MAX)
@@ -2674,6 +2676,11 @@ static void bond_loadbalance_arp_mon(struct bonding 
*bond)
if (!rtnl_trylock())
goto re_arm;
 
+   bond_for_each_slave(bond, slave, iter) {
+   if (slave->new_link != BOND_LINK_NOCHANGE)
+   slave->link = slave->new_link;
+   }
+
if (slave_state_changed) {
bond_slave_state_change(bond);
if (BOND_MODE(bond) == BOND_MODE_XOR)
-- 
2.8.2

Re: [PATCH net-next 1/8] net: ipv4: refactor __ip_route_output_key_hash

2017-05-24 Thread kbuild test robot

Hi David,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Roopa-Prabhu/net-extend-RTM_GETROUTE-to-return-fib-result/20170525-053253
config: openrisc-or1ksim_defconfig (attached as .config)
compiler: or1k-linux-gcc (GCC) 5.4.0
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=openrisc 

Note: the 
linux-review/Roopa-Prabhu/net-extend-RTM_GETROUTE-to-return-fib-result/20170525-053253
 HEAD 083c4ee9e124d0acf29d159ced8a22cb41665a7a builds fine.
  It only hurts bisectibility.

All errors (new ones prefixed by >>):

   In file included from include/net/route.h:31:0,
from include/net/lwtunnel.h:8,
from include/net/ip_tunnels.h:17,
from include/net/dst_metadata.h:5,
from net/ipv4/route.c:94:
   net/ipv4/route.c: In function 'ip_route_output_key_hash_rcu':
>> include/net/ip_fib.h:167:32: error: request for member 'fi' in something not 
>> a structure or union
#define FIB_RES_NH(res)  ((res).fi->fib_nh[0])
   ^
   include/net/ip_fib.h:196:28: note: in expansion of macro 'FIB_RES_NH'
#define FIB_RES_DEV(res)  (FIB_RES_NH(res).nh_dev)
   ^
   net/ipv4/route.c:2399:35: note: in expansion of macro 'FIB_RES_DEV'
  dev_out = l3mdev_master_dev_rcu(FIB_RES_DEV(res)) ? :
  ^
   net/ipv4/route.c: In function 'ip_route_output_key_hash':
   net/ipv4/route.c:2430:53: error: 'mp_hash' undeclared (first use in this 
function)
 rth = ip_route_output_key_hash_rcu(net, fl4, , mp_hash);
^
   net/ipv4/route.c:2430:53: note: each undeclared identifier is reported only 
once for each function it appears in
   net/ipv4/route.c:2430:47: error: passing argument 3 of 
'ip_route_output_key_hash_rcu' from incompatible pointer type 
[-Werror=incompatible-pointer-types]
 rth = ip_route_output_key_hash_rcu(net, fl4, , mp_hash);
  ^
   net/ipv4/route.c:2249:16: note: expected 'const struct sk_buff *' but 
argument is of type 'struct fib_result *'
struct rtable *ip_route_output_key_hash_rcu(struct net *net, struct flowi4 
*fl4,
   ^
   cc1: some warnings being treated as errors
--
   In file included from include/net/route.h:31:0,
from include/net/lwtunnel.h:8,
from include/net/ip_tunnels.h:17,
from include/net/dst_metadata.h:5,
from net//ipv4/route.c:94:
   net//ipv4/route.c: In function 'ip_route_output_key_hash_rcu':
>> include/net/ip_fib.h:167:32: error: request for member 'fi' in something not 
>> a structure or union
#define FIB_RES_NH(res)  ((res).fi->fib_nh[0])
   ^
   include/net/ip_fib.h:196:28: note: in expansion of macro 'FIB_RES_NH'
#define FIB_RES_DEV(res)  (FIB_RES_NH(res).nh_dev)
   ^
   net//ipv4/route.c:2399:35: note: in expansion of macro 'FIB_RES_DEV'
  dev_out = l3mdev_master_dev_rcu(FIB_RES_DEV(res)) ? :
  ^
   net//ipv4/route.c: In function 'ip_route_output_key_hash':
   net//ipv4/route.c:2430:53: error: 'mp_hash' undeclared (first use in this 
function)
 rth = ip_route_output_key_hash_rcu(net, fl4, , mp_hash);
^
   net//ipv4/route.c:2430:53: note: each undeclared identifier is reported only 
once for each function it appears in
   net//ipv4/route.c:2430:47: error: passing argument 3 of 
'ip_route_output_key_hash_rcu' from incompatible pointer type 
[-Werror=incompatible-pointer-types]
 rth = ip_route_output_key_hash_rcu(net, fl4, , mp_hash);
  ^
   net//ipv4/route.c:2249:16: note: expected 'const struct sk_buff *' but 
argument is of type 'struct fib_result *'
struct rtable *ip_route_output_key_hash_rcu(struct net *net, struct flowi4 
*fl4,
   ^
   cc1: some warnings being treated as errors

vim +/fi +167 include/net/ip_fib.h

5f300893f Thomas Graf2006-11-09  151u32 fl_mark;
246955fe4 Robert Olsson  2005-06-20  152unsigned char   fl_tos;
246955fe4 Robert Olsson  2005-06-20  153unsigned char   fl_scope;
246955fe4 Robert Olsson  2005-06-20  154unsigned char   tb_id_in;
246955fe4 Robert Olsson  2005-06-20  155  
246955fe4 Robert Olsson  2005-06-20  156unsigned char   tb_id;  /* 
Results */
246955fe4 Robert Olsson  2005-06-20  157unsigned char   prefixlen;
246955fe4 Robert Olsson  2005-06-20  158unsigned char   nh_sel;
246955fe4 Robert Olsson  2005-06-20  159unsigned char   type;

Re: [PATCH net-next 8/8] net: ipv6: RTM_GETROUTE: return matched fib result when requested

2017-05-24 Thread David Ahern

Since you have to do a v2 ...

On 5/24/17 12:19 PM, Roopa Prabhu wrote:
> @@ -3622,6 +3623,7 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, 
> struct nlmsghdr *nlh,
>   memset(, 0, sizeof(fl6));
>   rtm = nlmsg_data(nlh);
>   fl6.flowlabel = ip6_make_flowinfo(rtm->rtm_tos, 0);
> + fibmatch = (rtm->rtm_flags & RTM_F_FIB_MATCH) ? true : false;
this is typically done as !!(rtm->rtm_flags & RTM_F_FIB_MATCH)
>  
>   if (tb[RTA_SRC]) {
>   if (nla_len(tb[RTA_SRC]) < sizeof(struct in6_addr))
> @@ -3667,12 +3669,27 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, 
> struct nlmsghdr *nlh,
>   if (!ipv6_addr_any())
>   flags |= RT6_LOOKUP_F_HAS_SADDR;
>  
> - rt = (struct rt6_info *)ip6_route_input_lookup(net, dev, ,
> -flags);
> + if (!fibmatch)
> + rt = (struct rt6_info *)ip6_route_input_lookup(net, dev,
> +,
> +flags);
>   } else {
>   fl6.flowi6_oif = oif;
>  
> - rt = (struct rt6_info *)ip6_route_output(net, NULL, );
> + if (!fibmatch)
> + rt = (struct rt6_info *)ip6_route_output_flags(net,
> +NULL,
> +, 0);
> + }
> +
> + if (fibmatch) {
> + rt = (struct rt6_info *)ip6_route_lookup(net, , 0);
> + if (rt->dst.error) {
> + err = rt->dst.error;
> + ip6_rt_put(rt);
> + goto errout;
> + }
> +

I'd prefer to see the typecasts go away and use container_of to go from
dst_entry to rt6_info. I realize some of this is movement of existing
code, but better to clean up as we go.

[PATCH net-next 0/2] be2net: patch-set

2017-05-24 Thread Suresh Reddy

Hi Dave, Please consider applying these two patches to net-next

Suresh Reddy (2):
  be2net: Fix UE detection logic for BE3
  be2net: Update the driver version to 11.4.0.0

 drivers/net/ethernet/emulex/benet/be.h  |  2 +-
 drivers/net/ethernet/emulex/benet/be_hw.h   |  3 +++
 drivers/net/ethernet/emulex/benet/be_main.c | 27 +++
 3 files changed, 23 insertions(+), 9 deletions(-)

-- 
1.8.3.1

[PATCH net-next 1/2] be2net: Fix UE detection logic for BE3

2017-05-24 Thread Suresh Reddy

On certain platforms BE3 chips may indicate spurious UEs (unrecoverable
error). Because of the UE detection logic was disabled in the driver
for BE3 chips. Because of this, even in cases of a real UE,
a failover will not occur. This patch re-enables UE detection on BE3
and if a UE is detected, reads the POST register. If the POST register,
reports either a FAT_LOG_STATE or a ARMFW_UE, then it means that a valid
UE occurred in the chip.

Signed-off-by: Suresh Reddy 
---
 drivers/net/ethernet/emulex/benet/be_hw.h   |  3 +++
 drivers/net/ethernet/emulex/benet/be_main.c | 27 +++
 2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_hw.h 
b/drivers/net/ethernet/emulex/benet/be_hw.h
index 36e4232..c967f45 100644
--- a/drivers/net/ethernet/emulex/benet/be_hw.h
+++ b/drivers/net/ethernet/emulex/benet/be_hw.h
@@ -49,6 +49,9 @@
 #define POST_STAGE_BE_RESET0x3 /* Host wants to reset chip */
 #define POST_STAGE_ARMFW_RDY   0xc000  /* FW is done with POST */
 #define POST_STAGE_RECOVERABLE_ERR 0xE000  /* Recoverable err detected */
+/* FW has detected a UE and is dumping FAT log data */
+#define POST_STAGE_FAT_LOG_START   0x0D00
+#define POST_STAGE_ARMFW_UE0xF000  /*FW has asserted an UE*/
 
 /* Lancer SLIPORT registers */
 #define SLIPORT_STATUS_OFFSET  0x404
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index f3a09ab..8000551 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -3241,8 +3241,9 @@ void be_detect_error(struct be_adapter *adapter)
 {
u32 ue_lo = 0, ue_hi = 0, ue_lo_mask = 0, ue_hi_mask = 0;
u32 sliport_status = 0, sliport_err1 = 0, sliport_err2 = 0;
-   u32 i;
struct device *dev = >pdev->dev;
+   u16 val;
+   u32 i;
 
if (be_check_error(adapter, BE_ERROR_HW))
return;
@@ -3280,15 +3281,25 @@ void be_detect_error(struct be_adapter *adapter)
ue_lo = (ue_lo & ~ue_lo_mask);
ue_hi = (ue_hi & ~ue_hi_mask);
 
-   /* On certain platforms BE hardware can indicate spurious UEs.
-* Allow HW to stop working completely in case of a real UE.
-* Hence not setting the hw_error for UE detection.
-*/
-
if (ue_lo || ue_hi) {
+   /* On certain platforms BE3 hardware can indicate
+* spurious UEs. In case of a UE in the chip,
+* the POST register correctly reports either a
+* FAT_LOG_START state (FW is currently dumping
+* FAT log data) or a ARMFW_UE state. Check for the
+* above states to ascertain if the UE is valid or not.
+*/
+   if (BE3_chip(adapter)) {
+   val = be_POST_stage_get(adapter);
+   if ((val & POST_STAGE_FAT_LOG_START)
+!= POST_STAGE_FAT_LOG_START &&
+   (val & POST_STAGE_ARMFW_UE)
+!= POST_STAGE_ARMFW_UE)
+   return;
+   }
+
dev_err(dev, "Error detected in the adapter");
-   if (skyhawk_chip(adapter))
-   be_set_error(adapter, BE_ERROR_UE);
+   be_set_error(adapter, BE_ERROR_UE);
 
for (i = 0; ue_lo; ue_lo >>= 1, i++) {
if (ue_lo & 1)
-- 
1.8.3.1

[PATCH net-next 2/2] be2net: Update the driver version to 11.4.0.0

2017-05-24 Thread Suresh Reddy

Signed-off-by: Suresh Reddy 
---
 drivers/net/ethernet/emulex/benet/be.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index 5056624..674cf9d 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -37,7 +37,7 @@
 #include "be_hw.h"
 #include "be_roce.h"
 
-#define DRV_VER"11.1.0.0"
+#define DRV_VER"11.4.0.0"
 #define DRV_NAME   "be2net"
 #define BE_NAME"Emulex BladeEngine2"
 #define BE3_NAME   "Emulex BladeEngine3"
-- 
1.8.3.1

[PATCH net-next] ibmvnic: Enable TSO support

2017-05-24 Thread Thomas Falcon

Enable TSO in the ibmvnic driver. Scatter-gather is also enabled,
though there currently is no support for scatter-gather in
vNIC-compatible hardware, resulting in forced linearization
of all fragmented SKB's. Though not ideal, given the throughput
improvement gained by enabling TSO, it has been decided
that this is an acceptable tradeoff.

The feature is also enabled by a module parameter.
This parameter is necessary because TSO can not easily be
enabled or disabled in firmware without reinitializing the driver.

CC: Nathan Fontenot 
CC: John Allen 
Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 39 +++---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index abe2b6e..64e8d50 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -81,6 +81,11 @@
 static const char ibmvnic_driver_name[] = "ibmvnic";
 static const char ibmvnic_driver_string[] = "IBM System i/p Virtual NIC 
Driver";
 
+static int large_send_offload;
+module_param(large_send_offload, bool, 0644);
+MODULE_PARM_DESC(large_send_offload,
+"Determines whether large send offload is enabled");
+
 MODULE_AUTHOR("Santiago Leon ");
 MODULE_DESCRIPTION("IBM System i/p Virtual NIC Driver");
 MODULE_LICENSE("GPL");
@@ -1025,6 +1030,17 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct 
net_device *netdev)
goto out;
}
 
+   /* All scatter-gather SKB's will be linearized for the time being, but
+* scatter-gather is only enabled if the user wishes to use TSO.
+*/
+   if (skb_shinfo(skb)->nr_frags && __skb_linearize(skb)) {
+   dev_kfree_skb_any(skb);
+   tx_send_failed++;
+   tx_dropped++;
+   ret = NETDEV_TX_OK;
+   goto out;
+   }
+
tx_pool = >tx_pool[queue_num];
tx_scrq = adapter->tx_scrq[queue_num];
txq = netdev_get_tx_queue(netdev, skb_get_queue_mapping(skb));
@@ -1082,6 +1098,11 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct 
net_device *netdev)
tx_crq.v1.flags1 |= IBMVNIC_TX_CHKSUM_OFFLOAD;
hdrs += 2;
}
+   if (skb_is_gso(skb)) {
+   tx_crq.v1.flags1 |= IBMVNIC_TX_LSO;
+   tx_crq.v1.mss = cpu_to_be16(skb_shinfo(skb)->gso_size);
+   hdrs += 2;
+   }
/* determine if l2/3/4 headers are sent to firmware */
if ((*hdrs >> 7) & 1 &&
(skb->protocol == htons(ETH_P_IP) ||
@@ -2629,10 +2650,10 @@ static void handle_query_ip_offload_rsp(struct 
ibmvnic_adapter *adapter)
adapter->ip_offload_ctrl.udp_ipv4_chksum = buf->udp_ipv4_chksum;
adapter->ip_offload_ctrl.tcp_ipv6_chksum = buf->tcp_ipv6_chksum;
adapter->ip_offload_ctrl.udp_ipv6_chksum = buf->udp_ipv6_chksum;
+   adapter->ip_offload_ctrl.large_tx_ipv4 = buf->large_tx_ipv4;
+   adapter->ip_offload_ctrl.large_tx_ipv6 = buf->large_tx_ipv6;
 
-   /* large_tx/rx disabled for now, additional features needed */
-   adapter->ip_offload_ctrl.large_tx_ipv4 = 0;
-   adapter->ip_offload_ctrl.large_tx_ipv6 = 0;
+   /* large_rx disabled for now, additional features needed */
adapter->ip_offload_ctrl.large_rx_ipv4 = 0;
adapter->ip_offload_ctrl.large_rx_ipv6 = 0;
 
@@ -2648,6 +2669,18 @@ static void handle_query_ip_offload_rsp(struct 
ibmvnic_adapter *adapter)
(NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM)))
adapter->netdev->features |= NETIF_F_RXCSUM;
 
+   if (large_send_offload) {
+   /* Scatter-gather is not currently supported by firmware.
+* It will only be enabled to support TSO operations.
+*/
+   adapter->netdev->features = NETIF_F_SG;
+
+   if (buf->large_tx_ipv4)
+   adapter->netdev->features |= NETIF_F_TSO;
+   if (buf->large_tx_ipv6)
+   adapter->netdev->features |= NETIF_F_TSO6;
+   }
+
memset(, 0, sizeof(crq));
crq.control_ip_offload.first = IBMVNIC_CRQ_CMD;
crq.control_ip_offload.cmd = CONTROL_IP_OFFLOAD;
-- 
1.8.3.1

Re: [PATCH net-next 7/8] net: ipv4: RTM_GETROUTE: return matched fib result when requested

2017-05-24 Thread David Ahern

On 5/24/17 12:19 PM, Roopa Prabhu wrote:

> @@ -2746,8 +2748,15 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, 
> struct nlmsghdr *nlh,
>   if (rtm->rtm_flags & RTM_F_LOOKUP_TABLE)
>   table_id = rt->rt_table_id;
>  
> - err = rt_fill_info(net, dst, src, table_id, , skb,
> -NETLINK_CB(in_skb).portid, nlh->nlmsg_seq, rt);
> + if (rtm->rtm_flags & RTM_F_FIB_MATCH)
> + err = fib_dump_info(skb, NETLINK_CB(in_skb).portid,
> + nlh->nlmsg_seq, RTM_NEWROUTE, table_id,
> + rt->rt_type, res.prefix, res.prefixlen,
> + fl4.flowi4_tos, res.fi, 0);

I like this part -- much simpler than duplicating the attributes which
is where I was headed.

Re: [PATCH net-next 1/8] net: ipv4: refactor __ip_route_output_key_hash

2017-05-24 Thread kbuild test robot

Hi David,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Roopa-Prabhu/net-extend-RTM_GETROUTE-to-return-fib-result/20170525-053253
config: cris-etrax-100lx_v2_defconfig (attached as .config)
compiler: cris-linux-gcc (GCC) 6.2.0
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=cris 

Note: the 
linux-review/Roopa-Prabhu/net-extend-RTM_GETROUTE-to-return-fib-result/20170525-053253
 HEAD 083c4ee9e124d0acf29d159ced8a22cb41665a7a builds fine.
  It only hurts bisectibility.

All error/warnings (new ones prefixed by >>):

   In file included from include/net/route.h:31:0,
from include/net/lwtunnel.h:8,
from include/net/ip_tunnels.h:17,
from include/net/dst_metadata.h:5,
from net/ipv4/route.c:94:
   net/ipv4/route.c: In function 'ip_route_output_key_hash_rcu':
>> include/net/ip_fib.h:167:32: error: 'res' is a pointer; did you mean to use 
>> '->'?
#define FIB_RES_NH(res)  ((res).fi->fib_nh[0])
   ^
  ->
>> include/net/ip_fib.h:196:28: note: in expansion of macro 'FIB_RES_NH'
#define FIB_RES_DEV(res)  (FIB_RES_NH(res).nh_dev)
   ^~
>> net/ipv4/route.c:2399:35: note: in expansion of macro 'FIB_RES_DEV'
  dev_out = l3mdev_master_dev_rcu(FIB_RES_DEV(res)) ? :
  ^~~
   net/ipv4/route.c: In function 'ip_route_output_key_hash':
>> net/ipv4/route.c:2430:53: error: 'mp_hash' undeclared (first use in this 
>> function)
 rth = ip_route_output_key_hash_rcu(net, fl4, , mp_hash);
^~~
   net/ipv4/route.c:2430:53: note: each undeclared identifier is reported only 
once for each function it appears in
>> net/ipv4/route.c:2430:47: error: passing argument 3 of 
>> 'ip_route_output_key_hash_rcu' from incompatible pointer type 
>> [-Werror=incompatible-pointer-types]
 rth = ip_route_output_key_hash_rcu(net, fl4, , mp_hash);
  ^
   net/ipv4/route.c:2249:16: note: expected 'const struct sk_buff *' but 
argument is of type 'struct fib_result *'
struct rtable *ip_route_output_key_hash_rcu(struct net *net, struct flowi4 
*fl4,
   ^~~~
   cc1: some warnings being treated as errors
--
   In file included from include/net/route.h:31:0,
from include/net/lwtunnel.h:8,
from include/net/ip_tunnels.h:17,
from include/net/dst_metadata.h:5,
from net//ipv4/route.c:94:
   net//ipv4/route.c: In function 'ip_route_output_key_hash_rcu':
>> include/net/ip_fib.h:167:32: error: 'res' is a pointer; did you mean to use 
>> '->'?
#define FIB_RES_NH(res)  ((res).fi->fib_nh[0])
   ^
  ->
>> include/net/ip_fib.h:196:28: note: in expansion of macro 'FIB_RES_NH'
#define FIB_RES_DEV(res)  (FIB_RES_NH(res).nh_dev)
   ^~
   net//ipv4/route.c:2399:35: note: in expansion of macro 'FIB_RES_DEV'
  dev_out = l3mdev_master_dev_rcu(FIB_RES_DEV(res)) ? :
  ^~~
   net//ipv4/route.c: In function 'ip_route_output_key_hash':
   net//ipv4/route.c:2430:53: error: 'mp_hash' undeclared (first use in this 
function)
 rth = ip_route_output_key_hash_rcu(net, fl4, , mp_hash);
^~~
   net//ipv4/route.c:2430:53: note: each undeclared identifier is reported only 
once for each function it appears in
   net//ipv4/route.c:2430:47: error: passing argument 3 of 
'ip_route_output_key_hash_rcu' from incompatible pointer type 
[-Werror=incompatible-pointer-types]
 rth = ip_route_output_key_hash_rcu(net, fl4, , mp_hash);
  ^
   net//ipv4/route.c:2249:16: note: expected 'const struct sk_buff *' but 
argument is of type 'struct fib_result *'
struct rtable *ip_route_output_key_hash_rcu(struct net *net, struct flowi4 
*fl4,
   ^~~~
   cc1: some warnings being treated as errors

vim +/mp_hash +2430 net/ipv4/route.c

  2393  fl4->saddr = res->fi->fib_prefsrc;
  2394  else
  2395  fl4->saddr = fl4->daddr;
  2396  }
  2397  
  2398  /* L3 master device is the loopback for that domain */
> 2399  dev_out = l3mdev_master_dev_rcu(FIB_RES_DEV(res)) ? :
  2400  net->loopback_dev;
  2401  fl4->flowi4_oif = dev_out->ifindex;

Re: Deleting a dynamic mac entry..

2017-05-24 Thread Toshiaki Makita

On 2017/05/25 3:05, Manohar Kumar wrote:
> Thanks, Toshiaki.
> 
> What is the right way to set the default_pvid using the bridge command
> ? I tried this, which fails..
> 
> root@net-3:~# ip link set dev vxlan0 name untagged type vlan id 0
> RTNETLINK answers: Operation not supported
> root@net-3:~#
> 
> All the interfaces in the bridge are untagged.

# ip link set br0 down
# echo 0 > /sys/class/net/br0/bridge/default_pvid


Toshiaki Makita

Re: [PATCH net-next 1/8] net: ipv4: refactor __ip_route_output_key_hash

2017-05-24 Thread David Ahern

On 5/24/17 1:33 PM, Rosen, Rami wrote:
> Hi, Rupa /David Ahern,
> 
> First, thanks for this patch set!
> 
> Second, it seems to me that something might be incorrect here.
> 
> You have these additions in this patch  (1/8):
> ...
> +struct rtable *ip_route_output_key_hash_rcu(struct net *net, struct flowi4 
> *flp,
> + const struct sk_buff *skb,
> + struct fib_result *res);
> ...
> +struct rtable *ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
> + const struct sk_buff *skb)
> +{
> + struct fib_result res;
> + struct rtable *rth;
> +
> + res.tclassid= 0;
> + res.fi  = NULL;
> + res.table   = NULL;
> +
> + rcu_read_lock();
> + rth = ip_route_output_key_hash_rcu(net, fl4, , mp_hash);
>   rcu_read_unlock();
> +
>   return rth;
>  }
> -EXPORT_SYMBOL_GPL(__ip_route_output_key_hash);
> +EXPORT_SYMBOL_GPL(ip_route_output_key_hash);
>  
> 
> So the third parameter to ip_route_output_key_hash_rcu() should be skb*, and 
> the fourth parameter should be fib_result *. However, you do not pass the skb 
> parameter 
> when calling ip_route_output_key_hash_rcu() in 
> ip_route_output_key_hash()  (in fact you don't use it at all),  and you pass 
> mp_hash as the fourth parameter.

Yep, it's a problem with the forward port of the first round of patches.

Roopa: in include/net/route.h, __ip_route_output_key_hash should be
removed as well.

Re: [PATCH net-next 3/3] net: sysfs: Document PHY device sysfs attributes

2017-05-24 Thread Florian Fainelli

On 05/24/2017 04:07 PM, Florian Fainelli wrote:
> Hi Andrew,
> 
> On 05/24/2017 01:10 PM, Andrew Lunn wrote:
>>> +What:  /sys/class/mdio_bus///phy_interface
>>> +Date:  February 2014
>>> +KernelVersion: 3.15
>>> +Contact:   netdev@vger.kernel.org
>>> +Description:
>>> +   String value indicating the PHY interface, possible
>>> +   values are in include/linux/phy.h function phy_modes.
>>
>> Hi Florian
>>
>> Does this suggest that these strings should be moved to
>> include/uapi/linux/phy.h?
> 
> Humm, I suppose we could do that, although I am not sure ally sure what
> we would be putting in there (at least for now).

I actually prefer to put the possible values in the sysfs documentation
files instead of in a UAPI header. Will submit a v2.
-- 
Florian

Re: [PATCH 0/2] Replace driver's usage of hard-coded device IDs to #defines

2017-05-24 Thread David Miller

From: Myron Stowe 
Date: Wed, 24 May 2017 16:47:34 -0600

> Noa Osherovich introduced a series of new Mellanox device ID definitions to
> help differentiate specific controllers that needed INTx masking quirks [1].
> 
> Bjorn Helgaas followed on, using the device ID definitions Noa provided to
> replace hard-coded values within the mxl4 ID table [2].
> 
> This patch continues along similar lines, adding a few additional Mellanox
> device ID definitions and converting the net/mlx5e driver's mlx5 ID table to
> use the defines so tools like 'grep' and 'cscope' can be used to help
> identify relationships with other aspects (such as INTx masking).

If you're adding pci_ids.h defines, it's only valid to do so if you
actually use the defines in more than one location.

This patch series is not doing that.

Re: [PATCH] test_bpf: Add a couple of tests for BPF_JSGE.

2017-05-24 Thread Daniel Borkmann


On 05/25/2017 01:35 AM, David Daney wrote:

Some JITs can optimize comparisons with zero.  Add a couple of
BPF_JSGE tests against immediate zero.

Signed-off-by: David Daney 


Thanks for the tests!

Acked-by: Daniel Borkmann

[PATCH] test_bpf: Add a couple of tests for BPF_JSGE.

2017-05-24 Thread David Daney

Some JITs can optimize comparisons with zero.  Add a couple of
BPF_JSGE tests against immediate zero.

Signed-off-by: David Daney 
---
 lib/test_bpf.c | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index 889bc31..be88cba 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -4504,6 +4504,44 @@ static struct bpf_test tests[] = {
{ },
{ { 0, 1 } },
},
+   {
+   "JMP_JSGE_K: Signed jump: value walk 1",
+   .u.insns_int = {
+   BPF_ALU32_IMM(BPF_MOV, R0, 0),
+   BPF_LD_IMM64(R1, -3),
+   BPF_JMP_IMM(BPF_JSGE, R1, 0, 6),
+   BPF_ALU64_IMM(BPF_ADD, R1, 1),
+   BPF_JMP_IMM(BPF_JSGE, R1, 0, 4),
+   BPF_ALU64_IMM(BPF_ADD, R1, 1),
+   BPF_JMP_IMM(BPF_JSGE, R1, 0, 2),
+   BPF_ALU64_IMM(BPF_ADD, R1, 1),
+   BPF_JMP_IMM(BPF_JSGE, R1, 0, 1),
+   BPF_EXIT_INSN(),/* bad exit */
+   BPF_ALU32_IMM(BPF_MOV, R0, 1),  /* good exit */
+   BPF_EXIT_INSN(),
+   },
+   INTERNAL,
+   { },
+   { { 0, 1 } },
+   },
+   {
+   "JMP_JSGE_K: Signed jump: value walk 2",
+   .u.insns_int = {
+   BPF_ALU32_IMM(BPF_MOV, R0, 0),
+   BPF_LD_IMM64(R1, -3),
+   BPF_JMP_IMM(BPF_JSGE, R1, 0, 4),
+   BPF_ALU64_IMM(BPF_ADD, R1, 2),
+   BPF_JMP_IMM(BPF_JSGE, R1, 0, 2),
+   BPF_ALU64_IMM(BPF_ADD, R1, 2),
+   BPF_JMP_IMM(BPF_JSGE, R1, 0, 1),
+   BPF_EXIT_INSN(),/* bad exit */
+   BPF_ALU32_IMM(BPF_MOV, R0, 1),  /* good exit */
+   BPF_EXIT_INSN(),
+   },
+   INTERNAL,
+   { },
+   { { 0, 1 } },
+   },
/* BPF_JMP | BPF_JGT | BPF_K */
{
"JMP_JGT_K: if (3 > 2) return 1",
-- 
2.9.4

linux-next: manual merge of the net-next tree with the net tree

2017-05-24 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  drivers/net/phy/marvell.c

between commit:

  f2899788353c ("net: phy: marvell: Limit errata to 88m1101")

from the net tree and commit:

  0c3439bc7773 ("net: phy: Marvell: checkpatch - Comments")

from the net-next tree.

I fixed it up (I just used the former version) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

Re: [PATCH net-next 3/3] net: sysfs: Document PHY device sysfs attributes

2017-05-24 Thread Florian Fainelli

Hi Andrew,

On 05/24/2017 01:10 PM, Andrew Lunn wrote:
>> +What:   /sys/class/mdio_bus///phy_interface
>> +Date:   February 2014
>> +KernelVersion:  3.15
>> +Contact:netdev@vger.kernel.org
>> +Description:
>> +String value indicating the PHY interface, possible
>> +values are in include/linux/phy.h function phy_modes.
> 
> Hi Florian
> 
> Does this suggest that these strings should be moved to
> include/uapi/linux/phy.h?

Humm, I suppose we could do that, although I am not sure ally sure what
we would be putting in there (at least for now).
-- 
Florian

[PATCH net v2 2/5] bpf: properly reset caller saved regs after helper call and ld_abs/ind

2017-05-24 Thread Daniel Borkmann

Currently, after performing helper calls, we clear all caller saved
registers, that is r0 - r5 and fill r0 depending on struct bpf_func_proto
specification. The way we reset these regs can affect pruning decisions
in later paths, since we only reset register's imm to 0 and type to
NOT_INIT. However, we leave out clearing of other variables such as id,
min_value, max_value, etc, which can later on lead to pruning mismatches
due to stale data.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/verifier.c | 37 -
 1 file changed, 16 insertions(+), 21 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e37e06b..339c8a1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -463,19 +463,22 @@ static struct bpf_verifier_state *push_stack(struct 
bpf_verifier_env *env,
BPF_REG_0, BPF_REG_1, BPF_REG_2, BPF_REG_3, BPF_REG_4, BPF_REG_5
 };
 
+static void mark_reg_not_init(struct bpf_reg_state *regs, u32 regno)
+{
+   BUG_ON(regno >= MAX_BPF_REG);
+
+   memset([regno], 0, sizeof(regs[regno]));
+   regs[regno].type = NOT_INIT;
+   regs[regno].min_value = BPF_REGISTER_MIN_RANGE;
+   regs[regno].max_value = BPF_REGISTER_MAX_RANGE;
+}
+
 static void init_reg_state(struct bpf_reg_state *regs)
 {
int i;
 
-   for (i = 0; i < MAX_BPF_REG; i++) {
-   regs[i].type = NOT_INIT;
-   regs[i].imm = 0;
-   regs[i].min_value = BPF_REGISTER_MIN_RANGE;
-   regs[i].max_value = BPF_REGISTER_MAX_RANGE;
-   regs[i].min_align = 0;
-   regs[i].aux_off = 0;
-   regs[i].aux_off_align = 0;
-   }
+   for (i = 0; i < MAX_BPF_REG; i++)
+   mark_reg_not_init(regs, i);
 
/* frame pointer */
regs[BPF_REG_FP].type = FRAME_PTR;
@@ -1346,7 +1349,6 @@ static int check_call(struct bpf_verifier_env *env, int 
func_id, int insn_idx)
struct bpf_verifier_state *state = >cur_state;
const struct bpf_func_proto *fn = NULL;
struct bpf_reg_state *regs = state->regs;
-   struct bpf_reg_state *reg;
struct bpf_call_arg_meta meta;
bool changes_data;
int i, err;
@@ -1413,11 +1415,8 @@ static int check_call(struct bpf_verifier_env *env, int 
func_id, int insn_idx)
}
 
/* reset caller saved regs */
-   for (i = 0; i < CALLER_SAVED_REGS; i++) {
-   reg = regs + caller_saved[i];
-   reg->type = NOT_INIT;
-   reg->imm = 0;
-   }
+   for (i = 0; i < CALLER_SAVED_REGS; i++)
+   mark_reg_not_init(regs, caller_saved[i]);
 
/* update return register */
if (fn->ret_type == RET_INTEGER) {
@@ -2445,7 +2444,6 @@ static int check_ld_abs(struct bpf_verifier_env *env, 
struct bpf_insn *insn)
 {
struct bpf_reg_state *regs = env->cur_state.regs;
u8 mode = BPF_MODE(insn->code);
-   struct bpf_reg_state *reg;
int i, err;
 
if (!may_access_skb(env->prog->type)) {
@@ -2478,11 +2476,8 @@ static int check_ld_abs(struct bpf_verifier_env *env, 
struct bpf_insn *insn)
}
 
/* reset caller saved regs to unreadable */
-   for (i = 0; i < CALLER_SAVED_REGS; i++) {
-   reg = regs + caller_saved[i];
-   reg->type = NOT_INIT;
-   reg->imm = 0;
-   }
+   for (i = 0; i < CALLER_SAVED_REGS; i++)
+   mark_reg_not_init(regs, caller_saved[i]);
 
/* mark destination R0 register as readable, since it contains
 * the value fetched from the packet
-- 
1.9.3

[PATCH net v2 4/5] bpf: fix wrong exposure of map_flags into fdinfo for lpm

2017-05-24 Thread Daniel Borkmann

trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via
attr->map_flags, since it does not support preallocation yet. We
check the flag, but we never copy the flag into trie->map.map_flags,
which is later on exposed into fdinfo and used by loaders such as
iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test
whether a pinned map has the same spec as the one from the BPF obj
file and if not, bails out, which is currently the case for lpm
since it exposes always 0 as flags.

Also copy over flags in array_map_alloc() and stack_map_alloc().
They always have to be 0 right now, but we should make sure to not
miss to copy them over at a later point in time when we add actual
flags for them to use.

Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
Reported-by: Jarno Rajahalme 
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/arraymap.c | 1 +
 kernel/bpf/lpm_trie.c | 1 +
 kernel/bpf/stackmap.c | 1 +
 3 files changed, 3 insertions(+)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 5e00b23..172dc8e 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -86,6 +86,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
array->map.key_size = attr->key_size;
array->map.value_size = attr->value_size;
array->map.max_entries = attr->max_entries;
+   array->map.map_flags = attr->map_flags;
array->elem_size = elem_size;
 
if (!percpu)
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 39cfafd..b09185f 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -432,6 +432,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
trie->map.key_size = attr->key_size;
trie->map.value_size = attr->value_size;
trie->map.max_entries = attr->max_entries;
+   trie->map.map_flags = attr->map_flags;
trie->data_size = attr->key_size -
  offsetof(struct bpf_lpm_trie_key, data);
trie->max_prefixlen = trie->data_size * 8;
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 4dfd6f2..31147d7 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -88,6 +88,7 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
smap->map.key_size = attr->key_size;
smap->map.value_size = value_size;
smap->map.max_entries = attr->max_entries;
+   smap->map.map_flags = attr->map_flags;
smap->n_buckets = n_buckets;
smap->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
 
-- 
1.9.3

[PATCH net v2 3/5] bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data

2017-05-24 Thread Daniel Borkmann

The bpf_clone_redirect() still needs to be listed in
bpf_helper_changes_pkt_data() since we call into
bpf_try_make_head_writable() from there, thus we need
to invalidate prior pkt regs as well.

Fixes: 36bbef52c7eb ("bpf: direct packet write and access for helpers for 
clsact progs")
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 net/core/filter.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index a253a61..a6bb95f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2281,6 +2281,7 @@ bool bpf_helper_changes_pkt_data(void *func)
func == bpf_skb_change_head ||
func == bpf_skb_change_tail ||
func == bpf_skb_pull_data ||
+   func == bpf_clone_redirect ||
func == bpf_l3_csum_replace ||
func == bpf_l4_csum_replace ||
func == bpf_xdp_adjust_head)
-- 
1.9.3

[PATCH net v2 5/5] bpf: add various verifier test cases

2017-05-24 Thread Daniel Borkmann

This patch adds various verifier test cases:

1) A test case for the pruning issue when tracking alignment
   is used.
2) Various PTR_TO_MAP_VALUE_OR_NULL tests to make sure pointer
   arithmetic turns such register into UNKNOWN_VALUE type.
3) Test cases for the special treatment of LD_ABS/LD_IND to
   make sure verifier doesn't break calling convention here.
   Latter is needed, since f.e. arm64 JIT uses r1 - r5 for
   storing temporary data, so they really must be marked as
   NOT_INIT.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 include/linux/filter.h  |  10 ++
 tools/include/linux/filter.h|  10 ++
 tools/testing/selftests/bpf/test_verifier.c | 239 +++-
 3 files changed, 255 insertions(+), 4 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 56197f8..62d948f 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -272,6 +272,16 @@
.off   = OFF,   \
.imm   = IMM })
 
+/* Unconditional jumps, goto pc + off16 */
+
+#define BPF_JMP_A(OFF) \
+   ((struct bpf_insn) {\
+   .code  = BPF_JMP | BPF_JA,  \
+   .dst_reg = 0,   \
+   .src_reg = 0,   \
+   .off   = OFF,   \
+   .imm   = 0 })
+
 /* Function call */
 
 #define BPF_EMIT_CALL(FUNC)\
diff --git a/tools/include/linux/filter.h b/tools/include/linux/filter.h
index 390d7c9..4ce25d4 100644
--- a/tools/include/linux/filter.h
+++ b/tools/include/linux/filter.h
@@ -208,6 +208,16 @@
.off   = OFF,   \
.imm   = IMM })
 
+/* Unconditional jumps, goto pc + off16 */
+
+#define BPF_JMP_A(OFF) \
+   ((struct bpf_insn) {\
+   .code  = BPF_JMP | BPF_JA,  \
+   .dst_reg = 0,   \
+   .src_reg = 0,   \
+   .off   = OFF,   \
+   .imm   = 0 })
+
 /* Function call */
 
 #define BPF_EMIT_CALL(FUNC)\
diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 3773562..cabb19b 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -49,6 +49,7 @@
 #define MAX_NR_MAPS4
 
 #define F_NEEDS_EFFICIENT_UNALIGNED_ACCESS (1 << 0)
+#define F_LOAD_WITH_STRICT_ALIGNMENT   (1 << 1)
 
 struct bpf_test {
const char *descr;
@@ -2615,6 +2616,30 @@ struct test_val {
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
},
{
+   "direct packet access: test17 (pruning, alignment)",
+   .insns = {
+   BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+   offsetof(struct __sk_buff, data)),
+   BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+   offsetof(struct __sk_buff, data_end)),
+   BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+   offsetof(struct __sk_buff, mark)),
+   BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 14),
+   BPF_JMP_IMM(BPF_JGT, BPF_REG_7, 1, 4),
+   BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 1),
+   BPF_STX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, -4),
+   BPF_MOV64_IMM(BPF_REG_0, 0),
+   BPF_EXIT_INSN(),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 1),
+   BPF_JMP_A(-6),
+   },
+   .errstr = "misaligned packet access off 2+15+-4 size 4",
+   .result = REJECT,
+   .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+   .flags = F_LOAD_WITH_STRICT_ALIGNMENT,
+   },
+   {
"helper access to packet: test1, valid packet_ptr range",
.insns = {
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
@@ -3341,6 +3366,70 @@ struct test_val {
.prog_type = BPF_PROG_TYPE_SCHED_CLS
},
{
+   "alu ops on ptr_to_map_value_or_null, 1",
+   .insns = {
+   BPF_MOV64_IMM(BPF_REG_1, 10),
+   BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_1, -8),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+

[PATCH net v2 1/5] bpf: fix incorrect pruning decision when alignment must be tracked

2017-05-24 Thread Daniel Borkmann

Currently, when we enforce alignment tracking on direct packet access,
the verifier lets the following program pass despite doing a packet
write with unaligned access:

  0: (61) r2 = *(u32 *)(r1 +76)
  1: (61) r3 = *(u32 *)(r1 +80)
  2: (61) r7 = *(u32 *)(r1 +8)
  3: (bf) r0 = r2
  4: (07) r0 += 14
  5: (25) if r7 > 0x1 goto pc+4
   R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
   R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
  6: (2d) if r0 > r3 goto pc+1
   R0=pkt(id=0,off=14,r=14) R1=ctx R2=pkt(id=0,off=0,r=14)
   R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
  7: (63) *(u32 *)(r0 -4) = r0
  8: (b7) r0 = 0
  9: (95) exit

  from 6 to 8:
   R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
   R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
  8: (b7) r0 = 0
  9: (95) exit

  from 5 to 10:
   R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
   R3=pkt_end R7=inv,min_value=2 R10=fp
  10: (07) r0 += 1
  11: (05) goto pc-6
  6: safe   <- here, wrongly found safe
  processed 15 insns

However, if we enforce a pruning mismatch by adding state into r8
which is then being mismatched in states_equal(), we find that for
the otherwise same program, the verifier detects a misaligned packet
access when actually walking that path:

  0: (61) r2 = *(u32 *)(r1 +76)
  1: (61) r3 = *(u32 *)(r1 +80)
  2: (61) r7 = *(u32 *)(r1 +8)
  3: (b7) r8 = 1
  4: (bf) r0 = r2
  5: (07) r0 += 14
  6: (25) if r7 > 0x1 goto pc+4
   R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
   R3=pkt_end R7=inv,min_value=0,max_value=1
   R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
  7: (2d) if r0 > r3 goto pc+1
   R0=pkt(id=0,off=14,r=14) R1=ctx R2=pkt(id=0,off=0,r=14)
   R3=pkt_end R7=inv,min_value=0,max_value=1
   R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
  8: (63) *(u32 *)(r0 -4) = r0
  9: (b7) r0 = 0
  10: (95) exit

  from 7 to 9:
   R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
   R3=pkt_end R7=inv,min_value=0,max_value=1
   R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
  9: (b7) r0 = 0
  10: (95) exit

  from 6 to 11:
   R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
   R3=pkt_end R7=inv,min_value=2
   R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
  11: (07) r0 += 1
  12: (b7) r8 = 0
  13: (05) goto pc-7<- mismatch due to r8
  7: (2d) if r0 > r3 goto pc+1
   R0=pkt(id=0,off=15,r=15) R1=ctx R2=pkt(id=0,off=0,r=15)
   R3=pkt_end R7=inv,min_value=2
   R8=imm0,min_value=0,max_value=0,min_align=2147483648 R10=fp
  8: (63) *(u32 *)(r0 -4) = r0
  misaligned packet access off 2+15+-4 size 4

The reason why we fail to see it in states_equal() is that the
third test in compare_ptrs_to_packet() ...

  if (old->off <= cur->off &&
  old->off >= old->range && cur->off >= cur->range)
  return true;

... will let the above pass. The situation we run into is that
old->off <= cur->off (14 <= 15), meaning that prior walked paths
went with smaller offset, which was later used in the packet
access after successful packet range check and found to be safe
already.

For example: Given is R0=pkt(id=0,off=0,r=0). Adding offset 14
as in above program to it, results in R0=pkt(id=0,off=14,r=0)
before the packet range test. Now, testing this against R3=pkt_end
with 'if r0 > r3 goto out' will transform R0 into R0=pkt(id=0,off=14,r=14)
for the case when we're within bounds. A write into the packet
at offset *(u32 *)(r0 -4), that is, 2 + 14 -4, is valid and
aligned (2 is for NET_IP_ALIGN). After processing this with
all fall-through paths, we later on check paths from branches.
When the above skb->mark test is true, then we jump near the
end of the program, perform r0 += 1, and jump back to the
'if r0 > r3 goto out' test we've visited earlier already. This
time, R0 is of type R0=pkt(id=0,off=15,r=0), and we'll prune
that part because this time we'll have a larger safe packet
range, and we already found that with off=14 all further insn
were already safe, so it's safe as well with a larger off.
However, the problem is that the subsequent write into the packet
with 2 + 15 -4 is then unaligned, and not caught by the alignment
tracking. Note that min_align, aux_off, and aux_off_align were
all 0 in this example.

Since we cannot tell at this time what kind of packet access was
performed in the prior walk and what minimal requirements it has
(we might do so in the future, but that requires more complexity),
fix it to disable this pruning case for strict alignment for now,
and let the verifier do check such paths instead. With that applied,
the test cases pass and reject the program due to misalignment.

Fixes: d1174416747d ("bpf: Track alignment of register values in the verifier.")
Reference: http://patchwork.ozlabs.org/patch/761909/
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/verifier.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git

[PATCH net v2 0/5] Various BPF fixes

2017-05-24 Thread Daniel Borkmann

Follow-up to fix incorrect pruning when alignment tracking is
in use and to properly clear regs after call to not leave stale
data behind, also a fix that adds bpf_clone_redirect to the
bpf_helper_changes_pkt_data helper and exposes correct map_flags
for lpm map into fdinfo. For details, please see individual
patches.

Thanks!

v1 -> v2:
  - Reworked first patch so that env->strict_alignment is the
final indicator on whether we have to deal with strict
alignment rather than having CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
checks on various locations, so only checking env->strict_alignment
is sufficient after that. Thanks for spotting, Dave!
  - Added patch 3 and 4.
  - Rest as is.

Daniel Borkmann (5):
  bpf: fix incorrect pruning decision when alignment must be tracked
  bpf: properly reset caller saved regs after helper call and ld_abs/ind
  bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data
  bpf: fix wrong exposure of map_flags into fdinfo for lpm
  bpf: add various verifier test cases

 include/linux/filter.h  |  10 ++
 kernel/bpf/arraymap.c   |   1 +
 kernel/bpf/lpm_trie.c   |   1 +
 kernel/bpf/stackmap.c   |   1 +
 kernel/bpf/verifier.c   |  56 +++
 net/core/filter.c   |   1 +
 tools/include/linux/filter.h|  10 ++
 tools/testing/selftests/bpf/test_verifier.c | 239 +++-
 8 files changed, 285 insertions(+), 34 deletions(-)

-- 
1.9.3

[PATCH 2/2] net/mlx5e: Use device ID defines

2017-05-24 Thread Myron Stowe

Use Mellanox device ID definitions in the mlx5 ID table so tools such as
'grep' and 'cscope' can be used to help find related aspects.

No functional change intended.

Signed-off-by: Myron Stowe 
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 0c123d5..98642eb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1508,17 +1508,17 @@ static void shutdown(struct pci_dev *pdev)
 }
 
 static const struct pci_device_id mlx5_core_pci_table[] = {
-   { PCI_VDEVICE(MELLANOX, 0x1011) },  /* Connect-IB */
+   { PCI_VDEVICE(MELLANOX, PCI_DEVICE_ID_MELLANOX_CONNECTIB) },
{ PCI_VDEVICE(MELLANOX, 0x1012), MLX5_PCI_DEV_IS_VF},   /* Connect-IB 
VF */
-   { PCI_VDEVICE(MELLANOX, 0x1013) },  /* ConnectX-4 */
+   { PCI_VDEVICE(MELLANOX, PCI_DEVICE_ID_MELLANOX_CONNECTX4) },
{ PCI_VDEVICE(MELLANOX, 0x1014), MLX5_PCI_DEV_IS_VF},   /* ConnectX-4 
VF */
-   { PCI_VDEVICE(MELLANOX, 0x1015) },  /* ConnectX-4LX 
*/
+   { PCI_VDEVICE(MELLANOX, PCI_DEVICE_ID_MELLANOX_CONNECTX4_LX) },
{ PCI_VDEVICE(MELLANOX, 0x1016), MLX5_PCI_DEV_IS_VF},   /* ConnectX-4LX 
VF */
-   { PCI_VDEVICE(MELLANOX, 0x1017) },  /* ConnectX-5, 
PCIe 3.0 */
+   { PCI_VDEVICE(MELLANOX, PCI_DEVICE_ID_MELLANOX_CONNECTX5) },
{ PCI_VDEVICE(MELLANOX, 0x1018), MLX5_PCI_DEV_IS_VF},   /* ConnectX-5 
VF */
-   { PCI_VDEVICE(MELLANOX, 0x1019) },  /* ConnectX-5 
Ex */
+   { PCI_VDEVICE(MELLANOX, PCI_DEVICE_ID_MELLANOX_CONNECTX5_EX) },
{ PCI_VDEVICE(MELLANOX, 0x101a), MLX5_PCI_DEV_IS_VF},   /* ConnectX-5 
Ex VF */
-   { PCI_VDEVICE(MELLANOX, 0x101b) },  /* ConnectX-6 */
+   { PCI_VDEVICE(MELLANOX, PCI_DEVICE_ID_MELLANOX_CONNECTX6) },
{ PCI_VDEVICE(MELLANOX, 0x101c), MLX5_PCI_DEV_IS_VF},   /* ConnectX-6 
VF */
{ 0, }
 };

[PATCH 1/2] PCI: Add Mellanox device IDs

2017-05-24 Thread Myron Stowe

Add Mellanox device IDs for controllers covered by the mlx5 driver.

Signed-off-by: Myron Stowe 
---
 include/linux/pci_ids.h |3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 5f6b71d..5626d5a 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2266,6 +2266,9 @@
 #define PCI_DEVICE_ID_MELLANOX_CONNECTIB   0x1011
 #define PCI_DEVICE_ID_MELLANOX_CONNECTX4   0x1013
 #define PCI_DEVICE_ID_MELLANOX_CONNECTX4_LX0x1015
+#define PCI_DEVICE_ID_MELLANOX_CONNECTX5   0x1017
+#define PCI_DEVICE_ID_MELLANOX_CONNECTX5_EX0x1019
+#define PCI_DEVICE_ID_MELLANOX_CONNECTX6   0x101b
 #define PCI_DEVICE_ID_MELLANOX_TAVOR   0x5a44
 #define PCI_DEVICE_ID_MELLANOX_TAVOR_BRIDGE0x5a46
 #define PCI_DEVICE_ID_MELLANOX_SINAI_OLD   0x5e8c

[PATCH 0/2] Replace driver's usage of hard-coded device IDs to #defines

2017-05-24 Thread Myron Stowe

Noa Osherovich introduced a series of new Mellanox device ID definitions to
help differentiate specific controllers that needed INTx masking quirks [1].

Bjorn Helgaas followed on, using the device ID definitions Noa provided to
replace hard-coded values within the mxl4 ID table [2].

This patch continues along similar lines, adding a few additional Mellanox
device ID definitions and converting the net/mlx5e driver's mlx5 ID table to
use the defines so tools like 'grep' and 'cscope' can be used to help
identify relationships with other aspects (such as INTx masking).


[1]  7254383341b PCI: Add Mellanox device IDs
 d76d2fe05fd PCI: Convert Mellanox broken INTx quirks to be for listed
 devices only

[2]  c19e4b9037f net/mlx4_core: Use device ID defines


Myron Stowe (2):
  PCI: Add Mellanox device IDs
  net/mlx5e: Use device ID defines


 drivers/net/ethernet/mellanox/mlx5/core/main.c |   12 ++--
 include/linux/pci_ids.h|3 +++
 2 files changed, 9 insertions(+), 6 deletions(-)

--

[PATCH net] sky2: Do not deadlock on sky2_hw_down

2017-05-24 Thread Joshua Emele

From: Joshua Emele 

The sky2_hw_down uses sky2_tx_complete to free pending frames stuck in
the HW queue. Because sky2_hw_down can be called from a process context,
the call to u64_stats_update_begin can result in deadlock.

Because the statistics do not require update as part of the sky2_hw_down
sequence, prevent the update to avoid the deadlock.

[18198.003900] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[18198.009931] ifconfig/11604 [HC0[0]:SC0[0]:HE1:SE1] takes:
[18198.015348]  (>seq#2){+.?...}, at: [<805ed440>] 
sky2_hw_down+0x370/0x428
[18198.022827] {IN-SOFTIRQ-W} state was registered at:
[18198.027723]   [<801729a0>] lock_acquire+0x78/0x98
[18198.032484]   [<805ed048>] sky2_tx_complete+0x1a8/0x230
[18198.037760]   [<805f2718>] sky2_poll+0x7cc/0xfa8
[18198.042425]   [<807692f8>] net_rx_action+0x200/0x330
[18198.047447]   [<80129f64>] __do_softirq+0x130/0x31c
[18198.052369]   [<8012a4a0>] irq_exit+0xc8/0x13c
[18198.056836]   [<8017e398>] __handle_domain_irq+0x84/0xf8
[18198.062180]   [<80101564>] gic_handle_irq+0x58/0xb8
[18198.067086]   [<8010d838>] __irq_usr+0x58/0x80
[18198.071557]   [<76e178ae>] 0x76e178ae
[18198.075247] irq event stamp: 2109
[18198.078568] hardirqs last  enabled at (2109): [<8012a2cc>] 
__local_bh_enable_ip+0x90/0x110
[18198.086860] hardirqs last disabled at (2107): [<8012a27c>] 
__local_bh_enable_ip+0x40/0x110
[18198.095150] softirqs last  enabled at (2108): [<805ed368>] 
sky2_hw_down+0x298/0x428
[18198.102832] softirqs last disabled at (2106): [<805ed284>] 
sky2_hw_down+0x1b4/0x428
[18198.110515]
   other info that might help us debug this:
[18198.117048]  Possible unsafe locking scenario:

[18198.122974]CPU0
[18198.125425]
[18198.127876]   lock(>seq#2);
[18198.131332]   
[18198.133955] lock(>seq#2);
[18198.137585]
*** DEADLOCK ***

[18198.143517] 1 lock held by ifconfig/11604:
[18198.147618]  #0:  (rtnl_mutex){+.+.+.}, at: [<8077e874>] rtnl_lock+0x18/0x20
[18198.154772]
   stack backtrace:
[18198.159143] CPU: 1 PID: 11604 Comm: ifconfig Not tainted 4.8.17 #1
[18198.165331] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[18198.171864] Backtrace:
[18198.174353] [<8010c634>] (dump_backtrace) from [<8010c828>] 
(show_stack+0x18/0x1c)
[18198.181931]  r6:60070093 r5: r4:80e23248 r3:
[18198.187682] [<8010c810>] (show_stack) from [<8041edc4>] 
(dump_stack+0xb4/0xe8)
[18198.194924] [<8041ed10>] (dump_stack) from [<801700cc>] 
(print_usage_bug+0x1dc/0x2d0)
[18198.202762]  r10:edec55c0 r9:80bd1178 r8:81649598 r7:0006 r6:edec5a90 
r5:80fb09f4
[18198.210686]  r4:edec55c0 r3:0002
[18198.214311] [<8016fef0>] (print_usage_bug) from [<80170348>] 
(mark_lock+0x188/0x6c4)
[18198.222059]  r9: r8:0004 r7:edec55c0 r6:0006 r5:edec5a90 
r4:8016f44c
[18198.229903] [<801701c0>] (mark_lock) from [<801714ac>] 
(__lock_acquire+0xb90/0x1c70)
[18198.237653]  r10:edec55c0 r9: r8:edec5aa4 r7:0004 r6:0001 
r5:
[18198.245573]  r4:01cd r3:0001
[18198.249192] [<8017091c>] (__lock_acquire) from [<801729a0>] 
(lock_acquire+0x78/0x98)
[18198.256940]  r10:0001 r9:06d0 r8: r7:0001 r6:805ed440 
r5:60070013
[18198.264860]  r4:
[18198.267428] [<80172928>] (lock_acquire) from [<805ed048>] 
(sky2_tx_complete+0x1a8/0x230)
[18198.275523]  r7:ee2d9dfc r6: r5:ee2d9dc0 r4:
[18198.281266] [<805ecea0>] (sky2_tx_complete) from [<805ed440>] 
(sky2_hw_down+0x370/0x428)
[18198.289360]  r10:ee2cac00 r9:06d0 r8:ee2d9dc0 r7:f0d40aa8 r6:0001 
r5:0d48
[18198.297281]  r4:
[18198.299850] [<805ed0d0>] (sky2_hw_down) from [<805efed8>] 
(sky2_close+0xac/0x11c)
[18198.307337]  r10:ee838600 r9: r8: r7:1003 r6:ee2d9dc0 
r5:
[18198.315258]  r4:ee2cac00
[18198.317830] [<805efe2c>] (sky2_close) from [<8076a310>] 
(__dev_close_many+0xc4/0x118)
[18198.325664]  r7:1003 r6:1042 r5:eb047df8 r4:ee2d9800
[18198.331404] [<8076a24c>] (__dev_close_many) from [<8076ab24>] 
(__dev_close+0x30/0x48)
[18198.339240]  r5:0001 r4:ee2d9800
[18198.342866] [<8076aaf4>] (__dev_close) from [<8076dbac>] 
(__dev_change_flags+0x90/0x154)
[18198.350970] [<8076db1c>] (__dev_change_flags) from [<8076dc90>] 
(dev_change_flags+0x20/0x50)
[18198.359412]  r8: r7:8914 r6:1003 r5:ee2d9948 r4:ee2d9800 
r3:0014
[18198.367264] [<8076dc70>] (dev_change_flags) from [<807f06d0>] 
(devinet_ioctl+0x724/0x818)
[18198.375448]  r8: r7:8914 r6:ee2cae0c r5:7ed799ac r4:eb047e80 
r3:0014
[18198.383291] [<807effac>] (devinet_ioctl) from [<807f2e98>] 
(inet_ioctl+0x19c/0x1c8)
[18198.390951]  r10:edfc3b80 r9:eb046000 r8:0004 r7:ee724b40 r6:7ed799ac 
r5:ee724b60
[18198.398872]  r4:8914
[18198.401446] [<807f2cfc>] (inet_ioctl) from [<8074a234>] 
(sock_ioctl+0x158/0x300)
[18198.408860] [<8074a0dc>] (sock_ioctl) from [<80239770>] 
(do_vfs_ioctl+0x98/0xa3c)
[18198.416348]  r7:8023a150

[PATCH] arp: fixed -Wuninitialized compiler warning

2017-05-24 Thread Ihar Hrachyshka

Commit 7d472a59c0e5ec117220a05de6b370447fb6cb66 ("arp: always override
existing neigh entries with gratuitous ARP") introduced a compiler
warning:

net/ipv4/arp.c:880:35: warning: 'addr_type' may be used uninitialized in
this function [-Wmaybe-uninitialized]

While the code logic seems to be correct and doesn't allow the variable
to be used uninitialized, and the warning is not consistently
reproducible, it's still worth fixing it for other people not to waste
time looking at the warning in case it pops up in the build environment.
Yes, compiler is probably at fault, but we will need to accommodate.

Fixes: 7d472a59c0e5ec117220a05de6b370447fb6cb66
Signed-off-by: Ihar Hrachyshka 
---
 net/ipv4/arp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index ae96e6f..e9f3386 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -863,8 +863,8 @@ static int arp_process(struct net *net, struct sock *sk, 
struct sk_buff *skb)
 
n = __neigh_lookup(_tbl, , dev, 0);
 
+   addr_type = -1;
if (n || IN_DEV_ARP_ACCEPT(in_dev)) {
-   addr_type = -1;
is_garp = arp_is_garp(net, dev, _type, arp->ar_op,
  sip, tip, sha, tha);
}
-- 
2.9.4

Re: [Intel-wired-lan] [PATCH 3/4] [next-queue]net: i40e: Enable mqprio full offload mode in the i40e driver for configuring TCs and queue mapping

2017-05-24 Thread Alexander Duyck

On Fri, May 19, 2017 at 5:58 PM, Amritha Nambiar
 wrote:
> The i40e driver is modified to enable the new mqprio hardware
> offload mode and factor the TCs and queue configuration by
> creating channel VSIs. In this mode, the priority to traffic
> class mapping and the user specified queue ranges are used
> to configure the traffic classes when the 'hw' option is set
> to 2.
>
> Example:
> # tc qdisc add dev eth0 root mqprio num_tc 4\
>   map 0 0 0 0 1 2 2 3 queues 2@0 2@2 1@4 1@5 hw 2
>
> # tc qdisc show dev eth0
> qdisc mqprio 8038: root  tc 4 map 0 0 0 0 1 2 2 3 0 0 0 0 0 0 0 0
>  queues:(0:1) (2:3) (4:4) (5:5)
>
> The HW channels created are removed and all the queue configuration
> is set to default when the qdisc is detached from the root of the
> device.
>
> #tc qdisc del dev eth0 root
>
> This patch also disables setting up channels via ethtool (ethtool -L)
> when the TCs are confgured using mqprio scheduler.
>
> Signed-off-by: Amritha Nambiar 
> ---
>  drivers/net/ethernet/intel/i40e/i40e.h |4
>  drivers/net/ethernet/intel/i40e/i40e_ethtool.c |6
>  drivers/net/ethernet/intel/i40e/i40e_main.c|  311 
> ++--
>  3 files changed, 292 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
> b/drivers/net/ethernet/intel/i40e/i40e.h
> index 0915b02..a62f65a 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> @@ -54,6 +54,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +
>  #include "i40e_type.h"
>  #include "i40e_prototype.h"
>  #include "i40e_client.h"
> @@ -685,6 +687,7 @@ struct i40e_vsi {
> enum i40e_vsi_type type;  /* VSI type, e.g., LAN, FCoE, etc */
> s16 vf_id;  /* Virtual function ID for SRIOV VSIs */
>
> +   struct tc_mqprio_qopt_offload mqprio_qopt; /* queue parameters */
> struct i40e_tc_configuration tc_config;
> struct i40e_aqc_vsi_properties_data info;
>
> @@ -710,6 +713,7 @@ struct i40e_vsi {
> u16 cnt_q_avail; /* num of queues available for channel usage */
> u16 orig_rss_size;
> u16 current_rss_size;
> +   bool reconfig_rss;
>
> /* keeps track of next_base_queue to be used for channel setup */
> atomic_t next_base_queue;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
> b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> index 3d58762..ab52979 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> @@ -3841,6 +3841,12 @@ static int i40e_set_channels(struct net_device *dev,
> if (vsi->type != I40E_VSI_MAIN)
> return -EINVAL;
>
> +   /* We do not support setting channels via ethtool when TCs are
> +* configured through mqprio
> +*/
> +   if (pf->flags & I40E_FLAG_TC_MQPRIO)
> +   return -EINVAL;
> +
> /* verify they are not requesting separate vectors */
> if (!count || ch->rx_count || ch->tx_count)
> return -EINVAL;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index e1bea45..7f61d4f 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -68,6 +68,7 @@ static int i40e_reset(struct i40e_pf *pf);
>  static void i40e_rebuild(struct i40e_pf *pf, bool reinit, bool 
> lock_acquired);
>  static void i40e_fdir_sb_setup(struct i40e_pf *pf);
>  static int i40e_veb_get_bw_info(struct i40e_veb *veb);
> +static int i40e_vsi_config_rss(struct i40e_vsi *vsi);
>
>  /* i40e_pci_tbl - PCI Device ID Table
>   *
> @@ -1560,6 +1561,105 @@ static int i40e_set_mac(struct net_device *netdev, 
> void *p)
>  }
>
>  /**
> + * i40e_vsi_setup_queue_map_mqprio - Prepares VSI tc_config to have queue
> + * configurations based on MQPRIO options.
> + * @vsi: the VSI being configured,
> + * @ctxt: VSI context structure
> + * @enabled_tc: number of traffic classes to enable
> + **/
> +static int i40e_vsi_setup_queue_map_mqprio(struct i40e_vsi *vsi,
> +  struct i40e_vsi_context *ctxt,
> +  u8 enabled_tc)
> +{
> +   u8 netdev_tc = 0, offset = 0;
> +   u16 qcount = 0, max_qcount, qmap, sections = 0;
> +   int i, override_q, pow, num_qps, ret;
> +
> +   if (vsi->type != I40E_VSI_MAIN)
> +   return -EINVAL;
> +
> +   sections = I40E_AQ_VSI_PROP_QUEUE_MAP_VALID;
> +   sections |= I40E_AQ_VSI_PROP_SCHED_VALID;
> +
> +   vsi->tc_config.numtc = vsi->mqprio_qopt.qopt.num_tc;
> +   vsi->tc_config.enabled_tc = enabled_tc ? enabled_tc : 1;
> +
> +   num_qps = vsi->mqprio_qopt.qopt.count[0];
> +
> +   /* find the next higher power-of-2 of num queue pairs */
> +   pow = ilog2(num_qps);
> +   if (!is_power_of_2(num_qps))
> +

Re: [PATCH 2/2] at803x: double check SGMII side autoneg

2017-05-24 Thread Timur Tabi


On 05/24/2017 04:36 PM, Florian Fainelli wrote:

OK, and there is no way you can run into the following race condition:

CPU HW
MDIO read intent
polling starts
disable HW autopoll
polling continues


Disabling of the HW autopoll waits for the poll to actually stop before 
continuing.  You can see the code here:


http://elixir.free-electrons.com/linux/latest/source/drivers/net/ethernet/qualcomm/emac/emac-phy.c#L102


MDIO read is done
MDIO read done
polling stops
MDIO read value returned


if you disable autopolling in HW this is guaranteed to immediately stop
by the time the register value is seen in HW and your I/O read/write
returns?


It doesn't immediately stop, but the emac_phy_mdio_autopoll_disable() 
function waits for the MDIO bus to not be busy.  But low-level details 
of this feature are not documented, so who knows what it does exactly? 
The original code that used this feature only supported one PHY and 
never expected there to be any asynchronous MDIO transactios.


--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

Re: [Intel-wired-lan] [PATCH 1/4] [next-queue]net: mqprio: Introduce new hardware offload mode in mqprio for offloading full TC configurations

2017-05-24 Thread Alexander Duyck

On Fri, May 19, 2017 at 5:58 PM, Amritha Nambiar
 wrote:
> This patch introduces a new hardware offload mode in mqprio
> which makes full use of the mqprio options, the TCs, the
> queue configurations and the bandwidth rates for the TCs.
> This is achieved by setting the value 2 for the "hw" option.
> This new offload mode supports new attributes for traffic
> class such as minimum and maximum values for bandwidth rate limits.
>
> Introduces a new datastructure 'tc_mqprio_qopt_offload' for offloading
> mqprio queue options and use this to be shared between the kernel and
> device driver. This contains a copy of the exisiting datastructure
> for mqprio queue options. This new datastructure can be extended when
> adding new attributes for traffic class such as bandwidth rate limits. The
> existing datastructure for mqprio queue options will be shared between the
> kernel and userspace.
>
> This patch enables configuring additional attributes associated
> with a traffic class such as minimum and maximum bandwidth
> rates and can be offloaded to the hardware in the new offload mode.
> The min and max limits for bandwidth rates are provided
> by the user along with the the TCs and the queue configurations
> when creating the mqprio qdisc.
>
> Example:
> # tc qdisc add dev eth0 root mqprio num_tc 2  map 0 0 0 0 1 1 1 1\
>   queues 4@0 4@4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2
>
> To dump the bandwidth rates:
>
> # tc qdisc show dev eth0
> qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
>  queues:(0:3) (4:7)
>  min rates:0bit 0bit
>  max rates:55Mbit 60Mbit
>
> Signed-off-by: Amritha Nambiar 
> ---
>  include/linux/netdevice.h  |2
>  include/net/pkt_cls.h  |7 ++
>  include/uapi/linux/pkt_sched.h |   13 +++
>  net/sched/sch_mqprio.c |  169 
> +---
>  4 files changed, 180 insertions(+), 11 deletions(-)
>

[...]

> diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
> index 0a4cf27..6457ec9 100644
> --- a/net/sched/sch_mqprio.c
> +++ b/net/sched/sch_mqprio.c
> @@ -18,10 +18,13 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  struct mqprio_sched {
> struct Qdisc**qdiscs;
> int hw_offload;
> +   u32 flags;
> +   u64 min_rate[TC_QOPT_MAX_QUEUE], max_rate[TC_QOPT_MAX_QUEUE];
>  };
>
>  static void mqprio_destroy(struct Qdisc *sch)
> @@ -39,10 +42,21 @@ static void mqprio_destroy(struct Qdisc *sch)
> }
>
> if (priv->hw_offload && dev->netdev_ops->ndo_setup_tc) {
> -   struct tc_mqprio_qopt offload = { 0 };
> -   struct tc_to_netdev tc = { .type = TC_SETUP_MQPRIO,
> -  { .mqprio =  } };
> +   struct tc_mqprio_qopt_offload offload = { 0 };

So this is currently throwing a warning when I pull these patches and
build this with gcc 6.2. I think the correct setup is "offload = {{ 0
}}" in order to indicate that we are initializing the inner structure
and all other data to 0.

> +   struct tc_to_netdev tc = { 0 };
>
> +   switch (priv->hw_offload) {
> +   case TC_MQPRIO_HW_OFFLOAD_TCS:
> +   tc.type = TC_SETUP_MQPRIO;
> +   tc.mqprio = 
> +   break;
> +   case TC_MQPRIO_HW_OFFLOAD:
> +   tc.type = TC_SETUP_MQPRIO_EXT;
> +   tc.mqprio_qopt = 
> +   break;
> +   default:
> +   return;
> +   }
> dev->netdev_ops->ndo_setup_tc(dev, sch->handle, 0, );
> } else {
> netdev_set_num_tc(dev, 0);

Re: [PATCH v2 0/4] arp: always override existing neigh entries with gratuitous ARP

2017-05-24 Thread Arnd Bergmann

On Wed, May 24, 2017 at 11:32 PM, Ihar Hrachyshka  wrote:
> On 05/23/2017 01:56 PM, Arnd Bergmann wrote:
>>
>> This seems to have caused a build warning:
>>
>> net/ipv4/arp.c:880:35: warning: 'addr_type' may be used uninitialized
>> in this function [-Wmaybe-uninitialized]
>>
> Not sure. How do you reproduce it? I just did 'make net' in the latest tree
> that includes the patch, and it doesn't trigger the failure. Also the code
> logic prevents it to be uninitialized (it's always set to -1 or the actual
> type value).

I saw it reported by the kernelci build bot:

https://kernelci.org/build/id/5925b81859b514fb106b9590/logs/

It happened on arm64 with 'make allmodconfig', but none of the other
architectures or the other configurations on that architecture.

This kind of warning can be a real pain to shut up when the compiler
gets it wrong. I haven't tested the latest kernel with your patch on my
own build box yet, so I also haven't been able to reproduce it.
Most likely, gcc gets confused when it needs to track the state of
too many variables here (I certainly get confused when I read it too).

My first attempt to resolve it would be:

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index ae96e6f3e0cb..39f26980a565 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -663,6 +663,8 @@ static bool arp_is_garp(struct net *net, struct
net_device *dev,
*addr_type = inet_addr_type_dev_table(net, dev, sip);
if (*addr_type != RTN_UNICAST)
is_garp = false;
+   } else {
+   *addr_type = -1;
}
return is_garp;
 }
@@ -864,7 +866,6 @@ static int arp_process(struct net *net, struct
sock *sk, struct sk_buff *skb)
n = __neigh_lookup(_tbl, , dev, 0);

if (n || IN_DEV_ARP_ACCEPT(in_dev)) {
-   addr_type = -1;
is_garp = arp_is_garp(net, dev, _type, arp->ar_op,
  sip, tip, sha, tha);
}

but this clearly needs a lot of build testing to see if it's sufficient. Moving
the initialization of addr_type=1 to the start of the function would
certainly avoid the warning, but I suspect this would be incorrect here.

Arnd

Re: [Intel-wired-lan] [PATCH 2/4] [next-queue]net: i40e: Add infrastructure for queue channel support with the TCs and queue configurations offloaded via mqprio scheduler

2017-05-24 Thread Alexander Duyck

On Fri, May 19, 2017 at 5:58 PM, Amritha Nambiar
 wrote:
> This patch sets up the infrastructure for offloading TCs and
> queue configurations to the hardware by creating HW channels(VSI).
> A new channel is created for each of the traffic class
> configuration offloaded via mqprio framework except for the first TC
> (TC0). TC0 for the main VSI is also reconfigured as per user provided
> queue parameters. Queue counts that are not power-of-2 are handled by
> reconfiguring RSS by reprogramming LUTs using the queue count value.
> This patch also handles configuring the TX rings for the channels,
> setting up the RX queue map for channel.
>
> Also, the channels so created are removed and all the queue
> configuration is set to default when the qdisc is detached from the
> root of the device.
>
> Signed-off-by: Amritha Nambiar 
> Signed-off-by: Kiran Patil 
> ---
>  drivers/net/ethernet/intel/i40e/i40e.h  |   36 +
>  drivers/net/ethernet/intel/i40e/i40e_main.c |  740 
> +++
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h |2
>  3 files changed, 771 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
> b/drivers/net/ethernet/intel/i40e/i40e.h
> index 395ca94..0915b02 100644

[...]

> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 8d1d3b85..e1bea45 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c

[...]

> +/**
> + * i40e_create_queue_channel - function to create channel
> + * @vsi: VSI to be configured
> + * @ch: ptr to channel (it contains channel specific params)
> + *
> + * This function creates channel (VSI) using num_queues specified by user,
> + * reconfigs RSS if needed.
> + **/
> +int i40e_create_queue_channel(struct i40e_vsi *vsi,
> + struct i40e_channel *ch)
> +{
> +   struct i40e_pf *pf = vsi->back;
> +   bool reconfig_rss;
> +   int err;
> +
> +   if (!ch)
> +   return -EINVAL;
> +
> +   if (!ch->num_queue_pairs) {
> +   dev_err(>pdev->dev, "Invalid num_queues requested: %d\n",
> +   ch->num_queue_pairs);
> +   return -EINVAL;
> +   }
> +
> +   /* validate user requested num_queues for channel */
> +   err = i40e_validate_num_queues(pf, ch->num_queue_pairs, vsi,
> +  _rss);
> +   if (err) {
> +   dev_info(>pdev->dev, "Failed to validate num_queues 
> (%d)\n",
> +ch->num_queue_pairs);
> +   return -EINVAL;
> +   }
> +
> +   /* By default we are in VEPA mode, if this is the first VF/VMDq
> +* VSI to be added switch to VEB mode.
> +*/
> +   if ((!(pf->flags & I40E_FLAG_VEB_MODE_ENABLED)) ||
> +   (!i40e_is_any_channel(vsi))) {
> +   if (!is_power_of_2(vsi->tc_config.tc_info[0].qcount)) {
> +   dev_info(>pdev->dev,
> +"Failed to create channel. Override queues 
> (%u) not power of 2\n",
> +vsi->tc_config.tc_info[0].qcount);
> +   return -EINVAL;
> +   }
> +
> +   if (vsi->type == I40E_VSI_SRIOV) {
> +   if (!(pf->flags & I40E_FLAG_VEB_MODE_ENABLED)) {
> +   dev_info(>pdev->dev,
> +"Expected to be VEB mode by this 
> time\n");
> +   return -EINVAL;
> +   }
> +   }
> +   if (!(pf->flags & I40E_FLAG_VEB_MODE_ENABLED)) {
> +   pf->flags |= I40E_FLAG_VEB_MODE_ENABLED;
> +
> +   if (vsi->type == I40E_VSI_MAIN) {
> +   if (pf->flags & I40E_FLAG_TC_MQPRIO)
> +   i40e_do_reset(pf,
> +   BIT_ULL(__I40E_PF_RESET_REQUESTED),
> + true);
> +   else
> +   i40e_do_reset_safe(pf,
> +   BIT_ULL(__I40E_PF_RESET_REQUESTED));

So these BIT_ULL lines are triggering a check in checkpatch, and I
have to say I don't really like this as it really is messed up in
terms of formatting.

If nothing else you might want to look at defining a macro that
replaces the line. That way you could still represent the same data
without having to resort to misaligning things to make it under 80
characters.

Re: [PATCH v2 2/4] arp: decompose is_garp logic into a separate function

2017-05-24 Thread Ihar Hrachyshka


On 05/18/2017 01:49 PM, Julian Anastasov wrote:


All 4 patches look ok to me with only a small problem
which comes from patch already included in kernel. I see
that GARP replies can not work for 1394, is_garp will be
cleared. May be 'tha' check should be moved in if expression,
for example:

if (is_garp && ar_op == htons(ARPOP_REPLY) && tha)
is_garp = !memcmp(tha, sha, dev->addr_len);


I can easily miss something substantial, so please correct me, but...

If it's of REPLY type, the RFC 2002 requires that target hardware 
address field equals to source address field for a packet to be 
considered gratuitous. Since IEEE 1394 ARP standard defines its payload 
without target field, it seems to me that there is no such thing as a 
gratuitous ARP reply for IEEE 1394. That's why I think resetting is_garp 
to 0 for those packets is justified.


Ihar

Re: [PATCH 2/2] at803x: double check SGMII side autoneg

2017-05-24 Thread Florian Fainelli

On 05/24/2017 02:32 PM, Timur Tabi wrote:
> On 05/24/2017 04:28 PM, Florian Fainelli wrote:
> 
>> Yes phydev->lock which is used to serialize the state machine state
>> changes.
>>
>> Most PHYs have many more registers than the 15 standard exposed
>> directly, and so you need indirect reads/writes to access these
>> registers, which typically involve switching a particular page, doing
>> the indirect register access, and then flipping the page back. If you
>> interrupt that scheme one way or another, your reads and writes are all
>> messed up.
> 
> Ah, and the at803x is a device like that.
> 
> At worst, the autopoll feature could read a register from the wrong
> page, and think that the link state has changed when it hasn't.  But
> that's still bad, and all my problems do revolve around link states.

Absolutely, just like it's not clear whether autopoll does read
BMSR.LINKSTAT or another at803x specific register? Also how is
BMSR.LINKSTAT defined on at803x when there is SGMII + Copper involved?

> 
>>> I forgot one detail.  Every time you do an MDIO read/write, it
>>> temporarily disables the feature.  Although, I think that's not relevant
>>> to your point.
>>
>> Is that done by the HW itself, or is this under SW control exclusively.
> 
> Software.

OK, and there is no way you can run into the following race condition:

CPU HW
MDIO read intent
polling starts
disable HW autopoll
polling continues
MDIO read is done
MDIO read done
polling stops
MDIO read value returned


if you disable autopolling in HW this is guaranteed to immediately stop
by the time the register value is seen in HW and your I/O read/write
returns?
-- 
Florian

Re: [PATCH v2 0/4] arp: always override existing neigh entries with gratuitous ARP

2017-05-24 Thread Ihar Hrachyshka


On 05/23/2017 01:56 PM, Arnd Bergmann wrote:

This seems to have caused a build warning:

net/ipv4/arp.c:880:35: warning: 'addr_type' may be used uninitialized
in this function [-Wmaybe-uninitialized]

Not sure. How do you reproduce it? I just did 'make net' in the latest 
tree that includes the patch, and it doesn't trigger the failure. Also 
the code logic prevents it to be uninitialized (it's always set to -1 or 
the actual type value).


Please advise,
Ihar

Re: [PATCH 2/2] at803x: double check SGMII side autoneg

2017-05-24 Thread Timur Tabi


On 05/24/2017 04:28 PM, Florian Fainelli wrote:


Yes phydev->lock which is used to serialize the state machine state changes.

Most PHYs have many more registers than the 15 standard exposed
directly, and so you need indirect reads/writes to access these
registers, which typically involve switching a particular page, doing
the indirect register access, and then flipping the page back. If you
interrupt that scheme one way or another, your reads and writes are all
messed up.


Ah, and the at803x is a device like that.

At worst, the autopoll feature could read a register from the wrong 
page, and think that the link state has changed when it hasn't.  But 
that's still bad, and all my problems do revolve around link states.



I forgot one detail.  Every time you do an MDIO read/write, it
temporarily disables the feature.  Although, I think that's not relevant
to your point.


Is that done by the HW itself, or is this under SW control exclusively.


Software.

Re: [PATCH 2/2] at803x: double check SGMII side autoneg

2017-05-24 Thread Florian Fainelli

On 05/24/2017 02:20 PM, Timur Tabi wrote:
> On 05/24/2017 04:15 PM, Andrew Lunn wrote:
>>> My NIC has a feature called autopolling where it takes over the MDIO
>>> bus and regularly polls the link state.  When it detects that the
>>> link state has changed, it generates a MAC interrupt.  This is when
>>> I call phy_mac_interrupt() normally.
> 
>> Unfortunately, you need to keep this feature turned off. It will not
>> respect the phydev mutex. It has no idea what page has been currently
>> selected. It probably has no way to flip the page and see if the SGMII
>> link is up. etc.
> 
> phydev mutex?  And what do you mean by page?

Yes phydev->lock which is used to serialize the state machine state changes.

Most PHYs have many more registers than the 15 standard exposed
directly, and so you need indirect reads/writes to access these
registers, which typically involve switching a particular page, doing
the indirect register access, and then flipping the page back. If you
interrupt that scheme one way or another, your reads and writes are all
messed up.

> 
> I forgot one detail.  Every time you do an MDIO read/write, it
> temporarily disables the feature.  Although, I think that's not relevant
> to your point.

Is that done by the HW itself, or is this under SW control exclusively.

> 
> Disabling this feature and switching from PHY_IGNORE_INTERRUPT to
> PHY_POLL might fix everything.  I will try it.
> 

Humm yes, that seems like a worthwhile exercise at least.
-- 
Florian

Re: [PATCH 2/2] at803x: double check SGMII side autoneg

2017-05-24 Thread Timur Tabi


On 05/24/2017 04:15 PM, Andrew Lunn wrote:

My NIC has a feature called autopolling where it takes over the MDIO
bus and regularly polls the link state.  When it detects that the
link state has changed, it generates a MAC interrupt.  This is when
I call phy_mac_interrupt() normally.



Unfortunately, you need to keep this feature turned off. It will not
respect the phydev mutex. It has no idea what page has been currently
selected. It probably has no way to flip the page and see if the SGMII
link is up. etc.


phydev mutex?  And what do you mean by page?

I forgot one detail.  Every time you do an MDIO read/write, it 
temporarily disables the feature.  Although, I think that's not relevant 
to your point.


Disabling this feature and switching from PHY_IGNORE_INTERRUPT to 
PHY_POLL might fix everything.  I will try it.

Re: [PATCH 2/2] at803x: double check SGMII side autoneg

2017-05-24 Thread Florian Fainelli

On 05/24/2017 01:57 PM, Timur Tabi wrote:
> On 05/24/2017 02:34 PM, Andrew Lunn wrote:
>>> Ok, I'm going to debug this some more.  It turns out that the MAC
>>> side of
>>> the SGMII link can send an interrupt when it thinks that
>>> auto-negotiation is
>>> done.  I might be able to use this.
>>
>> You can use this for your board. But it still leaves the phy driver
>> broken for everybody else.
> 
> Wait, I thought you said the at803x driver was not broken, since it
> returns 0 when the SGMII side of the link hasn't finished auto-negotiating?
> 
>>> What function should my MAC driver call when it wants the phy core to
>>> call
>>> at803x_aneg_done again to see if autonegotiation is done?
>>
>> You want to trigger the PHY state machine. There is only one exported
>> API call to do this, phy_mac_interrupt(). But you are supposed to pass
>> the new link state. And your MAC driver has no idea of that, it does
>> not know if the copper side of the PHY is up.
> 
> My NIC has a feature called autopolling where it takes over the MDIO bus
> and regularly polls the link state.  When it detects that the link state
> has changed, it generates a MAC interrupt.  This is when I call
> phy_mac_interrupt() normally.
> 
> I think I can use the SGMII interrupt to also call phy_mac_interrupt().
> The problem is the from the copper side, the link is already up, so if I
> call phy_mac_interrupt() again and say the link is up, the phy core is
> going to ignore that.

The question is what the HW is auto-polling on? Most HW implementations
that I have seen either auto-poll and assume that BMSR.LINKSTATUS will
reflect what they want to, or they even let you specify which register
and bit(s) to monitor for link status. So which one is it here?

As Andrew already responded, this clashes with any reads/writes that the
PHY library could do, unless your HW is smart enough to serialize MDIO
reads/writes?

> 
>> So it might be better if you export phy_trigger_machine().
> 
> I'll test that, but that does seem a bit hackish.
> 
>>> Also, is there a way for the MAC driver to know that at803x_aneg_done()
>>> previously returned 0, and that it needs to tell the phy core to
>>> check again?
>>
>> Not that i know of. The MAC layer is not supposed to be messing around
>> in the PHY layer. However, just triggering the PHY state machine
>> should be enough.
> 
> Can you tell my how PHY_HAS_INTERRUPT is supposed to work?  How does the
> PHY send an interrupt?

PHY_HAS_INTERRUPT indicates that you have a dedicated interrupt line for
your PHY device and that you have the ability to implement a custom
interrupt handling callback in the PHY driver (ack_interrupt) that lets
you deal with all possible sorts of the interrupts that the PHY could
generate.

> 
> I'm starting to think that my NIC's autopolling feature is not
> compatible with phylib, and that I should use polling mode.

That's pretty much what Andrew told you about 5 emails ago...
-- 
Florian

Re: [PATCH 2/2] at803x: double check SGMII side autoneg

2017-05-24 Thread Andrew Lunn

On Wed, May 24, 2017 at 03:57:06PM -0500, Timur Tabi wrote:
> On 05/24/2017 02:34 PM, Andrew Lunn wrote:
> >>Ok, I'm going to debug this some more.  It turns out that the MAC side of
> >>the SGMII link can send an interrupt when it thinks that auto-negotiation is
> >>done.  I might be able to use this.
> >
> >You can use this for your board. But it still leaves the phy driver
> >broken for everybody else.
> 
> Wait, I thought you said the at803x driver was not broken, since it
> returns 0 when the SGMII side of the link hasn't finished
> auto-negotiating?

It is correct so far. But to work, it needs to interrupt again once
the SGMII side has come up. Only then have we link.

> My NIC has a feature called autopolling where it takes over the MDIO
> bus and regularly polls the link state.  When it detects that the
> link state has changed, it generates a MAC interrupt.  This is when
> I call phy_mac_interrupt() normally.

Unfortunately, you need to keep this feature turned off. It will not
respect the phydev mutex. It has no idea what page has been currently
selected. It probably has no way to flip the page and see if the SGMII
link is up. etc.

> Can you tell my how PHY_HAS_INTERRUPT is supposed to work?  How does
> the PHY send an interrupt?

Generally, the PHY interrupt pin is connected to a GPIO. You then use
the GPIO as an interrupt source. So it has an interrupt number. Put
that in phydev->irq, eg using the interrupts property in device tree.
The core will register an interrupt handler, and enable the
interrupt. When it receives an interrupt, it calls the phy driver to
service the interrupt.

Andrew

Re: [PATCH 2/2] at803x: double check SGMII side autoneg

2017-05-24 Thread Timur Tabi


On 05/24/2017 02:34 PM, Andrew Lunn wrote:

Ok, I'm going to debug this some more.  It turns out that the MAC side of
the SGMII link can send an interrupt when it thinks that auto-negotiation is
done.  I might be able to use this.


You can use this for your board. But it still leaves the phy driver
broken for everybody else.


Wait, I thought you said the at803x driver was not broken, since it 
returns 0 when the SGMII side of the link hasn't finished auto-negotiating?



What function should my MAC driver call when it wants the phy core to call
at803x_aneg_done again to see if autonegotiation is done?


You want to trigger the PHY state machine. There is only one exported
API call to do this, phy_mac_interrupt(). But you are supposed to pass
the new link state. And your MAC driver has no idea of that, it does
not know if the copper side of the PHY is up.


My NIC has a feature called autopolling where it takes over the MDIO bus 
and regularly polls the link state.  When it detects that the link state 
has changed, it generates a MAC interrupt.  This is when I call 
phy_mac_interrupt() normally.


I think I can use the SGMII interrupt to also call phy_mac_interrupt(). 
The problem is the from the copper side, the link is already up, so if I 
call phy_mac_interrupt() again and say the link is up, the phy core is 
going to ignore that.



So it might be better if you export phy_trigger_machine().


I'll test that, but that does seem a bit hackish.


Also, is there a way for the MAC driver to know that at803x_aneg_done()
previously returned 0, and that it needs to tell the phy core to check again?


Not that i know of. The MAC layer is not supposed to be messing around
in the PHY layer. However, just triggering the PHY state machine
should be enough.


Can you tell my how PHY_HAS_INTERRUPT is supposed to work?  How does the 
PHY send an interrupt?


I'm starting to think that my NIC's autopolling feature is not 
compatible with phylib, and that I should use polling mode.

Re: [PATCH net-next v9 5/5] virtio_net: check return value of skb_to_sgvec always

2017-05-24 Thread Sergei Shtylyov


On 05/24/2017 11:39 PM, Jason A. Donenfeld wrote:


   I've only looked on the last 2 patches. You can add my:

Reviewed-by: Sergei Shtylyov 

if you want. :-)


Will do. For the series, or just for 5/5?


   5/5 only. :-)

MBR, Sergei

Re: [PATCH v6 net-next 01/17] net: qualcomm: remove unnecessary includes

2017-05-24 Thread David Miller

From: Stefan Wahren 
Date: Wed, 24 May 2017 22:05:26 +0200 (CEST)

> AFAIK these ones above aren't necessary (no init, no kernel module,
> no kernel parameter, no kernel version) for this C file. So i will
> double check it.

You need the endianness translators like cpu_to_be32() or whatever,
so you need to figure out where you are getting that once these
explicit headers are removed.

And see, it's probably hidden inside of the private header's includes.

So we can't tell.

Re: [PATCH net-next v9 5/5] virtio_net: check return value of skb_to_sgvec always

2017-05-24 Thread Jason A. Donenfeld

On Wed, May 24, 2017 at 6:41 PM, Sergei Shtylyov
>I've only looked on the last 2 patches. You can add my:
>
> Reviewed-by: Sergei Shtylyov 
>
> if you want. :-)

Will do. For the series, or just for 5/5?

Re: [patch net-next v2 0/8] mlxsw: Support firmware flash

2017-05-24 Thread David Miller

From: Jiri Pirko 
Date: Tue, 23 May 2017 21:56:22 +0200

> From: Jiri Pirko 
> 
> Add support for device firmware flash on mlxsw spectrum. The firmware files
> are expected to be in the Mellanox Firmware Archive version 2 (MFA2)
> format.
> 
> The firmware flash is triggered on driver initialization time if the device
> firmware version does not meet the minimum firmware version supported by
> the driver.
> 
> Currently, to activate the newly flashed firmware, the user needs to
> reboot his system.
> 
> The first patch introduces the mlxfw module, which implements common logic
> needed for the firmware flash process on Mellanox products, such as the
> MFA2 format parsing and the firmware flash state machine logic. As the
> module implements common logic which will be needed by various different
> Mellanox drivers, it defines a set of callbacks needed to interact with the
> specific device.
> 
> Patches 1-5 implement the needed mlxfw callbacks in the mlxsw spectrum
> driver.
> 
> Patches 6 and 7 add boot-time firmware upgrade on the mlxsw spectrum
> driver.
> 
> Patch 8 adds a fix needed for new firmware versions.

Series applied, although I hope you sort out the user interface
soon otherwise most of this is completely dead unused code.

Thanks.

Re: [PATCH net-next] tcp: fix TCP_SYNCNT flakes

2017-05-24 Thread David Miller

From: Eric Dumazet 
Date: Tue, 23 May 2017 12:38:35 -0700

> From: Eric Dumazet 
> 
> After the mentioned commit, some of our packetdrill tests became flaky.
> 
> TCP_SYNCNT socket option can limit the number of SYN retransmits.
> 
> retransmits_timed_out() has to compare times computations based on
> local_clock() while timers are based on jiffies. With NTP adjustments
> and roundings we can observe 999 ms delay for 1000 ms timers.
> We end up sending one extra SYN packet.
> 
> Gimmick added in commit 6fa12c850314 ("Revert Backoff [v3]: Calculate
> TCP's connection close threshold as a time value") makes no
> real sense for TCP_SYN_SENT sockets where no RTO backoff can happen at
> all.
> 
> Lets use a simpler logic for TCP_SYN_SENT sockets and remove @syn_set
> parameter from retransmits_timed_out()
> 
> Fixes: 9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock")
> Signed-off-by: Eric Dumazet 
> Signed-off-by: Yuchung Cheng 

Applied, thanks Eric.

Re: [PATCH net-next] net: dsa: support cross-chip ageing time

2017-05-24 Thread David Miller

From: Vivien Didelot 
Date: Tue, 23 May 2017 15:20:59 -0400

> Now that the switchdev bridge ageing time attribute is propagated to all
> switch chips of the fabric, each switch can check if the requested value
> is valid and program itself, so that the whole fabric shares a common
> ageing time setting.
> 
> This is especially needed for switch chips in between others, containing
> no bridge port members but evidently used in the data path.
> 
> To achieve that, remove the condition which skips the other switches. We
> also don't need to identify the target switch anymore, thus remove the
> sw_index member of the dsa_notifier_ageing_time_info notifier structure.
> 
> On ZII Dev Rev B (with two 88E6352 and one 88E6185) and ZII Dev Rev C
> (with two 88E6390X), we have the following hardware configuration:
 ...
> Before this patch:
 ...
> After this patch:
 ...
> Signed-off-by: Vivien Didelot 

Applied, thanks.

Re: [PATCH V3 net 0/3] Fix checksum issues with Q-in-Q vlans

2017-05-24 Thread David Miller

From: Vladislav Yasevich 
Date: Tue, 23 May 2017 13:38:40 -0400

> TCP checksum appear broken on a lot of devices that
> advertise NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM.  This problem
> becomes very visible/reproducable since the series
> commit afb0bc972b526 ("Merge branch 'stacked_vlan_tso'").
> 
> In particular, the issue appeared consistently on bnx2 and be2net
> drivers (not all drivers were tested).
> 
> This short series corrects this by disabling checksum offload
> support on packets sent through Q-in-Q vlans if the underlying HW only
> enables IP specific checksum features.  We currently 'assume' that
> any drivers setting NETIF_F_HW_CSUM can correclty pass checksum offsets
> to HW.  It is up to individual drivers to enable it properly through
> ndo_features_check if they have some support for Q-in-Q vlans.
> 
> Additionally, be2net driver was fixed to make the proper call.
> 
> While looking at the drivers, it was also found that virtio-net ended
> up disabling accelerations, which is unnecessary.  
> 
> V3: Fixed checkpatch errors.
> 
> V2: Instead of disabling checksuming for all devices, only devices using
> NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM are now affected by this change.
> For drivers using NETIF_F_HW_CSUM, we will continue to use checksum
> offloading.  If any drivers are found to be broken, they would need
> be fixed individually.

Series applied and queued up for -stable, thanks.

Re: [patch net-next v2 0/5] add tcp flags match support to flower and offload it

2017-05-24 Thread David Miller

From: Jiri Pirko 
Date: Tue, 23 May 2017 18:40:43 +0200

> From: Jiri Pirko 
> 
> This patch adds support to dissect tcp flags, match on them using
> flower classifier and offload such rules to mlxsw Spectrum devices.
> 
> ---
> v1->v2:
> - removed no longer relevant comment from patch 1 as suggested by Or
> - sent correct patches this time

Series applied, thanks.

Re: [Patch net-next] net_sched: only create filter chains for new filters/actions

2017-05-24 Thread Jiri Pirko

Wed, May 24, 2017 at 05:53:42PM CEST, xiyou.wangc...@gmail.com wrote:
>On Tue, May 23, 2017 at 11:37 PM, Jiri Pirko  wrote:
>> Tue, May 23, 2017 at 06:42:37PM CEST, xiyou.wangc...@gmail.com wrote:
>>>tcf_chain_get() always creates a new filter chain if not found
>>>in existing ones. This is totally unnecessary when we get or
>>>delete filters, new chain should be only created for new filters
>>>(or new actions).
>>>
>>>Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters")
>>>Cc: Jamal Hadi Salim 
>>>Cc: Jiri Pirko 
>>>Signed-off-by: Cong Wang 
>>>---
>>> include/net/pkt_cls.h |  3 ++-
>>> net/sched/act_api.c   |  2 +-
>>> net/sched/cls_api.c   | 13 +
>>> 3 files changed, 12 insertions(+), 6 deletions(-)
>>>
>>>diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>>>index 2c213a6..f776229 100644
>>>--- a/include/net/pkt_cls.h
>>>+++ b/include/net/pkt_cls.h
>>>@@ -18,7 +18,8 @@ int register_tcf_proto_ops(struct tcf_proto_ops *ops);
>>> int unregister_tcf_proto_ops(struct tcf_proto_ops *ops);
>>>
>>> #ifdef CONFIG_NET_CLS
>>>-struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index);
>>>+struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
>>>+  bool create);
>>> void tcf_chain_put(struct tcf_chain *chain);
>>> int tcf_block_get(struct tcf_block **p_block,
>>> struct tcf_proto __rcu **p_filter_chain);
>>>diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>>>index 0ecf2a8..aed6cf2 100644
>>>--- a/net/sched/act_api.c
>>>+++ b/net/sched/act_api.c
>>>@@ -34,7 +34,7 @@ static int tcf_action_goto_chain_init(struct tc_action *a, 
>>>struct tcf_proto *tp)
>>>
>>>   if (!tp)
>>>   return -EINVAL;
>>>-  a->goto_chain = tcf_chain_get(tp->chain->block, chain_index);
>>>+  a->goto_chain = tcf_chain_get(tp->chain->block, chain_index, true);
>>>   if (!a->goto_chain)
>>>   return -ENOMEM;
>>>   return 0;
>>>diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>>>index 01a8b8b..23d2236 100644
>>>--- a/net/sched/cls_api.c
>>>+++ b/net/sched/cls_api.c
>>>@@ -220,7 +220,8 @@ static void tcf_chain_destroy(struct tcf_chain *chain)
>>>   kfree(chain);
>>> }
>>>
>>>-struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index)
>>>+struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
>>>+  bool create)
>>> {
>>>   struct tcf_chain *chain;
>>>
>>>@@ -230,7 +231,10 @@ struct tcf_chain *tcf_chain_get(struct tcf_block 
>>>*block, u32 chain_index)
>>>   return chain;
>>>   }
>>>   }
>>>-  return tcf_chain_create(block, chain_index);
>>>+  if (create)
>>>+  return tcf_chain_create(block, chain_index);
>>>+  else
>>>+  return NULL;
>>> }
>>> EXPORT_SYMBOL(tcf_chain_get);
>>>
>>>@@ -509,9 +513,10 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct 
>>>nlmsghdr *n,
>>>   err = -EINVAL;
>>>   goto errout;
>>>   }
>>>-  chain = tcf_chain_get(block, chain_index);
>>>+  chain = tcf_chain_get(block, chain_index,
>>>+n->nlmsg_type == RTM_NEWTFILTER);
>>
>> First of all, I really hate all these true/false arg dances. Totaly
>> confusing all the time.
>
>Sounds like you are able to understand the code at all.
>Sigh, I bet you never even read the changelog. ;)
>
>
>>
>>
>>
>>>   if (!chain) {
>>>-  err = -ENOMEM;
>>>+  err = n->nlmsg_type == RTM_NEWTFILTER ? -ENOMEM : -EINVAL;
>>
>> Confusing. Please do not obfuscate the code for a corner cases. Thanks.
>
>Either you don't understand the changelog or you don't understand
>ternary conditional operator. I can't help you if the latter.
>
>Sorry.

Heh. Really? All I say is, your patch is not needed at all. All it adds
makes to code harder to understand and no benefit.

Re: [PATCH net 1/3] bpf: fix incorrect pruning decision when alignment must be tracked

2017-05-24 Thread Daniel Borkmann

On 05/24/2017 10:07 PM, David Miller wrote:

From: Daniel Borkmann 
Date: Tue, 23 May 2017 18:30:41 +0200

+   if (!env->strict_alignment && old->off <= cur->off &&

You can't just test env->strict_alignment by itself, that's just an
override and doesn't determine the actual "strict" value we use which
is a combination of env->strict_alignment and
"!CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS".

So you'll have to update this test.

Argh, good point, true. Will respin with a v2.

Re: [PATCH net-next 3/3] net: sysfs: Document PHY device sysfs attributes

2017-05-24 Thread Andrew Lunn

> +What:/sys/class/mdio_bus///phy_interface
> +Date:February 2014
> +KernelVersion:   3.15
> +Contact: netdev@vger.kernel.org
> +Description:
> + String value indicating the PHY interface, possible
> + values are in include/linux/phy.h function phy_modes.

Hi Florian

Does this suggest that these strings should be moved to
include/uapi/linux/phy.h?

Andrew

Re: [PATCH net 1/3] bpf: fix incorrect pruning decision when alignment must be tracked

2017-05-24 Thread David Miller

From: Daniel Borkmann 
Date: Tue, 23 May 2017 18:30:41 +0200

> + if (!env->strict_alignment && old->off <= cur->off &&

You can't just test env->strict_alignment by itself, that's just an
override and doesn't determine the actual "strict" value we use which
is a combination of env->strict_alignment and
"!CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS".

So you'll have to update this test.

Re: [PATCH v6 net-next 01/17] net: qualcomm: remove unnecessary includes

2017-05-24 Thread Stefan Wahren


> David Miller  hat am 24. Mai 2017 um 21:41 geschrieben:
> 
> 
> From: Stefan Wahren 
> Date: Tue, 23 May 2017 15:12:37 +0200
> 
> > Most of the includes in qca_7k.c are unnecessary so we better remove them.
> > 
> > Signed-off-by: Stefan Wahren 
> > ---
> >  drivers/net/ethernet/qualcomm/qca_7k.c | 4 
> >  1 file changed, 4 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/qualcomm/qca_7k.c 
> > b/drivers/net/ethernet/qualcomm/qca_7k.c
> > index f0066fb..557d53c 100644
> > --- a/drivers/net/ethernet/qualcomm/qca_7k.c
> > +++ b/drivers/net/ethernet/qualcomm/qca_7k.c
> > @@ -23,11 +23,7 @@
> >   *   kernel-based SPI device.
> >   */
> >  
> > -#include 
> > -#include 
> > -#include 
> >  #include 
> > -#include 
> >  
> >  #include "qca_7k.h"
> >  
> > -- 
> > 2.1.4
> > 
> 
> Changes like this drive me crazy.
> 
> The only reason you can remove those headers is because you are obtaining
> things indirectly via qca_7k.h
> 
> And if that is indeed the case, you are also getting qca_spi.h which
> in turn includes linux/spi/spi.h
> 
> So you could have removed that as well.
> 
> But seriously, it is so much harder to understand a driver and what
> interfaces it needs via header files when you hide _all_ of it behind
> these local private header files which just include _everything_
> and then _every_ foo.c file in your driver gets _all_ of those kernel
> headers whether they need it or not.
> 
> So if just one foo.c file needs 20 extra kernel headers than the rest
> of the files in the driver, every foo.c file eats that cost of
> including them.
> 
> I really don't like when drivers move in this direction for that
> reason.  And at best, as described at the beginning of my response,
> this change is incomplete.
>

The intension of this change wasn't to hide the includes into qca_7k.h

AFAIK these ones above aren't necessary (no init, no kernel module, no kernel 
parameter, no kernel version) for this C file. So i will double check it.

Re: [PATCH net-next] geneve: fix fill_info when using collect_metadata

2017-05-24 Thread Eric Garver

On Wed, May 24, 2017 at 12:20:36PM -0700, Pravin Shelar wrote:
> On Tue, May 23, 2017 at 3:37 PM, Eric Garver  wrote:
> > Since 9b4437a5b870 ("geneve: Unify LWT and netdev handling.") fill_info
> > does not return UDP_ZERO_CSUM6_RX when using COLLECT_METADATA. This is
> > because it uses ip_tunnel_info_af() with the device level info, which is
> > not valid for COLLECT_METADATA.
> >
> > Fix by checking for the presence of the actual sockets.
> >
> > Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.")
> > Signed-off-by: Eric Garver 
> Thanks for the patch.
> 
> Acked-by: Pravin B Shelar 
> 
> 
> I noticed that the MTU and encal_len calculation in geneve_configure()
> also needs to be fixed for collect metadata case. Can you send patch
> to fix it?

Ah, yes. I'll take a look at those as well.

Eric.

Re: [PATCH v3 net] net: phy: marvell: Limit errata to 88m1101

2017-05-24 Thread David Miller

From: Andrew Lunn 
Date: Tue, 23 May 2017 17:49:13 +0200

> The 88m1101 has an errata when configuring autoneg. However, it was
> being applied to many other Marvell PHYs as well. Limit its scope to
> just the 88m1101.
> 
> Fixes: 76884679c644 ("phylib: Add support for Marvell 88eS and 88e1145")
> Reported-by: Daniel Walker 
> Signed-off-by: Andrew Lunn 
> Acked-by: Harini Katakam 

Applied and queued up for -stable, thanks.

Re: [PATCH] net/phy: fix mdio-octeon dependency and build

2017-05-24 Thread David Miller

From: Randy Dunlap 
Date: Tue, 23 May 2017 08:19:49 -0700

> From: Randy Dunlap 
> 
> Fix build errors by making this driver depend on OF_MDIO, like
> several other similar drivers do.
> 
> drivers/built-in.o: In function `octeon_mdiobus_remove':
> mdio-octeon.c:(.text+0x196ee0): undefined reference to `mdiobus_unregister'
> mdio-octeon.c:(.text+0x196ee8): undefined reference to `mdiobus_free'
> drivers/built-in.o: In function `octeon_mdiobus_probe':
> mdio-octeon.c:(.text+0x196f1d): undefined reference to 
> `devm_mdiobus_alloc_size'
> mdio-octeon.c:(.text+0x196ffe): undefined reference to `of_mdiobus_register'
> mdio-octeon.c:(.text+0x197010): undefined reference to `mdiobus_free'
> 
> Signed-off-by: Randy Dunlap 

Applied, thanks Randy.

Re: [pull request][net 0/8] Mellanox, mlx5 fixes 2017-05-23

2017-05-24 Thread David Miller

From: Saeed Mahameed 
Date: Tue, 23 May 2017 17:16:02 +0300

> This series contains some fixes for the mlx5 driver and one small patch that
> adds csum actions accessors in include/net/tc_act/tc_csum.h needed by some of
> mlx5 fixes patches.
> 
> Details are below.
> 
> Please pull and let me know if there's any problem.
> 
> Note: This series doesn't introduce any merge conflict with the ongoing mlx5
> for-next submission.
> 
> For -stable kernels >= 4.7:
> ("net/mlx5: Avoid using pending command interface slots")
> ("net/mlx5: Tolerate irq_set_affinity_hint() failures")

Pulled and queued up for -stable, thanks.

Re: [PATCH v6 net-next 01/17] net: qualcomm: remove unnecessary includes

2017-05-24 Thread David Miller

From: Stefan Wahren 
Date: Tue, 23 May 2017 15:12:37 +0200

> Most of the includes in qca_7k.c are unnecessary so we better remove them.
> 
> Signed-off-by: Stefan Wahren 
> ---
>  drivers/net/ethernet/qualcomm/qca_7k.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/qualcomm/qca_7k.c 
> b/drivers/net/ethernet/qualcomm/qca_7k.c
> index f0066fb..557d53c 100644
> --- a/drivers/net/ethernet/qualcomm/qca_7k.c
> +++ b/drivers/net/ethernet/qualcomm/qca_7k.c
> @@ -23,11 +23,7 @@
>   *   kernel-based SPI device.
>   */
>  
> -#include 
> -#include 
> -#include 
>  #include 
> -#include 
>  
>  #include "qca_7k.h"
>  
> -- 
> 2.1.4
> 

Changes like this drive me crazy.

The only reason you can remove those headers is because you are obtaining
things indirectly via qca_7k.h

And if that is indeed the case, you are also getting qca_spi.h which
in turn includes linux/spi/spi.h

So you could have removed that as well.

But seriously, it is so much harder to understand a driver and what
interfaces it needs via header files when you hide _all_ of it behind
these local private header files which just include _everything_
and then _every_ foo.c file in your driver gets _all_ of those kernel
headers whether they need it or not.

So if just one foo.c file needs 20 extra kernel headers than the rest
of the files in the driver, every foo.c file eats that cost of
including them.

I really don't like when drivers move in this direction for that
reason.  And at best, as described at the beginning of my response,
this change is incomplete.

Re: [PATCH net] net/mlx4: Fix the check in attaching steering rules

2017-05-24 Thread David Miller

From: Tariq Toukan 
Date: Tue, 23 May 2017 15:50:07 +0300

> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> index ae5fdc2df654..00a7cd3dcc2e 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> @@ -1562,8 +1562,7 @@ static int mlx4_en_flow_replace(struct net_device *dev,
>   qpn = priv->drop_qp.qpn;
>   else if (cmd->fs.ring_cookie & EN_ETHTOOL_QP_ATTACH) {
>   qpn = cmd->fs.ring_cookie & (EN_ETHTOOL_QP_ATTACH - 1);
> - if (qpn < priv->rss_map.base_qpn ||
> - qpn >= priv->rss_map.base_qpn + priv->rx_ring_num) {
> + if (!mlx4_qp_lookup(priv->mdev->dev, qpn)) {
>   en_warn(priv, "rxnfc: QP (0x%x) doesn't exist\n", qpn);
>   return -EINVAL;
>   }
> diff --git a/drivers/net/ethernet/mellanox/mlx4/qp.c 
> b/drivers/net/ethernet/mellanox/mlx4/qp.c
> index 2d6abd4662b1..1eff2fe32a8b 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/qp.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/qp.c
> @@ -384,6 +384,20 @@ static void mlx4_qp_free_icm(struct mlx4_dev *dev, int 
> qpn)
>   __mlx4_qp_free_icm(dev, qpn);
>  }
>  
> +struct mlx4_qp *mlx4_qp_lookup(struct mlx4_dev *dev, u32 qpn)
 ...
> +EXPORT_SYMBOL_GPL(mlx4_qp_lookup);
> +

This phony separation between MLX4_CORE and MLX4_EN is the only reason
you need this unreasonable symbol export.

I doubt you'll ever use this function anywhere outside of en_ethtool.c
so this export is wasted space in the kernel image.  Probably compiler
could inline it decently as well.

So find another way to do this without the symbol export.  I don't
really want to hear any stories about "clean separation" or whatever.
What's happening here is exactly why this separate modules scheme
results in ugly unreasonable code, and unnecessary gymnastics and
wasted object space just to make routines available in one place from
another.

Re: [PATCH 2/2] at803x: double check SGMII side autoneg

2017-05-24 Thread Andrew Lunn

On Wed, May 24, 2017 at 01:58:09PM -0500, Timur Tabi wrote:
> On 05/24/2017 09:09 AM, Andrew Lunn wrote:
> > It could be, the copper side is up, but the SGMII side is down, at the
> > point at803x_aneg_done() is called. So it is correctly returning
> > 0. Sometime later the SGMII side goes up, but there is not a second
> > interrupt. Hence the phy core does not know that the full, 2 stage MAC
> > to PHY to peer PHY link is now up.
> 
> Ok, I'm going to debug this some more.  It turns out that the MAC side of
> the SGMII link can send an interrupt when it thinks that auto-negotiation is
> done.  I might be able to use this.

You can use this for your board. But it still leaves the phy driver
broken for everybody else.

> What function should my MAC driver call when it wants the phy core to call
> at803x_aneg_done again to see if autonegotiation is done?

You want to trigger the PHY state machine. There is only one exported
API call to do this, phy_mac_interrupt(). But you are supposed to pass
the new link state. And your MAC driver has no idea of that, it does
not know if the copper side of the PHY is up.

So it might be better if you export phy_trigger_machine().

> Also, is there a way for the MAC driver to know that at803x_aneg_done()
> previously returned 0, and that it needs to tell the phy core to check again?

Not that i know of. The MAC layer is not supposed to be messing around
in the PHY layer. However, just triggering the PHY state machine
should be enough.

   Andrew

[PATCH net-next 1/3] net: phy: Create sysfs reciprocal links for attached_dev/phydev

2017-05-24 Thread Florian Fainelli

There is currently no way for a program scanning /sys to know whether a
network device is attached to a particular PHY device, just like the PHY
device is not pointed back to its attached network device.

Create a symbolic link in the network device's namespace named "phydev"
which points to the PHY device and create a symbolic link in the PHY
device's namespace named "attached_dev" that points back to the network
device. These links are set up during phy_attach_direct() and removed
during phy_detach() for symetry.

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/phy_device.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 1219eeab69d1..8151477c7027 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -960,6 +960,15 @@ int phy_attach_direct(struct net_device *dev, struct 
phy_device *phydev,
 
phydev->attached_dev = dev;
dev->phydev = phydev;
+   err = sysfs_create_link(>mdio.dev.kobj, >dev.kobj,
+   "attached_dev");
+   if (err)
+   goto error;
+
+   err = sysfs_create_link(>dev.kobj, >mdio.dev.kobj,
+   "phydev");
+   if (err)
+   goto error;
 
phydev->dev_flags = flags;
 
@@ -1050,6 +1059,8 @@ void phy_detach(struct phy_device *phydev)
struct mii_bus *bus;
int i;
 
+   sysfs_remove_link(>dev.kobj, "phydev");
+   sysfs_remove_link(>mdio.dev.kobj, "attached_dev");
phydev->attached_dev->phydev = NULL;
phydev->attached_dev = NULL;
phy_suspend(phydev);
-- 
2.9.3

[PATCH net-next 3/3] net: sysfs: Document PHY device sysfs attributes

2017-05-24 Thread Florian Fainelli

Document the different sysfs attributes that exist for PHY devices:
attached_dev, phy_has_fixups, phy_id and phy_interface.

Signed-off-by: Florian Fainelli 
---
 Documentation/ABI/testing/sysfs-class-net-phydev | 32 
 1 file changed, 32 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-net-phydev

diff --git a/Documentation/ABI/testing/sysfs-class-net-phydev 
b/Documentation/ABI/testing/sysfs-class-net-phydev
new file mode 100644
index ..a05831667c3d
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-net-phydev
@@ -0,0 +1,32 @@
+What:  /sys/class/mdio_bus///attached_dev
+Date:  May 2017
+KernelVersion: 4.13
+Contact:   netdev@vger.kernel.org
+Description:
+   Symbolic link to the network device this PHY device is
+   attached to.
+
+What:  /sys/class/mdio_bus///phy_has_fixups
+Date:  February 2014
+KernelVersion: 3.15
+Contact:   netdev@vger.kernel.org
+Description:
+   Boolean value indicating whether the PHY device has
+   any fixups registered against it (phy_register_fixup)
+
+What:  /sys/class/mdio_bus///phy_id
+Date:  November 2012
+KernelVersion: 3.8
+Contact:   netdev@vger.kernel.org
+Description:
+   32-bit hexadecimal value corresponding to the PHY device's OUI,
+   model and revision number.
+
+What:  /sys/class/mdio_bus///phy_interface
+Date:  February 2014
+KernelVersion: 3.15
+Contact:   netdev@vger.kernel.org
+Description:
+   String value indicating the PHY interface, possible
+   values are in include/linux/phy.h function phy_modes.
+
-- 
2.9.3

[PATCH net-next 2/3] net: sysfs: Document "phydev" symbolic link

2017-05-24 Thread Florian Fainelli

Now that we link the network device to its PHY device, document this
sysfs symbolic link.

Signed-off-by: Florian Fainelli 
---
 Documentation/ABI/testing/sysfs-class-net | 8 
 1 file changed, 8 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-class-net 
b/Documentation/ABI/testing/sysfs-class-net
index 668604fc8e06..6856da99b6f7 100644
--- a/Documentation/ABI/testing/sysfs-class-net
+++ b/Documentation/ABI/testing/sysfs-class-net
@@ -251,3 +251,11 @@ Contact:   netdev@vger.kernel.org
 Description:
Indicates the unique physical switch identifier of a switch this
port belongs to, as a string.
+
+What:  /sys/class/net//phydev
+Date:  May 2017
+KernelVersion: 4.13
+Contact:   netdev@vger.kernel.org
+Description:
+   Symbolic link to the PHY device this network device is attached
+   to.
-- 
2.9.3

[PATCH net-next 0/3] net: phy: Create sysfs reciprocal links for attached_dev/phydev

2017-05-24 Thread Florian Fainelli

Hi David, Andrew,

This patch series addresses a device topology shortcoming where a program
scanning /sys would not be able to establish a mapping between the network
device and the PHY device.

In the process it turned out that no PHY device documentation existed for
sysfs attributes.

Thanks!

Florian Fainelli (3):
  net: phy: Create sysfs reciprocal links for attached_dev/phydev
  net: sysfs: Document "phydev" symbolic link
  net: sysfs: Document PHY device sysfs attributes

 Documentation/ABI/testing/sysfs-class-net|  8 ++
 Documentation/ABI/testing/sysfs-class-net-phydev | 32 
 drivers/net/phy/phy_device.c | 11 
 3 files changed, 51 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-net-phydev

-- 
2.9.3

RE: [PATCH net-next 1/8] net: ipv4: refactor __ip_route_output_key_hash

2017-05-24 Thread Rosen, Rami

Hi, Rupa /David Ahern,

First, thanks for this patch set!

Second, it seems to me that something might be incorrect here.

You have these additions in this patch  (1/8):
...
+struct rtable *ip_route_output_key_hash_rcu(struct net *net, struct flowi4 
*flp,
+   const struct sk_buff *skb,
+   struct fib_result *res);
...
+struct rtable *ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
+   const struct sk_buff *skb)
+{
+   struct fib_result res;
+   struct rtable *rth;
+
+   res.tclassid= 0;
+   res.fi  = NULL;
+   res.table   = NULL;
+
+   rcu_read_lock();
+   rth = ip_route_output_key_hash_rcu(net, fl4, , mp_hash);
rcu_read_unlock();
+
return rth;
 }
-EXPORT_SYMBOL_GPL(__ip_route_output_key_hash);
+EXPORT_SYMBOL_GPL(ip_route_output_key_hash);
 

So the third parameter to ip_route_output_key_hash_rcu() should be skb*, and 
the fourth parameter should be fib_result *. However, you do not pass the skb 
parameter 
when calling ip_route_output_key_hash_rcu() in 
ip_route_output_key_hash()  (in fact you don't use it at all),  and you pass 
mp_hash as the fourth parameter.

Regards,
Rami Rosen
Intel Corporation

Re: [PATCH] net: rtnetlink: bail out from rtnl_fdb_dump() on parse error

2017-05-24 Thread David Miller

From: Alexander Potapenko 
Date: Tue, 23 May 2017 13:20:28 +0200

> rtnl_fdb_dump() failed to check the result of nlmsg_parse(), which led
> to contents of |ifm| being uninitialized because nlh->nlmsglen was too
> small to accommodate |ifm|. The uninitialized data may affect some
> branches and result in unwanted effects, although kernel data doesn't
> seem to leak to the userspace directly.
> 
> The bug has been detected with KMSAN and syzkaller.
> 
> Signed-off-by: Alexander Potapenko 

Applied, thanks.

Re: pull-request: mac80211 2017-05-23

2017-05-24 Thread David Miller

From: Johannes Berg 
Date: Tue, 23 May 2017 14:42:55 +0200

> I have just two fixes here, one of the scheduled scan issue that
> Sander Eikelenboom found, and the other properly makes mesh more
> strictly check its extension headers.
> 
> Please pull and let me know if there's any problem.

Pulled, thanks Johannes.

Re: [PATCH] net: rtnetlink: bail out from rtnl_fdb_dump() on parse error

2017-05-24 Thread Greg Rose

On Tue, 2017-05-23 at 13:20 +0200, Alexander Potapenko wrote:
> rtnl_fdb_dump() failed to check the result of nlmsg_parse(), which led
> to contents of |ifm| being uninitialized because nlh->nlmsglen was too
> small to accommodate |ifm|. The uninitialized data may affect some
> branches and result in unwanted effects, although kernel data doesn't
> seem to leak to the userspace directly.
> 
> The bug has been detected with KMSAN and syzkaller.
> 
> Signed-off-by: Alexander Potapenko 
> ---
> For the record, here is the KMSAN report:
> 
> ==
> BUG: KMSAN: use of unitialized memory in rtnl_fdb_dump+0x5dc/0x1000
> CPU: 0 PID: 1039 Comm: probe Not tainted 4.11.0-rc5+ #2727
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:16
>  dump_stack+0x143/0x1b0 lib/dump_stack.c:52
>  kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:1007
>  __kmsan_warning_32+0x66/0xb0 mm/kmsan/kmsan_instr.c:491
>  rtnl_fdb_dump+0x5dc/0x1000 net/core/rtnetlink.c:3230
>  netlink_dump+0x84f/0x1190 net/netlink/af_netlink.c:2168
>  __netlink_dump_start+0xc97/0xe50 net/netlink/af_netlink.c:2258
>  netlink_dump_start ./include/linux/netlink.h:165
>  rtnetlink_rcv_msg+0xae9/0xb40 net/core/rtnetlink.c:4094
>  netlink_rcv_skb+0x339/0x5a0 net/netlink/af_netlink.c:2339
>  rtnetlink_rcv+0x83/0xa0 net/core/rtnetlink.c:4110
>  netlink_unicast_kernel net/netlink/af_netlink.c:1272
>  netlink_unicast+0x13b7/0x1480 net/netlink/af_netlink.c:1298
>  netlink_sendmsg+0x10b8/0x10f0 net/netlink/af_netlink.c:1844
>  sock_sendmsg_nosec net/socket.c:633
>  sock_sendmsg net/socket.c:643
>  ___sys_sendmsg+0xd4b/0x10f0 net/socket.c:1997
>  __sys_sendmsg net/socket.c:2031
>  SYSC_sendmsg+0x2c6/0x3f0 net/socket.c:2042
>  SyS_sendmsg+0x87/0xb0 net/socket.c:2038
>  do_syscall_64+0x102/0x150 arch/x86/entry/common.c:285
>  entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
> RIP: 0033:0x401300
> RSP: 002b:7ffc3b0e6d58 EFLAGS: 0246 ORIG_RAX: 002e
> RAX: ffda RBX: 004002b0 RCX: 00401300
> RDX:  RSI: 7ffc3b0e6d80 RDI: 0003
> RBP: 7ffc3b0e6e00 R08: 000b R09: 0004
> R10: 000d R11: 0246 R12: 
> R13: 004065a0 R14: 00406630 R15: 
> origin: 8fe00056
>  save_stack_trace+0x59/0x60 arch/x86/kernel/stacktrace.c:59
>  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:352
>  kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:247
>  kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:260
>  slab_alloc_node mm/slub.c:2743
>  __kmalloc_node_track_caller+0x1f4/0x390 mm/slub.c:4349
>  __kmalloc_reserve net/core/skbuff.c:138
>  __alloc_skb+0x2cd/0x740 net/core/skbuff.c:231
>  alloc_skb ./include/linux/skbuff.h:933
>  netlink_alloc_large_skb net/netlink/af_netlink.c:1144
>  netlink_sendmsg+0x934/0x10f0 net/netlink/af_netlink.c:1819
>  sock_sendmsg_nosec net/socket.c:633
>  sock_sendmsg net/socket.c:643
>  ___sys_sendmsg+0xd4b/0x10f0 net/socket.c:1997
>  __sys_sendmsg net/socket.c:2031
>  SYSC_sendmsg+0x2c6/0x3f0 net/socket.c:2042
>  SyS_sendmsg+0x87/0xb0 net/socket.c:2038
>  do_syscall_64+0x102/0x150 arch/x86/entry/common.c:285
>  return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
> ==
> 
> and the reproducer:
> 
> ==
>   #include 
>   #include 
>   #include 
>   #include 
>   
>   int main()
>   {
> int sock = socket(PF_NETLINK, SOCK_DGRAM | SOCK_NONBLOCK, 0);
> struct msghdr msg;
> memset(, 0, sizeof(msg));
> char nlmsg_buf[32];
> memset(nlmsg_buf, 0, sizeof(nlmsg_buf));
> struct nlmsghdr *nlmsg = nlmsg_buf;
> nlmsg->nlmsg_len = 0x11;
> nlmsg->nlmsg_type = 0x1e; // RTM_NEWROUTE = RTM_BASE + 0x0e
> // type = 0x0e = 1110b
> // kind = 2
> nlmsg->nlmsg_flags = 0x101; // NLM_F_ROOT | NLM_F_REQUEST
> nlmsg->nlmsg_seq = 0;
> nlmsg->nlmsg_pid = 0;
> nlmsg_buf[16] = (char)7;
> struct iovec iov;
> iov.iov_base = nlmsg_buf;
> iov.iov_len = 17;
> msg.msg_iov = 
> msg.msg_iovlen = 1;
> sendmsg(sock, , 0);
> return 0;
>   }
> ==
> ---
>  net/core/rtnetlink.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 49a279a7cc15..9e2c0a7cb325 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -3231,8 +3231,11 @@ static int rtnl_fdb_dump(struct sk_buff *skb, struct 
> netlink_callback *cb)
>   int err = 0;
>   int fidx = 0;
>  
> - if (nlmsg_parse(cb->nlh, sizeof(struct ifinfomsg), tb,
> - IFLA_MAX, ifla_policy, NULL) == 0) {
> + err =

Re: [PATCH v2] net: fec: add post PHY reset delay DT property

2017-05-24 Thread David Miller

From: Quentin Schulz 
Date: Tue, 23 May 2017 11:48:08 +0200

> Some PHY require to wait for a bit after the reset GPIO has been
> toggled. This adds support for the DT property `phy-reset-post-delay`
> which gives the delay in milliseconds to wait after reset.
> 
> If the DT property is not given, no delay is observed. Post reset delay
> greater than 1000ms are invalid.
> 
> Signed-off-by: Quentin Schulz 

Applied, thanks.

Re: [PATCH net-next 2/4] nfp: register ports as devlink ports

2017-05-24 Thread Jakub Kicinski

On Wed, 24 May 2017 14:35:14 +0200, Jiri Pirko wrote:
> >+void nfp_devlink_port_unregister(struct nfp_port *port)
> >+{
> >+/* Due to unpleasant lock ordering we may see the port go away before
> >+ * we have fully probed.  
> 
> Could you elaborate on this a bit more please?

It's partially due to peculiarities of the management FW more than
kernel stuff.  Unfortunately some ethtool media config requires reboot
to be applied, so we print a friendly message to the logs and
unregister the associated netdevs.  Which means once netdevs get
registered ports may go away.

Enter devlink, I need the ability to grab the adapater lock in
split/unsplit callbacks to find the ports, which implies having to drop
that lock before I register devlink.  And only after I register devlink
can I register the ports.

I could do init without registering anything, drop the adapter lock,
register devlink, and then grab the adapter lock back and register
devlink ports and netdevs.  But there is another issue...

Since I look for ports on a list maintained in the adapter struct,
driver code doesn't care if devlink_port has been registered or not.
The moment devlink is registered, split/unsplit requests will be
accepted - potentially trying to unregister devlink_port before the
register could happen.

Further down the line, also, the eswitch mode setting is coming.  Which
means the moment I register devlink itself ports will get shuffled (due
to the plan of registering VFs as ports :)). 

I feel like registering devlink should be the last action of the
driver, really.  My plan was to keep that simple if() for now, and once
we get to extending devlink with SR-IOV stuff also add the ability to
pre-register ports.  Allow registering ports on not-yet-registered
devlink (probably put them on a private list within struct devlink).
This would make devlink_register() a single point when everything
devlink becomes visible, atomically, instead of devlink itself coming
first and then ports following.

Does that make sense?  Am I misreading the code (again :S)?

Re: [PATCH net 0/2] sctp: a bunch of fixes for processing dupcookie

2017-05-24 Thread David Miller

From: Xin Long 
Date: Tue, 23 May 2017 13:28:53 +0800

> After introducing transport hashtable and per stream info into sctp,
> some regressions were caused when processing dupcookie, this patchset
> is to fix them.

Series applied, thanks.

Re: [PATCH net-next] geneve: fix fill_info when using collect_metadata

2017-05-24 Thread Pravin Shelar

On Tue, May 23, 2017 at 3:37 PM, Eric Garver  wrote:
> Since 9b4437a5b870 ("geneve: Unify LWT and netdev handling.") fill_info
> does not return UDP_ZERO_CSUM6_RX when using COLLECT_METADATA. This is
> because it uses ip_tunnel_info_af() with the device level info, which is
> not valid for COLLECT_METADATA.
>
> Fix by checking for the presence of the actual sockets.
>
> Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.")
> Signed-off-by: Eric Garver 
Thanks for the patch.

Acked-by: Pravin B Shelar 


I noticed that the MTU and encal_len calculation in geneve_configure()
also needs to be fixed for collect metadata case. Can you send patch
to fix it?

Re: [PATCH v2 net-next 00/11] qed/qede: Mostly-cleanup series

2017-05-24 Thread David Miller

From: Yuval Mintz 
Date: Tue, 23 May 2017 09:41:17 +0300

> This series contains some cleanup of the qed and qede code:
>  - #1 contains mostly static/endian changes in order to allow qede to
>pass sparse compilation cleanly.
>  - #2, #5 and #6 are either semantic or remove dead-code from driver.
>  - #9, #10 and #11 relate to printing and slightly change some APIs
>between qed and the protocol drivers for that end [sharing the
>interface names and information regarding device].
> 
> The rest of the patches are minor changes/fixes to various flows
> in qed.

Series applied, thanks.

Re: [PATCH 2/2] at803x: double check SGMII side autoneg

2017-05-24 Thread Timur Tabi

On 05/24/2017 09:09 AM, Andrew Lunn wrote:
> It could be, the copper side is up, but the SGMII side is down, at the
> point at803x_aneg_done() is called. So it is correctly returning
> 0. Sometime later the SGMII side goes up, but there is not a second
> interrupt. Hence the phy core does not know that the full, 2 stage MAC
> to PHY to peer PHY link is now up.

Ok, I'm going to debug this some more.  It turns out that the MAC side of
the SGMII link can send an interrupt when it thinks that auto-negotiation is
done.  I might be able to use this.

What function should my MAC driver call when it wants the phy core to call
at803x_aneg_done again to see if autonegotiation is done?

Also, is there a way for the MAC driver to know that at803x_aneg_done()
previously returned 0, and that it needs to tell the phy core to check again?

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[PATCH net-next 1/8] net: ipv4: refactor __ip_route_output_key_hash

2017-05-24 Thread Roopa Prabhu

From: David Ahern 

A later patch wants access to the fib result on an output route lookup
with the rcu lock held. Refactor __ip_route_output_key_hash, pushing
the logic between rcu_read_lock ... rcu_read_unlock into a new helper
that takes the fib_result as an input arg.

To keep the name length under control remove the leading underscores
from the name. _rcu is added to the name of the new helper indicating
it is called with the rcu read lock held.

Signed-off-by: David Ahern 
Signed-off-by: Roopa Prabhu 
---
 include/net/route.h |  7 ++-
 net/ipv4/icmp.c |  2 +-
 net/ipv4/route.c| 57 +++--
 3 files changed, 40 insertions(+), 26 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 2cc0e14..5a92347 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -115,11 +115,16 @@ struct rt_cache_stat {
 void rt_flush_dev(struct net_device *dev);
 struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *flp,
  const struct sk_buff *skb);
+struct rtable *ip_route_output_key_hash(struct net *net, struct flowi4 *flp,
+   const struct sk_buff *skb);
+struct rtable *ip_route_output_key_hash_rcu(struct net *net, struct flowi4 
*flp,
+   const struct sk_buff *skb,
+   struct fib_result *res);
 
 static inline struct rtable *__ip_route_output_key(struct net *net,
   struct flowi4 *flp)
 {
-   return __ip_route_output_key_hash(net, flp, NULL);
+   return ip_route_output_key_hash(net, flp, NULL);
 }
 
 struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 43318b5..5610971 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -489,7 +489,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
fl4->flowi4_oif = l3mdev_master_ifindex(skb_dst(skb_in)->dev);
 
security_skb_classify_flow(skb_in, flowi4_to_flowi(fl4));
-   rt = __ip_route_output_key_hash(net, fl4, skb_in);
+   rt = ip_route_output_key_hash(net, fl4, skb_in);
if (IS_ERR(rt))
return rt;
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 655d9ee..f787208 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2246,29 +2246,22 @@ static struct rtable *__mkroute_output(const struct 
fib_result *res,
  * Major route resolver routine.
  */
 
-struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
- const struct sk_buff *skb)
+struct rtable *ip_route_output_key_hash_rcu(struct net *net, struct flowi4 
*fl4,
+   const struct sk_buff *skb,
+   struct fib_result *res)
 {
struct net_device *dev_out = NULL;
+   int orig_oif = fl4->flowi4_oif;
__u8 tos = RT_FL_TOS(fl4);
unsigned int flags = 0;
-   struct fib_result res;
-   struct rtable *rth;
-   int orig_oif;
int err = -ENETUNREACH;
-
-   res.tclassid= 0;
-   res.fi  = NULL;
-   res.table   = NULL;
-
-   orig_oif = fl4->flowi4_oif;
+   struct rtable *rth;
 
fl4->flowi4_iif = LOOPBACK_IFINDEX;
fl4->flowi4_tos = tos & IPTOS_RT_MASK;
fl4->flowi4_scope = ((tos & RTO_ONLINK) ?
 RT_SCOPE_LINK : RT_SCOPE_UNIVERSE);
 
-   rcu_read_lock();
if (fl4->saddr) {
rth = ERR_PTR(-EINVAL);
if (ipv4_is_multicast(fl4->saddr) ||
@@ -2354,15 +2347,15 @@ struct rtable *__ip_route_output_key_hash(struct net 
*net, struct flowi4 *fl4,
fl4->daddr = fl4->saddr = htonl(INADDR_LOOPBACK);
dev_out = net->loopback_dev;
fl4->flowi4_oif = LOOPBACK_IFINDEX;
-   res.type = RTN_LOCAL;
+   res->type = RTN_LOCAL;
flags |= RTCF_LOCAL;
goto make_route;
}
 
-   err = fib_lookup(net, fl4, , 0);
+   err = fib_lookup(net, fl4, res, 0);
if (err) {
-   res.fi = NULL;
-   res.table = NULL;
+   res->fi = NULL;
+   res->table = NULL;
if (fl4->flowi4_oif &&
(ipv4_is_multicast(fl4->daddr) ||
!netif_index_is_l3_master(net, fl4->flowi4_oif))) {
@@ -2387,17 +2380,17 @@ struct rtable *__ip_route_output_key_hash(struct net 
*net, struct flowi4 *fl4,
if (fl4->saddr == 0)
fl4->saddr = inet_select_addr(dev_out, 0,
  RT_SCOPE_LINK);
-   res.type = RTN_UNICAST;
+   res->type =

[PATCH net-next 2/8] net: ipv4: refactor ip_route_input_noref

2017-05-24 Thread Roopa Prabhu

From: David Ahern 

A later patch wants access to the fib result on an input route lookup
with the rcu lock held. Refactor ip_route_input_noref pushing the logic
between rcu_read_lock ... rcu_read_unlock into a new helper that takes
the fib_result as an input arg.

Signed-off-by: David Ahern 
Signed-off-by: Roopa Prabhu 
---
 include/net/route.h |  3 +++
 net/ipv4/route.c| 66 ++---
 2 files changed, 40 insertions(+), 29 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 5a92347..8d209ad 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -180,6 +180,9 @@ static inline struct rtable *ip_route_output_gre(struct net 
*net, struct flowi4
 
 int ip_route_input_noref(struct sk_buff *skb, __be32 dst, __be32 src,
 u8 tos, struct net_device *devin);
+int ip_route_input_rcu(struct sk_buff *skb, __be32 dst, __be32 src,
+  u8 tos, struct net_device *devin,
+  struct fib_result *res);
 
 static inline int ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src,
 u8 tos, struct net_device *devin)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f787208..fe7b765 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1852,9 +1852,9 @@ static int ip_mkroute_input(struct sk_buff *skb,
  */
 
 static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
-  u8 tos, struct net_device *dev)
+  u8 tos, struct net_device *dev,
+  struct fib_result *res)
 {
-   struct fib_result res;
struct in_device *in_dev = __in_dev_get_rcu(dev);
struct ip_tunnel_info *tun_info;
struct flowi4   fl4;
@@ -1884,8 +1884,8 @@ static int ip_route_input_slow(struct sk_buff *skb, 
__be32 daddr, __be32 saddr,
if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr))
goto martian_source;
 
-   res.fi = NULL;
-   res.table = NULL;
+   res->fi = NULL;
+   res->table = NULL;
if (ipv4_is_lbcast(daddr) || (saddr == 0 && daddr == 0))
goto brd_input;
 
@@ -1921,17 +1921,17 @@ static int ip_route_input_slow(struct sk_buff *skb, 
__be32 daddr, __be32 saddr,
fl4.daddr = daddr;
fl4.saddr = saddr;
fl4.flowi4_uid = sock_net_uid(net, NULL);
-   err = fib_lookup(net, , , 0);
+   err = fib_lookup(net, , res, 0);
if (err != 0) {
if (!IN_DEV_FORWARD(in_dev))
err = -EHOSTUNREACH;
goto no_route;
}
 
-   if (res.type == RTN_BROADCAST)
+   if (res->type == RTN_BROADCAST)
goto brd_input;
 
-   if (res.type == RTN_LOCAL) {
+   if (res->type == RTN_LOCAL) {
err = fib_validate_source(skb, saddr, daddr, tos,
  0, dev, in_dev, );
if (err < 0)
@@ -1943,10 +1943,10 @@ static int ip_route_input_slow(struct sk_buff *skb, 
__be32 daddr, __be32 saddr,
err = -EHOSTUNREACH;
goto no_route;
}
-   if (res.type != RTN_UNICAST)
+   if (res->type != RTN_UNICAST)
goto martian_destination;
 
-   err = ip_mkroute_input(skb, , in_dev, daddr, saddr, tos);
+   err = ip_mkroute_input(skb, res, in_dev, daddr, saddr, tos);
 out:   return err;
 
 brd_input:
@@ -1960,14 +1960,14 @@ static int ip_route_input_slow(struct sk_buff *skb, 
__be32 daddr, __be32 saddr,
goto martian_source;
}
flags |= RTCF_BROADCAST;
-   res.type = RTN_BROADCAST;
+   res->type = RTN_BROADCAST;
RT_CACHE_STAT_INC(in_brd);
 
 local_input:
do_cache = false;
-   if (res.fi) {
+   if (res->fi) {
if (!itag) {
-   rth = rcu_dereference(FIB_RES_NH(res).nh_rth_input);
+   rth = rcu_dereference(FIB_RES_NH(*res).nh_rth_input);
if (rt_cache_valid(rth)) {
skb_dst_set_noref(skb, >dst);
err = 0;
@@ -1978,7 +1978,7 @@ static int ip_route_input_slow(struct sk_buff *skb, 
__be32 daddr, __be32 saddr,
}
 
rth = rt_dst_alloc(l3mdev_master_dev_rcu(dev) ? : net->loopback_dev,
-  flags | RTCF_LOCAL, res.type,
+  flags | RTCF_LOCAL, res->type,
   IN_DEV_CONF_GET(in_dev, NOPOLICY), false, do_cache);
if (!rth)
goto e_nobufs;
@@ -1988,18 +1988,18 @@ static int ip_route_input_slow(struct sk_buff *skb, 
__be32 daddr, __be32 saddr,
rth->dst.tclassid = itag;
 #endif
rth->rt_is_input = 1;
-   if (res.table)
-   rth->rt_table_id = res.table->tb_id;
+   if (res->table)
+

[PATCH net-next 8/8] net: ipv6: RTM_GETROUTE: return matched fib result when requested

2017-05-24 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds support to return matched fib result when RTM_F_FIB_MATCH
flag is specified in RTM_GETROUTE request. This is useful for user-space
applications/controllers wanting to query a matching route.

Signed-off-by: Roopa Prabhu 
---
 net/ipv6/route.c | 36 +---
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 80bda31..c4d6da9 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3612,6 +3612,7 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh,
struct rtmsg *rtm;
struct flowi6 fl6;
int err, iif = 0, oif = 0;
+   bool fibmatch;
 
err = nlmsg_parse(nlh, sizeof(*rtm), tb, RTA_MAX, rtm_ipv6_policy,
  extack);
@@ -3622,6 +3623,7 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh,
memset(, 0, sizeof(fl6));
rtm = nlmsg_data(nlh);
fl6.flowlabel = ip6_make_flowinfo(rtm->rtm_tos, 0);
+   fibmatch = (rtm->rtm_flags & RTM_F_FIB_MATCH) ? true : false;
 
if (tb[RTA_SRC]) {
if (nla_len(tb[RTA_SRC]) < sizeof(struct in6_addr))
@@ -3667,12 +3669,27 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh,
if (!ipv6_addr_any())
flags |= RT6_LOOKUP_F_HAS_SADDR;
 
-   rt = (struct rt6_info *)ip6_route_input_lookup(net, dev, ,
-  flags);
+   if (!fibmatch)
+   rt = (struct rt6_info *)ip6_route_input_lookup(net, dev,
+  ,
+  flags);
} else {
fl6.flowi6_oif = oif;
 
-   rt = (struct rt6_info *)ip6_route_output(net, NULL, );
+   if (!fibmatch)
+   rt = (struct rt6_info *)ip6_route_output_flags(net,
+  NULL,
+  , 0);
+   }
+
+   if (fibmatch) {
+   rt = (struct rt6_info *)ip6_route_lookup(net, , 0);
+   if (rt->dst.error) {
+   err = rt->dst.error;
+   ip6_rt_put(rt);
+   goto errout;
+   }
+
}
 
if (rt == net->ipv6.ip6_null_entry) {
@@ -3689,10 +3706,15 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh,
}
 
skb_dst_set(skb, >dst);
-
-   err = rt6_fill_node(net, skb, rt, , , iif,
-   RTM_NEWROUTE, NETLINK_CB(in_skb).portid,
-   nlh->nlmsg_seq, 0);
+   if (fibmatch) {
+   err = rt6_fill_node(net, skb, rt, NULL, NULL, iif,
+   RTM_NEWROUTE, NETLINK_CB(in_skb).portid,
+   nlh->nlmsg_seq, 0);
+   } else {
+   err = rt6_fill_node(net, skb, rt, , , iif,
+   RTM_NEWROUTE, NETLINK_CB(in_skb).portid,
+   nlh->nlmsg_seq, 0);
+   }
if (err < 0) {
kfree_skb(skb);
goto errout;
-- 
1.9.1

[PATCH net-next 0/8] net: extend RTM_GETROUTE to return fib result

2017-05-24 Thread Roopa Prabhu

From: Roopa Prabhu 

This series adds a new RTM_F_FIB_MATCH flag to return matched fib result
with RTM_GETROUTE. This is useful for applications and protocols in
userspace wanting to query the selected route.

examples (with patched iproute2):
ipv4:

$ip route show
default via 192.168.0.2 dev eth0
10.0.14.0/24
nexthop via 172.16.0.3  dev dummy0 weight 1
nexthop via 172.16.1.3  dev dummy1 weight 1

$ip route get 10.0.14.2
10.0.14.2 via 172.16.1.3 dev dummy1  src 172.16.1.1 
cache 

$ip route get fibmatch 10.0.14.2
10.0.14.0/24
nexthop via 172.16.0.3  dev dummy0 weight 1
nexthop via 172.16.1.3  dev dummy1 weight 1

ipv6:

$ip -6 route show
2001:db9:100::/120  metric 1024 
nexthop via 2001:db8:2::2  dev dummy0 weight 1
nexthop via 2001:db8:12::2  dev dummy1 weight 1

$ip -6 route get 2001:db9:100::1
2001:db9:100::1 from :: via 2001:db8:12::2 dev dummy1  src 2001:db8:12::1  
metric 1024  pref medium

$ip -6 route get fibmatch 2001:db9:100::1
2001:db9:100::/120  metric 1024 
nexthop via 2001:db8:12::2  dev dummy1 weight 1
nexthop via 2001:db8:2::2  dev dummy0 weight 1


David Ahern (5):
  net: ipv4: refactor __ip_route_output_key_hash
  net: ipv4: refactor ip_route_input_noref
  net: ipv4: Remove event arg to rt_fill_info
  net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup
  net: ipv4: Save trie prefix to fib lookup result

Roopa Prabhu (3):
  net: ipv4: add new RTM_F_FIB_MATCH flag for use with RTM_GETROUTE
  net: ipv4: RTM_GETROUTE: return matched fib result when requested
  net: ipv6: RTM_GETROUTE: return matched fib result when requested

 include/net/ip_fib.h   |   1 +
 include/net/route.h|  10 ++-
 include/uapi/linux/rtnetlink.h |   1 +
 net/ipv4/fib_trie.c|   1 +
 net/ipv4/icmp.c|   2 +-
 net/ipv4/route.c   | 160 -
 net/ipv6/route.c   |  11 ++-
 7 files changed, 116 insertions(+), 70 deletions(-)

-- 
1.9.1

[PATCH net-next 5/8] net: ipv4: Save trie prefix to fib lookup result

2017-05-24 Thread Roopa Prabhu

From: David Ahern 

Prefix is needed for returning matching route spec on get route request.

Signed-off-by: David Ahern 
Signed-off-by: Roopa Prabhu 
---
 include/net/ip_fib.h | 1 +
 net/ipv4/fib_trie.c  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 42e8b8f..25f5c51 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -136,6 +136,7 @@ struct fib_info {
 
 struct fib_table;
 struct fib_result {
+   __be32  prefix;
unsigned char   prefixlen;
unsigned char   nh_sel;
unsigned char   type;
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 6d0f6c79..6e9df7d 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1452,6 +1452,7 @@ int fib_table_lookup(struct fib_table *tb, const struct 
flowi4 *flp,
if (!(fib_flags & FIB_LOOKUP_NOREF))
atomic_inc(>fib_clntref);
 
+   res->prefix = htonl(n->key);
res->prefixlen = KEYLENGTH - fa->fa_slen;
res->nh_sel = nhsel;
res->type = fa->fa_type;
-- 
1.9.1

[PATCH net-next 3/8] net: ipv4: Remove event arg to rt_fill_info

2017-05-24 Thread Roopa Prabhu

From: David Ahern 

rt_fill_info has 1 caller with the event set to RTM_NEWROUTE. Given that
remove the arg and use RTM_NEWROUTE directly in rt_fill_info.

Signed-off-by: David Ahern 
Signed-off-by: Roopa Prabhu 
---
 net/ipv4/route.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index fe7b765..9699e9b 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2536,7 +2536,7 @@ struct rtable *ip_route_output_flow(struct net *net, 
struct flowi4 *flp4,
 
 static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
-   u32 seq, int event)
+   u32 seq)
 {
struct rtable *rt = skb_rtable(skb);
struct rtmsg *r;
@@ -2545,7 +2545,7 @@ static int rt_fill_info(struct net *net,  __be32 dst, 
__be32 src, u32 table_id,
u32 error;
u32 metrics[RTAX_MAX];
 
-   nlh = nlmsg_put(skb, portid, seq, event, sizeof(*r), 0);
+   nlh = nlmsg_put(skb, portid, seq, RTM_NEWROUTE, sizeof(*r), 0);
if (!nlh)
return -EMSGSIZE;
 
@@ -2745,8 +2745,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh,
table_id = rt->rt_table_id;
 
err = rt_fill_info(net, dst, src, table_id, , skb,
-  NETLINK_CB(in_skb).portid, nlh->nlmsg_seq,
-  RTM_NEWROUTE);
+  NETLINK_CB(in_skb).portid, nlh->nlmsg_seq);
if (err < 0)
goto errout_free;
 
-- 
1.9.1

[PATCH net-next 7/8] net: ipv4: RTM_GETROUTE: return matched fib result when requested

2017-05-24 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds support to return matched fib result when RTM_F_FIB_MATCH
flag is specified in RTM_GETROUTE request. This is useful for user-space
applications/controllers wanting to query a matching route.

Signed-off-by: Roopa Prabhu 
---
 net/ipv4/route.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 6e0bd40..419bdba 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -114,6 +114,8 @@
 #include 
 #include 
 
+#include "fib_lookup.h"
+
 #define RT_FL_TOS(oldflp4) \
((oldflp4)->flowi4_tos & (IPTOS_RT_MASK | RTO_ONLINK))
 
@@ -2746,8 +2748,15 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh,
if (rtm->rtm_flags & RTM_F_LOOKUP_TABLE)
table_id = rt->rt_table_id;
 
-   err = rt_fill_info(net, dst, src, table_id, , skb,
-  NETLINK_CB(in_skb).portid, nlh->nlmsg_seq, rt);
+   if (rtm->rtm_flags & RTM_F_FIB_MATCH)
+   err = fib_dump_info(skb, NETLINK_CB(in_skb).portid,
+   nlh->nlmsg_seq, RTM_NEWROUTE, table_id,
+   rt->rt_type, res.prefix, res.prefixlen,
+   fl4.flowi4_tos, res.fi, 0);
+   else
+   err = rt_fill_info(net, dst, src, table_id, , skb,
+  NETLINK_CB(in_skb).portid, nlh->nlmsg_seq,
+  rt);
if (err < 0)
goto errout_free;
 
-- 
1.9.1

[PATCH net-next 6/8] net: ipv4: add new RTM_F_FIB_MATCH flag for use with RTM_GETROUTE

2017-05-24 Thread Roopa Prabhu

From: Roopa Prabhu 

This flag when specified will return matched fib result in
response to a RTM_GETROUTE query.

Signed-off-by: Roopa Prabhu 
---
 include/uapi/linux/rtnetlink.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 6487b21..564790e 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -278,6 +278,7 @@ enum rt_scope_t {
 #define RTM_F_EQUALIZE 0x400   /* Multipath equalizer: NI  */
 #define RTM_F_PREFIX   0x800   /* Prefix addresses */
 #define RTM_F_LOOKUP_TABLE 0x1000  /* set rtm_table to FIB lookup result */
+#define RTM_F_FIB_MATCH0x2000  /* return full fib lookup match 
*/
 
 /* Reserved table identifiers */
 
-- 
1.9.1

[PATCH net-next 4/8] net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup

2017-05-24 Thread Roopa Prabhu

From: David Ahern 

Convert inet_rtm_getroute to use ip_route_input_rcu and
ip_route_output_key_hash_rcu passing the fib_result arg to both.
The rcu lock is held through the creation of the response, so the
rtable/dst does not need to be attached to the skb and is passed
to rt_fill_info directly.

In converting from ip_route_output_key to ip_route_output_key_hash_rcu
the xfrm_lookup_route in ip_route_output_flow is dropped since
flowi4_proto is not set for a route get request.

Signed-off-by: David Ahern 
Signed-off-by: Roopa Prabhu 
---
 net/ipv4/route.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 9699e9b..6e0bd40 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2404,7 +2404,7 @@ struct rtable *ip_route_output_key_hash_rcu(struct net 
*net, struct flowi4 *fl4,
}
 
/* L3 master device is the loopback for that domain */
-   dev_out = l3mdev_master_dev_rcu(FIB_RES_DEV(res)) ? :
+   dev_out = l3mdev_master_dev_rcu(FIB_RES_DEV(*res)) ? :
net->loopback_dev;
fl4->flowi4_oif = dev_out->ifindex;
flags |= RTCF_LOCAL;
@@ -2435,7 +2435,7 @@ struct rtable *ip_route_output_key_hash(struct net *net, 
struct flowi4 *fl4,
res.table   = NULL;
 
rcu_read_lock();
-   rth = ip_route_output_key_hash_rcu(net, fl4, , mp_hash);
+   rth = ip_route_output_key_hash_rcu(net, fl4, skb, );
rcu_read_unlock();
 
return rth;
@@ -2534,11 +2534,11 @@ struct rtable *ip_route_output_flow(struct net *net, 
struct flowi4 *flp4,
 }
 EXPORT_SYMBOL_GPL(ip_route_output_flow);
 
+/* called with rcu_read_lock held */
 static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
-   u32 seq)
+   u32 seq, struct rtable *rt)
 {
-   struct rtable *rt = skb_rtable(skb);
struct rtmsg *r;
struct nlmsghdr *nlh;
unsigned long expires = 0;
@@ -2653,6 +2653,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh,
struct net *net = sock_net(in_skb->sk);
struct rtmsg *rtm;
struct nlattr *tb[RTA_MAX+1];
+   struct fib_result res = {};
struct rtable *rt = NULL;
struct flowi4 fl4;
__be32 dst = 0;
@@ -2709,10 +2710,12 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh,
fl4.flowi4_mark = mark;
fl4.flowi4_uid = uid;
 
+   rcu_read_lock();
+
if (iif) {
struct net_device *dev;
 
-   dev = __dev_get_by_index(net, iif);
+   dev = dev_get_by_index_rcu(net, iif);
if (!dev) {
err = -ENODEV;
goto errout_free;
@@ -2721,14 +2724,14 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh,
skb->protocol   = htons(ETH_P_IP);
skb->dev= dev;
skb->mark   = mark;
-   err = ip_route_input(skb, dst, src, rtm->rtm_tos, dev);
+   err = ip_route_input_rcu(skb, dst, src, rtm->rtm_tos,
+dev, );
 
rt = skb_rtable(skb);
if (err == 0 && rt->dst.error)
err = -rt->dst.error;
} else {
-   rt = ip_route_output_key(net, );
-
+   rt = ip_route_output_key_hash_rcu(net, , skb, );
err = 0;
if (IS_ERR(rt))
err = PTR_ERR(rt);
@@ -2737,7 +2740,6 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh,
if (err)
goto errout_free;
 
-   skb_dst_set(skb, >dst);
if (rtm->rtm_flags & RTM_F_NOTIFY)
rt->rt_flags |= RTCF_NOTIFY;
 
@@ -2745,15 +2747,18 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh,
table_id = rt->rt_table_id;
 
err = rt_fill_info(net, dst, src, table_id, , skb,
-  NETLINK_CB(in_skb).portid, nlh->nlmsg_seq);
+  NETLINK_CB(in_skb).portid, nlh->nlmsg_seq, rt);
if (err < 0)
goto errout_free;
 
+   rcu_read_unlock();
+
err = rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid);
 errout:
return err;
 
 errout_free:
+   rcu_read_unlock();
kfree_skb(skb);
goto errout;
 }
-- 
1.9.1

Re: [kernel-hardening] [PATCH v4 next 0/3] modules: automatic module loading restrictions

2017-05-24 Thread Djalal Harouni

On Tue, May 23, 2017 at 9:48 AM, Solar Designer  wrote:
>> >>> On Mon, May 22, 2017 at 2:08 PM, Solar Designer  
>> >>> wrote:
>> >>> > On Mon, May 22, 2017 at 01:57:03PM +0200, Djalal Harouni wrote:
>> >>> >> *) When modules_autoload_mode is set to (2), automatic module loading 
>> >>> >> is
>> >>> >> disabled for all. Once set, this value can not be changed.
>> >>> >
>> >>> > What purpose does this securelevel-like property ("Once set, this value
>> >>> > can not be changed.") serve here?  I think this mode 2 is needed, but
>> >>> > without this extra property, which is bypassable by e.g. explicitly
>> >>> > loaded kernel modules anyway (and that's OK).
>
> On Mon, May 22, 2017 at 04:07:56PM -0700, Kees Cook wrote:
>> I'm on the fence. For modules_disabled and Yama, it was tied to
>> CAP_SYS_ADMIN, basically designed to be a at-boot setting that could
>> not later be undone by an attacker gaining that privilege, keeping
>> them out of either kernel memory or existing user process memory.
>> Here, it's CAP_SYS_MODULE... it's hard to imagine the situation where
>> a CAP_SYS_MODULE-capable process could write to this sysctl but NOT
>> issue direct modprobe requests, but it's _possible_ via crazy symlink
>> games to trick capable processes into writing to sysctls. We've seen
>> this multiple times before, and it's a way for attackers to turn a
>> single privileged write into a privileged exec.
>
> OK, tricking a process via crazy symlink games is finally a potentially
> valid reason.  The question then becomes: are there perhaps so many
> other important sysctl's, disk files, etc. (which the vulnerable capable
> process could similarly be tricked into writing) so that specifically
> resetting modules_autoload_mode isn't particularly lucrative?  I think
> that the answer to that is usually yes.  Another related question: do we
> really want to inconsistently single out a handful of sysctl's for this
> kind of extra protection?  I think not.
>
> I agree there are some other settings where being unable to reset them
> makes sense, but I think this isn't one of those.
>

Alright, I already replied to Andy, since it was requested I will drop
it. I definitely prefer that we have something merged and usable ;-)


>> I might turn the question around, though: why would we want to have it
>> changeable at this setting?
>
> Convenience for the sysadmin - being able to correct one's error (e.g.,
> wrong order of shell commands), respond to new findings (thought module
> autoloading was unneeded after some point, then found out some software
> relies on it), change one's mind, reuse a system differently than
> originally intended without a forced reboot.
>
>> I'm fine leaving that piece off, either way.
>
> I'm also fine with either decision.  I just thought I'd point out what
> looked weird to me.
>
> I think this is an important patch that should get in, but primarily
> for modules_autoload_mode=1, which many distros could make the default
> (and maybe the kernel eventually should?)

I do not think that desktop, or interactive systems will set it.
Ubuntu has snaps for their app store, and they can use the per-task
flag and other vendors too Flatpak, etc.


> For modules_autoload_mode=2, we already seem to have the equivalent of
> modprobe=/bin/true (or does it differ subtly, maybe in return values?),
> which I already use at startup on a GPU box like this (preloading
> modules so that the OpenCL backends wouldn't need the autoloading):
>
> nvidia-smi
> nvidia-modprobe -u -c=0
> #modprobe nvidia_uvm
> #modprobe fglrx
>
> sysctl -w kernel.modprobe=/bin/true
> sysctl -w kernel.hotplug=/bin/true
>
> but it's good to also have this supported more explicitly and more
> consistently through modules_autoload_mode=2 while we're at it.  So I
> support having this mode as well.  I just question the need to have it
> non-resettable.
>

Ok, yes with mode=2 it is clear interface, it logs what was denied, it
is consistent with the per-task flag that we want for sandboxes and
desktop, it will work with CONFIG_STATIC_USERMODEHELPER, more
importantly it will avoid doing the usermod upcall, even if one day
the kernel support upcall into namespaces, I'm pretty sure that no one
will like the idea of namespaced modprobe paths, modules are global
anyway, this allows to say we are safe by default from any future
change that may make an upcall into the wrong context, we just avoid
that.


-- 
tixxdz

Re: Deleting a dynamic mac entry..

2017-05-24 Thread Manohar Kumar

Thanks, Toshiaki.

What is the right way to set the default_pvid using the bridge command
? I tried this, which fails..

root@net-3:~# ip link set dev vxlan0 name untagged type vlan id 0
RTNETLINK answers: Operation not supported
root@net-3:~#

All the interfaces in the bridge are untagged.

thanks,
Manohar.


On Mon, May 22, 2017 at 8:17 PM, Toshiaki Makita
 wrote:
> On 2017/05/21 11:28, Manohar Kumar wrote:
>> Hello,
>>
>> In 3.19 the following bridge fdb command to delete a dynamically
>> learned entry fails..
>>
>> root@net-3:~# bridge fdb show | grep 02:42:0a:ff:00:06
>> 02:42:0a:ff:00:06 dev vxlan0 master br0
>> root@net-3:~# bridge fdb del 02:42:0a:ff:00:06 dev vxlan0 master
>> RTNETLINK answers: No such file or directory
>>
>> It works in 4.4.
>>
>> Can someone please point to the patch that made this change ?
>
> 25d3b493a52d ("bridge: Fix inability to add non-vlan fdb entry") might
> be what you are looking for, but you might want to do git-bisect to
> track down any regression or fix.
>
>> In kernels without this patch is there an alternative to delete
>> (actually I want to do it programmatically) dynamic mac entries ?
>
> If 25d3b493a52d is causing your problem, set default_pvid to 0 in order
> to disable default_pvid, and delete any vlans which is already
> configured in bridge's vlan_filtering. Then, delete the fdb entry.
>
> Toshiaki Makita
>

Re: [PATCH] nfc: Ensure presence of required attributes in the activate_target netlink handler

2017-05-24 Thread Kees Cook

On Wed, May 24, 2017 at 3:42 AM, Mateusz Jurczyk  wrote:
> Check that the NFC_ATTR_TARGET_INDEX and NFC_ATTR_PROTOCOLS attributes (in
> addition to NFC_ATTR_DEVICE_INDEX) are provided by the netlink client
> prior to accessing them. This prevents potential unhandled NULL pointer
> dereference exceptions which can be triggered by malicious user-mode
> programs, if they omit one or both of these attributes.
>
> Signed-off-by: Mateusz Jurczyk 

Acked-by: Kees Cook 

-Kees

> ---
>  net/nfc/netlink.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/nfc/netlink.c b/net/nfc/netlink.c
> index 6b0850e63e09..b251fb936a27 100644
> --- a/net/nfc/netlink.c
> +++ b/net/nfc/netlink.c
> @@ -907,7 +907,9 @@ static int nfc_genl_activate_target(struct sk_buff *skb, 
> struct genl_info *info)
> u32 device_idx, target_idx, protocol;
> int rc;
>
> -   if (!info->attrs[NFC_ATTR_DEVICE_INDEX])
> +   if (!info->attrs[NFC_ATTR_DEVICE_INDEX] ||
> +   !info->attrs[NFC_ATTR_TARGET_INDEX] ||
> +   !info->attrs[NFC_ATTR_PROTOCOLS])
> return -EINVAL;
>
> device_idx = nla_get_u32(info->attrs[NFC_ATTR_DEVICE_INDEX]);
> --
> 2.13.0.219.gdb65acc882-goog
>



-- 
Kees Cook
Pixel Security

RE: [PATCH v2] drivers: phy: Add Cortina CS4340 driver

2017-05-24 Thread Bogdan Purcareata

> -Original Message-
> From: Florian Fainelli [mailto:f.faine...@gmail.com]
> Sent: Wednesday, May 24, 2017 8:03 PM
> To: Bogdan Purcareata ; Andrew Lunn
> 
> Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH v2] drivers: phy: Add Cortina CS4340 driver
> 
> On 05/24/2017 07:34 AM, Bogdan Purcareata wrote:
> >> You should do that as well.
> >
> > Okay, I will include this check in a probe function in v3.
> 
> When you do so, can you change the subject to be net: phy: Add Cortina
> CS4340 driver for the driver, and use dt-bindings: net:  for the Device
> Tree binding patch?

Will do, thanks!
Bogdan

1 2 >

1 - 100 of 169 matches

Mail list logo