date:20181007

Re: [PATCH net-next 00/11] net: sched: cls_u32 Various improvements

2018-10-07 Thread David Miller

From: Al Viro 
Date: Mon, 8 Oct 2018 06:45:15 +0100

> Er...  Both are due to missing in the very beginning of the series (well, on
> top of "net: sched: cls_u32: fix hnode refcounting") commit

All dependencies like this must be explicitly stated.

And in such situations you actually should wait for the dependency to
get into 'net', eventually get merged into 'net-next', and then you
can submit the stuff that depends upon it.

Not the way this was done.

Re: [PATCH net-next 00/11] net: sched: cls_u32 Various improvements

2018-10-07 Thread Al Viro

On Sun, Oct 07, 2018 at 09:25:01PM -0700, David Miller wrote:
> From: Jamal Hadi Salim 
> Date: Sun,  7 Oct 2018 12:38:00 -0400
> 
> > Various improvements from Al.
> 
> Please submit changes that actually are compile tested:
> 
>   CC [M]  net/sched/cls_u32.o
> net/sched/cls_u32.c: In function ‘u32_delete’:
> net/sched/cls_u32.c:674:6: error: ‘root_ht’ undeclared (first use in this 
> function); did you mean ‘root_user’?
>   if (root_ht == ht) {
>   ^~~
>   root_user
> net/sched/cls_u32.c:674:6: note: each undeclared identifier is reported only 
> once for each function it appears in
> net/sched/cls_u32.c: In function ‘u32_set_parms’:
> net/sched/cls_u32.c:746:15: error: ‘struct tc_u_hnode’ has no member named 
> ‘is_root’
> if (ht_down->is_root) {
>^~

Er...  Both are due to missing in the very beginning of the series (well, on
top of "net: sched: cls_u32: fix hnode refcounting") commit

Author: Al Viro 
Date:   Mon Sep 3 14:39:02 2018 -0400

net: sched: cls_u32: mark root hnode explicitly

... and produce consistent error on attempt to delete such.
Existing check in u32_delete() is inconsistent - after

tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent : protocol ip prio 100 handle 1: u32 
divisor 1
tc filter add dev eth0 parent : protocol ip prio 200 handle 2: u32 
divisor 1

both

tc filter delete dev eth0 parent : protocol ip prio 100 handle 801: u32

and

tc filter delete dev eth0 parent : protocol ip prio 100 handle 800: u32

will fail (at least with refcounting fixes), but the former will complain
about an attempt to remove a busy table, while the latter will recognize
it as root and yield "Not allowed to delete root node" instead.

The problem with the existing check is that several tcf_proto instances 
might
share the same tp->data and handle-to-hnode lookup will be the same for all
of them.  So comparing an hnode to be deleted with tp->root won't catch the
case when one tp is used to try deleting the root of another.  Solution is
trivial - mark the root hnodes explicitly upon allocation and check for 
that.

Signed-off-by: Al Viro 

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index b2c3406a2cf2..c4782aa808c7 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -84,6 +84,7 @@ struct tc_u_hnode {
int refcnt;
unsigned intdivisor;
struct idr  handle_idr;
+   boolis_root;
struct rcu_head rcu;
u32 flags;
/* The 'ht' field MUST be the last field in structure to allow for
@@ -377,6 +378,7 @@ static int u32_init(struct tcf_proto *tp)
root_ht->refcnt++;
root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : 0x8000;
root_ht->prio = tp->prio;
+   root_ht->is_root = true;
idr_init(_ht->handle_idr);
 
if (tp_c == NULL) {
@@ -693,7 +695,7 @@ static int u32_delete(struct tcf_proto *tp, void *arg, bool 
*last,
goto out;
}
 
-   if (root_ht == ht) {
+   if (ht->is_root) {
NL_SET_ERR_MSG_MOD(extack, "Not allowed to delete root node");
return -EINVAL;
}

Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.

2018-10-07 Thread Richard Cochran

On Sun, Oct 07, 2018 at 02:20:33PM -0700, Richard Cochran wrote:
> On Sun, Oct 07, 2018 at 11:14:05PM +0200, Andrew Lunn wrote:
> > I'm currently thinking register_mii_timestamper() should take a netdev
> > argument, and the net_device structure should gain a struct
> > mii_timestamper.

We are going round in circles on this point.  V1 had it this way, but
nobody liked it.  You specifically asked to move the new pointer out
of the netdev and into phydev.

> > But we have to look at the lifetime problems. A phydev does not know
> > what netdev it is associated to until phy_connect() is called. It is
> > at that point you can call register_mii_timestamper().

I had used a netdev notifier on NETDEV_UP for this, but Florian seemed
to suggest using phy_{connect,attach,disconnect} instead.

Thanks,
Richard

Re: [PATCH] net: vhost: remove bad code line

2018-10-07 Thread David Miller

From: xiangxia.m@gmail.com
Date: Sun,  7 Oct 2018 18:41:50 -0700

> From: Tonghao Zhang 
> 
> Signed-off-by: Tonghao Zhang 

Applied.

Re: [PATCH net-next 00/11] net: sched: cls_u32 Various improvements

2018-10-07 Thread David Miller

From: Jamal Hadi Salim 
Date: Sun,  7 Oct 2018 12:38:00 -0400

> Various improvements from Al.

Please submit changes that actually are compile tested:

  CC [M]  net/sched/cls_u32.o
net/sched/cls_u32.c: In function ‘u32_delete’:
net/sched/cls_u32.c:674:6: error: ‘root_ht’ undeclared (first use in this 
function); did you mean ‘root_user’?
  if (root_ht == ht) {
  ^~~
  root_user
net/sched/cls_u32.c:674:6: note: each undeclared identifier is reported only 
once for each function it appears in
net/sched/cls_u32.c: In function ‘u32_set_parms’:
net/sched/cls_u32.c:746:15: error: ‘struct tc_u_hnode’ has no member named 
‘is_root’
if (ht_down->is_root) {
   ^~

Re: [PATCH v8 05/15] octeontx2-af: Add mailbox IRQ and msg handlers

2018-10-07 Thread David Miller

From: sunil.kovv...@gmail.com
Date: Sun,  7 Oct 2018 20:29:14 +0530

> + req_hdr = (struct mbox_hdr *)(mdev->mbase + mbox->rx_start);

I commented in the previous patch series version, for patch #4, that
such casts are unnecessary and should be removed.

You are in for a very long review process if you only consider feedback
given to you for the specific patch for which it is given, rather than
your entire series.

Please put forth the effort to fix the problems pointed out to you in
your entire series.

Thank you.

Re: [PATCH net 1/1] net: sched: cls_u32: fix hnode refcounting

2018-10-07 Thread David Miller

From: Jamal Hadi Salim 
Date: Sun,  7 Oct 2018 07:40:17 -0400

> From: Al Viro 
> 
> cls_u32.c misuses refcounts for struct tc_u_hnode - it counts references
> via ->hlist and via ->tp_root together.  u32_destroy() drops the former
> and, in case when there had been links, leaves the sucker on the list.
> As the result, there's nothing to protect it from getting freed once links
> are dropped.
> That also makes the "is it busy" check incapable of catching the root
> hnode - it *is* busy (there's a reference from tp), but we don't see it as
> something separate.  "Is it our root?" check partially covers that, but
> the problem exists for others' roots as well.
> 
> AFAICS, the minimal fix preserving the existing behaviour (where it doesn't
> include oopsen, that is) would be this:
> * count tp->root and tp_c->hlist as separate references.  I.e.
> have u32_init() set refcount to 2, not 1.
>   * in u32_destroy() we always drop the former;
> in u32_destroy_hnode() - the latter.
> 
>   That way we have *all* references contributing to refcount.  List
> removal happens in u32_destroy_hnode() (called only when ->refcnt is 1)
> an in u32_destroy() in case of tc_u_common going away, along with
> everything reachable from it.  IOW, that way we know that
> u32_destroy_key() won't free something still on the list (or pointed to by
> someone's ->root).
> 
> Reproducer:
 ...
> Signed-off-by: Al Viro 
> Signed-off-by: Jamal Hadi Salim 

Applied and queued up for -stable.

[PATCH v2 net-next 17/23] net/namespace: Update rtnl_net_dumpid for strict data checking

2018-10-07 Thread David Ahern

From: David Ahern 

Update rtnl_net_dumpid for strict data checking. If the flag is set,
the dump request is expected to have an rtgenmsg struct as the header
which has the family as the only element. No data may be appended.

Signed-off-by: David Ahern 
---
 net/core/net_namespace.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 670c84b1bfc2..fefe72774aeb 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -853,6 +853,12 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct 
netlink_callback *cb)
.s_idx = cb->args[0],
};
 
+   if (cb->strict_check &&
+   nlmsg_attrlen(cb->nlh, sizeof(struct rtgenmsg))) {
+   NL_SET_ERR_MSG(cb->extack, "Unknown data in network 
namespace id dump request");
+   return -EINVAL;
+   }
+
spin_lock_bh(>nsid_lock);
idr_for_each(>netns_ids, rtnl_net_dumpid_one, _cb);
spin_unlock_bh(>nsid_lock);
-- 
2.11.0

[PATCH v2 net-next 06/23] netlink: Add new socket option to enable strict checking on dumps

2018-10-07 Thread David Ahern

From: David Ahern 

Add a new socket option, NETLINK_DUMP_STRICT_CHK, that userspace
can use via setsockopt to request strict checking of headers and
attributes on dump requests.

To get dump features such as kernel side filtering based on data in
the header or attributes appended to the dump request, userspace
must call setsockopt() for NETLINK_DUMP_STRICT_CHK and a non-zero
value. Since the netlink sock and its flags are private to the
af_netlink code, the strict checking flag is passed to dump handlers
via a flag in the netlink_callback struct.

For old userspace on new kernel there is no impact as all of the data
checks in later patches are wrapped in a check on the new strict flag.

For new userspace on old kernel, the setsockopt will fail and even if
new userspace sets data in the headers and appended attributes the
kernel will silently ignore it. Moving forward when the setsockopt
succeeds, the new userspace on old kernel means the dump request can
pass an attribute the kernel does not understand. The dump will then
fail as the older kernel does not understand it.

New userspace on new kernel setting the socket option gets the benefit
of the improved data dump.

Kernel side the NETLINK_DUMP_STRICT_CHK uapi is converted to a generic
NETLINK_F_STRICT_CHK flag which can potentially be leveraged for tighter
checking on the NEW, DEL, and SET commands.

Signed-off-by: David Ahern 
---
 include/linux/netlink.h  |  1 +
 include/uapi/linux/netlink.h |  1 +
 net/netlink/af_netlink.c | 21 -
 net/netlink/af_netlink.h |  1 +
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 88c8a2d83eb3..72580f1a72a2 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -179,6 +179,7 @@ struct netlink_callback {
struct netlink_ext_ack  *extack;
u16 family;
u16 min_dump_alloc;
+   boolstrict_check;
unsigned intprev_seq, seq;
longargs[6];
 };
diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
index 776bc92e9118..486ed1f0c0bc 100644
--- a/include/uapi/linux/netlink.h
+++ b/include/uapi/linux/netlink.h
@@ -155,6 +155,7 @@ enum nlmsgerr_attrs {
 #define NETLINK_LIST_MEMBERSHIPS   9
 #define NETLINK_CAP_ACK10
 #define NETLINK_EXT_ACK11
+#define NETLINK_DUMP_STRICT_CHK12
 
 struct nl_pktinfo {
__u32   group;
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 7ac585f33a9e..e613a9f89600 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1706,6 +1706,13 @@ static int netlink_setsockopt(struct socket *sock, int 
level, int optname,
nlk->flags &= ~NETLINK_F_EXT_ACK;
err = 0;
break;
+   case NETLINK_DUMP_STRICT_CHK:
+   if (val)
+   nlk->flags |= NETLINK_F_STRICT_CHK;
+   else
+   nlk->flags &= ~NETLINK_F_STRICT_CHK;
+   err = 0;
+   break;
default:
err = -ENOPROTOOPT;
}
@@ -1799,6 +1806,15 @@ static int netlink_getsockopt(struct socket *sock, int 
level, int optname,
return -EFAULT;
err = 0;
break;
+   case NETLINK_DUMP_STRICT_CHK:
+   if (len < sizeof(int))
+   return -EINVAL;
+   len = sizeof(int);
+   val = nlk->flags & NETLINK_F_STRICT_CHK ? 1 : 0;
+   if (put_user(len, optlen) || put_user(val, optval))
+   return -EFAULT;
+   err = 0;
+   break;
default:
err = -ENOPROTOOPT;
}
@@ -2282,9 +2298,9 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
 const struct nlmsghdr *nlh,
 struct netlink_dump_control *control)
 {
+   struct netlink_sock *nlk, *nlk2;
struct netlink_callback *cb;
struct sock *sk;
-   struct netlink_sock *nlk;
int ret;
 
refcount_inc(>users);
@@ -2318,6 +2334,9 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
cb->min_dump_alloc = control->min_dump_alloc;
cb->skb = skb;
 
+   nlk2 = nlk_sk(NETLINK_CB(skb).sk);
+   cb->strict_check = !!(nlk2->flags & NETLINK_F_STRICT_CHK);
+
if (control->start) {
ret = control->start(cb);
if (ret)
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 962de7b3c023..5f454c8de6a4 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -15,6 +15,7 @@
 #define NETLINK_F_LISTEN_ALL_NSID  0x10
 #define NETLINK_F_CAP_ACK  0x20
 #define NETLINK_F_EXT_ACK  0x40
+#define NETLINK_F_STRICT_CHK

[PATCH v2 net-next 08/23] net/ipv6: Update inet6_dump_addr for strict data checking

2018-10-07 Thread David Ahern

From: David Ahern 

Update inet6_dump_addr for strict data checking. If the flag is set, the
dump request is expected to have an ifaddrmsg struct as the header
potentially followed by one or more attributes. Any data passed in the
header or as an attribute is taken as a request to influence the data
returned. Only values suppored by the dump handler are allowed to be
non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID
attribute is supported. Follow on patches can add support for other fields
(e.g., honor ifa_index and only return data for the given device index).

Signed-off-by: David Ahern 
---
 net/ipv6/addrconf.c | 69 +
 1 file changed, 59 insertions(+), 10 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index afa279170ba5..095d3f56f0a9 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4998,9 +4998,62 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct 
sk_buff *skb,
return err;
 }
 
+static int inet6_valid_dump_ifaddr_req(const struct nlmsghdr *nlh,
+  struct inet6_fill_args *fillargs,
+  struct net **tgt_net, struct sock *sk,
+  struct netlink_ext_ack *extack)
+{
+   struct nlattr *tb[IFA_MAX+1];
+   struct ifaddrmsg *ifm;
+   int err, i;
+
+   if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ifm))) {
+   NL_SET_ERR_MSG_MOD(extack, "Invalid header for address dump 
request");
+   return -EINVAL;
+   }
+
+   ifm = nlmsg_data(nlh);
+   if (ifm->ifa_prefixlen || ifm->ifa_flags || ifm->ifa_scope) {
+   NL_SET_ERR_MSG_MOD(extack, "Invalid values in header for 
address dump request");
+   return -EINVAL;
+   }
+   if (ifm->ifa_index) {
+   NL_SET_ERR_MSG_MOD(extack, "Filter by device index not 
supported for address dump");
+   return -EINVAL;
+   }
+
+   err = nlmsg_parse_strict(nlh, sizeof(*ifm), tb, IFA_MAX,
+ifa_ipv6_policy, extack);
+   if (err < 0)
+   return err;
+
+   for (i = 0; i <= IFA_MAX; ++i) {
+   if (!tb[i])
+   continue;
+
+   if (i == IFA_TARGET_NETNSID) {
+   struct net *net;
+
+   fillargs->netnsid = nla_get_s32(tb[i]);
+   net = rtnl_get_net_ns_capable(sk, fillargs->netnsid);
+   if (IS_ERR(net)) {
+   NL_SET_ERR_MSG_MOD(extack, "Invalid target 
network namespace id");
+   return PTR_ERR(net);
+   }
+   *tgt_net = net;
+   } else {
+   NL_SET_ERR_MSG_MOD(extack, "Unsupported attribute in 
dump request");
+   return -EINVAL;
+   }
+   }
+
+   return 0;
+}
+
 static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
   enum addr_type_t type)
 {
+   const struct nlmsghdr *nlh = cb->nlh;
struct inet6_fill_args fillargs = {
.portid = NETLINK_CB(cb->skb).portid,
.seq = cb->nlh->nlmsg_seq,
@@ -5009,7 +5062,6 @@ static int inet6_dump_addr(struct sk_buff *skb, struct 
netlink_callback *cb,
.type = type,
};
struct net *net = sock_net(skb->sk);
-   struct nlattr *tb[IFA_MAX+1];
struct net *tgt_net = net;
int h, s_h;
int idx, ip_idx;
@@ -5022,16 +5074,13 @@ static int inet6_dump_addr(struct sk_buff *skb, struct 
netlink_callback *cb,
s_idx = idx = cb->args[1];
s_ip_idx = ip_idx = cb->args[2];
 
-   if (nlmsg_parse(cb->nlh, sizeof(struct ifaddrmsg), tb, IFA_MAX,
-   ifa_ipv6_policy, cb->extack) >= 0) {
-   if (tb[IFA_TARGET_NETNSID]) {
-   fillargs.netnsid = nla_get_s32(tb[IFA_TARGET_NETNSID]);
+   if (cb->strict_check) {
+   int err;
 
-   tgt_net = rtnl_get_net_ns_capable(skb->sk,
- fillargs.netnsid);
-   if (IS_ERR(tgt_net))
-   return PTR_ERR(tgt_net);
-   }
+   err = inet6_valid_dump_ifaddr_req(nlh, , _net,
+ skb->sk, cb->extack);
+   if (err < 0)
+   return err;
}
 
rcu_read_lock();
-- 
2.11.0

[PATCH v2 net-next 13/23] rtnetlink: Update ipmr_rtm_dumplink for strict data checking

2018-10-07 Thread David Ahern

From: David Ahern 

Update ipmr_rtm_dumplink for strict data checking. If the flag is set,
the dump request is expected to have an ifinfomsg struct as the header.
All elements of the struct are expected to be 0 and no attributes can
be appended.

Signed-off-by: David Ahern 
---
 net/ipv4/ipmr.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 5660adcf7a04..e7322e407bb4 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2710,6 +2710,31 @@ static bool ipmr_fill_vif(struct mr_table *mrt, u32 
vifid, struct sk_buff *skb)
return true;
 }
 
+static int ipmr_valid_dumplink(const struct nlmsghdr *nlh,
+  struct netlink_ext_ack *extack)
+{
+   struct ifinfomsg *ifm;
+
+   if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ifm))) {
+   NL_SET_ERR_MSG(extack, "ipv4: Invalid header for ipmr link 
dump");
+   return -EINVAL;
+   }
+
+   if (nlmsg_attrlen(nlh, sizeof(*ifm))) {
+   NL_SET_ERR_MSG(extack, "Invalid data after header in ipmr link 
dump");
+   return -EINVAL;
+   }
+
+   ifm = nlmsg_data(nlh);
+   if (ifm->__ifi_pad || ifm->ifi_type || ifm->ifi_flags ||
+   ifm->ifi_change || ifm->ifi_index) {
+   NL_SET_ERR_MSG(extack, "Invalid values in header for ipmr link 
dump request");
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 static int ipmr_rtm_dumplink(struct sk_buff *skb, struct netlink_callback *cb)
 {
struct net *net = sock_net(skb->sk);
@@ -2718,6 +2743,13 @@ static int ipmr_rtm_dumplink(struct sk_buff *skb, struct 
netlink_callback *cb)
unsigned int e = 0, s_e;
struct mr_table *mrt;
 
+   if (cb->strict_check) {
+   int err = ipmr_valid_dumplink(cb->nlh, cb->extack);
+
+   if (err < 0)
+   return err;
+   }
+
s_t = cb->args[0];
s_e = cb->args[1];
 
-- 
2.11.0

[PATCH v2 net-next 01/23] netlink: Pass extack to dump handlers

2018-10-07 Thread David Ahern

From: David Ahern 

Declare extack in netlink_dump and pass to dump handlers via
netlink_callback. Add any extack message after the dump_done_errno
allowing error messages to be returned. This will be useful when
strict checking is done on dump requests, returning why the dump
fails EINVAL.

Signed-off-by: David Ahern 
Acked-by: Christian Brauner 
---
 include/linux/netlink.h  |  1 +
 net/netlink/af_netlink.c | 12 +++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 71f121b66ca8..88c8a2d83eb3 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -176,6 +176,7 @@ struct netlink_callback {
void*data;
/* the module that dump function belong to */
struct module   *module;
+   struct netlink_ext_ack  *extack;
u16 family;
u16 min_dump_alloc;
unsigned intprev_seq, seq;
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index e3a0538ec0be..7ac585f33a9e 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2171,6 +2171,7 @@ EXPORT_SYMBOL(__nlmsg_put);
 static int netlink_dump(struct sock *sk)
 {
struct netlink_sock *nlk = nlk_sk(sk);
+   struct netlink_ext_ack extack = {};
struct netlink_callback *cb;
struct sk_buff *skb = NULL;
struct nlmsghdr *nlh;
@@ -,8 +2223,11 @@ static int netlink_dump(struct sock *sk)
skb_reserve(skb, skb_tailroom(skb) - alloc_size);
netlink_skb_set_owner_r(skb, sk);
 
-   if (nlk->dump_done_errno > 0)
+   if (nlk->dump_done_errno > 0) {
+   cb->extack = 
nlk->dump_done_errno = cb->dump(skb, cb);
+   cb->extack = NULL;
+   }
 
if (nlk->dump_done_errno > 0 ||
skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) 
{
@@ -2246,6 +2250,12 @@ static int netlink_dump(struct sock *sk)
memcpy(nlmsg_data(nlh), >dump_done_errno,
   sizeof(nlk->dump_done_errno));
 
+   if (extack._msg && nlk->flags & NETLINK_F_EXT_ACK) {
+   nlh->nlmsg_flags |= NLM_F_ACK_TLVS;
+   if (!nla_put_string(skb, NLMSGERR_ATTR_MSG, extack._msg))
+   nlmsg_end(skb, nlh);
+   }
+
if (sk_filter(sk, skb))
kfree_skb(skb);
else
-- 
2.11.0

[PATCH v2 net-next 04/23] netlink: Add strict version of nlmsg_parse and nla_parse

2018-10-07 Thread David Ahern

From: David Ahern 

nla_parse is currently lenient on message parsing, allowing type to be 0
or greater than max expected and only logging a message

"netlink: %d bytes leftover after parsing attributes in process `%s'."

if the netlink message has unknown data at the end after parsing. What this
could mean is that the header at the front of the attributes is actually
wrong and the parsing is shifted from what is expected.

Add a new strict version that actually fails with EINVAL if there are any
bytes remaining after the parsing loop completes, if the atttrbitue type
is 0 or greater than max expected.

Signed-off-by: David Ahern 
---
 include/net/netlink.h | 17 +
 lib/nlattr.c  | 48 
 2 files changed, 53 insertions(+), 12 deletions(-)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index 9522a0bf1f3a..f1db8e594847 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -373,6 +373,9 @@ int nla_validate(const struct nlattr *head, int len, int 
maxtype,
 int nla_parse(struct nlattr **tb, int maxtype, const struct nlattr *head,
  int len, const struct nla_policy *policy,
  struct netlink_ext_ack *extack);
+int nla_parse_strict(struct nlattr **tb, int maxtype, const struct nlattr 
*head,
+int len, const struct nla_policy *policy,
+struct netlink_ext_ack *extack);
 int nla_policy_len(const struct nla_policy *, int);
 struct nlattr *nla_find(const struct nlattr *head, int len, int attrtype);
 size_t nla_strlcpy(char *dst, const struct nlattr *nla, size_t dstsize);
@@ -525,6 +528,20 @@ static inline int nlmsg_parse(const struct nlmsghdr *nlh, 
int hdrlen,
 nlmsg_attrlen(nlh, hdrlen), policy, extack);
 }
 
+static inline int nlmsg_parse_strict(const struct nlmsghdr *nlh, int hdrlen,
+struct nlattr *tb[], int maxtype,
+const struct nla_policy *policy,
+struct netlink_ext_ack *extack)
+{
+   if (nlh->nlmsg_len < nlmsg_msg_size(hdrlen)) {
+   NL_SET_ERR_MSG(extack, "Invalid header length");
+   return -EINVAL;
+   }
+
+   return nla_parse_strict(tb, maxtype, nlmsg_attrdata(nlh, hdrlen),
+   nlmsg_attrlen(nlh, hdrlen), policy, extack);
+}
+
 /**
  * nlmsg_find_attr - find a specific attribute in a netlink message
  * @nlh: netlink message header
diff --git a/lib/nlattr.c b/lib/nlattr.c
index 1e900bb414ef..d26de6156b97 100644
--- a/lib/nlattr.c
+++ b/lib/nlattr.c
@@ -391,9 +391,10 @@ EXPORT_SYMBOL(nla_policy_len);
  *
  * Returns 0 on success or a negative error code.
  */
-int nla_parse(struct nlattr **tb, int maxtype, const struct nlattr *head,
- int len, const struct nla_policy *policy,
- struct netlink_ext_ack *extack)
+static int __nla_parse(struct nlattr **tb, int maxtype,
+  const struct nlattr *head, int len,
+  bool strict, const struct nla_policy *policy,
+  struct netlink_ext_ack *extack)
 {
const struct nlattr *nla;
int rem;
@@ -403,27 +404,50 @@ int nla_parse(struct nlattr **tb, int maxtype, const 
struct nlattr *head,
nla_for_each_attr(nla, head, len, rem) {
u16 type = nla_type(nla);
 
-   if (type > 0 && type <= maxtype) {
-   if (policy) {
-   int err = validate_nla(nla, maxtype, policy,
-  extack);
-
-   if (err < 0)
-   return err;
+   if (type == 0 || type > maxtype) {
+   if (strict) {
+   NL_SET_ERR_MSG(extack, "Unknown attribute 
type");
+   return -EINVAL;
}
+   continue;
+   }
+   if (policy) {
+   int err = validate_nla(nla, maxtype, policy, extack);
 
-   tb[type] = (struct nlattr *)nla;
+   if (err < 0)
+   return err;
}
+
+   tb[type] = (struct nlattr *)nla;
}
 
-   if (unlikely(rem > 0))
+   if (unlikely(rem > 0)) {
pr_warn_ratelimited("netlink: %d bytes leftover after parsing 
attributes in process `%s'.\n",
rem, current->comm);
+   NL_SET_ERR_MSG(extack, "bytes leftover after parsing 
attributes");
+   if (strict)
+   return -EINVAL;
+   }
 
return 0;
 }
+
+int nla_parse(struct nlattr **tb, int maxtype, const struct nlattr *head,
+ int len, const struct nla_policy *policy,
+ struct netlink_ext_ack *extack)
+{
+   return

[PATCH v2 net-next 02/23] netlink: Add extack message to nlmsg_parse for invalid header length

2018-10-07 Thread David Ahern

From: David Ahern 

Give a user a reason why EINVAL is returned in nlmsg_parse.

Signed-off-by: David Ahern 
Acked-by: Christian Brauner 
---
 include/net/netlink.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index 589683091f16..9522a0bf1f3a 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -516,8 +516,10 @@ static inline int nlmsg_parse(const struct nlmsghdr *nlh, 
int hdrlen,
  const struct nla_policy *policy,
  struct netlink_ext_ack *extack)
 {
-   if (nlh->nlmsg_len < nlmsg_msg_size(hdrlen))
+   if (nlh->nlmsg_len < nlmsg_msg_size(hdrlen)) {
+   NL_SET_ERR_MSG(extack, "Invalid header length");
return -EINVAL;
+   }
 
return nla_parse(tb, maxtype, nlmsg_attrdata(nlh, hdrlen),
 nlmsg_attrlen(nlh, hdrlen), policy, extack);
-- 
2.11.0

[PATCH v2 net-next 07/23] net/ipv4: Update inet_dump_ifaddr for strict data checking

2018-10-07 Thread David Ahern

From: David Ahern 

Update inet_dump_ifaddr for strict data checking. If the flag is set,
the dump request is expected to have an ifaddrmsg struct as the header
potentially followed by one or more attributes. Any data passed in the
header or as an attribute is taken as a request to influence the data
returned. Only values supported by the dump handler are allowed to be
non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID
attribute is supported. Follow on patches can support for other fields
(e.g., honor ifa_index and only return data for the given device index).

Signed-off-by: David Ahern 
---
 net/ipv4/devinet.c | 72 +-
 1 file changed, 61 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index ab2b11df5ea4..6f2bbd04e950 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1660,17 +1660,70 @@ static int inet_fill_ifaddr(struct sk_buff *skb, struct 
in_ifaddr *ifa,
return -EMSGSIZE;
 }
 
+static int inet_valid_dump_ifaddr_req(const struct nlmsghdr *nlh,
+ struct inet_fill_args *fillargs,
+ struct net **tgt_net, struct sock *sk,
+ struct netlink_ext_ack *extack)
+{
+   struct nlattr *tb[IFA_MAX+1];
+   struct ifaddrmsg *ifm;
+   int err, i;
+
+   if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ifm))) {
+   NL_SET_ERR_MSG(extack, "ipv4: Invalid header for address dump 
request");
+   return -EINVAL;
+   }
+
+   ifm = nlmsg_data(nlh);
+   if (ifm->ifa_prefixlen || ifm->ifa_flags || ifm->ifa_scope) {
+   NL_SET_ERR_MSG(extack, "ipv4: Invalid values in header for 
address dump request");
+   return -EINVAL;
+   }
+   if (ifm->ifa_index) {
+   NL_SET_ERR_MSG(extack, "ipv4: Filter by device index not 
supported for address dump");
+   return -EINVAL;
+   }
+
+   err = nlmsg_parse_strict(nlh, sizeof(*ifm), tb, IFA_MAX,
+ifa_ipv4_policy, extack);
+   if (err < 0)
+   return err;
+
+   for (i = 0; i <= IFA_MAX; ++i) {
+   if (!tb[i])
+   continue;
+
+   if (i == IFA_TARGET_NETNSID) {
+   struct net *net;
+
+   fillargs->netnsid = nla_get_s32(tb[i]);
+
+   net = rtnl_get_net_ns_capable(sk, fillargs->netnsid);
+   if (IS_ERR(net)) {
+   NL_SET_ERR_MSG(extack, "ipv4: Invalid target 
network namespace id");
+   return PTR_ERR(net);
+   }
+   *tgt_net = net;
+   } else {
+   NL_SET_ERR_MSG(extack, "ipv4: Unsupported attribute in 
dump request");
+   return -EINVAL;
+   }
+   }
+
+   return 0;
+}
+
 static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
 {
+   const struct nlmsghdr *nlh = cb->nlh;
struct inet_fill_args fillargs = {
.portid = NETLINK_CB(cb->skb).portid,
-   .seq = cb->nlh->nlmsg_seq,
+   .seq = nlh->nlmsg_seq,
.event = RTM_NEWADDR,
.flags = NLM_F_MULTI,
.netnsid = -1,
};
struct net *net = sock_net(skb->sk);
-   struct nlattr *tb[IFA_MAX+1];
struct net *tgt_net = net;
int h, s_h;
int idx, s_idx;
@@ -1684,16 +1737,13 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct 
netlink_callback *cb)
s_idx = idx = cb->args[1];
s_ip_idx = ip_idx = cb->args[2];
 
-   if (nlmsg_parse(cb->nlh, sizeof(struct ifaddrmsg), tb, IFA_MAX,
-   ifa_ipv4_policy, cb->extack) >= 0) {
-   if (tb[IFA_TARGET_NETNSID]) {
-   fillargs.netnsid = nla_get_s32(tb[IFA_TARGET_NETNSID]);
+   if (cb->strict_check) {
+   int err;
 
-   tgt_net = rtnl_get_net_ns_capable(skb->sk,
- fillargs.netnsid);
-   if (IS_ERR(tgt_net))
-   return PTR_ERR(tgt_net);
-   }
+   err = inet_valid_dump_ifaddr_req(nlh, , _net,
+skb->sk, cb->extack);
+   if (err < 0)
+   return err;
}
 
for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
-- 
2.11.0

[PATCH v2 net-next 21/23] net/bridge: Update br_mdb_dump for strict data checking

2018-10-07 Thread David Ahern

From: David Ahern 

Update br_mdb_dump for strict data checking. If the flag is set,
the dump request is expected to have a br_port_msg struct as the
header. All elements of the struct are expected to be 0 and no
attributes can be appended.

Signed-off-by: David Ahern 
---
 net/bridge/br_mdb.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index a4a848bf827b..a7ea2d431714 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -162,6 +162,29 @@ static int br_mdb_fill_info(struct sk_buff *skb, struct 
netlink_callback *cb,
return err;
 }
 
+static int br_mdb_valid_dump_req(const struct nlmsghdr *nlh,
+struct netlink_ext_ack *extack)
+{
+   struct br_port_msg *bpm;
+
+   if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*bpm))) {
+   NL_SET_ERR_MSG_MOD(extack, "Invalid header for mdb dump 
request");
+   return -EINVAL;
+   }
+
+   bpm = nlmsg_data(nlh);
+   if (bpm->ifindex) {
+   NL_SET_ERR_MSG_MOD(extack, "Filtering by device index is not 
supported for mdb dump request");
+   return -EINVAL;
+   }
+   if (nlmsg_attrlen(nlh, sizeof(*bpm))) {
+   NL_SET_ERR_MSG(extack, "Invalid data after header in mdb dump 
request");
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 static int br_mdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
 {
struct net_device *dev;
@@ -169,6 +192,13 @@ static int br_mdb_dump(struct sk_buff *skb, struct 
netlink_callback *cb)
struct nlmsghdr *nlh = NULL;
int idx = 0, s_idx;
 
+   if (cb->strict_check) {
+   int err = br_mdb_valid_dump_req(cb->nlh, cb->extack);
+
+   if (err < 0)
+   return err;
+   }
+
s_idx = cb->args[0];
 
rcu_read_lock();
-- 
2.11.0

[PATCH v2 net-next 15/23] net/neighbor: Update neigh_dump_info for strict data checking

2018-10-07 Thread David Ahern

From: David Ahern 

Update neigh_dump_info for strict data checking. If the flag is set,
the dump request is expected to have an ndmsg struct as the header
potentially followed by one or more attributes. Any data passed in the
header or as an attribute is taken as a request to influence the data
returned. Only values supported by the dump handler are allowed to be
non-0 or set in the request. At the moment only the NDA_IFINDEX and
NDA_MASTER attributes are supported.

Existing code does not fail the dump if nlmsg_parse fails. That behavior
is kept for non-strict checking.

Signed-off-by: David Ahern 
---
 net/core/neighbour.c | 82 ++--
 1 file changed, 67 insertions(+), 15 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index b06f794bf91e..7c8a3a0ee059 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2426,11 +2426,73 @@ static int pneigh_dump_table(struct neigh_table *tbl, 
struct sk_buff *skb,
 
 }
 
+static int neigh_valid_dump_req(const struct nlmsghdr *nlh,
+   bool strict_check,
+   struct neigh_dump_filter *filter,
+   struct netlink_ext_ack *extack)
+{
+   struct nlattr *tb[NDA_MAX + 1];
+   int err, i;
+
+   if (strict_check) {
+   struct ndmsg *ndm;
+
+   if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ndm))) {
+   NL_SET_ERR_MSG(extack, "Invalid header for neighbor 
dump request");
+   return -EINVAL;
+   }
+
+   ndm = nlmsg_data(nlh);
+   if (ndm->ndm_pad1  || ndm->ndm_pad2  || ndm->ndm_ifindex ||
+   ndm->ndm_state || ndm->ndm_flags || ndm->ndm_type) {
+   NL_SET_ERR_MSG(extack, "Invalid values in header for 
neighbor dump request");
+   return -EINVAL;
+   }
+
+   err = nlmsg_parse_strict(nlh, sizeof(struct ndmsg), tb, NDA_MAX,
+NULL, extack);
+   } else {
+   err = nlmsg_parse(nlh, sizeof(struct ndmsg), tb, NDA_MAX,
+ NULL, extack);
+   }
+   if (err < 0)
+   return err;
+
+   for (i = 0; i <= NDA_MAX; ++i) {
+   if (!tb[i])
+   continue;
+
+   /* all new attributes should require strict_check */
+   switch (i) {
+   case NDA_IFINDEX:
+   if (nla_len(tb[i]) != sizeof(u32)) {
+   NL_SET_ERR_MSG(extack, "Invalid IFINDEX 
attribute in neighbor dump request");
+   return -EINVAL;
+   }
+   filter->dev_idx = nla_get_u32(tb[i]);
+   break;
+   case NDA_MASTER:
+   if (nla_len(tb[i]) != sizeof(u32)) {
+   NL_SET_ERR_MSG(extack, "Invalid MASTER 
attribute in neighbor dump request");
+   return -EINVAL;
+   }
+   filter->master_idx = nla_get_u32(tb[i]);
+   break;
+   default:
+   if (strict_check) {
+   NL_SET_ERR_MSG(extack, "Unsupported attribute 
in neighbor dump request");
+   return -EINVAL;
+   }
+   }
+   }
+
+   return 0;
+}
+
 static int neigh_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
 {
const struct nlmsghdr *nlh = cb->nlh;
struct neigh_dump_filter filter = {};
-   struct nlattr *tb[NDA_MAX + 1];
struct neigh_table *tbl;
int t, family, s_t;
int proxy = 0;
@@ -2445,20 +2507,10 @@ static int neigh_dump_info(struct sk_buff *skb, struct 
netlink_callback *cb)
((struct ndmsg *)nlmsg_data(nlh))->ndm_flags == NTF_PROXY)
proxy = 1;
 
-   err = nlmsg_parse(nlh, sizeof(struct ndmsg), tb, NDA_MAX, NULL,
- cb->extack);
-   if (!err) {
-   if (tb[NDA_IFINDEX]) {
-   if (nla_len(tb[NDA_IFINDEX]) != sizeof(u32))
-   return -EINVAL;
-   filter.dev_idx = nla_get_u32(tb[NDA_IFINDEX]);
-   }
-   if (tb[NDA_MASTER]) {
-   if (nla_len(tb[NDA_MASTER]) != sizeof(u32))
-   return -EINVAL;
-   filter.master_idx = nla_get_u32(tb[NDA_MASTER]);
-   }
-   }
+   err = neigh_valid_dump_req(nlh, cb->strict_check, , cb->extack);
+   if (err < 0 && cb->strict_check)
+   return err;
+
s_t = cb->args[0];
 
for (t = 0; t < NEIGH_NR_TABLES; t++) {
-- 
2.11.0

[PATCH v2 net-next 00/23] rtnetlink: Add support for rigid checking of data in dump request

2018-10-07 Thread David Ahern

From: David Ahern 

There are many use cases where a user wants to influence what is
returned in a dump for some rtnetlink command: one is wanting data
for a different namespace than the one the request is received and
another is limiting the amount of data returned in the dump to a
specific set of interest to userspace, reducing the cpu overhead of
both kernel and userspace. Unfortunately, the kernel has historically
not been strict with checking for the proper header or checking the
values passed in the header. This lenient implementation has allowed
iproute2 and other packages to pass any struct or data in the dump
request as long as the family is the first byte. For example, ifinfomsg
struct is used by iproute2 for all generic dump requests - links,
addresses, routes and rules when it is really only valid for link
requests.

There is 1 is example where the kernel deals with the wrong struct: link
dumps after VF support was added. Older iproute2 was sending rtgenmsg as
the header instead of ifinfomsg so a patch was added to try and detect
old userspace vs new:
e5eca6d41f53 ("rtnetlink: fix userspace API breakage for iproute2 < v3.9.0")

The latest example is Christian's patch set wanting to return addresses for
a target namespace. It guesses the header struct is an ifaddrmsg and if it
guesses wrong a netlink warning is generated in the kernel log on every
address dump which is unacceptable.

Another example where the kernel is a bit lenient is route dumps: iproute2
can send either a request with either ifinfomsg or a rtmsg as the header
struct, yet the kernel always treats the header as an rtmsg (see
inet_dump_fib and rtm_flags check). The header inconsistency impacts the
ability to add kernel side filters for route dumps - a necessary feature
for scale setups with 100k+ routes.

How to resolve the problem of not breaking old userspace yet be able to
move forward with new features such as kernel side filtering which are
crucial for efficient operation at high scale?

This patch set addresses the problem by adding a new socket flag,
NETLINK_DUMP_STRICT_CHK, that userspace can use with setsockopt to
request strict checking of headers and attributes on dump requests and
hence unlock the ability to use kernel side filters as they are added.

Kernel side, the dump handlers are updated to verify the message contains
at least the expected header struct:
RTM_GETLINK:   ifinfomsg
RTM_GETADDR:   ifaddrmsg
RTM_GETMULTICAST:  ifaddrmsg
RTM_GETANYCAST:ifaddrmsg
RTM_GETADDRLABEL:  ifaddrlblmsg
RTM_GETROUTE:  rtmsg
RTM_GETSTATS:  if_stats_msg
RTM_GETNEIGH:  ndmsg
RTM_GETNEIGHTBL:   ndtmsg
RTM_GETNSID:   rtgenmsg
RTM_GETRULE:   fib_rule_hdr
RTM_GETNETCONF:netconfmsg
RTM_GETMDB:br_port_msg

And then every field in the header struct should be 0 with the exception
of the family. There are a few exceptions to this rule where the kernel
already influences the data returned by values in the struct. Next the
message should not contain attributes unless the kernel implements
filtering for it. Any unexpected data causes the dump to fail with EINVAL.
If the new flag is honored by the kernel and the dump contents adjusted
by any data passed in the request, the dump handler can set the
NLM_F_DUMP_FILTERED flag in the netlink message header.

For old userspace on new kernel there is no impact as all checks are
wrapped in a check on the new strict flag. For new userspace on old
kernel, the data in the headers and any appended attributes are
silently ignored though the setsockopt failing is the clue to userspace
the feature is not supported. New userspace on new kernel gets the
requested data dump.

iproute2 patches can be found here:
https://github.com/dsahern/iproute2 dump-enhancements

Major changes since v1
- inner header is supposed to be 4-bytes aligned. So for dumps that
  should not have attributes appended changed the check to use:
if (nlmsg_attrlen(nlh, sizeof(hdr)))
  Only impacts patches with headers that are not multiples of 4-bytes
  (rtgenmsg, netconfmsg), but applied the change to all patches not
  calling nlmsg_parse for consistency.

- Added nlmsg_parse_strict and nla_parse_strict for tighter control on
  attribute parsing. There should be no unknown attribute types or extra
  bytes.

- Moved validation to a helper in most cases

Changes since rfc-v2
- dropped the NLM_F_DUMP_FILTERED flag from target nsid dumps per
  Jiri's objections
- changed the opt-in uapi from a netlink message flag to a socket
  flag. setsockopt provides an api for userspace to definitively
  know if the kernel supports strict checking on dumps.
- re-ordered patches to peel off the extack on dumps if needed to
  keep this set size within limits
- misc cleanups in patches based on testing

David Ahern (23):
  netlink: Pass extack to dump handlers
  netlink: Add extack message to nlmsg_parse for invalid header length
  net: Add extack to

[PATCH v2 net-next 20/23] net: Update netconf dump handlers for strict data checking

2018-10-07 Thread David Ahern

From: David Ahern 

Update inet_netconf_dump_devconf, inet6_netconf_dump_devconf, and
mpls_netconf_dump_devconf for strict data checking. If the flag is set,
the dump request is expected to have an netconfmsg struct as the header.
The struct only has the family member and no attributes can be appended.

Signed-off-by: David Ahern 
---
 net/ipv4/devinet.c  | 22 +++---
 net/ipv6/addrconf.c | 22 +++---
 net/mpls/af_mpls.c  | 18 +-
 3 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 6f2bbd04e950..d122ebbe5980 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2086,6 +2086,7 @@ static int inet_netconf_get_devconf(struct sk_buff 
*in_skb,
 static int inet_netconf_dump_devconf(struct sk_buff *skb,
 struct netlink_callback *cb)
 {
+   const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
int h, s_h;
int idx, s_idx;
@@ -2093,6 +2094,21 @@ static int inet_netconf_dump_devconf(struct sk_buff *skb,
struct in_device *in_dev;
struct hlist_head *head;
 
+   if (cb->strict_check) {
+   struct netlink_ext_ack *extack = cb->extack;
+   struct netconfmsg *ncm;
+
+   if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ncm))) {
+   NL_SET_ERR_MSG(extack, "ipv4: Invalid header for 
netconf dump request");
+   return -EINVAL;
+   }
+
+   if (nlmsg_attrlen(nlh, sizeof(*ncm))) {
+   NL_SET_ERR_MSG(extack, "ipv4: Invalid data after header 
in netconf dump request");
+   return -EINVAL;
+   }
+   }
+
s_h = cb->args[0];
s_idx = idx = cb->args[1];
 
@@ -2112,7 +2128,7 @@ static int inet_netconf_dump_devconf(struct sk_buff *skb,
if (inet_netconf_fill_devconf(skb, dev->ifindex,
  _dev->cnf,
  
NETLINK_CB(cb->skb).portid,
- cb->nlh->nlmsg_seq,
+ nlh->nlmsg_seq,
  RTM_NEWNETCONF,
  NLM_F_MULTI,
  NETCONFA_ALL) < 0) {
@@ -2129,7 +2145,7 @@ static int inet_netconf_dump_devconf(struct sk_buff *skb,
if (inet_netconf_fill_devconf(skb, NETCONFA_IFINDEX_ALL,
  net->ipv4.devconf_all,
  NETLINK_CB(cb->skb).portid,
- cb->nlh->nlmsg_seq,
+ nlh->nlmsg_seq,
  RTM_NEWNETCONF, NLM_F_MULTI,
  NETCONFA_ALL) < 0)
goto done;
@@ -2140,7 +2156,7 @@ static int inet_netconf_dump_devconf(struct sk_buff *skb,
if (inet_netconf_fill_devconf(skb, NETCONFA_IFINDEX_DEFAULT,
  net->ipv4.devconf_dflt,
  NETLINK_CB(cb->skb).portid,
- cb->nlh->nlmsg_seq,
+ nlh->nlmsg_seq,
  RTM_NEWNETCONF, NLM_F_MULTI,
  NETCONFA_ALL) < 0)
goto done;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index ce071d85ad00..2496b12bf721 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -666,6 +666,7 @@ static int inet6_netconf_get_devconf(struct sk_buff *in_skb,
 static int inet6_netconf_dump_devconf(struct sk_buff *skb,
  struct netlink_callback *cb)
 {
+   const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
int h, s_h;
int idx, s_idx;
@@ -673,6 +674,21 @@ static int inet6_netconf_dump_devconf(struct sk_buff *skb,
struct inet6_dev *idev;
struct hlist_head *head;
 
+   if (cb->strict_check) {
+   struct netlink_ext_ack *extack = cb->extack;
+   struct netconfmsg *ncm;
+
+   if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ncm))) {
+   NL_SET_ERR_MSG_MOD(extack, "Invalid header for netconf 
dump request");
+   return -EINVAL;
+   }
+
+   if (nlmsg_attrlen(nlh, sizeof(*ncm))) {
+   NL_SET_ERR_MSG_MOD(extack, "Invalid data after header 
in netconf dump request");
+   return -EINVAL;
+   }
+   }
+
s_h = cb->args[0];
s_idx = idx = cb->args[1];
 
@@ -692,7

[PATCH v2 net-next 03/23] net: Add extack to nlmsg_parse

2018-10-07 Thread David Ahern

From: David Ahern 

Make sure extack is passed to nlmsg_parse where easy to do so.
Most of these are dump handlers and leveraging the extack in
the netlink_callback.

Signed-off-by: David Ahern 
Acked-by: Christian Brauner 
---
 net/core/devlink.c | 2 +-
 net/core/neighbour.c   | 3 ++-
 net/core/rtnetlink.c   | 4 ++--
 net/ipv4/devinet.c | 9 +
 net/ipv6/addrconf.c| 2 +-
 net/ipv6/route.c   | 2 +-
 net/mpls/af_mpls.c | 2 +-
 net/netfilter/ipvs/ip_vs_ctl.c | 2 +-
 net/sched/act_api.c| 2 +-
 net/sched/cls_api.c| 6 --
 net/sched/sch_api.c| 2 +-
 net/xfrm/xfrm_user.c   | 2 +-
 12 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 938f68ee92f0..6dae81d65d5c 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3504,7 +3504,7 @@ static int devlink_nl_cmd_region_read_dumpit(struct 
sk_buff *skb,
start_offset = *((u64 *)>args[0]);
 
err = nlmsg_parse(cb->nlh, GENL_HDRLEN + devlink_nl_family.hdrsize,
- attrs, DEVLINK_ATTR_MAX, ops->policy, NULL);
+ attrs, DEVLINK_ATTR_MAX, ops->policy, cb->extack);
if (err)
goto out;
 
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index fb023df48b83..b06f794bf91e 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2445,7 +2445,8 @@ static int neigh_dump_info(struct sk_buff *skb, struct 
netlink_callback *cb)
((struct ndmsg *)nlmsg_data(nlh))->ndm_flags == NTF_PROXY)
proxy = 1;
 
-   err = nlmsg_parse(nlh, sizeof(struct ndmsg), tb, NDA_MAX, NULL, NULL);
+   err = nlmsg_parse(nlh, sizeof(struct ndmsg), tb, NDA_MAX, NULL,
+ cb->extack);
if (!err) {
if (tb[NDA_IFINDEX]) {
if (nla_len(tb[NDA_IFINDEX]) != sizeof(u32))
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 5564eee1e980..4486e8b7d9d0 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1909,7 +1909,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct 
netlink_callback *cb)
 sizeof(struct rtgenmsg) : sizeof(struct ifinfomsg);
 
if (nlmsg_parse(cb->nlh, hdrlen, tb, IFLA_MAX,
-   ifla_policy, NULL) >= 0) {
+   ifla_policy, cb->extack) >= 0) {
if (tb[IFLA_TARGET_NETNSID]) {
netnsid = nla_get_s32(tb[IFLA_TARGET_NETNSID]);
tgt_net = rtnl_get_net_ns_capable(skb->sk, netnsid);
@@ -3774,7 +3774,7 @@ static int rtnl_fdb_dump(struct sk_buff *skb, struct 
netlink_callback *cb)
(nlmsg_len(cb->nlh) != sizeof(struct ndmsg) +
 nla_attr_size(sizeof(u32 {
err = nlmsg_parse(cb->nlh, sizeof(struct ifinfomsg), tb,
- IFLA_MAX, ifla_policy, NULL);
+ IFLA_MAX, ifla_policy, cb->extack);
if (err < 0) {
return -EINVAL;
} else if (err == 0) {
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 44d931a3cd50..ab2b11df5ea4 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -782,7 +782,8 @@ static void set_ifa_lifetime(struct in_ifaddr *ifa, __u32 
valid_lft,
 }
 
 static struct in_ifaddr *rtm_to_ifaddr(struct net *net, struct nlmsghdr *nlh,
-  __u32 *pvalid_lft, __u32 *pprefered_lft)
+  __u32 *pvalid_lft, __u32 *pprefered_lft,
+  struct netlink_ext_ack *extack)
 {
struct nlattr *tb[IFA_MAX+1];
struct in_ifaddr *ifa;
@@ -792,7 +793,7 @@ static struct in_ifaddr *rtm_to_ifaddr(struct net *net, 
struct nlmsghdr *nlh,
int err;
 
err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_ipv4_policy,
- NULL);
+ extack);
if (err < 0)
goto errout;
 
@@ -897,7 +898,7 @@ static int inet_rtm_newaddr(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 
ASSERT_RTNL();
 
-   ifa = rtm_to_ifaddr(net, nlh, _lft, _lft);
+   ifa = rtm_to_ifaddr(net, nlh, _lft, _lft, extack);
if (IS_ERR(ifa))
return PTR_ERR(ifa);
 
@@ -1684,7 +1685,7 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct 
netlink_callback *cb)
s_ip_idx = ip_idx = cb->args[2];
 
if (nlmsg_parse(cb->nlh, sizeof(struct ifaddrmsg), tb, IFA_MAX,
-   ifa_ipv4_policy, NULL) >= 0) {
+   ifa_ipv4_policy, cb->extack) >= 0) {
if (tb[IFA_TARGET_NETNSID]) {
fillargs.netnsid = nla_get_s32(tb[IFA_TARGET_NETNSID]);
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index a9a317322388..2f8aa4fd5e55 100644
--- a/net/ipv6/addrconf.c
+++

[PATCH v2 net-next 11/23] rtnetlink: Update rtnl_stats_dump for strict data checking

2018-10-07 Thread David Ahern

From: David Ahern 

Update rtnl_stats_dump for strict data checking. If the flag is set,
the dump request is expected to have an if_stats_msg struct as the header.
All elements of the struct are expected to be 0 except filter_mask which
must be non-0 (legacy behavior). No attributes are supported.

Signed-off-by: David Ahern 
---
 net/core/rtnetlink.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e38e1f178611..f6d2609cfa9f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -4680,6 +4680,7 @@ static int rtnl_stats_get(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 
 static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb)
 {
+   struct netlink_ext_ack *extack = cb->extack;
int h, s_h, err, s_idx, s_idxattr, s_prividx;
struct net *net = sock_net(skb->sk);
unsigned int flags = NLM_F_MULTI;
@@ -4696,13 +4697,32 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct 
netlink_callback *cb)
 
cb->seq = net->dev_base_seq;
 
-   if (nlmsg_len(cb->nlh) < sizeof(*ifsm))
+   if (nlmsg_len(cb->nlh) < sizeof(*ifsm)) {
+   NL_SET_ERR_MSG(extack, "Invalid header for stats dump");
return -EINVAL;
+   }
 
ifsm = nlmsg_data(cb->nlh);
+
+   /* only requests using NLM_F_DUMP_PROPER_HDR can pass data to
+* influence the dump. The legacy exception is filter_mask.
+*/
+   if (cb->strict_check) {
+   if (ifsm->pad1 || ifsm->pad2 || ifsm->ifindex) {
+   NL_SET_ERR_MSG(extack, "Invalid values in header for 
stats dump request");
+   return -EINVAL;
+   }
+   if (nlmsg_attrlen(cb->nlh, sizeof(*ifsm))) {
+   NL_SET_ERR_MSG(extack, "Invalid attributes after stats 
header");
+   return -EINVAL;
+   }
+   }
+
filter_mask = ifsm->filter_mask;
-   if (!filter_mask)
+   if (!filter_mask) {
+   NL_SET_ERR_MSG(extack, "Filter mask must be set for stats 
dump");
return -EINVAL;
+   }
 
for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
idx = 0;
-- 
2.11.0

[PATCH v2 net-next 09/23] rtnetlink: Update rtnl_dump_ifinfo for strict data checking

2018-10-07 Thread David Ahern

From: David Ahern 

Update rtnl_dump_ifinfo for strict data checking. If the flag is set,
the dump request is expected to have an ifinfomsg struct as the header
potentially followed by one or more attributes. Any data passed in the
header or as an attribute is taken as a request to influence the data
returned. Only values supported by the dump handler are allowed to be
non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID,
IFLA_EXT_MASK, IFLA_MASTER, and IFLA_LINKINFO attributes are supported.

Existing code does not fail the dump if nlmsg_parse fails. That behavior
is kept for non-strict checking.

Signed-off-by: David Ahern 
---
 net/core/rtnetlink.c | 113 +--
 1 file changed, 83 insertions(+), 30 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 4486e8b7d9d0..12fd52105005 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1878,8 +1878,52 @@ struct net *rtnl_get_net_ns_capable(struct sock *sk, int 
netnsid)
 }
 EXPORT_SYMBOL_GPL(rtnl_get_net_ns_capable);
 
+static int rtnl_valid_dump_ifinfo_req(const struct nlmsghdr *nlh,
+ bool strict_check, struct nlattr **tb,
+ struct netlink_ext_ack *extack)
+{
+   int hdrlen;
+
+   if (strict_check) {
+   struct ifinfomsg *ifm;
+
+   if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ifm))) {
+   NL_SET_ERR_MSG(extack, "Invalid header for link dump");
+   return -EINVAL;
+   }
+
+   ifm = nlmsg_data(nlh);
+   if (ifm->__ifi_pad || ifm->ifi_type || ifm->ifi_flags ||
+   ifm->ifi_change) {
+   NL_SET_ERR_MSG(extack, "Invalid values in header for 
link dump request");
+   return -EINVAL;
+   }
+   if (ifm->ifi_index) {
+   NL_SET_ERR_MSG(extack, "Filter by device index not 
supported for link dumps");
+   return -EINVAL;
+   }
+
+   return nlmsg_parse_strict(nlh, sizeof(*ifm), tb, IFLA_MAX,
+ ifla_policy, extack);
+   }
+
+   /* A hack to preserve kernel<->userspace interface.
+* The correct header is ifinfomsg. It is consistent with rtnl_getlink.
+* However, before Linux v3.9 the code here assumed rtgenmsg and that's
+* what iproute2 < v3.9.0 used.
+* We can detect the old iproute2. Even including the IFLA_EXT_MASK
+* attribute, its netlink message is shorter than struct ifinfomsg.
+*/
+   hdrlen = nlmsg_len(nlh) < sizeof(struct ifinfomsg) ?
+sizeof(struct rtgenmsg) : sizeof(struct ifinfomsg);
+
+   return nlmsg_parse(nlh, hdrlen, tb, IFLA_MAX, ifla_policy, extack);
+}
+
 static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 {
+   struct netlink_ext_ack *extack = cb->extack;
+   const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
struct net *tgt_net = net;
int h, s_h;
@@ -1892,44 +1936,54 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct 
netlink_callback *cb)
unsigned int flags = NLM_F_MULTI;
int master_idx = 0;
int netnsid = -1;
-   int err;
-   int hdrlen;
+   int err, i;
 
s_h = cb->args[0];
s_idx = cb->args[1];
 
-   /* A hack to preserve kernel<->userspace interface.
-* The correct header is ifinfomsg. It is consistent with rtnl_getlink.
-* However, before Linux v3.9 the code here assumed rtgenmsg and that's
-* what iproute2 < v3.9.0 used.
-* We can detect the old iproute2. Even including the IFLA_EXT_MASK
-* attribute, its netlink message is shorter than struct ifinfomsg.
-*/
-   hdrlen = nlmsg_len(cb->nlh) < sizeof(struct ifinfomsg) ?
-sizeof(struct rtgenmsg) : sizeof(struct ifinfomsg);
+   err = rtnl_valid_dump_ifinfo_req(nlh, cb->strict_check, tb, extack);
+   if (err < 0) {
+   if (cb->strict_check)
+   return err;
+
+   goto walk_entries;
+   }
+
+   for (i = 0; i <= IFLA_MAX; ++i) {
+   if (!tb[i])
+   continue;
 
-   if (nlmsg_parse(cb->nlh, hdrlen, tb, IFLA_MAX,
-   ifla_policy, cb->extack) >= 0) {
-   if (tb[IFLA_TARGET_NETNSID]) {
-   netnsid = nla_get_s32(tb[IFLA_TARGET_NETNSID]);
+   /* new attributes should only be added with strict checking */
+   switch (i) {
+   case IFLA_TARGET_NETNSID:
+   netnsid = nla_get_s32(tb[i]);
tgt_net = rtnl_get_net_ns_capable(skb->sk, netnsid);
-   if (IS_ERR(tgt_net))
+   if (IS_ERR(tgt_net)) {
+

[PATCH v2 net-next 18/23] net/fib_rules: Update fib_nl_dumprule for strict data checking

2018-10-07 Thread David Ahern

From: David Ahern 

Update fib_nl_dumprule for strict data checking. If the flag is set,
the dump request is expected to have fib_rule_hdr struct as the header.
All elements of the struct are expected to be 0 and no attributes can
be appended.

Signed-off-by: David Ahern 
---
 net/core/fib_rules.c | 36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 0ff3953f64aa..ffbb827723a2 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -1063,13 +1063,47 @@ static int dump_rules(struct sk_buff *skb, struct 
netlink_callback *cb,
return err;
 }
 
+static int fib_valid_dumprule_req(const struct nlmsghdr *nlh,
+  struct netlink_ext_ack *extack)
+{
+   struct fib_rule_hdr *frh;
+
+   if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*frh))) {
+   NL_SET_ERR_MSG(extack, "Invalid header for fib rule dump 
request");
+   return -EINVAL;
+   }
+
+   frh = nlmsg_data(nlh);
+   if (frh->dst_len || frh->src_len || frh->tos || frh->table ||
+   frh->res1 || frh->res2 || frh->action || frh->flags) {
+   NL_SET_ERR_MSG(extack,
+  "Invalid values in header for fib rule dump 
request");
+   return -EINVAL;
+   }
+
+   if (nlmsg_attrlen(nlh, sizeof(*frh))) {
+   NL_SET_ERR_MSG(extack, "Invalid data after header in fib rule 
dump request");
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 static int fib_nl_dumprule(struct sk_buff *skb, struct netlink_callback *cb)
 {
+   const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
struct fib_rules_ops *ops;
int idx = 0, family;
 
-   family = rtnl_msg_family(cb->nlh);
+   if (cb->strict_check) {
+   int err = fib_valid_dumprule_req(nlh, cb->extack);
+
+   if (err < 0)
+   return err;
+   }
+
+   family = rtnl_msg_family(nlh);
if (family != AF_UNSPEC) {
/* Protocol specific dump request */
ops = lookup_rules_ops(net, family);
-- 
2.11.0

[PATCH v2 net-next 16/23] net/neighbor: Update neightbl_dump_info for strict data checking

2018-10-07 Thread David Ahern

From: David Ahern 

Update neightbl_dump_info for strict data checking. If the flag is set,
the dump request is expected to have an ndtmsg struct as the header.
All elements of the struct are expected to be 0 and no attributes can
be appended.

Signed-off-by: David Ahern 
---
 net/core/neighbour.c | 38 +++---
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 7c8a3a0ee059..dc1389b8beb1 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2164,15 +2164,47 @@ static int neightbl_set(struct sk_buff *skb, struct 
nlmsghdr *nlh,
return err;
 }
 
+static int neightbl_valid_dump_info(const struct nlmsghdr *nlh,
+   struct netlink_ext_ack *extack)
+{
+   struct ndtmsg *ndtm;
+
+   if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ndtm))) {
+   NL_SET_ERR_MSG(extack, "Invalid header for neighbor table dump 
request");
+   return -EINVAL;
+   }
+
+   ndtm = nlmsg_data(nlh);
+   if (ndtm->ndtm_pad1  || ndtm->ndtm_pad2) {
+   NL_SET_ERR_MSG(extack, "Invalid values in header for neighbor 
table dump request");
+   return -EINVAL;
+   }
+
+   if (nlmsg_attrlen(nlh, sizeof(*ndtm))) {
+   NL_SET_ERR_MSG(extack, "Invalid data after header in neighbor 
table dump request");
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 static int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
 {
+   const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
int family, tidx, nidx = 0;
int tbl_skip = cb->args[0];
int neigh_skip = cb->args[1];
struct neigh_table *tbl;
 
-   family = ((struct rtgenmsg *) nlmsg_data(cb->nlh))->rtgen_family;
+   if (cb->strict_check) {
+   int err = neightbl_valid_dump_info(nlh, cb->extack);
+
+   if (err < 0)
+   return err;
+   }
+
+   family = ((struct rtgenmsg *)nlmsg_data(nlh))->rtgen_family;
 
for (tidx = 0; tidx < NEIGH_NR_TABLES; tidx++) {
struct neigh_parms *p;
@@ -2185,7 +2217,7 @@ static int neightbl_dump_info(struct sk_buff *skb, struct 
netlink_callback *cb)
continue;
 
if (neightbl_fill_info(skb, tbl, NETLINK_CB(cb->skb).portid,
-  cb->nlh->nlmsg_seq, RTM_NEWNEIGHTBL,
+  nlh->nlmsg_seq, RTM_NEWNEIGHTBL,
   NLM_F_MULTI) < 0)
break;
 
@@ -2200,7 +2232,7 @@ static int neightbl_dump_info(struct sk_buff *skb, struct 
netlink_callback *cb)
 
if (neightbl_fill_param_info(skb, tbl, p,
 NETLINK_CB(cb->skb).portid,
-cb->nlh->nlmsg_seq,
+nlh->nlmsg_seq,
 RTM_NEWNEIGHTBL,
 NLM_F_MULTI) < 0)
goto out;
-- 
2.11.0

[PATCH v2 net-next 10/23] rtnetlink: Update rtnl_bridge_getlink for strict data checking

2018-10-07 Thread David Ahern

From: David Ahern 

Update rtnl_bridge_getlink for strict data checking. If the flag is set,
the dump request is expected to have an ifinfomsg struct as the header
potentially followed by one or more attributes. Any data passed in the
header or as an attribute is taken as a request to influence the data
returned. Only values supported by the dump handler are allowed to be
non-0 or set in the request. At the moment only the IFLA_EXT_MASK
attribute is supported.

Signed-off-by: David Ahern 
---
 net/core/rtnetlink.c | 70 ++--
 1 file changed, 57 insertions(+), 13 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 12fd52105005..e38e1f178611 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -4021,28 +4021,72 @@ int ndo_dflt_bridge_getlink(struct sk_buff *skb, u32 
pid, u32 seq,
 }
 EXPORT_SYMBOL_GPL(ndo_dflt_bridge_getlink);
 
+static int valid_bridge_getlink_req(const struct nlmsghdr *nlh,
+   bool strict_check, u32 *filter_mask,
+   struct netlink_ext_ack *extack)
+{
+   struct nlattr *tb[IFLA_MAX+1];
+   int err, i;
+
+   if (strict_check) {
+   struct ifinfomsg *ifm;
+
+   if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ifm))) {
+   NL_SET_ERR_MSG(extack, "Invalid header for bridge link 
dump");
+   return -EINVAL;
+   }
+
+   ifm = nlmsg_data(nlh);
+   if (ifm->__ifi_pad || ifm->ifi_type || ifm->ifi_flags ||
+   ifm->ifi_change || ifm->ifi_index) {
+   NL_SET_ERR_MSG(extack, "Invalid values in header for 
bridge link dump request");
+   return -EINVAL;
+   }
+
+   err = nlmsg_parse_strict(nlh, sizeof(struct ifinfomsg), tb,
+IFLA_MAX, ifla_policy, extack);
+   } else {
+   err = nlmsg_parse(nlh, sizeof(struct ifinfomsg), tb,
+ IFLA_MAX, ifla_policy, extack);
+   }
+   if (err < 0)
+   return err;
+
+   /* new attributes should only be added with strict checking */
+   for (i = 0; i <= IFLA_MAX; ++i) {
+   if (!tb[i])
+   continue;
+
+   switch (i) {
+   case IFLA_EXT_MASK:
+   *filter_mask = nla_get_u32(tb[i]);
+   break;
+   default:
+   if (strict_check) {
+   NL_SET_ERR_MSG(extack, "Unsupported attribute 
in bridge link dump request");
+   return -EINVAL;
+   }
+   }
+   }
+
+   return 0;
+}
+
 static int rtnl_bridge_getlink(struct sk_buff *skb, struct netlink_callback 
*cb)
 {
+   const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
struct net_device *dev;
int idx = 0;
u32 portid = NETLINK_CB(cb->skb).portid;
-   u32 seq = cb->nlh->nlmsg_seq;
+   u32 seq = nlh->nlmsg_seq;
u32 filter_mask = 0;
int err;
 
-   if (nlmsg_len(cb->nlh) > sizeof(struct ifinfomsg)) {
-   struct nlattr *extfilt;
-
-   extfilt = nlmsg_find_attr(cb->nlh, sizeof(struct ifinfomsg),
- IFLA_EXT_MASK);
-   if (extfilt) {
-   if (nla_len(extfilt) < sizeof(filter_mask))
-   return -EINVAL;
-
-   filter_mask = nla_get_u32(extfilt);
-   }
-   }
+   err = valid_bridge_getlink_req(nlh, cb->strict_check, _mask,
+  cb->extack);
+   if (err < 0 && cb->strict_check)
+   return err;
 
rcu_read_lock();
for_each_netdev_rcu(net, dev) {
-- 
2.11.0

[PATCH v2 net-next 05/23] net/ipv6: Refactor address dump to push inet6_fill_args to in6_dump_addrs

2018-10-07 Thread David Ahern

From: David Ahern 

Pull the inet6_fill_args arg up to in6_dump_addrs and move netnsid
into it.

Signed-off-by: David Ahern 
Acked-by: Christian Brauner 
---
 net/ipv6/addrconf.c | 57 -
 1 file changed, 30 insertions(+), 27 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 2f8aa4fd5e55..afa279170ba5 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4793,12 +4793,19 @@ static inline int inet6_ifaddr_msgsize(void)
   + nla_total_size(4)  /* IFA_RT_PRIORITY */;
 }
 
+enum addr_type_t {
+   UNICAST_ADDR,
+   MULTICAST_ADDR,
+   ANYCAST_ADDR,
+};
+
 struct inet6_fill_args {
u32 portid;
u32 seq;
int event;
unsigned int flags;
int netnsid;
+   enum addr_type_t type;
 };
 
 static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa,
@@ -4930,39 +4937,28 @@ static int inet6_fill_ifacaddr(struct sk_buff *skb, 
struct ifacaddr6 *ifaca,
return 0;
 }
 
-enum addr_type_t {
-   UNICAST_ADDR,
-   MULTICAST_ADDR,
-   ANYCAST_ADDR,
-};
-
 /* called with rcu_read_lock() */
 static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
- struct netlink_callback *cb, enum addr_type_t type,
- int s_ip_idx, int *p_ip_idx, int netnsid)
+ struct netlink_callback *cb,
+ int s_ip_idx, int *p_ip_idx,
+ struct inet6_fill_args *fillargs)
 {
-   struct inet6_fill_args fillargs = {
-   .portid = NETLINK_CB(cb->skb).portid,
-   .seq = cb->nlh->nlmsg_seq,
-   .flags = NLM_F_MULTI,
-   .netnsid = netnsid,
-   };
struct ifmcaddr6 *ifmca;
struct ifacaddr6 *ifaca;
int err = 1;
int ip_idx = *p_ip_idx;
 
read_lock_bh(>lock);
-   switch (type) {
+   switch (fillargs->type) {
case UNICAST_ADDR: {
struct inet6_ifaddr *ifa;
-   fillargs.event = RTM_NEWADDR;
+   fillargs->event = RTM_NEWADDR;
 
/* unicast address incl. temp addr */
list_for_each_entry(ifa, >addr_list, if_list) {
if (++ip_idx < s_ip_idx)
continue;
-   err = inet6_fill_ifaddr(skb, ifa, );
+   err = inet6_fill_ifaddr(skb, ifa, fillargs);
if (err < 0)
break;
nl_dump_check_consistent(cb, nlmsg_hdr(skb));
@@ -4970,26 +4966,26 @@ static int in6_dump_addrs(struct inet6_dev *idev, 
struct sk_buff *skb,
break;
}
case MULTICAST_ADDR:
-   fillargs.event = RTM_GETMULTICAST;
+   fillargs->event = RTM_GETMULTICAST;
 
/* multicast address */
for (ifmca = idev->mc_list; ifmca;
 ifmca = ifmca->next, ip_idx++) {
if (ip_idx < s_ip_idx)
continue;
-   err = inet6_fill_ifmcaddr(skb, ifmca, );
+   err = inet6_fill_ifmcaddr(skb, ifmca, fillargs);
if (err < 0)
break;
}
break;
case ANYCAST_ADDR:
-   fillargs.event = RTM_GETANYCAST;
+   fillargs->event = RTM_GETANYCAST;
/* anycast address */
for (ifaca = idev->ac_list; ifaca;
 ifaca = ifaca->aca_next, ip_idx++) {
if (ip_idx < s_ip_idx)
continue;
-   err = inet6_fill_ifacaddr(skb, ifaca, );
+   err = inet6_fill_ifacaddr(skb, ifaca, fillargs);
if (err < 0)
break;
}
@@ -5005,10 +5001,16 @@ static int in6_dump_addrs(struct inet6_dev *idev, 
struct sk_buff *skb,
 static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
   enum addr_type_t type)
 {
+   struct inet6_fill_args fillargs = {
+   .portid = NETLINK_CB(cb->skb).portid,
+   .seq = cb->nlh->nlmsg_seq,
+   .flags = NLM_F_MULTI,
+   .netnsid = -1,
+   .type = type,
+   };
struct net *net = sock_net(skb->sk);
struct nlattr *tb[IFA_MAX+1];
struct net *tgt_net = net;
-   int netnsid = -1;
int h, s_h;
int idx, ip_idx;
int s_idx, s_ip_idx;
@@ -5023,9 +5025,10 @@ static int inet6_dump_addr(struct sk_buff *skb, struct 
netlink_callback *cb,
if (nlmsg_parse(cb->nlh, sizeof(struct ifaddrmsg), tb, IFA_MAX,
ifa_ipv6_policy, cb->extack) >= 0) {
if (tb[IFA_TARGET_NETNSID]) {
-   netnsid =

[PATCH v2 net-next 12/23] rtnetlink: Update inet6_dump_ifinfo for strict data checking

2018-10-07 Thread David Ahern

From: David Ahern 

Update inet6_dump_ifinfo for strict data checking. If the flag is
set, the dump request is expected to have an ifinfomsg struct as
the header. All elements of the struct are expected to be 0 and no
attributes can be appended.

Signed-off-by: David Ahern 
---
 net/ipv6/addrconf.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 095d3f56f0a9..ce071d85ad00 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -5644,6 +5644,31 @@ static int inet6_fill_ifinfo(struct sk_buff *skb, struct 
inet6_dev *idev,
return -EMSGSIZE;
 }
 
+static int inet6_valid_dump_ifinfo(const struct nlmsghdr *nlh,
+  struct netlink_ext_ack *extack)
+{
+   struct ifinfomsg *ifm;
+
+   if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ifm))) {
+   NL_SET_ERR_MSG_MOD(extack, "Invalid header for link dump 
request");
+   return -EINVAL;
+   }
+
+   if (nlmsg_attrlen(nlh, sizeof(*ifm))) {
+   NL_SET_ERR_MSG_MOD(extack, "Invalid data after header");
+   return -EINVAL;
+   }
+
+   ifm = nlmsg_data(nlh);
+   if (ifm->__ifi_pad || ifm->ifi_type || ifm->ifi_flags ||
+   ifm->ifi_change || ifm->ifi_index) {
+   NL_SET_ERR_MSG_MOD(extack, "Invalid values in header for dump 
request");
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 static int inet6_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 {
struct net *net = sock_net(skb->sk);
@@ -5653,6 +5678,16 @@ static int inet6_dump_ifinfo(struct sk_buff *skb, struct 
netlink_callback *cb)
struct inet6_dev *idev;
struct hlist_head *head;
 
+   /* only requests using strict checking can pass data to
+* influence the dump
+*/
+   if (cb->strict_check) {
+   int err = inet6_valid_dump_ifinfo(cb->nlh, cb->extack);
+
+   if (err < 0)
+   return err;
+   }
+
s_h = cb->args[0];
s_idx = cb->args[1];
 
-- 
2.11.0

[PATCH v2 net-next 14/23] rtnetlink: Update fib dumps for strict data checking

2018-10-07 Thread David Ahern

From: David Ahern 

Add helper to check netlink message for route dumps. If the strict flag
is set the dump request is expected to have an rtmsg struct as the header.
All elements of the struct are expected to be 0 with the exception of
rtm_flags (which is used by both ipv4 and ipv6 dumps) and no attributes
can be appended. rtm_flags can only have RTM_F_CLONED and RTM_F_PREFIX
set.

Update inet_dump_fib, inet6_dump_fib, mpls_dump_routes, ipmr_rtm_dumproute,
and ip6mr_rtm_dumproute to call this helper if strict data checking is
enabled.

Signed-off-by: David Ahern 
---
 include/net/ip_fib.h|  2 ++
 net/ipv4/fib_frontend.c | 42 --
 net/ipv4/ipmr.c |  7 +++
 net/ipv6/ip6_fib.c  |  8 
 net/ipv6/ip6mr.c|  9 +
 net/mpls/af_mpls.c  |  8 
 6 files changed, 74 insertions(+), 2 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index f7c109e37298..9846b79c9ee1 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -452,4 +452,6 @@ static inline void fib_proc_exit(struct net *net)
 
 u32 ip_mtu_from_fib_result(struct fib_result *res, __be32 daddr);
 
+int ip_valid_fib_dump_req(const struct nlmsghdr *nlh,
+ struct netlink_ext_ack *extack);
 #endif  /* _NET_FIB_H */
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 30e2bcc3ef2a..038f511c73fa 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -802,8 +802,40 @@ static int inet_rtm_newroute(struct sk_buff *skb, struct 
nlmsghdr *nlh,
return err;
 }
 
+int ip_valid_fib_dump_req(const struct nlmsghdr *nlh,
+ struct netlink_ext_ack *extack)
+{
+   struct rtmsg *rtm;
+
+   if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*rtm))) {
+   NL_SET_ERR_MSG(extack, "Invalid header for FIB dump request");
+   return -EINVAL;
+   }
+
+   rtm = nlmsg_data(nlh);
+   if (rtm->rtm_dst_len || rtm->rtm_src_len  || rtm->rtm_tos   ||
+   rtm->rtm_table   || rtm->rtm_protocol || rtm->rtm_scope ||
+   rtm->rtm_type) {
+   NL_SET_ERR_MSG(extack, "Invalid values in header for FIB dump 
request");
+   return -EINVAL;
+   }
+   if (rtm->rtm_flags & ~(RTM_F_CLONED | RTM_F_PREFIX)) {
+   NL_SET_ERR_MSG(extack, "Invalid flags for FIB dump request");
+   return -EINVAL;
+   }
+
+   if (nlmsg_attrlen(nlh, sizeof(*rtm))) {
+   NL_SET_ERR_MSG(extack, "Invalid data after header in FIB dump 
request");
+   return -EINVAL;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(ip_valid_fib_dump_req);
+
 static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 {
+   const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
unsigned int h, s_h;
unsigned int e = 0, s_e;
@@ -811,8 +843,14 @@ static int inet_dump_fib(struct sk_buff *skb, struct 
netlink_callback *cb)
struct hlist_head *head;
int dumped = 0, err;
 
-   if (nlmsg_len(cb->nlh) >= sizeof(struct rtmsg) &&
-   ((struct rtmsg *) nlmsg_data(cb->nlh))->rtm_flags & RTM_F_CLONED)
+   if (cb->strict_check) {
+   err = ip_valid_fib_dump_req(nlh, cb->extack);
+   if (err < 0)
+   return err;
+   }
+
+   if (nlmsg_len(nlh) >= sizeof(struct rtmsg) &&
+   ((struct rtmsg *)nlmsg_data(nlh))->rtm_flags & RTM_F_CLONED)
return skb->len;
 
s_h = cb->args[0];
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index e7322e407bb4..91b0d5671649 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2527,6 +2527,13 @@ static int ipmr_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh,
 
 static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
 {
+   if (cb->strict_check) {
+   int err = ip_valid_fib_dump_req(cb->nlh, cb->extack);
+
+   if (err < 0)
+   return err;
+   }
+
return mr_rtm_dumproute(skb, cb, ipmr_mr_table_iter,
_ipmr_fill_mroute, _unres_lock);
 }
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index cf709eadc932..e14d244c551f 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -564,6 +564,7 @@ static int fib6_dump_table(struct fib6_table *table, struct 
sk_buff *skb,
 
 static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 {
+   const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
unsigned int h, s_h;
unsigned int e = 0, s_e;
@@ -573,6 +574,13 @@ static int inet6_dump_fib(struct sk_buff *skb, struct 
netlink_callback *cb)
struct hlist_head *head;
int res = 0;
 
+   if (cb->strict_check) {
+   int err = ip_valid_fib_dump_req(nlh, cb->extack);
+
+   if (err < 0)
+

[PATCH v2 net-next 19/23] net/ipv6: Update ip6addrlbl_dump for strict data checking

2018-10-07 Thread David Ahern

From: David Ahern 

Update ip6addrlbl_dump for strict data checking. If the flag is set,
the dump request is expected to have an ifaddrlblmsg struct as the
header. All elements of the struct are expected to be 0 and no
attributes can be appended.

Signed-off-by: David Ahern 
---
 net/ipv6/addrlabel.c | 34 +-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/addrlabel.c b/net/ipv6/addrlabel.c
index 1d6ced37ad71..0d1ee82ee55b 100644
--- a/net/ipv6/addrlabel.c
+++ b/net/ipv6/addrlabel.c
@@ -458,20 +458,52 @@ static int ip6addrlbl_fill(struct sk_buff *skb,
return 0;
 }
 
+static int ip6addrlbl_valid_dump_req(const struct nlmsghdr *nlh,
+struct netlink_ext_ack *extack)
+{
+   struct ifaddrlblmsg *ifal;
+
+   if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ifal))) {
+   NL_SET_ERR_MSG_MOD(extack, "Invalid header for address label 
dump request");
+   return -EINVAL;
+   }
+
+   ifal = nlmsg_data(nlh);
+   if (ifal->__ifal_reserved || ifal->ifal_prefixlen ||
+   ifal->ifal_flags || ifal->ifal_index || ifal->ifal_seq) {
+   NL_SET_ERR_MSG_MOD(extack, "Invalid values in header for 
address label dump request");
+   return -EINVAL;
+   }
+
+   if (nlmsg_attrlen(nlh, sizeof(*ifal))) {
+   NL_SET_ERR_MSG_MOD(extack, "Invalid data after header for 
address label dump requewst");
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 static int ip6addrlbl_dump(struct sk_buff *skb, struct netlink_callback *cb)
 {
+   const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
struct ip6addrlbl_entry *p;
int idx = 0, s_idx = cb->args[0];
int err;
 
+   if (cb->strict_check) {
+   err = ip6addrlbl_valid_dump_req(nlh, cb->extack);
+   if (err < 0)
+   return err;
+   }
+
rcu_read_lock();
hlist_for_each_entry_rcu(p, >ipv6.ip6addrlbl_table.head, list) {
if (idx >= s_idx) {
err = ip6addrlbl_fill(skb, p,
  net->ipv6.ip6addrlbl_table.seq,
  NETLINK_CB(cb->skb).portid,
- cb->nlh->nlmsg_seq,
+ nlh->nlmsg_seq,
  RTM_NEWADDRLABEL,
  NLM_F_MULTI);
if (err < 0)
-- 
2.11.0

[PATCH v2 net-next 22/23] rtnetlink: Move input checking for rtnl_fdb_dump to helper

2018-10-07 Thread David Ahern

From: David Ahern 

Move the existing input checking for rtnl_fdb_dump into a helper,
valid_fdb_dump_legacy. This function will retain the current
logic that works around the 2 headers that userspace has been
allowed to send up to this point.

Signed-off-by: David Ahern 
---
 net/core/rtnetlink.c | 53 
 1 file changed, 33 insertions(+), 20 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index f6d2609cfa9f..c7509c789fb6 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3799,22 +3799,13 @@ int ndo_dflt_fdb_dump(struct sk_buff *skb,
 }
 EXPORT_SYMBOL(ndo_dflt_fdb_dump);
 
-static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
+static int valid_fdb_dump_legacy(const struct nlmsghdr *nlh,
+int *br_idx, int *brport_idx,
+struct netlink_ext_ack *extack)
 {
-   struct net_device *dev;
+   struct ifinfomsg *ifm = nlmsg_data(nlh);
struct nlattr *tb[IFLA_MAX+1];
-   struct net_device *br_dev = NULL;
-   const struct net_device_ops *ops = NULL;
-   const struct net_device_ops *cops = NULL;
-   struct ifinfomsg *ifm = nlmsg_data(cb->nlh);
-   struct net *net = sock_net(skb->sk);
-   struct hlist_head *head;
-   int brport_idx = 0;
-   int br_idx = 0;
-   int h, s_h;
-   int idx = 0, s_idx;
-   int err = 0;
-   int fidx = 0;
+   int err;
 
/* A hack to preserve kernel<->userspace interface.
 * Before Linux v4.12 this code accepted ndmsg since iproute2 v3.3.0.
@@ -3823,20 +3814,42 @@ static int rtnl_fdb_dump(struct sk_buff *skb, struct 
netlink_callback *cb)
 * Fortunately these sizes don't conflict with the size of ifinfomsg
 * with an optional attribute.
 */
-   if (nlmsg_len(cb->nlh) != sizeof(struct ndmsg) &&
-   (nlmsg_len(cb->nlh) != sizeof(struct ndmsg) +
+   if (nlmsg_len(nlh) != sizeof(struct ndmsg) &&
+   (nlmsg_len(nlh) != sizeof(struct ndmsg) +
 nla_attr_size(sizeof(u32 {
-   err = nlmsg_parse(cb->nlh, sizeof(struct ifinfomsg), tb,
- IFLA_MAX, ifla_policy, cb->extack);
+   err = nlmsg_parse(nlh, sizeof(struct ifinfomsg), tb, IFLA_MAX,
+ ifla_policy, extack);
if (err < 0) {
return -EINVAL;
} else if (err == 0) {
if (tb[IFLA_MASTER])
-   br_idx = nla_get_u32(tb[IFLA_MASTER]);
+   *br_idx = nla_get_u32(tb[IFLA_MASTER]);
}
 
-   brport_idx = ifm->ifi_index;
+   *brport_idx = ifm->ifi_index;
}
+   return 0;
+}
+
+static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
+{
+   struct net_device *dev;
+   struct net_device *br_dev = NULL;
+   const struct net_device_ops *ops = NULL;
+   const struct net_device_ops *cops = NULL;
+   struct net *net = sock_net(skb->sk);
+   struct hlist_head *head;
+   int brport_idx = 0;
+   int br_idx = 0;
+   int h, s_h;
+   int idx = 0, s_idx;
+   int err = 0;
+   int fidx = 0;
+
+   err = valid_fdb_dump_legacy(cb->nlh, _idx, _idx,
+   cb->extack);
+   if (err < 0)
+   return err;
 
if (br_idx) {
br_dev = __dev_get_by_index(net, br_idx);
-- 
2.11.0

[PATCH v2 net-next 23/23] rtnetlink: Update rtnl_fdb_dump for strict data checking

2018-10-07 Thread David Ahern

From: David Ahern 

Update rtnl_fdb_dump for strict data checking. If the flag is set,
the dump request is expected to have an ndmsg struct as the header
potentially followed by one or more attributes. Any data passed in the
header or as an attribute is taken as a request to influence the data
returned. Only values supported by the dump handler are allowed to be
non-0 or set in the request. At the moment only the NDA_IFINDEX and
NDA_MASTER attributes are supported.

Signed-off-by: David Ahern 
---
 net/core/rtnetlink.c | 62 ++--
 1 file changed, 60 insertions(+), 2 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index c7509c789fb6..c894c4af8981 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3799,6 +3799,60 @@ int ndo_dflt_fdb_dump(struct sk_buff *skb,
 }
 EXPORT_SYMBOL(ndo_dflt_fdb_dump);
 
+static int valid_fdb_dump_strict(const struct nlmsghdr *nlh,
+int *br_idx, int *brport_idx,
+struct netlink_ext_ack *extack)
+{
+   struct nlattr *tb[NDA_MAX + 1];
+   struct ndmsg *ndm;
+   int err, i;
+
+   if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ndm))) {
+   NL_SET_ERR_MSG(extack, "Invalid header for fdb dump request");
+   return -EINVAL;
+   }
+
+   ndm = nlmsg_data(nlh);
+   if (ndm->ndm_pad1  || ndm->ndm_pad2  || ndm->ndm_state ||
+   ndm->ndm_flags || ndm->ndm_type) {
+   NL_SET_ERR_MSG(extack, "Invalid values in header for fbd dump 
request");
+   return -EINVAL;
+   }
+
+   err = nlmsg_parse_strict(nlh, sizeof(struct ndmsg), tb, NDA_MAX,
+NULL, extack);
+   if (err < 0)
+   return err;
+
+   *brport_idx = ndm->ndm_ifindex;
+   for (i = 0; i <= NDA_MAX; ++i) {
+   if (!tb[i])
+   continue;
+
+   switch (i) {
+   case NDA_IFINDEX:
+   if (nla_len(tb[i]) != sizeof(u32)) {
+   NL_SET_ERR_MSG(extack, "Invalid IFINDEX 
attribute in fdb dump request");
+   return -EINVAL;
+   }
+   *brport_idx = nla_get_u32(tb[NDA_IFINDEX]);
+   break;
+   case NDA_MASTER:
+   if (nla_len(tb[i]) != sizeof(u32)) {
+   NL_SET_ERR_MSG(extack, "Invalid MASTER 
attribute in fdb dump request");
+   return -EINVAL;
+   }
+   *br_idx = nla_get_u32(tb[NDA_MASTER]);
+   break;
+   default:
+   NL_SET_ERR_MSG(extack, "Unsupported attribute in fdb 
dump request");
+   return -EINVAL;
+   }
+   }
+
+   return 0;
+}
+
 static int valid_fdb_dump_legacy(const struct nlmsghdr *nlh,
 int *br_idx, int *brport_idx,
 struct netlink_ext_ack *extack)
@@ -3846,8 +3900,12 @@ static int rtnl_fdb_dump(struct sk_buff *skb, struct 
netlink_callback *cb)
int err = 0;
int fidx = 0;
 
-   err = valid_fdb_dump_legacy(cb->nlh, _idx, _idx,
-   cb->extack);
+   if (cb->strict_check)
+   err = valid_fdb_dump_strict(cb->nlh, _idx, _idx,
+   cb->extack);
+   else
+   err = valid_fdb_dump_legacy(cb->nlh, _idx, _idx,
+   cb->extack);
if (err < 0)
return err;
 
-- 
2.11.0

Re: [RFC PATCH bpf-next v4 4/7] bpf: add bpf queue and stack maps

2018-10-07 Thread Mauricio Vasquez





On 10/04/2018 10:40 PM, Mauricio Vasquez wrote:



On 10/04/2018 06:57 PM, Alexei Starovoitov wrote:

On Thu, Oct 04, 2018 at 07:12:44PM +0200, Mauricio Vasquez B wrote:

Implement two new kind of maps that support the peek, push and pop
operations.

A use case for this is to keep track of a pool of elements, like
network ports in a SNAT.

Signed-off-by: Mauricio Vasquez B 
---
  include/linux/bpf.h   |    7 +
  include/linux/bpf_types.h |    2
  include/uapi/linux/bpf.h  |   35 -
  kernel/bpf/Makefile   |    2
  kernel/bpf/core.c |    3
  kernel/bpf/helpers.c  |   43 ++
  kernel/bpf/queue_stack_maps.c |  300 
+

  kernel/bpf/syscall.c  |   31 +++-
  kernel/bpf/verifier.c |   14 +-
  net/core/filter.c |    6 +
  10 files changed, 424 insertions(+), 19 deletions(-)
  create mode 100644 kernel/bpf/queue_stack_maps.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 98c7eeb6d138..cad3bc5cffd1 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -40,6 +40,9 @@ struct bpf_map_ops {
  int (*map_update_elem)(struct bpf_map *map, void *key, void 
*value, u64 flags);

  int (*map_delete_elem)(struct bpf_map *map, void *key);
  void *(*map_lookup_and_delete_elem)(struct bpf_map *map, void 
*key);

+    int (*map_push_elem)(struct bpf_map *map, void *value, u64 flags);
+    int (*map_pop_elem)(struct bpf_map *map, void *value);
+    int (*map_peek_elem)(struct bpf_map *map, void *value);
    /* funcs called by prog_array and perf_event_array map */
  void *(*map_fd_get_ptr)(struct bpf_map *map, struct file 
*map_file,

@@ -139,6 +142,7 @@ enum bpf_arg_type {
  ARG_CONST_MAP_PTR,    /* const argument used as pointer to 
bpf_map */

  ARG_PTR_TO_MAP_KEY,    /* pointer to stack used as map key */
  ARG_PTR_TO_MAP_VALUE,    /* pointer to stack used as map value */
+    ARG_PTR_TO_UNINIT_MAP_VALUE,    /* pointer to valid memory used 
to store a map value */
    /* the following constraints used to prototype bpf_memcmp() 
and other

   * functions that access data on eBPF program stack
@@ -825,6 +829,9 @@ static inline int 
bpf_fd_reuseport_array_update_elem(struct bpf_map *map,

  extern const struct bpf_func_proto bpf_map_lookup_elem_proto;
  extern const struct bpf_func_proto bpf_map_update_elem_proto;
  extern const struct bpf_func_proto bpf_map_delete_elem_proto;
+extern const struct bpf_func_proto bpf_map_push_elem_proto;
+extern const struct bpf_func_proto bpf_map_pop_elem_proto;
+extern const struct bpf_func_proto bpf_map_peek_elem_proto;
    extern const struct bpf_func_proto bpf_get_prandom_u32_proto;
  extern const struct bpf_func_proto bpf_get_smp_processor_id_proto;
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 658509daacd4..a2ec73aa1ec7 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -69,3 +69,5 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_XSKMAP, xsk_map_ops)
  BPF_MAP_TYPE(BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, reuseport_array_ops)
  #endif
  #endif
+BPF_MAP_TYPE(BPF_MAP_TYPE_QUEUE, queue_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 3bb94aa2d408..bfa042273fad 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -129,6 +129,8 @@ enum bpf_map_type {
  BPF_MAP_TYPE_CGROUP_STORAGE,
  BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
  BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
+    BPF_MAP_TYPE_QUEUE,
+    BPF_MAP_TYPE_STACK,
  };
    enum bpf_prog_type {
@@ -463,6 +465,28 @@ union bpf_attr {
   * Return
   * 0 on success, or a negative error in case of failure.
   *
+ * int bpf_map_push_elem(struct bpf_map *map, const void *value, 
u64 flags)

+ * Description
+ * Push an element *value* in *map*. *flags* is one of:
+ *
+ * **BPF_EXIST**
+ * If the queue/stack is full, the oldest element is 
removed to

+ * make room for this.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_map_pop_elem(struct bpf_map *map, void *value)
+ * Description
+ * Pop an element from *map*.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_map_peek_elem(struct bpf_map *map, void *value)
+ * Description
+ * Get an element from *map* without removing it.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
   * int bpf_probe_read(void *dst, u32 size, const void *src)
   * Description
   * For tracing programs, safely attempt to read *size* 
bytes from

@@ -790,14 +814,14 @@ union bpf_attr {
   *
   * int ret;
   * struct bpf_tunnel_key key = {};
- *
+ *
   * ret = bpf_skb_get_tunnel_key(skb, , 
sizeof(key), 0);

   * if (ret < 0)
   * return TC_ACT_SHOT;    // drop

Re: [PATCH] net/packet: fix packet drop as of virtio gso

2018-10-07 Thread Jason Wang





On 2018年09月29日 23:41, Jianfeng Tan wrote:

When we use raw socket as the vhost backend, a packet from virito with
gso offloading information, cannot be sent out in later validaton at
xmit path, as we did not set correct skb->protocol which is further used
for looking up the gso function.


Hi:

May I ask the reason for using raw socket for vhost? It was not a common 
setup with little care in the past few years. And it was slow since it 
lacks some recent improvements. Can it be replaced with e.g macvtap?


Thanks



To fix this, we set this field according to virito hdr information.

Fixes: e858fae2b0b8f4 ("virtio_net: use common code for virtio_net_hdr and skb GSO 
conversion")

Cc: sta...@vger.kernel.org
Signed-off-by: Jianfeng Tan 
---
  include/linux/virtio_net.h | 18 ++
  net/packet/af_packet.c | 11 +++
  2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 9397628a1967..cb462f9ab7dd 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -5,6 +5,24 @@
  #include 
  #include 
  
+static inline int virtio_net_hdr_set_proto(struct sk_buff *skb,

+  const struct virtio_net_hdr *hdr)
+{
+   switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
+   case VIRTIO_NET_HDR_GSO_TCPV4:
+   case VIRTIO_NET_HDR_GSO_UDP:
+   skb->protocol = cpu_to_be16(ETH_P_IP);
+   break;
+   case VIRTIO_NET_HDR_GSO_TCPV6:
+   skb->protocol = cpu_to_be16(ETH_P_IPV6);
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
  static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
const struct virtio_net_hdr *hdr,
bool little_endian)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 75c92a87e7b2..d6e94dc7e290 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2715,10 +2715,12 @@ static int tpacket_snd(struct packet_sock *po, struct 
msghdr *msg)
}
}
  
-		if (po->has_vnet_hdr && virtio_net_hdr_to_skb(skb, vnet_hdr,

- vio_le())) {
-   tp_len = -EINVAL;
-   goto tpacket_error;
+   if (po->has_vnet_hdr) {
+   if (virtio_net_hdr_to_skb(skb, vnet_hdr, vio_le())) {
+   tp_len = -EINVAL;
+   goto tpacket_error;
+   }
+   virtio_net_hdr_set_proto(skb, vnet_hdr);
}
  
  		skb->destructor = tpacket_destruct_skb;

@@ -2915,6 +2917,7 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
if (err)
goto out_free;
len += sizeof(vnet_hdr);
+   virtio_net_hdr_set_proto(skb, _hdr);
}
  
  	skb_probe_transport_header(skb, reserve);

can not sync master interface when bond1 box connected with another bond1 box

2018-10-07 Thread Yi Li

Hi guys,
  I encountered this problem when using bonding with mode1. I
have two linux box,
they both have two nics, and i setup these two nics with bond1 mode on
each linux box.
And then I connected these two linux box with each other.

  And then , I found , sometimes, Box A selects eth0 as
active,  eth1 as backup; and
at this moment, Box B auto selects eth1 as active, eth0 as backup.
But my Box A's eth0 is connected with Box B's eth0, so they
are disconnected and can
not recovery from this situation, until i reboot or re-plug cables.

So , guys, how can I make two linux boxes both with bonding
mode 1, connected with each
other steadily.
Thanks.

Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.

2018-10-07 Thread Richard Cochran

On Sun, Oct 07, 2018 at 09:54:00PM +0200, Andrew Lunn wrote:
> Sure, but things have moved on since then.

I was curious about this.  Based on your uses cases, I guess that you
mean phylib?  But not much has changed AFAICT. (There is one new
global function and two were removed, but that doesn't change the
picture WRT time stamping.)

Phylink now has two or three new users, one of which is dsa.  Is that
the big move?

The situation with MACs that handle their own PHYs without phylib is
unchanged, AFAICT.

So what exactly do you mean?

Thanks,
Richard

[PATCH] net: vhost: remove bad code line

2018-10-07 Thread xiangxia . m . yue

From: Tonghao Zhang 

Signed-off-by: Tonghao Zhang 
---
 drivers/vhost/net.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 015abf3..ab11b2b 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -562,7 +562,6 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
if (r == tvq->num && tvq->busyloop_timeout) {
/* Flush batched packets first */
if (!vhost_sock_zcopy(tvq->private_data))
-   // vhost_net_signal_used(tnvq);
vhost_tx_batch(net, tnvq, tvq->private_data, msghdr);
 
vhost_net_busy_poll(net, rvq, tvq, busyloop_intr, false);
-- 
1.8.3.1

Re: [PATCH net-next 19/20] net: Update netconf dump handlers for strict data checking

2018-10-07 Thread David Ahern

On 10/7/18 4:53 AM, Christian Brauner wrote:
>> @@ -2076,6 +2077,21 @@ static int inet_netconf_dump_devconf(struct sk_buff 
>> *skb,
>>  struct in_device *in_dev;
>>  struct hlist_head *head;
>>  
>> +if (cb->strict_check) {
>> +struct netlink_ext_ack *extack = cb->extack;
>> +struct netconfmsg *ncm;
>> +
>> +if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ncm))) {
>> +NL_SET_ERR_MSG(extack, "Invalid header");
>> +return -EINVAL;
>> +}
>> +
>> +if (nlh->nlmsg_len != nlmsg_msg_size(sizeof(*ncm))) {
>> +NL_SET_ERR_MSG(extack, "Invalid data after header");
>> +return -EINVAL;
>> +}
> 
> Hm, I think this could just be one branch with !=
> But if you've done this to report back a more meaningful error message
> to userspace, fine. :)

Consistency with other dump handlers and better userspace error
messages. If netconf ever gets a filter the length check is removed in
favor of nlmsg_parse_strict

Re: [PATCH net-next 15/20] net/neighbor: Update neightbl_dump_info for strict data checking

2018-10-07 Thread David Ahern

On 10/7/18 4:48 AM, Christian Brauner wrote:
>> +
>>  static int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback 
>> *cb)
>>  {
>> +const struct nlmsghdr *nlh = cb->nlh;
>>  struct net *net = sock_net(skb->sk);
>>  int family, tidx, nidx = 0;
>>  int tbl_skip = cb->args[0];
>>  int neigh_skip = cb->args[1];
>>  struct neigh_table *tbl;
>>  
>> -family = ((struct rtgenmsg *) nlmsg_data(cb->nlh))->rtgen_family;
>> +if (cb->strict_check) {
>> +int err = neightbl_valid_dump_info(nlh, cb->extack);
>> +
>> +if (err)
>> +return err;
>> +}
>> +
>> +family = ((struct rtgenmsg *)nlmsg_data(nlh))->rtgen_family;
> 
> So this already was a problem prior to your patch: what happens when you
> pass in the wrong struct? Then this case is not safe to do and might
> contain all kinds of crap.

'This case' meaning the above dereference? family is *always* the first
element in all of the header structs. It is core to the rtnetlink
processing.

Re: [PATCH net-next 12/20] rtnetlink: Update ipmr_rtm_dumplink for strict data checking

2018-10-07 Thread David Ahern

On 10/7/18 4:40 AM, Christian Brauner wrote:
>> @@ -2718,6 +2743,13 @@ static int ipmr_rtm_dumplink(struct sk_buff *skb, 
>> struct netlink_callback *cb)
>>  unsigned int e = 0, s_e;
>>  struct mr_table *mrt;
>>  
>> +if (cb->strict_check) {
>> +int err = ipmr_valid_dumplink(cb->nlh, cb->extack);
>> +
>> +if (err)
>> +return err;
> 
> Nit: can we remove the unnecessary \n, please.

Coding standards dictate a newline between declarations and code. And
that is my preference too.

Re: [PATCH net-next 09/20] rtnetlink: Update rtnl_bridge_getlink for strict data checking

2018-10-07 Thread David Ahern

On 10/7/18 4:36 AM, Christian Brauner wrote:
>> +if (cb->strict_check) {
>> +struct ifinfomsg *ifm;
>>  
>> -extfilt = nlmsg_find_attr(cb->nlh, sizeof(struct ifinfomsg),
>> -  IFLA_EXT_MASK);
>> -if (extfilt) {
>> -if (nla_len(extfilt) < sizeof(filter_mask))
>> -return -EINVAL;
>> +if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ifm))) {
>> +NL_SET_ERR_MSG(extack, "Invalid header");
>> +return -EINVAL;
>> +}
>> +
>> +ifm = nlmsg_data(nlh);
>> +if (ifm->__ifi_pad || ifm->ifi_type || ifm->ifi_flags ||
>> +ifm->ifi_change || ifm->ifi_index) {
>> +NL_SET_ERR_MSG(extack, "Invalid values in header for 
>> dump request");
>> +return -EINVAL;
>> +}
>> +}
>>  
>> -filter_mask = nla_get_u32(extfilt);
>> +err = nlmsg_parse(nlh, sizeof(struct ifinfomsg), tb, IFLA_MAX,
>> +  ifla_policy, extack);
>> +if (err < 0) {
>> +if (cb->strict_check)
>> +return -EINVAL;
>> +goto walk_entries;
>> +}
> 
> What's the point of moving this out of the
> if (cb->strict_check) {} branch above? This looks like it would cause
> the same parse warnings that we're trying to get rid of in inet{4,6}
> dumps.

Link messages don't have the problem in general because they use
ifinfomsg as the header - which is the one abused for other message
types. That said ...

> Seems to make more sense to make the nlmsg_parse() itself conditional as
> well unless I'm lacking context.

... I now have nlmsg_parse and nlmsg_parse_strict.

Re: [PATCH net-next 08/20] rtnetlink: Update rtnl_dump_ifinfo for strict data checking

2018-10-07 Thread David Ahern

On 10/7/18 4:29 AM, Christian Brauner wrote:
>> I thought about that, but there is so much overlap - they are mostly
>> common. Besides, ifinfomsg is the header for link dumps, and ifinfomsg
>> is the one that has been (ab)used for other message types, so strict
>> versus lenient does not really have a differentiator for this message
>> type - other than checking the elements of the struct.
> 
> It's mostly about the function being extremely long and convoluted.
> Having parts moved out into (a) descriptive helper(s) with whatever name
> might make this way more readable than it is now especially with the new
> handling we need for strict checking.
> 

understood. In the next version I have pushed most of the checking into
helpers.

Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.

2018-10-07 Thread Andrew Lunn

On Sun, Oct 07, 2018 at 02:07:28PM -0700, Richard Cochran wrote:
> On Sun, Oct 07, 2018 at 01:59:06PM -0700, Richard Cochran wrote:
> > On Sun, Oct 07, 2018 at 09:54:00PM +0200, Andrew Lunn wrote:
> > > 1) phylink, not phdev. We have been pushing some MAC drivers towards
> > > phylink, especially those which support >1Gbp.
> > 
> > If a phylink device appears that wants time stamping, can't we add the
> > call to register_mii_timestamper()?
> 
> Actually, I see that 'struct phylink' has a 'struct phy_device *phydev',
> and so it can implement the 'struct mii_timestamper' interface directly.

Maybe. But you still don't have skb->dev->phydev. And phylink->phydev
is much more dynamic, since it can be hot-{un}plugged. You need to
handle it going away at any time.

However, your timestamper is unlikely to be hot-{un}pluggable. So
skb->dev->mii_timestamper seems a lot safer.

   Andrew

Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.

2018-10-07 Thread Richard Cochran

On Sun, Oct 07, 2018 at 11:14:05PM +0200, Andrew Lunn wrote:
> The problem is you depend on skbuf->dev->phydev. phydev will be NULL.
> net_device does not currently have a phylink member. Even if it did,
> you end up add more and more tests looking every place a
> mii_timestamper could be placed.

Ok, so the way to do this is to have something like
CONFIG_NETWORK_PHYLINK_TIMESTAMPING.  We can deal with that if and
when any real devices appear.

> I'm currently thinking register_mii_timestamper() should take a netdev
> argument, and the net_device structure should gain a struct
> mii_timestamper.
> 
> But we have to look at the lifetime problems. A phydev does not know
> what netdev it is associated to until phy_connect() is called. It is
> at that point you can call register_mii_timestamper().

Right, IOW passing a netdev won't work.

Thanks,
Richard

Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.

2018-10-07 Thread Andrew Lunn

On Sun, Oct 07, 2018 at 01:59:06PM -0700, Richard Cochran wrote:
> On Sun, Oct 07, 2018 at 09:54:00PM +0200, Andrew Lunn wrote:
> > Sure, but things have moved on since then.
> 
> If you have a specific suggestion on how to better implement this,
> please tell us what it is.
>   
> > I can think of three obvious use cases where this does not work:
> > 
> > 1) phylink, not phdev. We have been pushing some MAC drivers towards
> > phylink, especially those which support >1Gbp.
> 
> If a phylink device appears that wants time stamping, can't we add the
> call to register_mii_timestamper()?

Hi Richard

The problem is you depend on skbuf->dev->phydev. phydev will be NULL.
net_device does not currently have a phylink member. Even if it did,
you end up add more and more tests looking every place a
mii_timestamper could be placed.

> > 2) When an SFP is connected to the MAC, not a copper PHY. The class of
> > device you are adding a driver for will work just as well for an SFP
> > as for a copper PHY. The SERDES interface remains the same,
> > independent of if a copper PHY is used, or a SFP. But an SFP does not
> > have an instance of a phydv.
> 
> Well, as I said before in v1, CONFIG_NETWORK_PHY_TIMESTAMPING depends
> on phylib, plain and simple, and expanding beyond phylib is not within
> the scope of the this series.

True. But we also should be forward looking, to make sure we are not
heading into a dead end.

I'm currently thinking register_mii_timestamper() should take a netdev
argument, and the net_device structure should gain a struct
mii_timestamper.

But we have to look at the lifetime problems. A phydev does not know
what netdev it is associated to until phy_connect() is called. It is
at that point you can call register_mii_timestamper().

   Andrew

Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.

2018-10-07 Thread Richard Cochran

On Sun, Oct 07, 2018 at 01:59:06PM -0700, Richard Cochran wrote:
> On Sun, Oct 07, 2018 at 09:54:00PM +0200, Andrew Lunn wrote:
> > 1) phylink, not phdev. We have been pushing some MAC drivers towards
> > phylink, especially those which support >1Gbp.
> 
> If a phylink device appears that wants time stamping, can't we add the
> call to register_mii_timestamper()?

Actually, I see that 'struct phylink' has a 'struct phy_device *phydev',
and so it can implement the 'struct mii_timestamper' interface directly.

Thanks,
Richard

Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.

2018-10-07 Thread Richard Cochran

On Sun, Oct 07, 2018 at 09:54:00PM +0200, Andrew Lunn wrote:
> Sure, but things have moved on since then.

If you have a specific suggestion on how to better implement this,
please tell us what it is.

> I can think of three obvious use cases where this does not work:
> 
> 1) phylink, not phdev. We have been pushing some MAC drivers towards
> phylink, especially those which support >1Gbp.

If a phylink device appears that wants time stamping, can't we add the
call to register_mii_timestamper()?

> 2) When an SFP is connected to the MAC, not a copper PHY. The class of
> device you are adding a driver for will work just as well for an SFP
> as for a copper PHY. The SERDES interface remains the same,
> independent of if a copper PHY is used, or a SFP. But an SFP does not
> have an instance of a phydv.

Well, as I said before in v1, CONFIG_NETWORK_PHY_TIMESTAMPING depends
on phylib, plain and simple, and expanding beyond phylib is not within
the scope of the this series.

> 3) Firmware controlled PHYs. phylib/phylink is not used, the MAC turns
> all ethtool calls into RPCs to the firmware. I've no numbers about
> this, but i have the feeling this is becoming more popular. It does
> however tend to be high end devices, and those are more likely to have
> timestamping in the MAC. I suppose they could also offload
> tomestamping to the firmware, in which case, they might want to make
> use of this new API.

Any MAC with private PHY stuff (that doesn't use phylib) can implement
SO_TIMESTAMPING directly, as if it were a MAC.

Thanks,
Richard

Re: [PATCH V2 net-next 4/5] net: mdio: of: Register discovered MII time stampers.

2018-10-07 Thread Andrew Lunn

On Sun, Oct 07, 2018 at 12:26:27PM -0700, Richard Cochran wrote:
> On Sun, Oct 07, 2018 at 08:17:54PM +0200, Andrew Lunn wrote:
> > > > +   if (err == -ENOENT)
> > > > +   return NULL;
> > > > +   else if (err)
> > > > +   return ERR_PTR(err);
> > > > +
> > > > +   if (args.args_count >= 1)
> > > > +   port = args.args[0];
> > > 
> > > If it's greater than one, than it is an error, and it should be flagged
> > > as such.
> > > 
> > > The idea looks good though, should of_find_mii_timestamper() somehow be
> > > made conditional to CONFIG_PTP and we should have a stub for when it is
> > > disabled?
> > 
> > Hi Florian
> > 
> > There already is a stub. But register return -EOPNOTSUPP.
> 
> The stub returns NULL...

Ah, sorry, it is register_mii_tstamp_controller() which return
-EOPNOTSUP.

Andrew

Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.

2018-10-07 Thread Andrew Lunn

On Sun, Oct 07, 2018 at 12:15:51PM -0700, Richard Cochran wrote:
> On Sun, Oct 07, 2018 at 08:27:51PM +0200, Andrew Lunn wrote:
> > The mii_timestamper is generic, in the same why hwmon is generic. It
> > does not matter where the time stamper is. So i'm wondering if we
> > should remove the special case for a PHY timestamper, remove all the
> > phylib support, etc.
> 
> This implementation is (to the best of my understanding) what you were
> asking for in your review of v1:

Sure, but things have moved on since then.

> > So i really think you need to cleanly integrate into phylib and
> > phylink.
> 
> > Use a phandle, and have
> > of_mdiobus_register_phy() follow the phandle to get the device.
> 
> > To keep lifecycle issues simple, i would also keep it in phydev, not
> > netdev.
> 
> This present series is a reasonable, incremental improvement to the
> existing PHY time stamping support.  It will handle any use case that
> I can think of, and I would like to avoid over-engineering this.

I can think of three obvious use cases where this does not work:

1) phylink, not phdev. We have been pushing some MAC drivers towards
phylink, especially those which support >1Gbp.

2) When an SFP is connected to the MAC, not a copper PHY. The class of
device you are adding a driver for will work just as well for an SFP
as for a copper PHY. The SERDES interface remains the same,
independent of if a copper PHY is used, or a SFP. But an SFP does not
have an instance of a phydv.

2a) An SFP which is actually a Copper PHY. There is a phydev for this,
but it is associated to the phylink, not the netdev.

3) Firmware controlled PHYs. phylib/phylink is not used, the MAC turns
all ethtool calls into RPCs to the firmware. I've no numbers about
this, but i have the feeling this is becoming more popular. It does
however tend to be high end devices, and those are more likely to have
timestamping in the MAC. I suppose they could also offload
tomestamping to the firmware, in which case, they might want to make
use of this new API.

Andrew

Re: [PATCH V2 net-next 0/5] Peer to Peer One-Step time stamping

2018-10-07 Thread Richard Cochran

On Sun, Oct 07, 2018 at 10:38:18AM -0700, Richard Cochran wrote:
> Changed in v2:
> ~~
> - Per the v1 review, changed the modeling of MII time stamping
>   devices.  They are no longer a kind of mdio device.

Forgot to add:

 - Added method to callback into the driver after changes in link
   status.

Thanks,
Richard

Re: [PATCH V2 net-next 4/5] net: mdio: of: Register discovered MII time stampers.

2018-10-07 Thread Richard Cochran

On Sun, Oct 07, 2018 at 08:17:54PM +0200, Andrew Lunn wrote:
> > > + if (err == -ENOENT)
> > > + return NULL;
> > > + else if (err)
> > > + return ERR_PTR(err);
> > > +
> > > + if (args.args_count >= 1)
> > > + port = args.args[0];
> > 
> > If it's greater than one, than it is an error, and it should be flagged
> > as such.
> > 
> > The idea looks good though, should of_find_mii_timestamper() somehow be
> > made conditional to CONFIG_PTP and we should have a stub for when it is
> > disabled?
> 
> Hi Florian
> 
> There already is a stub. But register return -EOPNOTSUPP.

The stub returns NULL...

> > > + return register_mii_timestamper(args.np, port);
> 
> So this returns EOPNOTUP

NULL...
 
> > >  static int of_mdiobus_register_phy(struct mii_bus *mdio,
> > >   struct device_node *child, u32 addr)
> > >  {
> > > + struct mii_timestamper *mii_ts;
> > >   struct phy_device *phy;
> > >   bool is_c45;
> > >   int rc;
> > >   u32 phy_id;
> > >  
> > > + mii_ts = of_find_mii_timestamper(child);
> > > + if (IS_ERR(mii_ts))
> > > + return PTR_ERR(mii_ts);
> > > +
> 
> and this returns EOPNOPTSUPP, so the PHY is not registered :-(

and the phydev.mii_ts field is then set to NULL.

Thanks,
Richard

Re: [PATCH V2 net-next 4/5] net: mdio: of: Register discovered MII time stampers.

2018-10-07 Thread Richard Cochran

On Sun, Oct 07, 2018 at 11:14:38AM -0700, Florian Fainelli wrote:
> There appears to be a binding document missing to describe what a
> timerstamper provider is. Using a more specific name than
> "#phandle-cells" is preferred when dealing with specific devices, e.g:
> 
> interrupt-controller/#interrupt-cells
> clocks/#clock-cells

Sure.

> So I would go with #timestamp-cells here, and define what the cell sie
> and format should be in a separate "dt-bindings" prefixed patch that the
> Device Tree folks can also comment on.

I documented this in the last patch.  I didn't see any example in our
device tree that explains a "reference" like this that is not
connected to a specific node type.

> 
> > +   if (err == -ENOENT)
> > +   return NULL;
> > +   else if (err)
> > +   return ERR_PTR(err);
> > +
> > +   if (args.args_count >= 1)
> > +   port = args.args[0];
> 
> If it's greater than one, than it is an error, and it should be flagged
> as such.

I wanted to allow specific MII time stamping drivers to use one than
one value in the future, should the need arise.

> The idea looks good though, should of_find_mii_timestamper() somehow be
> made conditional to CONFIG_PTP and we should have a stub for when it is
> disabled?

Do you mean CONFIG_NETWORK_PHY_TIMESTAMPING ?
There is a stub for that.

Thanks,
Richard

Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.

2018-10-07 Thread Richard Cochran

On Sun, Oct 07, 2018 at 08:27:51PM +0200, Andrew Lunn wrote:
> The mii_timestamper is generic, in the same why hwmon is generic. It
> does not matter where the time stamper is. So i'm wondering if we
> should remove the special case for a PHY timestamper, remove all the
> phylib support, etc.

This implementation is (to the best of my understanding) what you were
asking for in your review of v1:

> So i really think you need to cleanly integrate into phylib and
> phylink.

> Use a phandle, and have
> of_mdiobus_register_phy() follow the phandle to get the device.

> To keep lifecycle issues simple, i would also keep it in phydev, not
> netdev.

This present series is a reasonable, incremental improvement to the
existing PHY time stamping support.  It will handle any use case that
I can think of, and I would like to avoid over-engineering this.

Thanks,
Richard

Re: [PATCH rdma-next 3/4] IB/mlx5: Verify that driver supports user flags

2018-10-07 Thread Jason Gunthorpe

On Sun, Oct 07, 2018 at 12:03:36PM +0300, Leon Romanovsky wrote:
> From: Yonatan Cohen 
> 
> Flags sent down from user might not be supported by
> running driver.
> This might lead to unwanted bugs.
> To solve this, added macro to test for unsupported flags.
> 
> Signed-off-by: Yonatan Cohen 
> Signed-off-by: Leon Romanovsky 
>  drivers/infiniband/hw/mlx5/qp.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
> index bae48bdf281c..17c4b6641933 100644
> +++ b/drivers/infiniband/hw/mlx5/qp.c
> @@ -1728,6 +1728,15 @@ static void configure_requester_scat_cqe(struct 
> mlx5_ib_dev *dev,
>   MLX5_SET(qpc, qpc, cs_req, MLX5_REQ_SCAT_DATA32_CQE);
>  }
>  
> +#define MLX5_QP_CREATE_FLAGS_NOT_SUPPORTED(flags) \
> +  ((flags) & ~(\

This needs a cast, it would be better to add something like the check
comp mask function in rdma-core than this goofy macro thing.

Jason

Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.

2018-10-07 Thread Florian Fainelli





On 10/7/2018 11:27 AM, Andrew Lunn wrote:

On Sun, Oct 07, 2018 at 10:38:20AM -0700, Richard Cochran wrote:

Currently the stack supports time stamping in PHY devices.  However,
there are newer, non-PHY devices that can snoop an MII bus and provide
time stamps.  In order to support such devices, this patch introduces
a new interface to be used by both PHY and non-PHY devices.

In addition, the one and only user of the old PHY time stamping API is
converted to the new interface.


Hi Richard

I'm a bit undecided about this. If you look at how we do HWMON sensors
in PHYs, the probe function just registers with the HWMON subsystem.
We don't have any support in phy_device, or anywhere else in the PHY
core.

The mii_timestamper is generic, in the same why hwmon is generic. It
does not matter where the time stamper is. So i'm wondering if we
should remove the special case for a PHY timestamper, remove all the
phylib support, etc.

I need to look at the other patches and see how this all fits
together.


Agreed, the fact that some PHYs capable of timestamping and register 
themselves as a timestamper makes sense, whether this needs to be backed 
into the core PHYLIB might have been something convenient at some point, 
but maybe we can revisit that paradigm now that there is more generic 
timestamping provider framework being proposed here.

--
Florian

Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.

2018-10-07 Thread Andrew Lunn

On Sun, Oct 07, 2018 at 10:38:20AM -0700, Richard Cochran wrote:
> Currently the stack supports time stamping in PHY devices.  However,
> there are newer, non-PHY devices that can snoop an MII bus and provide
> time stamps.  In order to support such devices, this patch introduces
> a new interface to be used by both PHY and non-PHY devices.
> 
> In addition, the one and only user of the old PHY time stamping API is
> converted to the new interface.

Hi Richard

I'm a bit undecided about this. If you look at how we do HWMON sensors
in PHYs, the probe function just registers with the HWMON subsystem.
We don't have any support in phy_device, or anywhere else in the PHY
core.

The mii_timestamper is generic, in the same why hwmon is generic. It
does not matter where the time stamper is. So i'm wondering if we
should remove the special case for a PHY timestamper, remove all the
phylib support, etc.

I need to look at the other patches and see how this all fits
together.

Andrew

Re: [PATCH V2 net-next 4/5] net: mdio: of: Register discovered MII time stampers.

2018-10-07 Thread Andrew Lunn

On Sun, Oct 07, 2018 at 10:38:22AM -0700, Richard Cochran wrote:
> When parsing a PHY node, register its time stamper, if any, and attach
> the instance to the PHY device.

Hi Richard

This does look a lot better.

Thanks for making the changes.

   Andrew

Re: [PATCH V2 net-next 4/5] net: mdio: of: Register discovered MII time stampers.

2018-10-07 Thread Andrew Lunn

> > +   if (err == -ENOENT)
> > +   return NULL;
> > +   else if (err)
> > +   return ERR_PTR(err);
> > +
> > +   if (args.args_count >= 1)
> > +   port = args.args[0];
> 
> If it's greater than one, than it is an error, and it should be flagged
> as such.
> 
> The idea looks good though, should of_find_mii_timestamper() somehow be
> made conditional to CONFIG_PTP and we should have a stub for when it is
> disabled?

Hi Florian

There already is a stub. But register return -EOPNOTSUPP.

 
> > +
> > +   return register_mii_timestamper(args.np, port);

So this returns EOPNOTUP

> > +}
> > +
> >  static int of_mdiobus_register_phy(struct mii_bus *mdio,
> > struct device_node *child, u32 addr)
> >  {
> > +   struct mii_timestamper *mii_ts;
> > struct phy_device *phy;
> > bool is_c45;
> > int rc;
> > u32 phy_id;
> >  
> > +   mii_ts = of_find_mii_timestamper(child);
> > +   if (IS_ERR(mii_ts))
> > +   return PTR_ERR(mii_ts);
> > +

and this returns EOPNOPTSUPP, so the PHY is not registered :-(

Andrew

Re: [PATCH V2 net-next 4/5] net: mdio: of: Register discovered MII time stampers.

2018-10-07 Thread Florian Fainelli

Re: [PATCH v2 2/2] netdev/phy: add MDIO bus multiplexer driven by a regmap

2018-10-07 Thread Florian Fainelli




On 10/07/18 11:24, Pankaj Bansal wrote:
> Add support for an MDIO bus multiplexer controlled by a regmap
> device, like an FPGA.
> 
> Tested on a NXP LX2160AQDS board which uses the "QIXIS" FPGA
> attached to the i2c bus.
> 
> Signed-off-by: Pankaj Bansal 
> ---
> 
> Notes:
> V2:
>  - replaced be32_to_cpup with of_property_read_u32
>  - incorporated Andrew's comments
> 
>  drivers/net/phy/Kconfig   |  13 +++
>  drivers/net/phy/Makefile  |   1 +
>  drivers/net/phy/mdio-mux-regmap.c | 171 
>  3 files changed, 185 insertions(+)
> 
> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> index 82070792edbb..d1ac9e70cbb2 100644
> --- a/drivers/net/phy/Kconfig
> +++ b/drivers/net/phy/Kconfig
> @@ -87,6 +87,19 @@ config MDIO_BUS_MUX_MMIOREG
>  
> Currently, only 8/16/32 bits registers are supported.
>  
> +config MDIO_BUS_MUX_REGMAP
> + tristate "REGMAP controlled MDIO bus multiplexers"
> + depends on OF_MDIO && REGMAP
> + select MDIO_BUS_MUX
> + help
> +   This module provides a driver for MDIO bus multiplexers that
> +   are controlled via a regmap device, like an FPGA connected to i2c.
> +   The multiplexer connects one of several child MDIO busses to a
> +   parent bus.Child bus selection is under the control of one of
> +   the FPGA's registers.
> +
> +   Currently, only upto 32 bits registers are supported.
> +
>  config MDIO_CAVIUM
>   tristate
>  
> diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
> index 5805c0b7d60e..33053f9f320d 100644
> --- a/drivers/net/phy/Makefile
> +++ b/drivers/net/phy/Makefile
> @@ -29,6 +29,7 @@ obj-$(CONFIG_MDIO_BUS_MUX)  += mdio-mux.o
>  obj-$(CONFIG_MDIO_BUS_MUX_BCM_IPROC) += mdio-mux-bcm-iproc.o
>  obj-$(CONFIG_MDIO_BUS_MUX_GPIO)  += mdio-mux-gpio.o
>  obj-$(CONFIG_MDIO_BUS_MUX_MMIOREG) += mdio-mux-mmioreg.o
> +obj-$(CONFIG_MDIO_BUS_MUX_REGMAP) += mdio-mux-regmap.o
>  obj-$(CONFIG_MDIO_CAVIUM)+= mdio-cavium.o
>  obj-$(CONFIG_MDIO_GPIO)  += mdio-gpio.o
>  obj-$(CONFIG_MDIO_HISI_FEMAC)+= mdio-hisi-femac.o
> diff --git a/drivers/net/phy/mdio-mux-regmap.c 
> b/drivers/net/phy/mdio-mux-regmap.c
> new file mode 100644
> index ..6068d05a728a
> --- /dev/null
> +++ b/drivers/net/phy/mdio-mux-regmap.c
> @@ -0,0 +1,171 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +
> +/* Simple regmap based MDIO MUX driver
> + *
> + * Copyright 2018 NXP
> + *
> + * Based on mdio-mux-mmioreg.c by Timur Tabi
> + *
> + * Author:
> + * Pankaj Bansal 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct mdio_mux_regmap_state {
> + void*mux_handle;
> + struct regmap   *regmap;
> + u32 mux_reg;
> + u32 mask;
> +};
> +
> +/* MDIO multiplexing switch function
> + *
> + * This function is called by the mdio-mux layer when it thinks the mdio bus
> + * multiplexer needs to switch.
> + *
> + * 'current_child' is the current value of the mux register (masked via
> + * s->mask).
> + *
> + * 'desired_child' is the value of the 'reg' property of the target child 
> MDIO
> + * node.
> + *
> + * The first time this function is called, current_child == -1.
> + *
> + * If current_child == desired_child, then the mux is already set to the
> + * correct bus.
> + */
> +static int mdio_mux_regmap_switch_fn(int current_child, int desired_child,
> +  void *data)
> +{
> + struct mdio_mux_regmap_state *s = data;
> + bool change;
> + int ret;
> +
> + ret = regmap_update_bits_check(s->regmap,
> +s->mux_reg,
> +s->mask,
> +desired_child,
> +);
> +
> + if (ret)
> + return ret;
> + if (change)
> + pr_debug("%s %d -> %d\n", __func__, current_child,
> +  desired_child);

If you add a struct platform_device or struct device reference to struct
mdio_mux_regmap_state, the you can use dev_dbg() here with the correct
device, which would be helpful if you are debugging problems, and there
are more than once instance of them in the system.

> + return ret;
> +}
> +
> +static int mdio_mux_regmap_probe(struct platform_device *pdev)
> +{
> + struct device_node *np2, *np = pdev->dev.of_node;

How about naming "np2", "child" instead?

Everything else looks fine to me, thanks!
-- 
Florian

Re: [PATCH v2 1/2] dt-bindings: net: add MDIO bus multiplexer driven by a regmap device

2018-10-07 Thread Florian Fainelli




On 10/07/18 11:24, Pankaj Bansal wrote:
> Add support for an MDIO bus multiplexer controlled by a regmap
> device, like an FPGA.
> 
> Tested on a NXP LX2160AQDS board which uses the "QIXIS" FPGA
> attached to the i2c bus.
> 
> Signed-off-by: Pankaj Bansal 
> ---
> 
> Notes:
> V2:
>  - Fixed formatting error caused by using space instead of tab
>  - Add newline between property list and subnode
>  - Add newline between two subnodes
> 
>  .../bindings/net/mdio-mux-regmap.txt | 95 ++
>  1 file changed, 95 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/net/mdio-mux-regmap.txt 
> b/Documentation/devicetree/bindings/net/mdio-mux-regmap.txt
> new file mode 100644
> index ..88ebac26c1c5
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/mdio-mux-regmap.txt
> @@ -0,0 +1,95 @@
> +Properties for an MDIO bus multiplexer controlled by a regmap
> +
> +This is a special case of a MDIO bus multiplexer.  A regmap device,
> +like an FPGA, is used to control which child bus is connected.  The mdio-mux
> +node must be a child of the device that is controlled by a regmap.
> +The driver currently only supports devices with upto 32-bit registers.

I would omit any sort of details about Linux constructs designed to
support specific needs (e.g: regmap) as well as putting driver
limitations into a binding document because it really ought to be
restricted to describing hardware.

> +
> +Required properties in addition to the generic multiplexer properties:
> +
> +- compatible : string, must contain "mdio-mux-regmap"
> +
> +- reg : integer, contains the offset of the register that controls the bus
> + multiplexer. it can be 32 bit number.

Technically it could be any "reg" property size, the only requirement is
that it must be of the appropriate size with respect to what the parent
node of this "mdio-mux-regmap" node has, determined by
#address-cells/#size-cells width.

> +
> +- mux-mask : integer, contains an 32 bit mask that specifies which
> + bits in the register control the actual bus multiplexer.  The
> + 'reg' property of each child mdio-mux node must be constrained by
> + this mask.

Same thing here.

Since this is a MDIO mux, I would invite you to specify what the correct
#address-cells/#size-cells values should be (1, and 0 respectively as
your example correctly shows).

> +
> +Example:
> +
> +The FPGA node defines a i2c connected FPGA with a register space of 0x30 
> bytes.
> +For the "EMI2" MDIO bus, register 0x54 (BRDCFG4) controls the mux on that 
> bus.
> +A bitmask of 0x07 means that bits 0, 1 and 2 (bit 0 is lsb) are the bits on
> +BRDCFG4 that control the actual mux.
> +
> +i2c@200 {
> + compatible = "fsl,vf610-i2c";
> + #address-cells = <1>;
> + #size-cells = <0>;
> + reg = <0x0 0x200 0x0 0x1>;
> + interrupts = <0 34 0x4>; // Level high type
> + clock-names = "i2c";
> + clocks = < 4 7>;
> + fsl-scl-gpio = < 15 0>;
> + status = "okay";
> +
> + /* The FPGA node */
> + fpga@66 {
> + compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c";
> + reg = <0x66>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + mdio1_mux@54 {
> + compatible = "mdio-mux-regmap", "mdio-mux";
> + mdio-parent-bus = <>; /* MDIO bus */
> + reg = <0x54>;/* BRDCFG4 */
> + mux-mask = <0x07>;  /* EMI2_MDIO */
> + #address-cells=<1>;
> + #size-cells = <0>;
> +
> + mdio1_ioslot1@0 {   // Slot 1
> + reg = <0x00>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + phy1@1 {
> + reg = <1>;
> + compatible = "ethernet-phy-id0210.7441";
> + };
> +
> + phy1@0 {
> + reg = <0>;
> + compatible = "ethernet-phy-id0210.7441";
> + };
> + };
> +
> + mdio1_ioslot2@1 {   // Slot 2
> + reg = <0x01>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + };
> +
> + mdio1_ioslot3@2 {   // Slot 3
> + reg = <0x02>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + };
> + };
> + };
> +};
> +
> + /* The parent MDIO bus. */
> + emdio2: mdio@0x8B97000 {
> + compatible = "fsl,fman-memac-mdio";
> + reg = <0x0 0x8B97000 0x0 0x1000>;
> +

Re: [PATCH iproute2 net-next v3 0/6] Introduce the taprio scheduler

2018-10-07 Thread David Ahern

On 10/5/18 5:25 PM, Vinicius Costa Gomes wrote:
> Hi,
> 

...

> This is the iproute2 side of the taprio v1 series.
> 
> Please see the kernel side cover letter for more information about how
> to test this.
> 
> Cheers,
> --
> Vinicius
> 
> Jesus Sanchez-Palencia (1):
>   libnetlink: Add helper for getting a __s32 from netlink msgs
> 
> Vinicius Costa Gomes (5):
>   utils: Implement get_s64()
>   include: Add helper to retrieve a __s64 from a netlink msg
>   include: add definitions for taprio [DO NOT COMMIT]
>   tc: Add support for configuring the taprio scheduler
>   taprio: Add manpage for tc-taprio(8)
> 

applied to iproute2-next. Thanks

[PATCH V2 net-next 0/5] Peer to Peer One-Step time stamping

2018-10-07 Thread Richard Cochran

Changed in v2:
~~
- Per the v1 review, changed the modeling of MII time stamping
  devices.  They are no longer a kind of mdio device.

This series adds support for PTP (IEEE 1588) P2P one-step time
stamping along with a driver for a hardware device that supports this.

If the hardware supports p2p one-step, it subtracts the ingress time
stamp value from the Pdelay_Request correction field.  The user space
software stack then simply copies the correction field into the
Pdelay_Response, and on transmission the hardware adds the egress time
stamp into the correction field.

- Patch 1 adds the new option.
- Patches 2-4 adds support for MII time stamping in non-PHY devices.
- Patch 5 adds a driver implementing the new option.

User space support is available in the current linuxptp master branch.

Thanks,
Richard

Richard Cochran (5):
  net: Introduce peer to peer one step PTP time stamping.
  net: Introduce a new MII time stamping interface.
  net: Add a layer for non-PHY MII time stamping drivers.
  net: mdio: of: Register discovered MII time stampers.
  ptp: Add a driver for InES time stamping IP core.

 Documentation/devicetree/bindings/ptp/ptp-ines.txt |  37 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c   |   1 +
 drivers/net/phy/Makefile   |   2 +
 drivers/net/phy/dp83640.c  |  47 +-
 drivers/net/phy/mii_timestamper.c  | 121 +++
 drivers/net/phy/phy.c  |   4 +-
 drivers/net/phy/phy_device.c   |   5 +
 drivers/of/of_mdio.c   |  26 +
 drivers/ptp/Kconfig|  10 +
 drivers/ptp/Makefile   |   1 +
 drivers/ptp/ptp_ines.c | 870 +
 include/linux/mii_timestamper.h| 115 +++
 include/linux/phy.h|  25 +-
 include/uapi/linux/net_tstamp.h|   8 +
 net/8021q/vlan_dev.c   |   4 +-
 net/Kconfig|   7 +-
 net/core/dev_ioctl.c   |   1 +
 net/core/ethtool.c |   4 +-
 net/core/timestamping.c|  20 +-
 19 files changed, 1251 insertions(+), 57 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/ptp/ptp-ines.txt
 create mode 100644 drivers/net/phy/mii_timestamper.c
 create mode 100644 drivers/ptp/ptp_ines.c
 create mode 100644 include/linux/mii_timestamper.h

-- 
2.11.0

[PATCH V2 net-next 1/5] net: Introduce peer to peer one step PTP time stamping.

2018-10-07 Thread Richard Cochran

The 1588 standard defines one step operation for both Sync and
PDelay_Resp messages.  Up until now, hardware with P2P one step has
been rare, and kernel support was lacking.  This patch adds support of
the mode in anticipation of new hardware developments.

Signed-off-by: Richard Cochran 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 +
 include/uapi/linux/net_tstamp.h  | 8 
 net/core/dev_ioctl.c | 1 +
 3 files changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 40093d88353f..2cdbc16245c2 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -15369,6 +15369,7 @@ int bnx2x_configure_ptp_filters(struct bnx2x *bp)
   NIG_REG_P0_TLLH_PTP_RULE_MASK, 0x3EEE);
break;
case HWTSTAMP_TX_ONESTEP_SYNC:
+   case HWTSTAMP_TX_ONESTEP_P2P:
BNX2X_ERR("One-step timestamping is not supported\n");
return -ERANGE;
}
diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index 97ff3c17ec4d..091441a4f78f 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -90,6 +90,14 @@ enum hwtstamp_tx_types {
 * queue.
 */
HWTSTAMP_TX_ONESTEP_SYNC,
+
+   /*
+* Same as HWTSTAMP_TX_ONESTEP_SYNC, but also enables time
+* stamp insertion directly into PDelay_Resp packets. In this
+* case, neither transmitted Sync nor PDelay_Resp packets will
+* receive a time stamp via the socket error queue.
+*/
+   HWTSTAMP_TX_ONESTEP_P2P,
 };
 
 /* possible values for hwtstamp_config->rx_filter */
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index 90e8aa36881e..8cdc13695909 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -187,6 +187,7 @@ static int net_hwtstamp_validate(struct ifreq *ifr)
case HWTSTAMP_TX_OFF:
case HWTSTAMP_TX_ON:
case HWTSTAMP_TX_ONESTEP_SYNC:
+   case HWTSTAMP_TX_ONESTEP_P2P:
tx_type_valid = 1;
break;
}
-- 
2.11.0

[PATCH V2 net-next 5/5] ptp: Add a driver for InES time stamping IP core.

2018-10-07 Thread Richard Cochran

The InES at the ZHAW offers a PTP time stamping IP core.  The FPGA
logic recognizes and time stamps PTP frames on the MII bus.  This
patch adds a driver for the core along with a device tree binding to
allow hooking the driver to MII buses.

Signed-off-by: Richard Cochran 
---
 Documentation/devicetree/bindings/ptp/ptp-ines.txt |  37 +
 drivers/ptp/Kconfig|  10 +
 drivers/ptp/Makefile   |   1 +
 drivers/ptp/ptp_ines.c | 870 +
 4 files changed, 918 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/ptp/ptp-ines.txt
 create mode 100644 drivers/ptp/ptp_ines.c

diff --git a/Documentation/devicetree/bindings/ptp/ptp-ines.txt 
b/Documentation/devicetree/bindings/ptp/ptp-ines.txt
new file mode 100644
index ..1484b62802c7
--- /dev/null
+++ b/Documentation/devicetree/bindings/ptp/ptp-ines.txt
@@ -0,0 +1,37 @@
+ZHAW InES PTP time stamping IP core
+
+The IP core needs two different kinds of nodes.  The control node
+lives somewhere in the memory map and specifies the address of the
+control registers.  There can be up to three port handles placed as
+attributes of PHY nodes.  These associate a particular MII bus with a
+port index within the IP core.
+
+Required properties of the control node:
+
+- compatible:  "ines,ptp-ctrl"
+- reg: physical address and size of the register bank
+- #phandle-cells:  must be one (1)
+
+Required format of the port handle within the PHY node:
+
+- timestamper: provides control node reference and
+   the port channel within the IP core
+
+Example:
+
+   tstamper: timestamper@6000 {
+   compatible = "ines,ptp-ctrl";
+   reg = <0x6000 0x80>;
+   #phandle-cells = <1>;
+   };
+
+   ethernet@8000 {
+   ...
+   mdio {
+   ...
+   phy@3 {
+   ...
+   timestamper = < 0>;
+   };
+   };
+   };
diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index d137c480db46..475aa6f32edd 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -88,6 +88,16 @@ config DP83640_PHY
  In order for this to work, your MAC driver must also
  implement the skb_tx_timestamp() function.
 
+config PTP_1588_CLOCK_INES
+   tristate "ZHAW InES PTP time stamping IP core"
+   depends on NETWORK_PHY_TIMESTAMPING
+   depends on PHYLIB
+   depends on PTP_1588_CLOCK
+   help
+ This driver adds support for using the ZHAW InES 1588 IP
+ core.  This clock is only useful if the MII bus of your MAC
+ is wired up to the core.
+
 config PTP_1588_CLOCK_PCH
tristate "Intel PCH EG20T as PTP clock"
depends on X86_32 || COMPILE_TEST
diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile
index 19efa9cfa950..15b656712897 100644
--- a/drivers/ptp/Makefile
+++ b/drivers/ptp/Makefile
@@ -6,6 +6,7 @@
 ptp-y  := ptp_clock.o ptp_chardev.o ptp_sysfs.o
 obj-$(CONFIG_PTP_1588_CLOCK)   += ptp.o
 obj-$(CONFIG_PTP_1588_CLOCK_DTE)   += ptp_dte.o
+obj-$(CONFIG_PTP_1588_CLOCK_INES)  += ptp_ines.o
 obj-$(CONFIG_PTP_1588_CLOCK_IXP46X)+= ptp_ixp46x.o
 obj-$(CONFIG_PTP_1588_CLOCK_PCH)   += ptp_pch.o
 obj-$(CONFIG_PTP_1588_CLOCK_KVM)   += ptp_kvm.o
diff --git a/drivers/ptp/ptp_ines.c b/drivers/ptp/ptp_ines.c
new file mode 100644
index ..a05b478aad38
--- /dev/null
+++ b/drivers/ptp/ptp_ines.c
@@ -0,0 +1,870 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Copyright (C) 2018 MOSER-BAER AG
+//
+
+#define pr_fmt(fmt) "InES_PTP: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+MODULE_DESCRIPTION("Driver for the ZHAW InES PTP time stamping IP core");
+MODULE_AUTHOR("Richard Cochran ");
+MODULE_VERSION("1.0");
+MODULE_LICENSE("GPL");
+
+/* GLOBAL register */
+#define MCAST_MAC_SELECT_SHIFT 2
+#define MCAST_MAC_SELECT_MASK  0x3
+#define IO_RESET   BIT(1)
+#define PTP_RESET  BIT(0)
+
+/* VERSION register */
+#define IF_MAJOR_VER_SHIFT 12
+#define IF_MAJOR_VER_MASK  0xf
+#define IF_MINOR_VER_SHIFT 8
+#define IF_MINOR_VER_MASK  0xf
+#define FPGA_MAJOR_VER_SHIFT   4
+#define FPGA_MAJOR_VER_MASK0xf
+#define FPGA_MINOR_VER_SHIFT   0
+#define FPGA_MINOR_VER_MASK0xf
+
+/* INT_STAT register */
+#define RX_INTR_STATUS_3   BIT(5)
+#define RX_INTR_STATUS_2   BIT(4)
+#define RX_INTR_STATUS_1   BIT(3)
+#define TX_INTR_STATUS_3   BIT(2)
+#define TX_INTR_STATUS_2   BIT(1)
+#define TX_INTR_STATUS_1   BIT(0)
+
+/* INT_MSK register */
+#define RX_INTR_MASK_3 BIT(5)
+#define RX_INTR_MASK_2 BIT(4)
+#define RX_INTR_MASK_1

[PATCH V2 net-next 4/5] net: mdio: of: Register discovered MII time stampers.

2018-10-07 Thread Richard Cochran

When parsing a PHY node, register its time stamper, if any, and attach
the instance to the PHY device.

Signed-off-by: Richard Cochran 
---
 drivers/net/phy/phy_device.c |  3 +++
 drivers/of/of_mdio.c | 26 ++
 2 files changed, 29 insertions(+)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index a454432d166f..c24bce9b7270 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -833,6 +833,9 @@ EXPORT_SYMBOL(phy_device_register);
  */
 void phy_device_remove(struct phy_device *phydev)
 {
+   if (phydev->mii_ts)
+   unregister_mii_timestamper(phydev->mii_ts);
+
device_del(>mdio.dev);
 
/* Assert the reset signal */
diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
index f76c10ecc616..7699f167e4a9 100644
--- a/drivers/of/of_mdio.c
+++ b/drivers/of/of_mdio.c
@@ -44,14 +44,38 @@ static int of_get_phy_id(struct device_node *device, u32 
*phy_id)
return -EINVAL;
 }
 
+struct mii_timestamper *of_find_mii_timestamper(struct device_node *node)
+{
+   struct of_phandle_args args;
+   unsigned int port = 0;
+   int err;
+
+   err = of_parse_phandle_with_args(node, "timestamper",
+"#phandle-cells", 0, );
+   if (err == -ENOENT)
+   return NULL;
+   else if (err)
+   return ERR_PTR(err);
+
+   if (args.args_count >= 1)
+   port = args.args[0];
+
+   return register_mii_timestamper(args.np, port);
+}
+
 static int of_mdiobus_register_phy(struct mii_bus *mdio,
struct device_node *child, u32 addr)
 {
+   struct mii_timestamper *mii_ts;
struct phy_device *phy;
bool is_c45;
int rc;
u32 phy_id;
 
+   mii_ts = of_find_mii_timestamper(child);
+   if (IS_ERR(mii_ts))
+   return PTR_ERR(mii_ts);
+
is_c45 = of_device_is_compatible(child,
 "ethernet-phy-ieee802.3-c45");
 
@@ -97,6 +121,8 @@ static int of_mdiobus_register_phy(struct mii_bus *mdio,
return rc;
}
 
+   phy->mii_ts = mii_ts;
+
dev_dbg(>dev, "registered phy %s at address %i\n",
child->name, addr);
return 0;
-- 
2.11.0

[PATCH V2 net-next 3/5] net: Add a layer for non-PHY MII time stamping drivers.

2018-10-07 Thread Richard Cochran

While PHY time stamping drivers can simply attach their interface
directly to the PHY instance, stand alone drivers require support in
order to manage their services.  Non-PHY MII time stamping drivers
have a control interface over another bus like I2C, SPI, UART, or via
a memory mapped peripheral.  The controller device will be associated
with one or more time stamping channels, each of which sits snoops in
on a MII bus.

This patch provides a glue layer that will enable time stamping
channels to find their controlling device.

Signed-off-by: Richard Cochran 
---
 drivers/net/phy/Makefile  |   2 +
 drivers/net/phy/mii_timestamper.c | 121 ++
 include/linux/mii_timestamper.h   |  63 
 net/Kconfig   |   7 ++-
 4 files changed, 190 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/phy/mii_timestamper.c

diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 5805c0b7d60e..584c7c6f40e7 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -40,6 +40,8 @@ obj-$(CONFIG_MDIO_SUN4I)  += mdio-sun4i.o
 obj-$(CONFIG_MDIO_THUNDER) += mdio-thunder.o
 obj-$(CONFIG_MDIO_XGENE)   += mdio-xgene.o
 
+obj-$(CONFIG_NETWORK_PHY_TIMESTAMPING) += mii_timestamper.o
+
 obj-$(CONFIG_SFP)  += sfp.o
 sfp-obj-$(CONFIG_SFP)  += sfp-bus.o
 obj-y  += $(sfp-obj-y) $(sfp-obj-m)
diff --git a/drivers/net/phy/mii_timestamper.c 
b/drivers/net/phy/mii_timestamper.c
new file mode 100644
index ..51b77fc92475
--- /dev/null
+++ b/drivers/net/phy/mii_timestamper.c
@@ -0,0 +1,121 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Support for generic time stamping devices on MII buses.
+// Copyright (C) 2018 Richard Cochran 
+//
+
+#include 
+
+static LIST_HEAD(mii_timestamping_devices);
+static DEFINE_MUTEX(tstamping_devices_lock);
+
+struct mii_timestamping_desc {
+   struct list_head list;
+   struct mii_timestamping_ctrl *ctrl;
+   struct device *device;
+};
+
+/**
+ * register_mii_tstamp_controller() - registers an MII time stamping device.
+ *
+ * @device:The device to be registered.
+ * @ctrl:  Pointer to device's control interface.
+ *
+ * Returns zero on success or non-zero on failure.
+ */
+int register_mii_tstamp_controller(struct device *device,
+  struct mii_timestamping_ctrl *ctrl)
+{
+   struct mii_timestamping_desc *desc;
+
+   desc = kzalloc(sizeof(*desc), GFP_KERNEL);
+   if (!desc)
+   return -ENOMEM;
+
+   INIT_LIST_HEAD(>list);
+   desc->ctrl = ctrl;
+   desc->device = device;
+
+   mutex_lock(_devices_lock);
+   list_add_tail(_timestamping_devices, >list);
+   mutex_unlock(_devices_lock);
+
+   return 0;
+}
+
+/**
+ * unregister_mii_tstamp_controller() - unregisters an MII time stamping 
device.
+ *
+ * @device:A device previously passed to register_mii_tstamp_controller().
+ */
+void unregister_mii_tstamp_controller(struct device *device)
+{
+   struct mii_timestamping_desc *desc;
+   struct list_head *this, *next;
+
+   mutex_lock(_devices_lock);
+   list_for_each_safe(this, next, _timestamping_devices) {
+   desc = list_entry(this, struct mii_timestamping_desc, list);
+   if (desc->device == device) {
+   list_del_init(>list);
+   kfree(desc);
+   break;
+   }
+   }
+   mutex_unlock(_devices_lock);
+}
+
+/**
+ * register_mii_timestamper - Enables a given port of an MII time stamper.
+ *
+ * @node:  The device tree node of the MII time stamp controller.
+ * @port:  The index of the port to be enabled.
+ *
+ * Returns a valid interface on success or ERR_PTR otherwise.
+ */
+struct mii_timestamper *register_mii_timestamper(struct device_node *node,
+unsigned int port)
+{
+   struct mii_timestamper *mii_ts = NULL;
+   struct mii_timestamping_desc *desc;
+   struct list_head *this;
+
+   mutex_lock(_devices_lock);
+   list_for_each(this, _timestamping_devices) {
+   desc = list_entry(this, struct mii_timestamping_desc, list);
+   if (desc->device->of_node == node) {
+   mii_ts = desc->ctrl->probe_channel(desc->device, port);
+   if (mii_ts) {
+   mii_ts->device = desc->device;
+   get_device(desc->device);
+   }
+   break;
+   }
+   }
+   mutex_unlock(_devices_lock);
+
+   return mii_ts ? mii_ts : ERR_PTR(-EPROBE_DEFER);
+}
+
+/**
+ * unregister_mii_timestamper - Disables a given MII time stamper.
+ *
+ * @mii_ts:An interface obtained via register_mii_timestamper().
+ *
+ */
+void unregister_mii_timestamper(struct mii_timestamper *mii_ts)
+{
+   struct

[PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.

2018-10-07 Thread Richard Cochran

Currently the stack supports time stamping in PHY devices.  However,
there are newer, non-PHY devices that can snoop an MII bus and provide
time stamps.  In order to support such devices, this patch introduces
a new interface to be used by both PHY and non-PHY devices.

In addition, the one and only user of the old PHY time stamping API is
converted to the new interface.

Signed-off-by: Richard Cochran 
---
 drivers/net/phy/dp83640.c   | 47 +
 drivers/net/phy/phy.c   |  4 ++--
 drivers/net/phy/phy_device.c|  2 ++
 include/linux/mii_timestamper.h | 52 +
 include/linux/phy.h | 25 ++--
 net/8021q/vlan_dev.c|  4 ++--
 net/core/ethtool.c  |  4 ++--
 net/core/timestamping.c | 20 
 8 files changed, 104 insertions(+), 54 deletions(-)
 create mode 100644 include/linux/mii_timestamper.h

diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
index edd4d44a386d..2f895c9bbedb 100644
--- a/drivers/net/phy/dp83640.c
+++ b/drivers/net/phy/dp83640.c
@@ -111,6 +111,7 @@ struct dp83640_private {
struct list_head list;
struct dp83640_clock *clock;
struct phy_device *phydev;
+   struct mii_timestamper mii_ts;
struct delayed_work ts_work;
int hwts_tx_en;
int hwts_rx_en;
@@ -214,6 +215,14 @@ static void dp83640_gpio_defaults(struct ptp_pin_desc *pd)
 static LIST_HEAD(phyter_clocks);
 static DEFINE_MUTEX(phyter_clocks_lock);
 
+static int dp83640_hwtstamp(struct mii_timestamper *mii_ts,
+   struct ifreq *ifr);
+static int dp83640_ts_info(struct mii_timestamper *mii_ts,
+  struct ethtool_ts_info *info);
+static bool dp83640_rxtstamp(struct mii_timestamper *mii_ts,
+struct sk_buff *skb, int type);
+static void dp83640_txtstamp(struct mii_timestamper *mii_ts,
+struct sk_buff *skb, int type);
 static void rx_timestamp_work(struct work_struct *work);
 
 /* extended register access functions */
@@ -1141,13 +1150,18 @@ static int dp83640_probe(struct phy_device *phydev)
goto no_memory;
 
dp83640->phydev = phydev;
-   INIT_DELAYED_WORK(>ts_work, rx_timestamp_work);
+   dp83640->mii_ts.rxtstamp = dp83640_rxtstamp;
+   dp83640->mii_ts.txtstamp = dp83640_txtstamp;
+   dp83640->mii_ts.hwtstamp = dp83640_hwtstamp;
+   dp83640->mii_ts.ts_info  = dp83640_ts_info;
 
+   INIT_DELAYED_WORK(>ts_work, rx_timestamp_work);
INIT_LIST_HEAD(>rxts);
INIT_LIST_HEAD(>rxpool);
for (i = 0; i < MAX_RXTS; i++)
list_add(>rx_pool_data[i].list, >rxpool);
 
+   phydev->mii_ts = >mii_ts;
phydev->priv = dp83640;
 
spin_lock_init(>rx_lock);
@@ -1188,6 +1202,8 @@ static void dp83640_remove(struct phy_device *phydev)
if (phydev->mdio.addr == BROADCAST_ADDR)
return;
 
+   phydev->mii_ts = NULL;
+
enable_status_frames(phydev, false);
cancel_delayed_work_sync(>ts_work);
 
@@ -1311,9 +1327,10 @@ static int dp83640_config_intr(struct phy_device *phydev)
}
 }
 
-static int dp83640_hwtstamp(struct phy_device *phydev, struct ifreq *ifr)
+static int dp83640_hwtstamp(struct mii_timestamper *mii_ts, struct ifreq *ifr)
 {
-   struct dp83640_private *dp83640 = phydev->priv;
+   struct dp83640_private *dp83640 =
+   container_of(mii_ts, struct dp83640_private, mii_ts);
struct hwtstamp_config cfg;
u16 txcfg0, rxcfg0;
 
@@ -1389,8 +1406,8 @@ static int dp83640_hwtstamp(struct phy_device *phydev, 
struct ifreq *ifr)
 
mutex_lock(>clock->extreg_lock);
 
-   ext_write(0, phydev, PAGE5, PTP_TXCFG0, txcfg0);
-   ext_write(0, phydev, PAGE5, PTP_RXCFG0, rxcfg0);
+   ext_write(0, dp83640->phydev, PAGE5, PTP_TXCFG0, txcfg0);
+   ext_write(0, dp83640->phydev, PAGE5, PTP_RXCFG0, rxcfg0);
 
mutex_unlock(>clock->extreg_lock);
 
@@ -1420,10 +1437,11 @@ static void rx_timestamp_work(struct work_struct *work)
schedule_delayed_work(>ts_work, SKB_TIMESTAMP_TIMEOUT);
 }
 
-static bool dp83640_rxtstamp(struct phy_device *phydev,
+static bool dp83640_rxtstamp(struct mii_timestamper *mii_ts,
 struct sk_buff *skb, int type)
 {
-   struct dp83640_private *dp83640 = phydev->priv;
+   struct dp83640_private *dp83640 =
+   container_of(mii_ts, struct dp83640_private, mii_ts);
struct dp83640_skb_info *skb_info = (struct dp83640_skb_info *)skb->cb;
struct list_head *this, *next;
struct rxts *rxts;
@@ -1469,10 +1487,11 @@ static bool dp83640_rxtstamp(struct phy_device *phydev,
return true;
 }
 
-static void dp83640_txtstamp(struct phy_device *phydev,
+static void dp83640_txtstamp(struct mii_timestamper *mii_ts,
 struct sk_buff *skb, int

Re: [PATCH iproute2-next v2] tc: flower: expose hardware offload count

2018-10-07 Thread David Ahern

On 10/3/18 2:44 PM, Vlad Buslov wrote:
> Recently flower classifier was updated to expose count of devices that
> filter is offloaded to. Add support to print this counter as 'in_hw_count'.
> 
> Signed-off-by: Vlad Buslov 
> Acked-by: Jiri Pirko 
> ---
> Changes from V1 to V2:
> - Change print format string to "%u"
> 
>  tc/f_flower.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 

applied to iproute2-next. Thanks

[PATCH net-next 08/11] net: sched: cls_u32: the tp_c argument of u32_set_parms() is always tp->data

2018-10-07 Thread Jamal Hadi Salim

From: Al Viro 

It must be tc_u_common associated with that tp (i.e. tp->data).
Proof:
* both ->ht_up and ->tp_c are assign-once
* ->tp_c of anything inserted into tp_c->hlist is tp_c
* hnodes never get reinserted into the lists or moved
between those, so anything found by u32_lookup_ht(tp->data, ...)
will have ->tp_c equal to tp->data.
* tp->root->tp_c == tp->data.
* ->ht_up of anything inserted into hnode->ht[...] is
equal to hnode.
* knodes never get reinserted into hash chains or moved
between those, so anything returned by u32_lookup_key(ht, ...)
will have ->ht_up equal to ht.
* any knode returned by u32_get(tp, ...) will have ->ht_up->tp_c
point to tp->data

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 53f34f8cde8b..3ed2c9866b36 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -956,8 +956,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
if (!new)
return -ENOMEM;
 
-   err = u32_set_parms(net, tp, base,
-   rtnl_dereference(n->ht_up)->tp_c, new, tb,
+   err = u32_set_parms(net, tp, base, tp_c, new, tb,
tca[TCA_RATE], ovr, extack);
 
if (err) {
@@ -1124,7 +1123,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
}
 #endif
 
-   err = u32_set_parms(net, tp, base, ht->tp_c, n, tb, tca[TCA_RATE], ovr,
+   err = u32_set_parms(net, tp, base, tp_c, n, tb, tca[TCA_RATE], ovr,
extack);
if (err == 0) {
struct tc_u_knode __rcu **ins;
-- 
2.11.0

[PATCH net-next 00/11] net: sched: cls_u32 Various improvements

2018-10-07 Thread Jamal Hadi Salim

From: Jamal Hadi Salim 

Various improvements from Al.

Al Viro (11):
  net: sched: cls_u32: disallow linking to root hnode
  net: sched: cls_u32: make sure that divisor is a power of 2
  net: sched: cls_u32: get rid of unused argument of u32_destroy_key()
  net: sched: cls_u32: get rid of tc_u_knode ->tp
  net: sched: cls_u32: get rid of tc_u_common ->rcu
  net: sched: cls_u32: clean tc_u_common hashtable
  net: sched: cls_u32: pass tc_u_common to u32_set_parms() instead of
tc_u_hnode
  net: sched: cls_u32: the tp_c argument of u32_set_parms() is always
tp->data
  net: sched: cls_u32: keep track of knodes count in tc_u_common
  net: sched: cls_u32: simplify the hell out u32_delete() emptiness
check
  net: sched: cls_u32: get rid of tp_c

 net/sched/cls_u32.c | 117 
 1 file changed, 35 insertions(+), 82 deletions(-)

-- 
2.11.0

[PATCH net-next 05/11] net: sched: cls_u32: get rid of tc_u_common ->rcu

2018-10-07 Thread Jamal Hadi Salim

From: Al Viro 

unused

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 810c49ac1bbe..c378168f4562 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -98,7 +98,6 @@ struct tc_u_common {
int refcnt;
struct idr  handle_idr;
struct hlist_node   hnode;
-   struct rcu_head rcu;
 };
 
 static inline unsigned int u32_hash_fold(__be32 key,
-- 
2.11.0

[PATCH net-next 04/11] net: sched: cls_u32: get rid of tc_u_knode ->tp

2018-10-07 Thread Jamal Hadi Salim

From: Al Viro 

not used anymore

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index ef0f2e6ec422..810c49ac1bbe 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -68,7 +68,6 @@ struct tc_u_knode {
u32 mask;
u32 __percpu*pcpu_success;
 #endif
-   struct tcf_proto*tp;
struct rcu_work rwork;
/* The 'sel' field MUST be the last field in structure to allow for
 * tc_u32_keys allocated at end of structure.
@@ -896,7 +895,6 @@ static struct tc_u_knode *u32_init_knode(struct tcf_proto 
*tp,
/* Similarly success statistics must be moved as pointers */
new->pcpu_success = n->pcpu_success;
 #endif
-   new->tp = tp;
memcpy(>sel, s, sizeof(*s) + s->nkeys*sizeof(struct tc_u32_key));
 
if (tcf_exts_init(>exts, TCA_U32_ACT, TCA_U32_POLICE)) {
@@ -1112,7 +1110,6 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
n->handle = handle;
n->fshift = s->hmask ? ffs(ntohl(s->hmask)) - 1 : 0;
n->flags = flags;
-   n->tp = tp;
 
err = tcf_exts_init(>exts, TCA_U32_ACT, TCA_U32_POLICE);
if (err < 0)
-- 
2.11.0

[PATCH net-next 09/11] net: sched: cls_u32: get rid of tp_c

2018-10-07 Thread Jamal Hadi Salim

From: Al Viro 

Both hnode ->tp_c and tp_c argument of u32_set_parms()
the latter is redundant, the former - never read...

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 3ed2c9866b36..3d4c360f9b0c 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -79,7 +79,6 @@ struct tc_u_hnode {
struct tc_u_hnode __rcu *next;
u32 handle;
u32 prio;
-   struct tc_u_common  *tp_c;
int refcnt;
unsigned intdivisor;
struct idr  handle_idr;
@@ -390,7 +389,6 @@ static int u32_init(struct tcf_proto *tp)
tp_c->refcnt++;
RCU_INIT_POINTER(root_ht->next, tp_c->hlist);
rcu_assign_pointer(tp_c->hlist, root_ht);
-   root_ht->tp_c = tp_c;
 
rcu_assign_pointer(tp->root, root_ht);
tp->data = tp_c;
@@ -761,7 +759,7 @@ static const struct nla_policy u32_policy[TCA_U32_MAX + 1] 
= {
 };
 
 static int u32_set_parms(struct net *net, struct tcf_proto *tp,
-unsigned long base, struct tc_u_common *tp_c,
+unsigned long base,
 struct tc_u_knode *n, struct nlattr **tb,
 struct nlattr *est, bool ovr,
 struct netlink_ext_ack *extack)
@@ -782,7 +780,7 @@ static int u32_set_parms(struct net *net, struct tcf_proto 
*tp,
}
 
if (handle) {
-   ht_down = u32_lookup_ht(tp_c, handle);
+   ht_down = u32_lookup_ht(tp->data, handle);
 
if (!ht_down) {
NL_SET_ERR_MSG_MOD(extack, "Link hash table not 
found");
@@ -956,7 +954,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
if (!new)
return -ENOMEM;
 
-   err = u32_set_parms(net, tp, base, tp_c, new, tb,
+   err = u32_set_parms(net, tp, base, new, tb,
tca[TCA_RATE], ovr, extack);
 
if (err) {
@@ -1012,7 +1010,6 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
return err;
}
}
-   ht->tp_c = tp_c;
ht->refcnt = 1;
ht->divisor = divisor;
ht->handle = handle;
@@ -1123,7 +1120,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
}
 #endif
 
-   err = u32_set_parms(net, tp, base, tp_c, n, tb, tca[TCA_RATE], ovr,
+   err = u32_set_parms(net, tp, base, n, tb, tca[TCA_RATE], ovr,
extack);
if (err == 0) {
struct tc_u_knode __rcu **ins;
-- 
2.11.0

[PATCH net-next 01/11] net: sched: cls_u32: disallow linking to root hnode

2018-10-07 Thread Jamal Hadi Salim

From: Al Viro 

Operation makes no sense.  Nothing will actually break if we do so
(depth limit in u32_classify() will prevent infinite loops), but
according to maintainers it's best prohibited outright.

NOTE: doing so guarantees that u32_destroy() will trigger the call
of u32_destroy_hnode(); we might want to make that unconditional.

Test:
tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent : protocol ip prio 100 u32 \
link 800: offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
should fail with
Error: cls_u32: Not linking to root node

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 622f4657da94..3357331a80a2 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -797,6 +797,10 @@ static int u32_set_parms(struct net *net, struct tcf_proto 
*tp,
NL_SET_ERR_MSG_MOD(extack, "Link hash table not 
found");
return -EINVAL;
}
+   if (ht_down->is_root) {
+   NL_SET_ERR_MSG_MOD(extack, "Not linking to root 
node");
+   return -EINVAL;
+   }
ht_down->refcnt++;
}
 
-- 
2.11.0

[PATCH net-next 06/11] net: sched: cls_u32: clean tc_u_common hashtable

2018-10-07 Thread Jamal Hadi Salim

From: Al Viro 

* calculate key *once*, not for each hash chain element
* let tc_u_hash() return the pointer to chain head rather than index -
callers are cleaner that way.

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 24 +---
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index c378168f4562..3f6fba831c57 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -343,19 +343,16 @@ static void *tc_u_common_ptr(const struct tcf_proto *tp)
return block->q;
 }
 
-static unsigned int tc_u_hash(const struct tcf_proto *tp)
+static struct hlist_head *tc_u_hash(void *key)
 {
-   return hash_ptr(tc_u_common_ptr(tp), U32_HASH_SHIFT);
+   return tc_u_common_hash + hash_ptr(key, U32_HASH_SHIFT);
 }
 
-static struct tc_u_common *tc_u_common_find(const struct tcf_proto *tp)
+static struct tc_u_common *tc_u_common_find(void *key)
 {
struct tc_u_common *tc;
-   unsigned int h;
-
-   h = tc_u_hash(tp);
-   hlist_for_each_entry(tc, _u_common_hash[h], hnode) {
-   if (tc->ptr == tc_u_common_ptr(tp))
+   hlist_for_each_entry(tc, tc_u_hash(key), hnode) {
+   if (tc->ptr == key)
return tc;
}
return NULL;
@@ -364,10 +361,8 @@ static struct tc_u_common *tc_u_common_find(const struct 
tcf_proto *tp)
 static int u32_init(struct tcf_proto *tp)
 {
struct tc_u_hnode *root_ht;
-   struct tc_u_common *tp_c;
-   unsigned int h;
-
-   tp_c = tc_u_common_find(tp);
+   void *key = tc_u_common_ptr(tp);
+   struct tc_u_common *tp_c = tc_u_common_find(key);
 
root_ht = kzalloc(sizeof(*root_ht), GFP_KERNEL);
if (root_ht == NULL)
@@ -385,12 +380,11 @@ static int u32_init(struct tcf_proto *tp)
kfree(root_ht);
return -ENOBUFS;
}
-   tp_c->ptr = tc_u_common_ptr(tp);
+   tp_c->ptr = key;
INIT_HLIST_NODE(_c->hnode);
idr_init(_c->handle_idr);
 
-   h = tc_u_hash(tp);
-   hlist_add_head(_c->hnode, _u_common_hash[h]);
+   hlist_add_head(_c->hnode, tc_u_hash(key));
}
 
tp_c->refcnt++;
-- 
2.11.0

[PATCH net-next 07/11] net: sched: cls_u32: pass tc_u_common to u32_set_parms() instead of tc_u_hnode

2018-10-07 Thread Jamal Hadi Salim

From: Al Viro 

the only thing we used ht for was ht->tp_c and callers can get that
without going through ->tp_c at all; start with lifting that into
the callers, next commits will massage those, eventually removing
->tp_c altogether.

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 3f6fba831c57..53f34f8cde8b 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -761,7 +761,7 @@ static const struct nla_policy u32_policy[TCA_U32_MAX + 1] 
= {
 };
 
 static int u32_set_parms(struct net *net, struct tcf_proto *tp,
-unsigned long base, struct tc_u_hnode *ht,
+unsigned long base, struct tc_u_common *tp_c,
 struct tc_u_knode *n, struct nlattr **tb,
 struct nlattr *est, bool ovr,
 struct netlink_ext_ack *extack)
@@ -782,7 +782,7 @@ static int u32_set_parms(struct net *net, struct tcf_proto 
*tp,
}
 
if (handle) {
-   ht_down = u32_lookup_ht(ht->tp_c, handle);
+   ht_down = u32_lookup_ht(tp_c, handle);
 
if (!ht_down) {
NL_SET_ERR_MSG_MOD(extack, "Link hash table not 
found");
@@ -957,7 +957,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
return -ENOMEM;
 
err = u32_set_parms(net, tp, base,
-   rtnl_dereference(n->ht_up), new, tb,
+   rtnl_dereference(n->ht_up)->tp_c, new, tb,
tca[TCA_RATE], ovr, extack);
 
if (err) {
@@ -1124,7 +1124,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
}
 #endif
 
-   err = u32_set_parms(net, tp, base, ht, n, tb, tca[TCA_RATE], ovr,
+   err = u32_set_parms(net, tp, base, ht->tp_c, n, tb, tca[TCA_RATE], ovr,
extack);
if (err == 0) {
struct tc_u_knode __rcu **ins;
-- 
2.11.0

[PATCH net-next 10/11] net: sched: cls_u32: keep track of knodes count in tc_u_common

2018-10-07 Thread Jamal Hadi Salim

From: Al Viro 

allows to simplify u32_delete() considerably

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 3d4c360f9b0c..61593bee08db 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -97,6 +97,7 @@ struct tc_u_common {
int refcnt;
struct idr  handle_idr;
struct hlist_node   hnode;
+   longknodes;
 };
 
 static inline unsigned int u32_hash_fold(__be32 key,
@@ -452,6 +453,7 @@ static void u32_delete_key_freepf_work(struct work_struct 
*work)
 
 static int u32_delete_key(struct tcf_proto *tp, struct tc_u_knode *key)
 {
+   struct tc_u_common *tp_c = tp->data;
struct tc_u_knode __rcu **kp;
struct tc_u_knode *pkp;
struct tc_u_hnode *ht = rtnl_dereference(key->ht_up);
@@ -462,6 +464,7 @@ static int u32_delete_key(struct tcf_proto *tp, struct 
tc_u_knode *key)
 kp = >next, pkp = rtnl_dereference(*kp)) {
if (pkp == key) {
RCU_INIT_POINTER(*kp, key->next);
+   tp_c->knodes--;
 
tcf_unbind_filter(tp, >res);
idr_remove(>handle_idr, key->handle);
@@ -576,6 +579,7 @@ static int u32_replace_hw_knode(struct tcf_proto *tp, 
struct tc_u_knode *n,
 static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht,
struct netlink_ext_ack *extack)
 {
+   struct tc_u_common *tp_c = tp->data;
struct tc_u_knode *n;
unsigned int h;
 
@@ -583,6 +587,7 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct 
tc_u_hnode *ht,
while ((n = rtnl_dereference(ht->ht[h])) != NULL) {
RCU_INIT_POINTER(ht->ht[h],
 rtnl_dereference(n->next));
+   tp_c->knodes--;
tcf_unbind_filter(tp, >res);
u32_remove_hw_knode(tp, n, extack);
idr_remove(>handle_idr, n->handle);
@@ -1141,6 +1146,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
 
RCU_INIT_POINTER(n->next, pins);
rcu_assign_pointer(*ins, n);
+   tp_c->knodes++;
*arg = n;
return 0;
}
-- 
2.11.0

[PATCH net-next 02/11] net: sched: cls_u32: make sure that divisor is a power of 2

2018-10-07 Thread Jamal Hadi Salim

From: Al Viro 

Tested by modifying iproute2 to to allow
sending a divisor > 255 

Tested-by: Jamal Hadi Salim 
Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 3357331a80a2..ce55eea448a0 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -994,7 +994,11 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
if (tb[TCA_U32_DIVISOR]) {
unsigned int divisor = nla_get_u32(tb[TCA_U32_DIVISOR]);
 
-   if (--divisor > 0x100) {
+   if (!is_power_of_2(divisor)) {
+   NL_SET_ERR_MSG_MOD(extack, "Divisor is not a power of 
2");
+   return -EINVAL;
+   }
+   if (divisor-- > 0x100) {
NL_SET_ERR_MSG_MOD(extack, "Exceeded maximum 256 hash 
buckets");
return -EINVAL;
}
-- 
2.11.0

[PATCH net-next 11/11] net: sched: cls_u32: simplify the hell out u32_delete() emptiness check

2018-10-07 Thread Jamal Hadi Salim

From: Al Viro 

Now that we have the knode count, we can instantly check if
any hnodes are non-empty.  And that kills the check for extra
references to root hnode - those could happen only if there was
a knode to carry such a link.

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 48 +---
 1 file changed, 1 insertion(+), 47 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 61593bee08db..ac79a40a0392 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -627,17 +627,6 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct 
tc_u_hnode *ht,
return -ENOENT;
 }
 
-static bool ht_empty(struct tc_u_hnode *ht)
-{
-   unsigned int h;
-
-   for (h = 0; h <= ht->divisor; h++)
-   if (rcu_access_pointer(ht->ht[h]))
-   return false;
-
-   return true;
-}
-
 static void u32_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
 {
struct tc_u_common *tp_c = tp->data;
@@ -675,13 +664,9 @@ static int u32_delete(struct tcf_proto *tp, void *arg, 
bool *last,
  struct netlink_ext_ack *extack)
 {
struct tc_u_hnode *ht = arg;
-   struct tc_u_hnode *root_ht = rtnl_dereference(tp->root);
struct tc_u_common *tp_c = tp->data;
int ret = 0;
 
-   if (ht == NULL)
-   goto out;
-
if (TC_U32_KEY(ht->handle)) {
u32_remove_hw_knode(tp, (struct tc_u_knode *)ht, extack);
ret = u32_delete_key(tp, (struct tc_u_knode *)ht);
@@ -702,38 +687,7 @@ static int u32_delete(struct tcf_proto *tp, void *arg, 
bool *last,
}
 
 out:
-   *last = true;
-   if (root_ht) {
-   if (root_ht->refcnt > 1) {
-   *last = false;
-   goto ret;
-   }
-   if (root_ht->refcnt == 1) {
-   if (!ht_empty(root_ht)) {
-   *last = false;
-   goto ret;
-   }
-   }
-   }
-
-   if (tp_c->refcnt > 1) {
-   *last = false;
-   goto ret;
-   }
-
-   if (tp_c->refcnt == 1) {
-   struct tc_u_hnode *ht;
-
-   for (ht = rtnl_dereference(tp_c->hlist);
-ht;
-ht = rtnl_dereference(ht->next))
-   if (!ht_empty(ht)) {
-   *last = false;
-   break;
-   }
-   }
-
-ret:
+   *last = tp_c->refcnt == 1 && tp_c->knodes == 0;
return ret;
 }
 
-- 
2.11.0

[PATCH net-next 03/11] net: sched: cls_u32: get rid of unused argument of u32_destroy_key()

2018-10-07 Thread Jamal Hadi Salim

From: Al Viro 

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index ce55eea448a0..ef0f2e6ec422 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -405,8 +405,7 @@ static int u32_init(struct tcf_proto *tp)
return 0;
 }
 
-static int u32_destroy_key(struct tcf_proto *tp, struct tc_u_knode *n,
-  bool free_pf)
+static int u32_destroy_key(struct tc_u_knode *n, bool free_pf)
 {
struct tc_u_hnode *ht = rtnl_dereference(n->ht_down);
 
@@ -440,7 +439,7 @@ static void u32_delete_key_work(struct work_struct *work)
  struct tc_u_knode,
  rwork);
rtnl_lock();
-   u32_destroy_key(key->tp, key, false);
+   u32_destroy_key(key, false);
rtnl_unlock();
 }
 
@@ -457,7 +456,7 @@ static void u32_delete_key_freepf_work(struct work_struct 
*work)
  struct tc_u_knode,
  rwork);
rtnl_lock();
-   u32_destroy_key(key->tp, key, true);
+   u32_destroy_key(key, true);
rtnl_unlock();
 }
 
@@ -600,7 +599,7 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct 
tc_u_hnode *ht,
if (tcf_exts_get_net(>exts))
tcf_queue_work(>rwork, 
u32_delete_key_freepf_work);
else
-   u32_destroy_key(n->tp, n, true);
+   u32_destroy_key(n, true);
}
}
 }
@@ -971,13 +970,13 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
tca[TCA_RATE], ovr, extack);
 
if (err) {
-   u32_destroy_key(tp, new, false);
+   u32_destroy_key(new, false);
return err;
}
 
err = u32_replace_hw_knode(tp, new, flags, extack);
if (err) {
-   u32_destroy_key(tp, new, false);
+   u32_destroy_key(new, false);
return err;
}
 
-- 
2.11.0

Re: [PATCH bpf-next] bpf: emit audit messages upon successful prog load and unload

2018-10-07 Thread Jesper Dangaard Brouer

On Sat, 6 Oct 2018 00:05:22 +0200
Jiri Olsa  wrote:

> On Fri, Oct 05, 2018 at 11:44:35AM -0700, Alexei Starovoitov wrote:
> > On Fri, Oct 05, 2018 at 08:14:09AM +0200, Jiri Olsa wrote:  
> > > On Thu, Oct 04, 2018 at 03:10:15PM -0700, Alexei Starovoitov wrote:  
> > > > On Thu, Oct 04, 2018 at 10:22:31PM +0200, Jesper Dangaard Brouer wrote: 
> > > >  
> > > > > On Thu, 4 Oct 2018 21:41:17 +0200 Daniel Borkmann 
> > > > >  wrote:
> > > > >   
> > > > > > On 10/04/2018 08:39 PM, Jesper Dangaard Brouer wrote:  
> > > > > > > On Thu, 4 Oct 2018 10:11:43 -0700 Alexei Starovoitov 
> > > > > > >  wrote:
> > > > > > >> On Thu, Oct 04, 2018 at 03:50:38PM +0200, Daniel Borkmann wrote: 
> > > > > > >>
> > > > > [...]  
> > > > > > >>
> > > > > > >> If the purpose of the patch is to give user space visibility into
> > > > > > >> bpf prog load/unload as a notification, then I completely agree 
> > > > > > >> that
> > > > > > >> some notification mechanism is necessary.
> > > > > > 
> > > > > > Yeah, I did only regard it as only that, nothing more. Some means
> > > > > > of timeline and notification that can be kept in a record in user
> > > > > > space and later retrieved e.g. for introspection on what has been
> > > > > > loaded.
> > > > > >   
> > > > > > >> I've started working on such mechanism via perf ring buffer 
> > > > > > >> which is
> > > > > > >> the fastest mechanism we have in the kernel so far.
> > > > > > >> See long discussion here: 
> > > > > > >> https://patchwork.ozlabs.org/patch/971970/
> > > 
[...]
> > > > > > 
> > > > > > That one is definitely needed in any case to resolve the kallsyms
> > > > > > limitations, and it does have overlap in that in either case we
> > > > > > want to look at past BPF programs that have been unloaded in the
> > > > > > meantime, so I don't have a strong preference either way, and the
> > > > > > former is needed in any case. Though thought was that audit might
> > > > > > be an option for those not running profiling daemons 24/7, but
> > > > > > presumably bpftool could be extended to record these events as
> > > > > > well if we don't want to reuse audit infra.  
> > > > > 
> > > > > Yes, exactly, I don't want to run a profiling daemon 24/7 to record
> > > > > these events.  I do acknowledge that this perf event is relevant,
> > > > > especially for catching the kernel symbols (I need that myself), but 
> > > > > it
> > > > > does not cover my use-case.
> > > > > 
> > > > > My use-case is to 24/7 collect and keep records in userspace, and 
> > > > > have a
> > > > > timeline of these notifications, for later retrieval.  The idea is 
> > > > > that
> > > > > our support engineers can look at these records when troubleshooting
> > > > > the system.  And the plan is also to collect these records as part of
> > > > > our sosreport tool, which is part of the support case.  
> > > > 
> > > > I don't think you're implying that prog load/unload should be spamming 
> > > > dmesg
> > > > and auditd not even running...  
> > > 
> > > I think the problem Jesper implied is that in order to collect
> > > those logs you'll need perf tool running all the time.. which
> > > it's not equipped for yet  
> > 
> > I'm not proposing to run 'perf' binary all the time.
> > Setting up perf ring buffer just for these new bpf prog load/unload events
> > and epolling it is simple enough to do from any application including 
> > auditd.
> > selftests/bpf/ do it for bpf output events.  
> 
> ok, did not think about the possibility to teach auditd talk to perf,
> time to get that tool evsel/evlist/rb library ready ;-)

Interesting, I also didn't consider teaching auditd to gets its 'bpf'
events from a separate perf ring-buffer, that might work.  I do wonder
how the audit people will take this suggestion.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

[PATCH v8 15/15] MAINTAINERS: Add entry for Marvell OcteonTX2 Admin Function driver

2018-10-07 Thread sunil . kovvuri

From: Sunil Goutham 

Added maintainers entry for Marvell OcteonTX2 SOC's RVU
admin function driver.

Signed-off-by: Sunil Goutham 
---
 MAINTAINERS | 9 +
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index bb5f431..bc76b03 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8844,6 +8844,15 @@ S:   Supported
 F: drivers/mmc/host/sdhci-xenon*
 F: Documentation/devicetree/bindings/mmc/marvell,xenon-sdhci.txt
 
+MARVELL OCTEONTX2 RVU ADMIN FUNCTION DRIVER
+M: Sunil Goutham 
+M: Linu Cherian 
+M: Geetha sowjanya 
+M: Jerin Jacob 
+L: netdev@vger.kernel.org
+S: Supported
+F: drivers/net/ethernet/marvell/octeontx2/af/
+
 MATROX FRAMEBUFFER DRIVER
 L: linux-fb...@vger.kernel.org
 S: Orphan
-- 
2.7.4

[PATCH v8 13/15] octeontx2-af: Add support for CGX link management

2018-10-07 Thread sunil . kovvuri

From: Linu Cherian 

CGX LMAC initialization, link status polling etc is done
by low level secure firmware. For link management this patch
adds a interface or communication mechanism between firmware
and this kernel CGX driver.

- Firmware interface specification is defined in cgx_fw_if.h.
- Support to send/receive commands/events to/form firmware.
- events/commands implemented
  * link up
  * link down
  * reading firmware version

Signed-off-by: Linu Cherian 
Signed-off-by: Nithya Mani 
Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/marvell/octeontx2/af/cgx.c| 357 -
 drivers/net/ethernet/marvell/octeontx2/af/cgx.h|  32 ++
 .../net/ethernet/marvell/octeontx2/af/cgx_fw_if.h  | 186 +++
 .../net/ethernet/marvell/octeontx2/af/rvu_cgx.c|  97 ++
 4 files changed, 668 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/cgx_fw_if.h
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c 
b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c
index 6ecae80..f290b1d 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c
@@ -24,16 +24,43 @@
 #define DRV_NAME   "octeontx2-cgx"
 #define DRV_STRING  "Marvell OcteonTX2 CGX/MAC Driver"
 
+/**
+ * struct lmac
+ * @wq_cmd_cmplt:  waitq to keep the process blocked until cmd completion
+ * @cmd_lock:  Lock to serialize the command interface
+ * @resp:  command response
+ * @event_cb:  callback for linkchange events
+ * @cmd_pend:  flag set before new command is started
+ * flag cleared after command response is received
+ * @cgx:   parent cgx port
+ * @lmac_id:   lmac port id
+ * @name:  lmac port name
+ */
+struct lmac {
+   wait_queue_head_t wq_cmd_cmplt;
+   struct mutex cmd_lock;
+   u64 resp;
+   struct cgx_event_cb event_cb;
+   bool cmd_pend;
+   struct cgx *cgx;
+   u8 lmac_id;
+   char *name;
+};
+
 struct cgx {
void __iomem*reg_base;
struct pci_dev  *pdev;
u8  cgx_id;
u8  lmac_count;
+   struct lmac *lmac_idmap[MAX_LMAC_PER_CGX];
struct list_headcgx_list;
 };
 
 static LIST_HEAD(cgx_list);
 
+/* CGX PHY management internal APIs */
+static int cgx_fwi_link_change(struct cgx *cgx, int lmac_id, bool en);
+
 /* Supported devices */
 static const struct pci_device_id cgx_id_table[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_OCTEONTX2_CGX) },
@@ -42,11 +69,24 @@ static const struct pci_device_id cgx_id_table[] = {
 
 MODULE_DEVICE_TABLE(pci, cgx_id_table);
 
+static void cgx_write(struct cgx *cgx, u64 lmac, u64 offset, u64 val)
+{
+   writeq(val, cgx->reg_base + (lmac << 18) + offset);
+}
+
 static u64 cgx_read(struct cgx *cgx, u64 lmac, u64 offset)
 {
return readq(cgx->reg_base + (lmac << 18) + offset);
 }
 
+static inline struct lmac *lmac_pdata(u8 lmac_id, struct cgx *cgx)
+{
+   if (!cgx || lmac_id >= MAX_LMAC_PER_CGX)
+   return NULL;
+
+   return cgx->lmac_idmap[lmac_id];
+}
+
 int cgx_get_cgx_cnt(void)
 {
struct cgx *cgx_dev;
@@ -82,18 +122,312 @@ void *cgx_get_pdata(int cgx_id)
 }
 EXPORT_SYMBOL(cgx_get_pdata);
 
-static void cgx_lmac_init(struct cgx *cgx)
+/* CGX Firmware interface low level support */
+static int cgx_fwi_cmd_send(u64 req, u64 *resp, struct lmac *lmac)
+{
+   struct cgx *cgx = lmac->cgx;
+   struct device *dev;
+   int err = 0;
+   u64 cmd;
+
+   /* Ensure no other command is in progress */
+   err = mutex_lock_interruptible(>cmd_lock);
+   if (err)
+   return err;
+
+   /* Ensure command register is free */
+   cmd = cgx_read(cgx, lmac->lmac_id,  CGX_COMMAND_REG);
+   if (FIELD_GET(CMDREG_OWN, cmd) != CGX_CMD_OWN_NS) {
+   err = -EBUSY;
+   goto unlock;
+   }
+
+   /* Update ownership in command request */
+   req = FIELD_SET(CMDREG_OWN, CGX_CMD_OWN_FIRMWARE, req);
+
+   /* Mark this lmac as pending, before we start */
+   lmac->cmd_pend = true;
+
+   /* Start command in hardware */
+   cgx_write(cgx, lmac->lmac_id, CGX_COMMAND_REG, req);
+
+   /* Ensure command is completed without errors */
+   if (!wait_event_timeout(lmac->wq_cmd_cmplt, !lmac->cmd_pend,
+   msecs_to_jiffies(CGX_CMD_TIMEOUT))) {
+   dev = >pdev->dev;
+   dev_err(dev, "cgx port %d:%d cmd timeout\n",
+   cgx->cgx_id, lmac->lmac_id);
+   err = -EIO;
+   goto unlock;
+   }
+
+   /* we have a valid command response */
+   smp_rmb(); /* Ensure the latest updates are visible */
+   *resp = lmac->resp;
+
+unlock:
+

[PATCH v8 11/15] octeontx2-af: Add Marvell OcteonTX2 CGX driver

2018-10-07 Thread sunil . kovvuri

From: Sunil Goutham 

This patch adds basic template for Marvell OcteonTX2's
CGX ethernet interface driver. Just the probe.
RVU AF driver will use APIs exported by this driver
for various things like PF to physical interface mapping,
loopback mode, interface stats etc. Hence marged both
drivers into a single module.

Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/marvell/octeontx2/af/Makefile |  2 +-
 drivers/net/ethernet/marvell/octeontx2/af/cgx.c| 97 ++
 drivers/net/ethernet/marvell/octeontx2/af/cgx.h| 22 +
 drivers/net/ethernet/marvell/octeontx2/af/rvu.c| 14 +++-
 4 files changed, 133 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/cgx.c
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/cgx.h

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile 
b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
index ac17cb9..8646421 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/Makefile
+++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
@@ -7,4 +7,4 @@ obj-$(CONFIG_OCTEONTX2_MBOX) += octeontx2_mbox.o
 obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o
 
 octeontx2_mbox-y := mbox.o
-octeontx2_af-y := rvu.o
+octeontx2_af-y := cgx.o rvu.o
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c 
b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c
new file mode 100644
index 000..c41d23f
--- /dev/null
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Marvell OcteonTx2 CGX driver
+ *
+ * Copyright (C) 2018 Marvell International Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "cgx.h"
+
+#define DRV_NAME   "octeontx2-cgx"
+#define DRV_STRING  "Marvell OcteonTX2 CGX/MAC Driver"
+
+struct cgx {
+   void __iomem*reg_base;
+   struct pci_dev  *pdev;
+   u8  cgx_id;
+};
+
+/* Supported devices */
+static const struct pci_device_id cgx_id_table[] = {
+   { PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_OCTEONTX2_CGX) },
+   { 0, }  /* end of table */
+};
+
+MODULE_DEVICE_TABLE(pci, cgx_id_table);
+
+static int cgx_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+   struct device *dev = >dev;
+   struct cgx *cgx;
+   int err;
+
+   cgx = devm_kzalloc(dev, sizeof(*cgx), GFP_KERNEL);
+   if (!cgx)
+   return -ENOMEM;
+   cgx->pdev = pdev;
+
+   pci_set_drvdata(pdev, cgx);
+
+   err = pci_enable_device(pdev);
+   if (err) {
+   dev_err(dev, "Failed to enable PCI device\n");
+   pci_set_drvdata(pdev, NULL);
+   return err;
+   }
+
+   err = pci_request_regions(pdev, DRV_NAME);
+   if (err) {
+   dev_err(dev, "PCI request regions failed 0x%x\n", err);
+   goto err_disable_device;
+   }
+
+   /* MAP configuration registers */
+   cgx->reg_base = pcim_iomap(pdev, PCI_CFG_REG_BAR_NUM, 0);
+   if (!cgx->reg_base) {
+   dev_err(dev, "CGX: Cannot map CSR memory space, aborting\n");
+   err = -ENOMEM;
+   goto err_release_regions;
+   }
+
+   return 0;
+
+err_release_regions:
+   pci_release_regions(pdev);
+err_disable_device:
+   pci_disable_device(pdev);
+   pci_set_drvdata(pdev, NULL);
+   return err;
+}
+
+static void cgx_remove(struct pci_dev *pdev)
+{
+   pci_release_regions(pdev);
+   pci_disable_device(pdev);
+   pci_set_drvdata(pdev, NULL);
+}
+
+struct pci_driver cgx_driver = {
+   .name = DRV_NAME,
+   .id_table = cgx_id_table,
+   .probe = cgx_probe,
+   .remove = cgx_remove,
+};
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cgx.h 
b/drivers/net/ethernet/marvell/octeontx2/af/cgx.h
new file mode 100644
index 000..a7d4b39
--- /dev/null
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cgx.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0
+ * Marvell OcteonTx2 CGX driver
+ *
+ * Copyright (C) 2018 Marvell International Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef CGX_H
+#define CGX_H
+
+ /* PCI device IDs */
+#definePCI_DEVID_OCTEONTX2_CGX 0xA059
+
+/* PCI BAR nos */
+#define PCI_CFG_REG_BAR_NUM0
+
+extern struct pci_driver cgx_driver;
+
+#endif /* CGX_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index e0c3c18..4927f6b 100644
---

[PATCH v8 07/15] octeontx2-af: Scan blocks for LFs provisioned to PF/VF

2018-10-07 Thread sunil . kovvuri

From: Sunil Goutham 

Scan all RVU blocks to find any 'LF to RVU PF/VF' mapping done by
low level firmware. If found any, mark them as used in respective
block's LF bitmap and also save mapped PF/VF's PF_FUNC info.

This is done to avoid reattaching a block LF to a different RVU PF/VF.

Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/marvell/octeontx2/af/rvu.c| 148 -
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h|  16 +++
 .../net/ethernet/marvell/octeontx2/af/rvu_struct.h |  16 +++
 3 files changed, 178 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index d75ce45..53e02b0 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -22,6 +22,8 @@
 #define DRV_STRING  "Marvell OcteonTX2 RVU Admin Function Driver"
 #define DRV_VERSION"1.0"
 
+static int rvu_get_hwvf(struct rvu *rvu, int pcifunc);
+
 /* Supported devices */
 static const struct pci_device_id rvu_id_table[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_OCTEONTX2_RVU_AF) },
@@ -66,6 +68,91 @@ int rvu_alloc_bitmap(struct rsrc_bmap *rsrc)
return 0;
 }
 
+static void rvu_update_rsrc_map(struct rvu *rvu, struct rvu_pfvf *pfvf,
+   struct rvu_block *block, u16 pcifunc,
+   u16 lf, bool attach)
+{
+   int devnum, num_lfs = 0;
+   bool is_pf;
+   u64 reg;
+
+   if (lf >= block->lf.max) {
+   dev_err(>pdev->dev,
+   "%s: FATAL: LF %d is >= %s's max lfs i.e %d\n",
+   __func__, lf, block->name, block->lf.max);
+   return;
+   }
+
+   /* Check if this is for a RVU PF or VF */
+   if (pcifunc & RVU_PFVF_FUNC_MASK) {
+   is_pf = false;
+   devnum = rvu_get_hwvf(rvu, pcifunc);
+   } else {
+   is_pf = true;
+   devnum = rvu_get_pf(pcifunc);
+   }
+
+   block->fn_map[lf] = attach ? pcifunc : 0;
+
+   switch (block->type) {
+   case BLKTYPE_NPA:
+   pfvf->npalf = attach ? true : false;
+   num_lfs = pfvf->npalf;
+   break;
+   case BLKTYPE_NIX:
+   pfvf->nixlf = attach ? true : false;
+   num_lfs = pfvf->nixlf;
+   break;
+   case BLKTYPE_SSO:
+   attach ? pfvf->sso++ : pfvf->sso--;
+   num_lfs = pfvf->sso;
+   break;
+   case BLKTYPE_SSOW:
+   attach ? pfvf->ssow++ : pfvf->ssow--;
+   num_lfs = pfvf->ssow;
+   break;
+   case BLKTYPE_TIM:
+   attach ? pfvf->timlfs++ : pfvf->timlfs--;
+   num_lfs = pfvf->timlfs;
+   break;
+   case BLKTYPE_CPT:
+   attach ? pfvf->cptlfs++ : pfvf->cptlfs--;
+   num_lfs = pfvf->cptlfs;
+   break;
+   }
+
+   reg = is_pf ? block->pf_lfcnt_reg : block->vf_lfcnt_reg;
+   rvu_write64(rvu, BLKADDR_RVUM, reg | (devnum << 16), num_lfs);
+}
+
+inline int rvu_get_pf(u16 pcifunc)
+{
+   return (pcifunc >> RVU_PFVF_PF_SHIFT) & RVU_PFVF_PF_MASK;
+}
+
+static int rvu_get_hwvf(struct rvu *rvu, int pcifunc)
+{
+   int pf, func;
+   u64 cfg;
+
+   pf = rvu_get_pf(pcifunc);
+   func = pcifunc & RVU_PFVF_FUNC_MASK;
+
+   /* Get first HWVF attached to this PF */
+   cfg = rvu_read64(rvu, BLKADDR_RVUM, RVU_PRIV_PFX_CFG(pf));
+
+   return ((cfg & 0xFFF) + func - 1);
+}
+
+struct rvu_pfvf *rvu_get_pfvf(struct rvu *rvu, int pcifunc)
+{
+   /* Check if it is a PF or VF */
+   if (pcifunc & RVU_PFVF_FUNC_MASK)
+   return >hwvf[rvu_get_hwvf(rvu, pcifunc)];
+   else
+   return >pf[rvu_get_pf(pcifunc)];
+}
+
 static void rvu_check_block_implemented(struct rvu *rvu)
 {
struct rvu_hwinfo *hw = rvu->hw;
@@ -107,6 +194,28 @@ static void rvu_reset_all_blocks(struct rvu *rvu)
rvu_block_reset(rvu, BLKADDR_NDC2, NDC_AF_BLK_RST);
 }
 
+static void rvu_scan_block(struct rvu *rvu, struct rvu_block *block)
+{
+   struct rvu_pfvf *pfvf;
+   u64 cfg;
+   int lf;
+
+   for (lf = 0; lf < block->lf.max; lf++) {
+   cfg = rvu_read64(rvu, block->addr,
+block->lfcfg_reg | (lf << block->lfshift));
+   if (!(cfg & BIT_ULL(63)))
+   continue;
+
+   /* Set this resource as being used */
+   __set_bit(lf, block->lf.bmap);
+
+   /* Get, to whom this LF is attached */
+   pfvf = rvu_get_pfvf(rvu, (cfg >> 8) & 0x);
+   rvu_update_rsrc_map(rvu, pfvf, block,
+   (cfg >> 8) & 0x, lf, true);
+   }
+}
+
 static void rvu_free_hw_resources(struct rvu *rvu)
 {
struct rvu_hwinfo *hw = rvu->hw;
@@ -124,7 +233,7 @@ static int

[PATCH v8 09/15] octeontx2-af: Configure block LF's MSIX vector offset

2018-10-07 Thread sunil . kovvuri

From: Sunil Goutham 

Firmware configures a certain number of MSIX vectors to each of
enabled RVU PF/VF. When a block LF is attached to a PF/VF, number
of MSIX vectors needed by that LF are set aside (out of PF/VF's
total MSIX vectors) and LF's msix_offset is configured in HW.

Also added support for a RVU PF/VF to retrieve that block LF's
MSIX vector offset information from AF via mbox.

Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h   |  18 ++
 drivers/net/ethernet/marvell/octeontx2/af/rvu.c| 333 -
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h|   7 +
 .../net/ethernet/marvell/octeontx2/af/rvu_struct.h |   2 +
 4 files changed, 357 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h 
b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index 7280d49..bedf0ee 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -122,6 +122,7 @@ static inline struct mbox_msghdr 
*otx2_mbox_alloc_msg(struct otx2_mbox *mbox,
 M(READY,   0x001, msg_req, ready_msg_rsp)  \
 M(ATTACH_RESOURCES,0x002, rsrc_attach, msg_rsp)\
 M(DETACH_RESOURCES,0x003, rsrc_detach, msg_rsp)\
+M(MSIX_OFFSET, 0x004, msg_req, msix_offset_rsp)\
 /* CGX mbox IDs (range 0x200 - 0x3FF) */   \
 /* NPA mbox IDs (range 0x400 - 0x5FF) */   \
 /* SSO/SSOW mbox IDs (range 0x600 - 0x7FF) */  \
@@ -190,4 +191,21 @@ struct rsrc_detach {
u8 cptlfs:1;
 };
 
+#define MSIX_VECTOR_INVALID0x
+#define MAX_RVU_BLKLF_CNT  256
+
+struct msix_offset_rsp {
+   struct mbox_msghdr hdr;
+   u16  npa_msixoff;
+   u16  nix_msixoff;
+   u8   sso;
+   u8   ssow;
+   u8   timlfs;
+   u8   cptlfs;
+   u16  sso_msixoff[MAX_RVU_BLKLF_CNT];
+   u16  ssow_msixoff[MAX_RVU_BLKLF_CNT];
+   u16  timlf_msixoff[MAX_RVU_BLKLF_CNT];
+   u16  cptlf_msixoff[MAX_RVU_BLKLF_CNT];
+};
+
 #endif /* MBOX_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index ef3f559..e4b8ed2 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -24,6 +24,11 @@
 
 static int rvu_get_hwvf(struct rvu *rvu, int pcifunc);
 
+static void rvu_set_msix_offset(struct rvu *rvu, struct rvu_pfvf *pfvf,
+   struct rvu_block *block, int lf);
+static void rvu_clear_msix_offset(struct rvu *rvu, struct rvu_pfvf *pfvf,
+ struct rvu_block *block, int lf);
+
 /* Supported devices */
 static const struct pci_device_id rvu_id_table[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_OCTEONTX2_RVU_AF) },
@@ -75,6 +80,45 @@ int rvu_alloc_rsrc(struct rsrc_bmap *rsrc)
return id;
 }
 
+static int rvu_alloc_rsrc_contig(struct rsrc_bmap *rsrc, int nrsrc)
+{
+   int start;
+
+   if (!rsrc->bmap)
+   return -EINVAL;
+
+   start = bitmap_find_next_zero_area(rsrc->bmap, rsrc->max, 0, nrsrc, 0);
+   if (start >= rsrc->max)
+   return -ENOSPC;
+
+   bitmap_set(rsrc->bmap, start, nrsrc);
+   return start;
+}
+
+static void rvu_free_rsrc_contig(struct rsrc_bmap *rsrc, int nrsrc, int start)
+{
+   if (!rsrc->bmap)
+   return;
+   if (start >= rsrc->max)
+   return;
+
+   bitmap_clear(rsrc->bmap, start, nrsrc);
+}
+
+static bool rvu_rsrc_check_contig(struct rsrc_bmap *rsrc, int nrsrc)
+{
+   int start;
+
+   if (!rsrc->bmap)
+   return false;
+
+   start = bitmap_find_next_zero_area(rsrc->bmap, rsrc->max, 0, nrsrc, 0);
+   if (start >= rsrc->max)
+   return false;
+
+   return true;
+}
+
 void rvu_free_rsrc(struct rsrc_bmap *rsrc, int id)
 {
if (!rsrc->bmap)
@@ -103,6 +147,26 @@ int rvu_alloc_bitmap(struct rsrc_bmap *rsrc)
return 0;
 }
 
+/* Get block LF's HW index from a PF_FUNC's block slot number */
+int rvu_get_lf(struct rvu *rvu, struct rvu_block *block, u16 pcifunc, u16 slot)
+{
+   u16 match = 0;
+   int lf;
+
+   spin_lock(>rsrc_lock);
+   for (lf = 0; lf < block->lf.max; lf++) {
+   if (block->fn_map[lf] == pcifunc) {
+   if (slot == match) {
+   spin_unlock(>rsrc_lock);
+   return lf;
+   }
+   match++;
+   }
+   }
+   spin_unlock(>rsrc_lock);
+   return -ENODEV;
+}
+
 /* Convert BLOCK_TYPE_E to a BLOCK_ADDR_E.
  * Some silicon variants of OcteonTX2 supports
  * multiple blocks of same type.
@@ -237,6 +301,16 @@ inline int rvu_get_pf(u16 pcifunc)
return (pcifunc >> RVU_PFVF_PF_SHIFT) & RVU_PFVF_PF_MASK;
 }
 
+void

[PATCH v8 14/15] octeontx2-af: Register for CGX lmac events

2018-10-07 Thread sunil . kovvuri

From: Linu Cherian 

Added support in RVU AF driver to register for
CGX LMAC link status change events from firmware
and managing them. Processing part will be added
in followup patches.

- Introduced eventqueue for posting events from cgx lmac.
  Queueing mechanism will ensure that events can be posted
  and firmware can be acked immediately and hence event
  reception and processing are decoupled.
- Events gets added to the queue by notification callback.
  Notification callback is expected to be atomic, since it
  is called from interrupt context.
- Events are dequeued and processed in a worker thread.

Signed-off-by: Linu Cherian 
Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/marvell/octeontx2/af/rvu.c|   6 +-
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h|   5 +
 .../net/ethernet/marvell/octeontx2/af/rvu_cgx.c| 101 -
 3 files changed, 108 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index 43ee14f..4e7788c 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -1564,10 +1564,11 @@ static int rvu_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
 
err = rvu_register_interrupts(rvu);
if (err)
-   goto err_mbox;
+   goto err_cgx;
 
return 0;
-
+err_cgx:
+   rvu_cgx_wq_destroy(rvu);
 err_mbox:
rvu_mbox_destroy(rvu);
 err_hwsetup:
@@ -1589,6 +1590,7 @@ static void rvu_remove(struct pci_dev *pdev)
struct rvu *rvu = pci_get_drvdata(pdev);
 
rvu_unregister_interrupts(rvu);
+   rvu_cgx_wq_destroy(rvu);
rvu_mbox_destroy(rvu);
rvu_reset_all_blocks(rvu);
rvu_free_hw_resources(rvu);
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 385f597..d169fa9 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -110,6 +110,10 @@ struct rvu {
  * every cgx lmac port
  */
void**cgx_idmap; /* cgx id to cgx data map table */
+   struct  work_struct cgx_evh_work;
+   struct  workqueue_struct *cgx_evh_wq;
+   spinlock_t  cgx_evq_lock; /* cgx event queue lock */
+   struct list_headcgx_evq_head; /* cgx event queue head */
 };
 
 static inline void rvu_write64(struct rvu *rvu, u64 block, u64 offset, u64 val)
@@ -150,4 +154,5 @@ int rvu_poll_reg(struct rvu *rvu, u64 block, u64 offset, 
u64 mask, bool zero);
 
 /* CGX APIs */
 int rvu_cgx_probe(struct rvu *rvu);
+void rvu_cgx_wq_destroy(struct rvu *rvu);
 #endif /* RVU_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c
index bf81507..5ecc223 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c
@@ -15,6 +15,11 @@
 #include "rvu.h"
 #include "cgx.h"
 
+struct cgx_evq_entry {
+   struct list_head evq_node;
+   struct cgx_link_event link_event;
+};
+
 static inline u8 cgxlmac_id_to_bmap(u8 cgx_id, u8 lmac_id)
 {
return ((cgx_id & 0xF) << 4) | (lmac_id & 0xF);
@@ -72,9 +77,95 @@ static int rvu_map_cgx_lmac_pf(struct rvu *rvu)
return 0;
 }
 
+/* This is called from interrupt context and is expected to be atomic */
+static int cgx_lmac_postevent(struct cgx_link_event *event, void *data)
+{
+   struct cgx_evq_entry *qentry;
+   struct rvu *rvu = data;
+
+   /* post event to the event queue */
+   qentry = kmalloc(sizeof(*qentry), GFP_ATOMIC);
+   if (!qentry)
+   return -ENOMEM;
+   qentry->link_event = *event;
+   spin_lock(>cgx_evq_lock);
+   list_add_tail(>evq_node, >cgx_evq_head);
+   spin_unlock(>cgx_evq_lock);
+
+   /* start worker to process the events */
+   queue_work(rvu->cgx_evh_wq, >cgx_evh_work);
+
+   return 0;
+}
+
+static void cgx_evhandler_task(struct work_struct *work)
+{
+   struct rvu *rvu = container_of(work, struct rvu, cgx_evh_work);
+   struct cgx_evq_entry *qentry;
+   struct cgx_link_event *event;
+   unsigned long flags;
+
+   do {
+   /* Dequeue an event */
+   spin_lock_irqsave(>cgx_evq_lock, flags);
+   qentry = list_first_entry_or_null(>cgx_evq_head,
+ struct cgx_evq_entry,
+ evq_node);
+   if (qentry)
+   list_del(>evq_node);
+   spin_unlock_irqrestore(>cgx_evq_lock, flags);
+   if (!qentry)
+   break; /* nothing more to process */
+
+   event = >link_event;
+
+

[PATCH v8 08/15] octeontx2-af: Add RVU block LF provisioning support

2018-10-07 Thread sunil . kovvuri

From: Sunil Goutham 

Added support for a RVU PF/VF to request AF via mailbox
to attach or detach NPA/NIX/SSO/SSOW/TIM/CPT block LFs.
Also supports partial detachment and modifying current
LF attached count of a certian block type.

Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h   |  45 +-
 drivers/net/ethernet/marvell/octeontx2/af/rvu.c| 472 -
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h|   8 +-
 .../net/ethernet/marvell/octeontx2/af/rvu_reg.h|   8 +-
 4 files changed, 523 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h 
b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index fc593f0..7280d49 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -118,7 +118,17 @@ static inline struct mbox_msghdr 
*otx2_mbox_alloc_msg(struct otx2_mbox *mbox,
 #define MBOX_MSG_MAX   0x
 
 #define MBOX_MESSAGES  \
-M(READY,   0x001, msg_req, ready_msg_rsp)
+/* Generic mbox IDs (range 0x000 - 0x1FF) */   \
+M(READY,   0x001, msg_req, ready_msg_rsp)  \
+M(ATTACH_RESOURCES,0x002, rsrc_attach, msg_rsp)\
+M(DETACH_RESOURCES,0x003, rsrc_detach, msg_rsp)\
+/* CGX mbox IDs (range 0x200 - 0x3FF) */   \
+/* NPA mbox IDs (range 0x400 - 0x5FF) */   \
+/* SSO/SSOW mbox IDs (range 0x600 - 0x7FF) */  \
+/* TIM mbox IDs (range 0x800 - 0x9FF) */   \
+/* CPT mbox IDs (range 0xA00 - 0xBFF) */   \
+/* NPC mbox IDs (range 0x6000 - 0x7FFF) */ \
+/* NIX mbox IDs (range 0x8000 - 0x) */ \
 
 enum {
 #define M(_name, _id, _1, _2) MBOX_MSG_ ## _name = _id,
@@ -147,4 +157,37 @@ struct ready_msg_rsp {
u16sclk_feq;/* SCLK frequency */
 };
 
+/* Structure for requesting resource provisioning.
+ * 'modify' flag to be used when either requesting more
+ * or to detach partial of a cetain resource type.
+ * Rest of the fields specify how many of what type to
+ * be attached.
+ */
+struct rsrc_attach {
+   struct mbox_msghdr hdr;
+   u8   modify:1;
+   u8   npalf:1;
+   u8   nixlf:1;
+   u16  sso;
+   u16  ssow;
+   u16  timlfs;
+   u16  cptlfs;
+};
+
+/* Structure for relinquishing resources.
+ * 'partial' flag to be used when relinquishing all resources
+ * but only of a certain type. If not set, all resources of all
+ * types provisioned to the RVU function will be detached.
+ */
+struct rsrc_detach {
+   struct mbox_msghdr hdr;
+   u8 partial:1;
+   u8 npalf:1;
+   u8 nixlf:1;
+   u8 sso:1;
+   u8 ssow:1;
+   u8 timlfs:1;
+   u8 cptlfs:1;
+};
+
 #endif /* MBOX_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index 53e02b0..ef3f559 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -59,6 +59,41 @@ int rvu_poll_reg(struct rvu *rvu, u64 block, u64 offset, u64 
mask, bool zero)
return -EBUSY;
 }
 
+int rvu_alloc_rsrc(struct rsrc_bmap *rsrc)
+{
+   int id;
+
+   if (!rsrc->bmap)
+   return -EINVAL;
+
+   id = find_first_zero_bit(rsrc->bmap, rsrc->max);
+   if (id >= rsrc->max)
+   return -ENOSPC;
+
+   __set_bit(id, rsrc->bmap);
+
+   return id;
+}
+
+void rvu_free_rsrc(struct rsrc_bmap *rsrc, int id)
+{
+   if (!rsrc->bmap)
+   return;
+
+   __clear_bit(id, rsrc->bmap);
+}
+
+int rvu_rsrc_free_count(struct rsrc_bmap *rsrc)
+{
+   int used;
+
+   if (!rsrc->bmap)
+   return 0;
+
+   used = bitmap_weight(rsrc->bmap, rsrc->max);
+   return (rsrc->max - used);
+}
+
 int rvu_alloc_bitmap(struct rsrc_bmap *rsrc)
 {
rsrc->bmap = kcalloc(BITS_TO_LONGS(rsrc->max),
@@ -68,6 +103,78 @@ int rvu_alloc_bitmap(struct rsrc_bmap *rsrc)
return 0;
 }
 
+/* Convert BLOCK_TYPE_E to a BLOCK_ADDR_E.
+ * Some silicon variants of OcteonTX2 supports
+ * multiple blocks of same type.
+ *
+ * @pcifunc has to be zero when no LF is yet attached.
+ */
+int rvu_get_blkaddr(struct rvu *rvu, int blktype, u16 pcifunc)
+{
+   int devnum, blkaddr = -ENODEV;
+   u64 cfg, reg;
+   bool is_pf;
+
+   switch (blktype) {
+   case BLKTYPE_NPA:
+   blkaddr = BLKADDR_NPA;
+   goto exit;
+   case BLKTYPE_NIX:
+   /* For now assume NIX0 */
+   if (!pcifunc) {
+   blkaddr = BLKADDR_NIX0;
+   goto exit;
+   }
+   break;
+   case BLKTYPE_SSO:
+   blkaddr = BLKADDR_SSO;

[PATCH v8 10/15] octeontx2-af: Reconfig MSIX base with IOVA

2018-10-07 Thread sunil . kovvuri

From: Geetha sowjanya 

HW interprets RVU_AF_MSIXTR_BASE address as an IOVA, hence
create a IOMMU mapping for the physcial address configured by
firmware and reconfig RVU_AF_MSIXTR_BASE with IOVA.

Signed-off-by: Geetha sowjanya 
Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/marvell/octeontx2/af/rvu.c | 33 ++---
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h |  1 +
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index e4b8ed2..e0c3c18 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -442,9 +442,10 @@ static int rvu_setup_msix_resources(struct rvu *rvu)
 {
struct rvu_hwinfo *hw = rvu->hw;
int pf, vf, numvfs, hwvf, err;
+   int nvecs, offset, max_msix;
struct rvu_pfvf *pfvf;
-   int nvecs, offset;
-   u64 cfg;
+   u64 cfg, phy_addr;
+   dma_addr_t iova;
 
for (pf = 0; pf < hw->total_pfs; pf++) {
cfg = rvu_read64(rvu, BLKADDR_RVUM, RVU_PRIV_PFX_CFG(pf));
@@ -523,6 +524,22 @@ static int rvu_setup_msix_resources(struct rvu *rvu)
}
}
 
+   /* HW interprets RVU_AF_MSIXTR_BASE address as an IOVA, hence
+* create a IOMMU mapping for the physcial address configured by
+* firmware and reconfig RVU_AF_MSIXTR_BASE with IOVA.
+*/
+   cfg = rvu_read64(rvu, BLKADDR_RVUM, RVU_PRIV_CONST);
+   max_msix = cfg & 0xF;
+   phy_addr = rvu_read64(rvu, BLKADDR_RVUM, RVU_AF_MSIXTR_BASE);
+   iova = dma_map_single(rvu->dev, (void *)phy_addr,
+ max_msix * PCI_MSIX_ENTRY_SIZE,
+ DMA_BIDIRECTIONAL);
+   if (dma_mapping_error(rvu->dev, iova))
+   return -ENOMEM;
+
+   rvu_write64(rvu, BLKADDR_RVUM, RVU_AF_MSIXTR_BASE, (u64)iova);
+   rvu->msix_base_iova = iova;
+
return 0;
 }
 
@@ -531,7 +548,8 @@ static void rvu_free_hw_resources(struct rvu *rvu)
struct rvu_hwinfo *hw = rvu->hw;
struct rvu_block *block;
struct rvu_pfvf  *pfvf;
-   int id;
+   int id, max_msix;
+   u64 cfg;
 
/* Free block LF bitmaps */
for (id = 0; id < BLK_COUNT; id++) {
@@ -549,6 +567,15 @@ static void rvu_free_hw_resources(struct rvu *rvu)
pfvf = >hwvf[id];
kfree(pfvf->msix.bmap);
}
+
+   /* Unmap MSIX vector base IOVA mapping */
+   if (!rvu->msix_base_iova)
+   return;
+   cfg = rvu_read64(rvu, BLKADDR_RVUM, RVU_PRIV_CONST);
+   max_msix = cfg & 0xF;
+   dma_unmap_single(rvu->dev, rvu->msix_base_iova,
+max_msix * PCI_MSIX_ENTRY_SIZE,
+DMA_BIDIRECTIONAL);
 }
 
 static int rvu_setup_hw_resources(struct rvu *rvu)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 7435e83..92c2022 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -99,6 +99,7 @@ struct rvu {
u16 num_vec;
char*irq_name;
bool*irq_allocated;
+   dma_addr_t  msix_base_iova;
 };
 
 static inline void rvu_write64(struct rvu *rvu, u64 block, u64 offset, u64 val)
-- 
2.7.4

[PATCH v8 12/15] octeontx2-af: Set RVU PFs to CGX LMACs mapping

2018-10-07 Thread sunil . kovvuri

From: Linu Cherian 

Each of the enabled CGX LMAC is considered a physical
interface and RVU PFs are mapped to these. VFs of these
SRIOV PFs will be virtual interfaces and share CGX LMAC
along with PF.

This mapping info will be used later on for Rx/Tx pkt steering.

Signed-off-by: Linu Cherian 
Signed-off-by: Geetha sowjanya 
Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/marvell/octeontx2/af/Makefile |  2 +-
 drivers/net/ethernet/marvell/octeontx2/af/cgx.c| 59 ++
 drivers/net/ethernet/marvell/octeontx2/af/cgx.h| 15 +-
 drivers/net/ethernet/marvell/octeontx2/af/rvu.c|  4 ++
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h| 12 +
 5 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile 
b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
index 8646421..eaac264 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/Makefile
+++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
@@ -7,4 +7,4 @@ obj-$(CONFIG_OCTEONTX2_MBOX) += octeontx2_mbox.o
 obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o
 
 octeontx2_mbox-y := mbox.o
-octeontx2_af-y := cgx.o rvu.o
+octeontx2_af-y := cgx.o rvu.o rvu_cgx.o
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c 
b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c
index c41d23f..6ecae80 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c
@@ -28,8 +28,12 @@ struct cgx {
void __iomem*reg_base;
struct pci_dev  *pdev;
u8  cgx_id;
+   u8  lmac_count;
+   struct list_headcgx_list;
 };
 
+static LIST_HEAD(cgx_list);
+
 /* Supported devices */
 static const struct pci_device_id cgx_id_table[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_OCTEONTX2_CGX) },
@@ -38,6 +42,53 @@ static const struct pci_device_id cgx_id_table[] = {
 
 MODULE_DEVICE_TABLE(pci, cgx_id_table);
 
+static u64 cgx_read(struct cgx *cgx, u64 lmac, u64 offset)
+{
+   return readq(cgx->reg_base + (lmac << 18) + offset);
+}
+
+int cgx_get_cgx_cnt(void)
+{
+   struct cgx *cgx_dev;
+   int count = 0;
+
+   list_for_each_entry(cgx_dev, _list, cgx_list)
+   count++;
+
+   return count;
+}
+EXPORT_SYMBOL(cgx_get_cgx_cnt);
+
+int cgx_get_lmac_cnt(void *cgxd)
+{
+   struct cgx *cgx = cgxd;
+
+   if (!cgx)
+   return -ENODEV;
+
+   return cgx->lmac_count;
+}
+EXPORT_SYMBOL(cgx_get_lmac_cnt);
+
+void *cgx_get_pdata(int cgx_id)
+{
+   struct cgx *cgx_dev;
+
+   list_for_each_entry(cgx_dev, _list, cgx_list) {
+   if (cgx_dev->cgx_id == cgx_id)
+   return cgx_dev;
+   }
+   return NULL;
+}
+EXPORT_SYMBOL(cgx_get_pdata);
+
+static void cgx_lmac_init(struct cgx *cgx)
+{
+   cgx->lmac_count = cgx_read(cgx, 0, CGXX_CMRX_RX_LMACS) & 0x7;
+   if (cgx->lmac_count > MAX_LMAC_PER_CGX)
+   cgx->lmac_count = MAX_LMAC_PER_CGX;
+}
+
 static int cgx_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
struct device *dev = >dev;
@@ -72,9 +123,14 @@ static int cgx_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
goto err_release_regions;
}
 
+   list_add(>cgx_list, _list);
+   cgx->cgx_id = cgx_get_cgx_cnt() - 1;
+   cgx_lmac_init(cgx);
+
return 0;
 
 err_release_regions:
+   list_del(>cgx_list);
pci_release_regions(pdev);
 err_disable_device:
pci_disable_device(pdev);
@@ -84,6 +140,9 @@ static int cgx_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
 
 static void cgx_remove(struct pci_dev *pdev)
 {
+   struct cgx *cgx = pci_get_drvdata(pdev);
+
+   list_del(>cgx_list);
pci_release_regions(pdev);
pci_disable_device(pdev);
pci_set_drvdata(pdev, NULL);
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cgx.h 
b/drivers/net/ethernet/marvell/octeontx2/af/cgx.h
index a7d4b39..acdc16e 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cgx.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cgx.h
@@ -12,11 +12,22 @@
 #define CGX_H
 
  /* PCI device IDs */
-#definePCI_DEVID_OCTEONTX2_CGX 0xA059
+#definePCI_DEVID_OCTEONTX2_CGX 0xA059
 
 /* PCI BAR nos */
-#define PCI_CFG_REG_BAR_NUM0
+#define PCI_CFG_REG_BAR_NUM0
+
+#define MAX_CGX3
+#define MAX_LMAC_PER_CGX   4
+#define CGX_OFFSET(x)  ((x) * MAX_LMAC_PER_CGX)
+
+/* Registers */
+#define CGXX_CMRX_RX_ID_MAP0x060
+#define CGXX_CMRX_RX_LMACS 0x128
 
 extern struct pci_driver cgx_driver;
 
+int cgx_get_cgx_cnt(void);
+int cgx_get_lmac_cnt(void *cgxd);
+void *cgx_get_pdata(int cgx_id);
 #endif /* CGX_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c

[PATCH v8 05/15] octeontx2-af: Add mailbox IRQ and msg handlers

2018-10-07 Thread sunil . kovvuri

From: Sunil Goutham 

This patch adds support for mailbox interrupt and message
handling. Mapped mailbox region and registered a workqueue
for message handling. Enabled mailbox IRQ of RVU PFs
and registered a interrupt handler. When IRQ is triggered
work is added to the mbox workqueue for msgs to get processed.

Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h   |  14 +-
 drivers/net/ethernet/marvell/octeontx2/af/rvu.c| 254 +
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h|  22 ++
 .../net/ethernet/marvell/octeontx2/af/rvu_struct.h |  22 ++
 4 files changed, 309 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h 
b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index 8e205fd..fc593f0 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -33,6 +33,8 @@
 # error "incorrect mailbox area sizes"
 #endif
 
+#define INTR_MASK(pfvfs) ((pfvfs < 64) ? (BIT_ULL(pfvfs) - 1) : (~0ull))
+
 #define MBOX_RSP_TIMEOUT   1000 /* in ms, Time to wait for mbox response */
 
 #define MBOX_MSG_ALIGN 16  /* Align mbox msg start to 16bytes */
@@ -90,8 +92,9 @@ struct mbox_msghdr {
 
 void otx2_mbox_reset(struct otx2_mbox *mbox, int devid);
 void otx2_mbox_destroy(struct otx2_mbox *mbox);
-int otx2_mbox_init(struct otx2_mbox *mbox, void *hwbase, struct pci_dev *pdev,
-  void *reg_base, int direction, int ndevs);
+int otx2_mbox_init(struct otx2_mbox *mbox, void __force *hwbase,
+  struct pci_dev *pdev, void __force *reg_base,
+  int direction, int ndevs);
 void otx2_mbox_msg_send(struct otx2_mbox *mbox, int devid);
 int otx2_mbox_wait_for_rsp(struct otx2_mbox *mbox, int devid);
 int otx2_mbox_busy_poll_for_rsp(struct otx2_mbox *mbox, int devid);
@@ -115,7 +118,7 @@ static inline struct mbox_msghdr 
*otx2_mbox_alloc_msg(struct otx2_mbox *mbox,
 #define MBOX_MSG_MAX   0x
 
 #define MBOX_MESSAGES  \
-M(READY,   0x001, msg_req, msg_rsp)
+M(READY,   0x001, msg_req, ready_msg_rsp)
 
 enum {
 #define M(_name, _id, _1, _2) MBOX_MSG_ ## _name = _id,
@@ -139,4 +142,9 @@ struct msg_rsp {
struct mbox_msghdr hdr;
 };
 
+struct ready_msg_rsp {
+   struct mbox_msghdr hdr;
+   u16sclk_feq;/* SCLK frequency */
+};
+
 #endif /* MBOX_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index fa5f40b..6999d0f 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -258,6 +258,245 @@ static int rvu_setup_hw_resources(struct rvu *rvu)
return 0;
 }
 
+static int rvu_process_mbox_msg(struct rvu *rvu, int devid,
+   struct mbox_msghdr *req)
+{
+   /* Check if valid, if not reply with a invalid msg */
+   if (req->sig != OTX2_MBOX_REQ_SIG)
+   goto bad_message;
+
+   if (req->id == MBOX_MSG_READY)
+   return 0;
+
+bad_message:
+   otx2_reply_invalid_msg(>mbox, devid, req->pcifunc,
+  req->id);
+   return -ENODEV;
+}
+
+static void rvu_mbox_handler(struct work_struct *work)
+{
+   struct rvu_work *mwork = container_of(work, struct rvu_work, work);
+   struct rvu *rvu = mwork->rvu;
+   struct otx2_mbox_dev *mdev;
+   struct mbox_hdr *req_hdr;
+   struct mbox_msghdr *msg;
+   struct otx2_mbox *mbox;
+   int offset, id, err;
+   u16 pf;
+
+   mbox = >mbox;
+   pf = mwork - rvu->mbox_wrk;
+   mdev = >dev[pf];
+
+   /* Process received mbox messages */
+   req_hdr = (struct mbox_hdr *)(mdev->mbase + mbox->rx_start);
+   if (req_hdr->num_msgs == 0)
+   return;
+
+   offset = mbox->rx_start + ALIGN(sizeof(*req_hdr), MBOX_MSG_ALIGN);
+
+   for (id = 0; id < req_hdr->num_msgs; id++) {
+   msg = (struct mbox_msghdr *)(mdev->mbase + offset);
+
+   /* Set which PF sent this message based on mbox IRQ */
+   msg->pcifunc &= ~(RVU_PFVF_PF_MASK << RVU_PFVF_PF_SHIFT);
+   msg->pcifunc |= (pf << RVU_PFVF_PF_SHIFT);
+   err = rvu_process_mbox_msg(rvu, pf, msg);
+   if (!err) {
+   offset = mbox->rx_start + msg->next_msgoff;
+   continue;
+   }
+
+   if (msg->pcifunc & RVU_PFVF_FUNC_MASK)
+   dev_warn(rvu->dev, "Error %d when processing message %s 
(0x%x) from PF%d:VF%d\n",
+err, otx2_mbox_id2name(msg->id), msg->id, pf,
+(msg->pcifunc & RVU_PFVF_FUNC_MASK) - 1);
+   else
+   dev_warn(rvu->dev, "Error %d when processing message %s 
(0x%x) from PF%d\n",
+

[PATCH v8 02/15] octeontx2-af: Reset all RVU blocks

2018-10-07 Thread sunil . kovvuri

From: Sunil Goutham 

Go through all BLKADDRs and check which ones are implemented
on this silicon and do a HW reset of each implemented block.
Also added all RVU AF and PF register offsets.

Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/marvell/octeontx2/af/rvu.c|  78 ++
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h|  37 +++
 .../net/ethernet/marvell/octeontx2/af/rvu_reg.h| 112 +
 .../net/ethernet/marvell/octeontx2/af/rvu_struct.h |  34 +++
 4 files changed, 261 insertions(+)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_reg.h
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_struct.h

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index 5af4da6..d40fabf 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -16,6 +16,7 @@
 #include 
 
 #include "rvu.h"
+#include "rvu_reg.h"
 
 #define DRV_NAME   "octeontx2-af"
 #define DRV_STRING  "Marvell OcteonTX2 RVU Admin Function Driver"
@@ -33,6 +34,70 @@ MODULE_LICENSE("GPL v2");
 MODULE_VERSION(DRV_VERSION);
 MODULE_DEVICE_TABLE(pci, rvu_id_table);
 
+/* Poll a RVU block's register 'offset', for a 'zero'
+ * or 'nonzero' at bits specified by 'mask'
+ */
+int rvu_poll_reg(struct rvu *rvu, u64 block, u64 offset, u64 mask, bool zero)
+{
+   void __iomem *reg;
+   int timeout = 100;
+   u64 reg_val;
+
+   reg = rvu->afreg_base + ((block << 28) | offset);
+   while (timeout) {
+   reg_val = readq(reg);
+   if (zero && !(reg_val & mask))
+   return 0;
+   if (!zero && (reg_val & mask))
+   return 0;
+   udelay(1);
+   cpu_relax();
+   timeout--;
+   }
+   return -EBUSY;
+}
+
+static void rvu_check_block_implemented(struct rvu *rvu)
+{
+   struct rvu_hwinfo *hw = rvu->hw;
+   struct rvu_block *block;
+   int blkid;
+   u64 cfg;
+
+   /* For each block check if 'implemented' bit is set */
+   for (blkid = 0; blkid < BLK_COUNT; blkid++) {
+   block = >block[blkid];
+   cfg = rvupf_read64(rvu, RVU_PF_BLOCK_ADDRX_DISC(blkid));
+   if (cfg & BIT_ULL(11))
+   block->implemented = true;
+   }
+}
+
+static void rvu_block_reset(struct rvu *rvu, int blkaddr, u64 rst_reg)
+{
+   struct rvu_block *block = >hw->block[blkaddr];
+
+   if (!block->implemented)
+   return;
+
+   rvu_write64(rvu, blkaddr, rst_reg, BIT_ULL(0));
+   rvu_poll_reg(rvu, blkaddr, rst_reg, BIT_ULL(63), true);
+}
+
+static void rvu_reset_all_blocks(struct rvu *rvu)
+{
+   /* Do a HW reset of all RVU blocks */
+   rvu_block_reset(rvu, BLKADDR_NPA, NPA_AF_BLK_RST);
+   rvu_block_reset(rvu, BLKADDR_NIX0, NIX_AF_BLK_RST);
+   rvu_block_reset(rvu, BLKADDR_NPC, NPC_AF_BLK_RST);
+   rvu_block_reset(rvu, BLKADDR_SSO, SSO_AF_BLK_RST);
+   rvu_block_reset(rvu, BLKADDR_TIM, TIM_AF_BLK_RST);
+   rvu_block_reset(rvu, BLKADDR_CPT0, CPT_AF_BLK_RST);
+   rvu_block_reset(rvu, BLKADDR_NDC0, NDC_AF_BLK_RST);
+   rvu_block_reset(rvu, BLKADDR_NDC1, NDC_AF_BLK_RST);
+   rvu_block_reset(rvu, BLKADDR_NDC2, NDC_AF_BLK_RST);
+}
+
 static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
struct device *dev = >dev;
@@ -43,6 +108,12 @@ static int rvu_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
if (!rvu)
return -ENOMEM;
 
+   rvu->hw = devm_kzalloc(dev, sizeof(struct rvu_hwinfo), GFP_KERNEL);
+   if (!rvu->hw) {
+   devm_kfree(dev, rvu);
+   return -ENOMEM;
+   }
+
pci_set_drvdata(pdev, rvu);
rvu->pdev = pdev;
rvu->dev = >dev;
@@ -80,6 +151,11 @@ static int rvu_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
goto err_release_regions;
}
 
+   /* Check which blocks the HW supports */
+   rvu_check_block_implemented(rvu);
+
+   rvu_reset_all_blocks(rvu);
+
return 0;
 
 err_release_regions:
@@ -88,6 +164,7 @@ static int rvu_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
pci_disable_device(pdev);
 err_freemem:
pci_set_drvdata(pdev, NULL);
+   devm_kfree(>dev, rvu->hw);
devm_kfree(dev, rvu);
return err;
 }
@@ -100,6 +177,7 @@ static void rvu_remove(struct pci_dev *pdev)
pci_disable_device(pdev);
pci_set_drvdata(pdev, NULL);
 
+   devm_kfree(>dev, rvu->hw);
devm_kfree(>dev, rvu);
 }
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 4a4b0ad..e2c54d0 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -11,6 +11,8 @@
 #ifndef RVU_H

[PATCH v8 06/15] octeontx2-af: Convert mbox msg id check to a macro

2018-10-07 Thread sunil . kovvuri

From: Aleksey Makarov 

With 10's of mailbox messages expected to be handled in future,
checking for message id could become a lengthy switch case. Hence
added a macro to auto generate the switch case for each msg id.

Signed-off-by: Aleksey Makarov 
Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/marvell/octeontx2/af/rvu.c | 44 +
 1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index 6999d0f..d75ce45 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -258,6 +258,12 @@ static int rvu_setup_hw_resources(struct rvu *rvu)
return 0;
 }
 
+static int rvu_mbox_handler_READY(struct rvu *rvu, struct msg_req *req,
+ struct ready_msg_rsp *rsp)
+{
+   return 0;
+}
+
 static int rvu_process_mbox_msg(struct rvu *rvu, int devid,
struct mbox_msghdr *req)
 {
@@ -265,13 +271,39 @@ static int rvu_process_mbox_msg(struct rvu *rvu, int 
devid,
if (req->sig != OTX2_MBOX_REQ_SIG)
goto bad_message;
 
-   if (req->id == MBOX_MSG_READY)
-   return 0;
-
+   switch (req->id) {
+#define M(_name, _id, _req_type, _rsp_type)\
+   case _id: { \
+   struct _rsp_type *rsp;  \
+   int err;\
+   \
+   rsp = (struct _rsp_type *)otx2_mbox_alloc_msg(  \
+   >mbox, devid,  \
+   sizeof(struct _rsp_type));  \
+   if (rsp) {  \
+   rsp->hdr.id = _id;  \
+   rsp->hdr.sig = OTX2_MBOX_RSP_SIG;   \
+   rsp->hdr.pcifunc = req->pcifunc;\
+   rsp->hdr.rc = 0;\
+   }   \
+   \
+   err = rvu_mbox_handler_ ## _name(rvu,   \
+(struct _req_type *)req, \
+rsp);  \
+   if (rsp && err) \
+   rsp->hdr.rc = err;  \
+   \
+   return rsp ? err : -ENOMEM; \
+   }
+MBOX_MESSAGES
+#undef M
+   break;
 bad_message:
-   otx2_reply_invalid_msg(>mbox, devid, req->pcifunc,
-  req->id);
-   return -ENODEV;
+   default:
+   otx2_reply_invalid_msg(>mbox, devid, req->pcifunc,
+  req->id);
+   return -ENODEV;
+   }
 }
 
 static void rvu_mbox_handler(struct work_struct *work)
-- 
2.7.4

[PATCH v8 03/15] octeontx2-af: Gather RVU blocks HW info

2018-10-07 Thread sunil . kovvuri

From: Sunil Goutham 

This patch gathers NPA/NIX/SSO/SSOW/TIM/CPT RVU blocks's
HW info like number of LFs. Important register offsets
saved for later use to avoid code duplication for each block.
A bitmap is allocated for each of the blocks which later
on will be used to allocate a LF for a RVU PF/VF.

Also added RVU NIX/NPA block registers and few registers
of other blocks.

Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/marvell/octeontx2/af/rvu.c| 167 +++
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h|  21 ++
 .../net/ethernet/marvell/octeontx2/af/rvu_reg.h| 333 -
 3 files changed, 517 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index d40fabf..fa5f40b 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -57,6 +57,15 @@ int rvu_poll_reg(struct rvu *rvu, u64 block, u64 offset, u64 
mask, bool zero)
return -EBUSY;
 }
 
+int rvu_alloc_bitmap(struct rsrc_bmap *rsrc)
+{
+   rsrc->bmap = kcalloc(BITS_TO_LONGS(rsrc->max),
+sizeof(long), GFP_KERNEL);
+   if (!rsrc->bmap)
+   return -ENOMEM;
+   return 0;
+}
+
 static void rvu_check_block_implemented(struct rvu *rvu)
 {
struct rvu_hwinfo *hw = rvu->hw;
@@ -98,6 +107,157 @@ static void rvu_reset_all_blocks(struct rvu *rvu)
rvu_block_reset(rvu, BLKADDR_NDC2, NDC_AF_BLK_RST);
 }
 
+static void rvu_free_hw_resources(struct rvu *rvu)
+{
+   struct rvu_hwinfo *hw = rvu->hw;
+   struct rvu_block *block;
+   int id;
+
+   /* Free all bitmaps */
+   for (id = 0; id < BLK_COUNT; id++) {
+   block = >block[id];
+   kfree(block->lf.bmap);
+   }
+}
+
+static int rvu_setup_hw_resources(struct rvu *rvu)
+{
+   struct rvu_hwinfo *hw = rvu->hw;
+   struct rvu_block *block;
+   int err;
+   u64 cfg;
+
+   /* Get HW supported max RVU PF & VF count */
+   cfg = rvu_read64(rvu, BLKADDR_RVUM, RVU_PRIV_CONST);
+   hw->total_pfs = (cfg >> 32) & 0xFF;
+   hw->total_vfs = (cfg >> 20) & 0xFFF;
+   hw->max_vfs_per_pf = (cfg >> 40) & 0xFF;
+
+   /* Init NPA LF's bitmap */
+   block = >block[BLKADDR_NPA];
+   if (!block->implemented)
+   goto nix;
+   cfg = rvu_read64(rvu, BLKADDR_NPA, NPA_AF_CONST);
+   block->lf.max = (cfg >> 16) & 0xFFF;
+   block->addr = BLKADDR_NPA;
+   block->lfshift = 8;
+   block->lookup_reg = NPA_AF_RVU_LF_CFG_DEBUG;
+   block->pf_lfcnt_reg = RVU_PRIV_PFX_NPA_CFG;
+   block->vf_lfcnt_reg = RVU_PRIV_HWVFX_NPA_CFG;
+   block->lfcfg_reg = NPA_PRIV_LFX_CFG;
+   block->msixcfg_reg = NPA_PRIV_LFX_INT_CFG;
+   block->lfreset_reg = NPA_AF_LF_RST;
+   sprintf(block->name, "NPA");
+   err = rvu_alloc_bitmap(>lf);
+   if (err)
+   return err;
+
+nix:
+   /* Init NIX LF's bitmap */
+   block = >block[BLKADDR_NIX0];
+   if (!block->implemented)
+   goto sso;
+   cfg = rvu_read64(rvu, BLKADDR_NIX0, NIX_AF_CONST2);
+   block->lf.max = cfg & 0xFFF;
+   block->addr = BLKADDR_NIX0;
+   block->lfshift = 8;
+   block->lookup_reg = NIX_AF_RVU_LF_CFG_DEBUG;
+   block->pf_lfcnt_reg = RVU_PRIV_PFX_NIX_CFG;
+   block->vf_lfcnt_reg = RVU_PRIV_HWVFX_NIX_CFG;
+   block->lfcfg_reg = NIX_PRIV_LFX_CFG;
+   block->msixcfg_reg = NIX_PRIV_LFX_INT_CFG;
+   block->lfreset_reg = NIX_AF_LF_RST;
+   sprintf(block->name, "NIX");
+   err = rvu_alloc_bitmap(>lf);
+   if (err)
+   return err;
+
+sso:
+   /* Init SSO group's bitmap */
+   block = >block[BLKADDR_SSO];
+   if (!block->implemented)
+   goto ssow;
+   cfg = rvu_read64(rvu, BLKADDR_SSO, SSO_AF_CONST);
+   block->lf.max = cfg & 0x;
+   block->addr = BLKADDR_SSO;
+   block->multislot = true;
+   block->lfshift = 3;
+   block->lookup_reg = SSO_AF_RVU_LF_CFG_DEBUG;
+   block->pf_lfcnt_reg = RVU_PRIV_PFX_SSO_CFG;
+   block->vf_lfcnt_reg = RVU_PRIV_HWVFX_SSO_CFG;
+   block->lfcfg_reg = SSO_PRIV_LFX_HWGRP_CFG;
+   block->msixcfg_reg = SSO_PRIV_LFX_HWGRP_INT_CFG;
+   block->lfreset_reg = SSO_AF_LF_HWGRP_RST;
+   sprintf(block->name, "SSO GROUP");
+   err = rvu_alloc_bitmap(>lf);
+   if (err)
+   return err;
+
+ssow:
+   /* Init SSO workslot's bitmap */
+   block = >block[BLKADDR_SSOW];
+   if (!block->implemented)
+   goto tim;
+   block->lf.max = (cfg >> 56) & 0xFF;
+   block->addr = BLKADDR_SSOW;
+   block->multislot = true;
+   block->lfshift = 3;
+   block->lookup_reg = SSOW_AF_RVU_LF_HWS_CFG_DEBUG;
+   block->pf_lfcnt_reg = RVU_PRIV_PFX_SSOW_CFG;
+   block->vf_lfcnt_reg = RVU_PRIV_HWVFX_SSOW_CFG;
+   block->lfcfg_reg = SSOW_PRIV_LFX_HWS_CFG;
+

[PATCH v8 04/15] octeontx2-af: Add mailbox support infra

2018-10-07 Thread sunil . kovvuri

From: Aleksey Makarov 

This patch adds mailbox support infrastructure APIs.
Each RVU device has a dedicated 64KB mailbox region
shared with it's peer for communication. RVU AF has
a separate mailbox region shared with each of RVU PFs
and a RVU PF has a separate region shared with each of
it's VF.

These set of APIs are used by this driver (RVU AF) and
other RVU PF/VF drivers eg netdev, crypto e.t.c.

Signed-off-by: Aleksey Makarov 
Signed-off-by: Sunil Goutham 
Signed-off-by: Lukasz Bartosik 
---
 drivers/net/ethernet/marvell/octeontx2/Kconfig |   4 +
 drivers/net/ethernet/marvell/octeontx2/af/Makefile |   2 +
 drivers/net/ethernet/marvell/octeontx2/af/mbox.c   | 303 +
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h   | 142 ++
 .../net/ethernet/marvell/octeontx2/af/rvu_reg.h|   4 +
 5 files changed, 455 insertions(+)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/mbox.c
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/mbox.h

diff --git a/drivers/net/ethernet/marvell/octeontx2/Kconfig 
b/drivers/net/ethernet/marvell/octeontx2/Kconfig
index 9743502..8002f9c 100644
--- a/drivers/net/ethernet/marvell/octeontx2/Kconfig
+++ b/drivers/net/ethernet/marvell/octeontx2/Kconfig
@@ -2,8 +2,12 @@
 # Marvell OcteonTX2 drivers configuration
 #
 
+config OCTEONTX2_MBOX
+tristate
+
 config OCTEONTX2_AF
tristate "Marvell OcteonTX2 RVU Admin Function driver"
+   select OCTEONTX2_MBOX
depends on ARM64 && PCI
help
  This driver supports Marvell's OcteonTX2 Resource Virtualization
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile 
b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
index dacbd16..ac17cb9 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/Makefile
+++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
@@ -3,6 +3,8 @@
 # Makefile for Marvell's OcteonTX2 RVU Admin Function driver
 #
 
+obj-$(CONFIG_OCTEONTX2_MBOX) += octeontx2_mbox.o
 obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o
 
+octeontx2_mbox-y := mbox.o
 octeontx2_af-y := rvu.o
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.c 
b/drivers/net/ethernet/marvell/octeontx2/af/mbox.c
new file mode 100644
index 000..85ba24a
--- /dev/null
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.c
@@ -0,0 +1,303 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Marvell OcteonTx2 RVU Admin Function driver
+ *
+ * Copyright (C) 2018 Marvell International Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+
+#include "rvu_reg.h"
+#include "mbox.h"
+
+static const u16 msgs_offset = ALIGN(sizeof(struct mbox_hdr), MBOX_MSG_ALIGN);
+
+void otx2_mbox_reset(struct otx2_mbox *mbox, int devid)
+{
+   struct otx2_mbox_dev *mdev = >dev[devid];
+   struct mbox_hdr *tx_hdr, *rx_hdr;
+
+   tx_hdr = mdev->mbase + mbox->tx_start;
+   rx_hdr = mdev->mbase + mbox->rx_start;
+
+   spin_lock(>mbox_lock);
+   mdev->msg_size = 0;
+   mdev->rsp_size = 0;
+   tx_hdr->num_msgs = 0;
+   rx_hdr->num_msgs = 0;
+   spin_unlock(>mbox_lock);
+}
+EXPORT_SYMBOL(otx2_mbox_reset);
+
+void otx2_mbox_destroy(struct otx2_mbox *mbox)
+{
+   mbox->reg_base = NULL;
+   mbox->hwbase = NULL;
+
+   kfree(mbox->dev);
+   mbox->dev = NULL;
+}
+EXPORT_SYMBOL(otx2_mbox_destroy);
+
+int otx2_mbox_init(struct otx2_mbox *mbox, void *hwbase, struct pci_dev *pdev,
+  void *reg_base, int direction, int ndevs)
+{
+   struct otx2_mbox_dev *mdev;
+   int devid;
+
+   switch (direction) {
+   case MBOX_DIR_AFPF:
+   case MBOX_DIR_PFVF:
+   mbox->tx_start = MBOX_DOWN_TX_START;
+   mbox->rx_start = MBOX_DOWN_RX_START;
+   mbox->tx_size  = MBOX_DOWN_TX_SIZE;
+   mbox->rx_size  = MBOX_DOWN_RX_SIZE;
+   break;
+   case MBOX_DIR_PFAF:
+   case MBOX_DIR_VFPF:
+   mbox->tx_start = MBOX_DOWN_RX_START;
+   mbox->rx_start = MBOX_DOWN_TX_START;
+   mbox->tx_size  = MBOX_DOWN_RX_SIZE;
+   mbox->rx_size  = MBOX_DOWN_TX_SIZE;
+   break;
+   case MBOX_DIR_AFPF_UP:
+   case MBOX_DIR_PFVF_UP:
+   mbox->tx_start = MBOX_UP_TX_START;
+   mbox->rx_start = MBOX_UP_RX_START;
+   mbox->tx_size  = MBOX_UP_TX_SIZE;
+   mbox->rx_size  = MBOX_UP_RX_SIZE;
+   break;
+   case MBOX_DIR_PFAF_UP:
+   case MBOX_DIR_VFPF_UP:
+   mbox->tx_start = MBOX_UP_RX_START;
+   mbox->rx_start = MBOX_UP_TX_START;
+   mbox->tx_size  = MBOX_UP_RX_SIZE;
+   mbox->rx_size  = MBOX_UP_TX_SIZE;
+   break;
+   default:
+   return -ENODEV;
+   }
+
+   switch (direction) {
+   case

[PATCH v8 01/15] octeontx2-af: Add Marvell OcteonTX2 RVU AF driver

2018-10-07 Thread sunil . kovvuri

From: Sunil Goutham 

This patch adds basic template for Marvell OcteonTX2's
resource virtualization unit (RVU) admin function (AF)
driver. Just the driver registration and probe.

Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/marvell/Kconfig   |   3 +
 drivers/net/ethernet/marvell/Makefile  |   1 +
 drivers/net/ethernet/marvell/octeontx2/Kconfig |  12 ++
 drivers/net/ethernet/marvell/octeontx2/Makefile|   6 +
 drivers/net/ethernet/marvell/octeontx2/af/Makefile |   8 ++
 drivers/net/ethernet/marvell/octeontx2/af/rvu.c| 126 +
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h|  31 +
 7 files changed, 187 insertions(+)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/Kconfig
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/Makefile
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/Makefile
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu.c
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu.h

diff --git a/drivers/net/ethernet/marvell/Kconfig 
b/drivers/net/ethernet/marvell/Kconfig
index f33fd22..3238aa7 100644
--- a/drivers/net/ethernet/marvell/Kconfig
+++ b/drivers/net/ethernet/marvell/Kconfig
@@ -167,4 +167,7 @@ config SKY2_DEBUG
 
  If unsure, say N.
 
+
+source "drivers/net/ethernet/marvell/octeontx2/Kconfig"
+
 endif # NET_VENDOR_MARVELL
diff --git a/drivers/net/ethernet/marvell/Makefile 
b/drivers/net/ethernet/marvell/Makefile
index 55d4d10..89dea72 100644
--- a/drivers/net/ethernet/marvell/Makefile
+++ b/drivers/net/ethernet/marvell/Makefile
@@ -11,3 +11,4 @@ obj-$(CONFIG_MVPP2) += mvpp2/
 obj-$(CONFIG_PXA168_ETH) += pxa168_eth.o
 obj-$(CONFIG_SKGE) += skge.o
 obj-$(CONFIG_SKY2) += sky2.o
+obj-y  += octeontx2/
diff --git a/drivers/net/ethernet/marvell/octeontx2/Kconfig 
b/drivers/net/ethernet/marvell/octeontx2/Kconfig
new file mode 100644
index 000..9743502
--- /dev/null
+++ b/drivers/net/ethernet/marvell/octeontx2/Kconfig
@@ -0,0 +1,12 @@
+#
+# Marvell OcteonTX2 drivers configuration
+#
+
+config OCTEONTX2_AF
+   tristate "Marvell OcteonTX2 RVU Admin Function driver"
+   depends on ARM64 && PCI
+   help
+ This driver supports Marvell's OcteonTX2 Resource Virtualization
+ Unit's admin function manager which manages all RVU HW resources
+ and provides a medium to other PF/VFs to configure HW. Should be
+ enabled for other RVU device drivers to work.
diff --git a/drivers/net/ethernet/marvell/octeontx2/Makefile 
b/drivers/net/ethernet/marvell/octeontx2/Makefile
new file mode 100644
index 000..e579dcd
--- /dev/null
+++ b/drivers/net/ethernet/marvell/octeontx2/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for Marvell OcteonTX2 device drivers.
+#
+
+obj-$(CONFIG_OCTEONTX2_AF) += af/
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile 
b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
new file mode 100644
index 000..dacbd16
--- /dev/null
+++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for Marvell's OcteonTX2 RVU Admin Function driver
+#
+
+obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o
+
+octeontx2_af-y := rvu.o
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
new file mode 100644
index 000..5af4da6
--- /dev/null
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -0,0 +1,126 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Marvell OcteonTx2 RVU Admin Function driver
+ *
+ * Copyright (C) 2018 Marvell International Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "rvu.h"
+
+#define DRV_NAME   "octeontx2-af"
+#define DRV_STRING  "Marvell OcteonTX2 RVU Admin Function Driver"
+#define DRV_VERSION"1.0"
+
+/* Supported devices */
+static const struct pci_device_id rvu_id_table[] = {
+   { PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_OCTEONTX2_RVU_AF) },
+   { 0, }  /* end of table */
+};
+
+MODULE_AUTHOR("Marvell International Ltd.");
+MODULE_DESCRIPTION(DRV_STRING);
+MODULE_LICENSE("GPL v2");
+MODULE_VERSION(DRV_VERSION);
+MODULE_DEVICE_TABLE(pci, rvu_id_table);
+
+static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+   struct device *dev = >dev;
+   struct rvu *rvu;
+   interr;
+
+   rvu = devm_kzalloc(dev, sizeof(*rvu), GFP_KERNEL);
+   if (!rvu)
+   return -ENOMEM;
+
+   pci_set_drvdata(pdev, rvu);
+   rvu->pdev = pdev;
+   rvu->dev = >dev;
+
+   err = pci_enable_device(pdev);
+   if (err) {
+   dev_err(dev, "Failed to enable PCI device\n");
+   goto err_freemem;
+

[PATCH v8 00/15] octeontx2-af: Add RVU Admin Function driver

2018-10-07 Thread sunil . kovvuri

From: Sunil Goutham 

Resource virtualization unit (RVU) on Marvell's OcteonTX2 SOC maps HW
resources from the network, crypto and other functional blocks into
PCI-compatible physical and virtual functions. Each functional block
again has multiple local functions (LFs) for provisioning to PCI devices.
RVU supports multiple PCIe SRIOV physical functions (PFs) and virtual
functions (VFs). PF0 is called the administrative / admin function (AF)
and has privileges to provision RVU functional block's LFs to each of the
PF/VF.

RVU managed networking functional blocks
 - Network pool allocator (NPA)
 - Network interface controller (NIX)
 - Network parser CAM (NPC)
 - Schedule/Synchronize/Order unit (SSO)

RVU managed non-networking functional blocks
 - Crypto accelerator (CPT)
 - Scheduled timers unit (TIM)
 - Schedule/Synchronize/Order unit (SSO)
   Used for both networking and non networking usecases
 - Compression (upcoming in future variants of the silicons)

Resource provisioning examples
 - A PF/VF with NIX-LF & NPA-LF resources works as a pure network device
 - A PF/VF with CPT-LF resource works as a pure cyrpto offload device.

This admin function driver neither receives any data nor processes it i.e
no I/O, a configuration only driver.

PF/VFs communicates with AF via a shared memory region (mailbox). Upon
receiving requests from PF/VF, AF does resource provisioning and other
HW configuration. AF is always attached to host, but PF/VFs may be used
by host kernel itself, or attached to VMs or to userspace applications
like DPDK etc. So AF has to handle provisioning/configuration requests
sent by any device from any domain.

This patch series adds logic for the following
 - RVU AF driver with functional blocks provisioning support.
 - Mailbox infrastructure for communication between AF and PFs.
 - CGX (MAC controller) driver which communicates with firmware for
   managing  physical ethernet interfaces. AF collects info from this
   driver and forwards the same to the PF/VFs uaing these interfaces.

This is the first set of patches out of 80+ patches.

Changes from v7:
 1 Removed unecessary typecasts in mbox infra code.
   - Suggested by David Miller
 2 Fixed MAINTAINERS patch
   - Suggested by Joe Perches

Changes from v6:
 Fixed ordering of local variables from longest to shortest line.
   - Suggested by David Miller

Changes from v5:
 Modified bitfield based command structures to bitmasks for communication
 with firmware, to address endianness issues.
   - Suggested by Arnd Bergmann

Changes from v4:
 1 Removed module author/version/description from CGX driver as it's now
   merged with AF driver module.
   - Suggested by Arnd Bergmann
 2 Added big-endian bitfields for CGX's kernel <=> firmware communication
   command structures.
   - Suggested by Arnd Bergmann

Changes from v3:
 Moved driver from drivers/soc to drivers/net/ethernet
   - Suggested by Arnd Bergmann
 https://patchwork.kernel.org/cover/10587635/ 

Changes from v2:
 No changes, submitted again with netdev mailing list in loop.
   - Suggested by Arnd Bergmann and Andrew Lunn

Changes from v1:
 1 Merged RVU admin function and CGX drivers into a single module
   - Suggested by Arnd Bergmann
 2 Pulled mbox communication APIs into a separate module to remove
   admin function driver dependency in a VM where AF is not attached.
   - Suggested by Arnd Bergmann

Aleksey Makarov (2):
  octeontx2-af: Add mailbox support infra
  octeontx2-af: Convert mbox msg id check to a macro

Geetha sowjanya (1):
  octeontx2-af: Reconfig MSIX base with IOVA

Linu Cherian (3):
  octeontx2-af: Set RVU PFs to CGX LMACs mapping
  octeontx2-af: Add support for CGX link management
  octeontx2-af: Register for CGX lmac events

Sunil Goutham (9):
  octeontx2-af: Add Marvell OcteonTX2 RVU AF driver
  octeontx2-af: Reset all RVU blocks
  octeontx2-af: Gather RVU blocks HW info
  octeontx2-af: Add mailbox IRQ and msg handlers
  octeontx2-af: Scan blocks for LFs provisioned to PF/VF
  octeontx2-af: Add RVU block LF provisioning support
  octeontx2-af: Configure block LF's MSIX vector offset
  octeontx2-af: Add Marvell OcteonTX2 CGX driver
  MAINTAINERS: Add entry for Marvell OcteonTX2 Admin Function driver

 MAINTAINERS|9 +
 drivers/net/ethernet/marvell/Kconfig   |3 +
 drivers/net/ethernet/marvell/Makefile  |1 +
 drivers/net/ethernet/marvell/octeontx2/Kconfig |   16 +
 drivers/net/ethernet/marvell/octeontx2/Makefile|6 +
 drivers/net/ethernet/marvell/octeontx2/af/Makefile |   10 +
 drivers/net/ethernet/marvell/octeontx2/af/cgx.c|  505 ++
 drivers/net/ethernet/marvell/octeontx2/af/cgx.h|   65 +
 .../net/ethernet/marvell/octeontx2/af/cgx_fw_if.h  |  186 +++
 drivers/net/ethernet/marvell/octeontx2/af/mbox.c   |  303 
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h   |  211 +++
 drivers/net/ethernet/marvell/octeontx2/af/rvu.c| 1637

Re: [PATCH v7 15/15] MAINTAINERS: Add entry for Marvell OcteonTX2 Admin Function driver

2018-10-07 Thread Sunil Kovvuri

On Sat, Oct 6, 2018 at 1:05 PM Joe Perches  wrote:
>
> On Sat, 2018-10-06 at 11:36 +0530, sunil.kovv...@gmail.com wrote:
> > Added maintainers entry for Marvell OcteonTX2 SOC's RVU
> > admin function driver.
> []
> > diff --git a/MAINTAINERS b/MAINTAINERS
> []
> > @@ -8844,6 +8844,15 @@ S: Supported
> >  F:   drivers/mmc/host/sdhci-xenon*
> >  F:   Documentation/devicetree/bindings/mmc/marvell,xenon-sdhci.txt
> >
> > +MARVELL OCTEONTX2 RVU ADMIN FUNCTION DRIVER
> > +M:   Sunil Goutham 
> > +M:   Linu Cherian 
> > +M:   Geetha sowjanya  > +M:   Jerin Jacob 
> > +L:   netdev@vger.kernel.org
> > +S:   Maintained
>
> Aren't you all being paid?
>
> So shouldn't this be
>
> S:  Supported
>
> ?
>
> > +F:   drivers/net/ethernet/marvell/octeontx2/af
>
> Please add a terminating / to show that this
> is a directory and not a file.
>
> F:  drivers/net/ethernet/marvell/octeontx2/af/
>

Thanks for looking at this, will fix these.

Sunil.

[PATCH v2 2/2] netdev/phy: add MDIO bus multiplexer driven by a regmap

2018-10-07 Thread Pankaj Bansal

Add support for an MDIO bus multiplexer controlled by a regmap
device, like an FPGA.

Tested on a NXP LX2160AQDS board which uses the "QIXIS" FPGA
attached to the i2c bus.

Signed-off-by: Pankaj Bansal 
---

Notes:
V2:
 - replaced be32_to_cpup with of_property_read_u32
 - incorporated Andrew's comments

 drivers/net/phy/Kconfig   |  13 +++
 drivers/net/phy/Makefile  |   1 +
 drivers/net/phy/mdio-mux-regmap.c | 171 
 3 files changed, 185 insertions(+)

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 82070792edbb..d1ac9e70cbb2 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -87,6 +87,19 @@ config MDIO_BUS_MUX_MMIOREG
 
  Currently, only 8/16/32 bits registers are supported.
 
+config MDIO_BUS_MUX_REGMAP
+   tristate "REGMAP controlled MDIO bus multiplexers"
+   depends on OF_MDIO && REGMAP
+   select MDIO_BUS_MUX
+   help
+ This module provides a driver for MDIO bus multiplexers that
+ are controlled via a regmap device, like an FPGA connected to i2c.
+ The multiplexer connects one of several child MDIO busses to a
+ parent bus.Child bus selection is under the control of one of
+ the FPGA's registers.
+
+ Currently, only upto 32 bits registers are supported.
+
 config MDIO_CAVIUM
tristate
 
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 5805c0b7d60e..33053f9f320d 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_MDIO_BUS_MUX)+= mdio-mux.o
 obj-$(CONFIG_MDIO_BUS_MUX_BCM_IPROC)   += mdio-mux-bcm-iproc.o
 obj-$(CONFIG_MDIO_BUS_MUX_GPIO)+= mdio-mux-gpio.o
 obj-$(CONFIG_MDIO_BUS_MUX_MMIOREG) += mdio-mux-mmioreg.o
+obj-$(CONFIG_MDIO_BUS_MUX_REGMAP) += mdio-mux-regmap.o
 obj-$(CONFIG_MDIO_CAVIUM)  += mdio-cavium.o
 obj-$(CONFIG_MDIO_GPIO)+= mdio-gpio.o
 obj-$(CONFIG_MDIO_HISI_FEMAC)  += mdio-hisi-femac.o
diff --git a/drivers/net/phy/mdio-mux-regmap.c 
b/drivers/net/phy/mdio-mux-regmap.c
new file mode 100644
index ..6068d05a728a
--- /dev/null
+++ b/drivers/net/phy/mdio-mux-regmap.c
@@ -0,0 +1,171 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+/* Simple regmap based MDIO MUX driver
+ *
+ * Copyright 2018 NXP
+ *
+ * Based on mdio-mux-mmioreg.c by Timur Tabi
+ *
+ * Author:
+ * Pankaj Bansal 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct mdio_mux_regmap_state {
+   void*mux_handle;
+   struct regmap   *regmap;
+   u32 mux_reg;
+   u32 mask;
+};
+
+/* MDIO multiplexing switch function
+ *
+ * This function is called by the mdio-mux layer when it thinks the mdio bus
+ * multiplexer needs to switch.
+ *
+ * 'current_child' is the current value of the mux register (masked via
+ * s->mask).
+ *
+ * 'desired_child' is the value of the 'reg' property of the target child MDIO
+ * node.
+ *
+ * The first time this function is called, current_child == -1.
+ *
+ * If current_child == desired_child, then the mux is already set to the
+ * correct bus.
+ */
+static int mdio_mux_regmap_switch_fn(int current_child, int desired_child,
+void *data)
+{
+   struct mdio_mux_regmap_state *s = data;
+   bool change;
+   int ret;
+
+   ret = regmap_update_bits_check(s->regmap,
+  s->mux_reg,
+  s->mask,
+  desired_child,
+  );
+
+   if (ret)
+   return ret;
+   if (change)
+   pr_debug("%s %d -> %d\n", __func__, current_child,
+desired_child);
+   return ret;
+}
+
+static int mdio_mux_regmap_probe(struct platform_device *pdev)
+{
+   struct device_node *np2, *np = pdev->dev.of_node;
+   struct mdio_mux_regmap_state *s;
+   int ret;
+   u32 val;
+
+   dev_dbg(>dev, "probing node %pOF\n", np);
+
+   s = devm_kzalloc(>dev, sizeof(*s), GFP_KERNEL);
+   if (!s)
+   return -ENOMEM;
+
+   s->regmap = dev_get_regmap(pdev->dev.parent, NULL);
+   if (IS_ERR(s->regmap)) {
+   dev_err(>dev, "Failed to get parent regmap\n");
+   return PTR_ERR(s->regmap);
+   }
+
+   ret = of_property_read_u32(np, "reg", >mux_reg);
+   if (ret) {
+   dev_err(>dev, "missing or invalid reg property\n");
+   return -ENODEV;
+   }
+
+   /* Test Register read write */
+   ret = regmap_read(s->regmap, s->mux_reg, );
+   if (ret) {
+   dev_err(>dev, "error while reading reg\n");
+   return ret;
+   }
+
+   ret = regmap_write(s->regmap, s->mux_reg, val);
+   if (ret) {
+   dev_err(>dev, "error while writing reg\n");
+   return ret;
+   }
+
+   ret

[PATCH v2 1/2] dt-bindings: net: add MDIO bus multiplexer driven by a regmap device

2018-10-07 Thread Pankaj Bansal

Add support for an MDIO bus multiplexer controlled by a regmap
device, like an FPGA.

Tested on a NXP LX2160AQDS board which uses the "QIXIS" FPGA
attached to the i2c bus.

Signed-off-by: Pankaj Bansal 
---

Notes:
V2:
 - Fixed formatting error caused by using space instead of tab
 - Add newline between property list and subnode
 - Add newline between two subnodes

 .../bindings/net/mdio-mux-regmap.txt | 95 ++
 1 file changed, 95 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/mdio-mux-regmap.txt 
b/Documentation/devicetree/bindings/net/mdio-mux-regmap.txt
new file mode 100644
index ..88ebac26c1c5
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/mdio-mux-regmap.txt
@@ -0,0 +1,95 @@
+Properties for an MDIO bus multiplexer controlled by a regmap
+
+This is a special case of a MDIO bus multiplexer.  A regmap device,
+like an FPGA, is used to control which child bus is connected.  The mdio-mux
+node must be a child of the device that is controlled by a regmap.
+The driver currently only supports devices with upto 32-bit registers.
+
+Required properties in addition to the generic multiplexer properties:
+
+- compatible : string, must contain "mdio-mux-regmap"
+
+- reg : integer, contains the offset of the register that controls the bus
+   multiplexer. it can be 32 bit number.
+
+- mux-mask : integer, contains an 32 bit mask that specifies which
+   bits in the register control the actual bus multiplexer.  The
+   'reg' property of each child mdio-mux node must be constrained by
+   this mask.
+
+Example:
+
+The FPGA node defines a i2c connected FPGA with a register space of 0x30 bytes.
+For the "EMI2" MDIO bus, register 0x54 (BRDCFG4) controls the mux on that bus.
+A bitmask of 0x07 means that bits 0, 1 and 2 (bit 0 is lsb) are the bits on
+BRDCFG4 that control the actual mux.
+
+i2c@200 {
+   compatible = "fsl,vf610-i2c";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0x0 0x200 0x0 0x1>;
+   interrupts = <0 34 0x4>; // Level high type
+   clock-names = "i2c";
+   clocks = < 4 7>;
+   fsl-scl-gpio = < 15 0>;
+   status = "okay";
+
+   /* The FPGA node */
+   fpga@66 {
+   compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c";
+   reg = <0x66>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   mdio1_mux@54 {
+   compatible = "mdio-mux-regmap", "mdio-mux";
+   mdio-parent-bus = <>; /* MDIO bus */
+   reg = <0x54>;/* BRDCFG4 */
+   mux-mask = <0x07>;  /* EMI2_MDIO */
+   #address-cells=<1>;
+   #size-cells = <0>;
+
+   mdio1_ioslot1@0 {   // Slot 1
+   reg = <0x00>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   phy1@1 {
+   reg = <1>;
+   compatible = "ethernet-phy-id0210.7441";
+   };
+
+   phy1@0 {
+   reg = <0>;
+   compatible = "ethernet-phy-id0210.7441";
+   };
+   };
+
+   mdio1_ioslot2@1 {   // Slot 2
+   reg = <0x01>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   };
+
+   mdio1_ioslot3@2 {   // Slot 3
+   reg = <0x02>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   };
+   };
+   };
+};
+
+   /* The parent MDIO bus. */
+   emdio2: mdio@0x8B97000 {
+   compatible = "fsl,fman-memac-mdio";
+   reg = <0x0 0x8B97000 0x0 0x1000>;
+   device_type = "mdio";
+   little-endian;
+
+   #address-cells = <1>;
+   #size-cells = <0>;
+   };
-- 
2.17.1

1 2 >

1 - 100 of 132 matches

Mail list logo