Re: [PATCH net-next 00/11] net: sched: cls_u32 Various improvements
From: Al Viro Date: Mon, 8 Oct 2018 06:45:15 +0100 > Er... Both are due to missing in the very beginning of the series (well, on > top of "net: sched: cls_u32: fix hnode refcounting") commit All dependencies like this must be explicitly stated. And in such situations you actually should wait for the dependency to get into 'net', eventually get merged into 'net-next', and then you can submit the stuff that depends upon it. Not the way this was done.
Re: [PATCH net-next 00/11] net: sched: cls_u32 Various improvements
On Sun, Oct 07, 2018 at 09:25:01PM -0700, David Miller wrote: > From: Jamal Hadi Salim > Date: Sun, 7 Oct 2018 12:38:00 -0400 > > > Various improvements from Al. > > Please submit changes that actually are compile tested: > > CC [M] net/sched/cls_u32.o > net/sched/cls_u32.c: In function ‘u32_delete’: > net/sched/cls_u32.c:674:6: error: ‘root_ht’ undeclared (first use in this > function); did you mean ‘root_user’? > if (root_ht == ht) { > ^~~ > root_user > net/sched/cls_u32.c:674:6: note: each undeclared identifier is reported only > once for each function it appears in > net/sched/cls_u32.c: In function ‘u32_set_parms’: > net/sched/cls_u32.c:746:15: error: ‘struct tc_u_hnode’ has no member named > ‘is_root’ > if (ht_down->is_root) { >^~ Er... Both are due to missing in the very beginning of the series (well, on top of "net: sched: cls_u32: fix hnode refcounting") commit Author: Al Viro Date: Mon Sep 3 14:39:02 2018 -0400 net: sched: cls_u32: mark root hnode explicitly ... and produce consistent error on attempt to delete such. Existing check in u32_delete() is inconsistent - after tc qdisc add dev eth0 ingress tc filter add dev eth0 parent : protocol ip prio 100 handle 1: u32 divisor 1 tc filter add dev eth0 parent : protocol ip prio 200 handle 2: u32 divisor 1 both tc filter delete dev eth0 parent : protocol ip prio 100 handle 801: u32 and tc filter delete dev eth0 parent : protocol ip prio 100 handle 800: u32 will fail (at least with refcounting fixes), but the former will complain about an attempt to remove a busy table, while the latter will recognize it as root and yield "Not allowed to delete root node" instead. The problem with the existing check is that several tcf_proto instances might share the same tp->data and handle-to-hnode lookup will be the same for all of them. So comparing an hnode to be deleted with tp->root won't catch the case when one tp is used to try deleting the root of another. Solution is trivial - mark the root hnodes explicitly upon allocation and check for that. Signed-off-by: Al Viro diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index b2c3406a2cf2..c4782aa808c7 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -84,6 +84,7 @@ struct tc_u_hnode { int refcnt; unsigned intdivisor; struct idr handle_idr; + boolis_root; struct rcu_head rcu; u32 flags; /* The 'ht' field MUST be the last field in structure to allow for @@ -377,6 +378,7 @@ static int u32_init(struct tcf_proto *tp) root_ht->refcnt++; root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : 0x8000; root_ht->prio = tp->prio; + root_ht->is_root = true; idr_init(_ht->handle_idr); if (tp_c == NULL) { @@ -693,7 +695,7 @@ static int u32_delete(struct tcf_proto *tp, void *arg, bool *last, goto out; } - if (root_ht == ht) { + if (ht->is_root) { NL_SET_ERR_MSG_MOD(extack, "Not allowed to delete root node"); return -EINVAL; }
Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.
On Sun, Oct 07, 2018 at 02:20:33PM -0700, Richard Cochran wrote: > On Sun, Oct 07, 2018 at 11:14:05PM +0200, Andrew Lunn wrote: > > I'm currently thinking register_mii_timestamper() should take a netdev > > argument, and the net_device structure should gain a struct > > mii_timestamper. We are going round in circles on this point. V1 had it this way, but nobody liked it. You specifically asked to move the new pointer out of the netdev and into phydev. > > But we have to look at the lifetime problems. A phydev does not know > > what netdev it is associated to until phy_connect() is called. It is > > at that point you can call register_mii_timestamper(). I had used a netdev notifier on NETDEV_UP for this, but Florian seemed to suggest using phy_{connect,attach,disconnect} instead. Thanks, Richard
Re: [PATCH] net: vhost: remove bad code line
From: xiangxia.m@gmail.com Date: Sun, 7 Oct 2018 18:41:50 -0700 > From: Tonghao Zhang > > Signed-off-by: Tonghao Zhang Applied.
Re: [PATCH net-next 00/11] net: sched: cls_u32 Various improvements
From: Jamal Hadi Salim Date: Sun, 7 Oct 2018 12:38:00 -0400 > Various improvements from Al. Please submit changes that actually are compile tested: CC [M] net/sched/cls_u32.o net/sched/cls_u32.c: In function ‘u32_delete’: net/sched/cls_u32.c:674:6: error: ‘root_ht’ undeclared (first use in this function); did you mean ‘root_user’? if (root_ht == ht) { ^~~ root_user net/sched/cls_u32.c:674:6: note: each undeclared identifier is reported only once for each function it appears in net/sched/cls_u32.c: In function ‘u32_set_parms’: net/sched/cls_u32.c:746:15: error: ‘struct tc_u_hnode’ has no member named ‘is_root’ if (ht_down->is_root) { ^~
Re: [PATCH v8 05/15] octeontx2-af: Add mailbox IRQ and msg handlers
From: sunil.kovv...@gmail.com Date: Sun, 7 Oct 2018 20:29:14 +0530 > + req_hdr = (struct mbox_hdr *)(mdev->mbase + mbox->rx_start); I commented in the previous patch series version, for patch #4, that such casts are unnecessary and should be removed. You are in for a very long review process if you only consider feedback given to you for the specific patch for which it is given, rather than your entire series. Please put forth the effort to fix the problems pointed out to you in your entire series. Thank you.
Re: [PATCH net 1/1] net: sched: cls_u32: fix hnode refcounting
From: Jamal Hadi Salim Date: Sun, 7 Oct 2018 07:40:17 -0400 > From: Al Viro > > cls_u32.c misuses refcounts for struct tc_u_hnode - it counts references > via ->hlist and via ->tp_root together. u32_destroy() drops the former > and, in case when there had been links, leaves the sucker on the list. > As the result, there's nothing to protect it from getting freed once links > are dropped. > That also makes the "is it busy" check incapable of catching the root > hnode - it *is* busy (there's a reference from tp), but we don't see it as > something separate. "Is it our root?" check partially covers that, but > the problem exists for others' roots as well. > > AFAICS, the minimal fix preserving the existing behaviour (where it doesn't > include oopsen, that is) would be this: > * count tp->root and tp_c->hlist as separate references. I.e. > have u32_init() set refcount to 2, not 1. > * in u32_destroy() we always drop the former; > in u32_destroy_hnode() - the latter. > > That way we have *all* references contributing to refcount. List > removal happens in u32_destroy_hnode() (called only when ->refcnt is 1) > an in u32_destroy() in case of tc_u_common going away, along with > everything reachable from it. IOW, that way we know that > u32_destroy_key() won't free something still on the list (or pointed to by > someone's ->root). > > Reproducer: ... > Signed-off-by: Al Viro > Signed-off-by: Jamal Hadi Salim Applied and queued up for -stable.
[PATCH v2 net-next 17/23] net/namespace: Update rtnl_net_dumpid for strict data checking
From: David Ahern Update rtnl_net_dumpid for strict data checking. If the flag is set, the dump request is expected to have an rtgenmsg struct as the header which has the family as the only element. No data may be appended. Signed-off-by: David Ahern --- net/core/net_namespace.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 670c84b1bfc2..fefe72774aeb 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -853,6 +853,12 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct netlink_callback *cb) .s_idx = cb->args[0], }; + if (cb->strict_check && + nlmsg_attrlen(cb->nlh, sizeof(struct rtgenmsg))) { + NL_SET_ERR_MSG(cb->extack, "Unknown data in network namespace id dump request"); + return -EINVAL; + } + spin_lock_bh(>nsid_lock); idr_for_each(>netns_ids, rtnl_net_dumpid_one, _cb); spin_unlock_bh(>nsid_lock); -- 2.11.0
[PATCH v2 net-next 06/23] netlink: Add new socket option to enable strict checking on dumps
From: David Ahern Add a new socket option, NETLINK_DUMP_STRICT_CHK, that userspace can use via setsockopt to request strict checking of headers and attributes on dump requests. To get dump features such as kernel side filtering based on data in the header or attributes appended to the dump request, userspace must call setsockopt() for NETLINK_DUMP_STRICT_CHK and a non-zero value. Since the netlink sock and its flags are private to the af_netlink code, the strict checking flag is passed to dump handlers via a flag in the netlink_callback struct. For old userspace on new kernel there is no impact as all of the data checks in later patches are wrapped in a check on the new strict flag. For new userspace on old kernel, the setsockopt will fail and even if new userspace sets data in the headers and appended attributes the kernel will silently ignore it. Moving forward when the setsockopt succeeds, the new userspace on old kernel means the dump request can pass an attribute the kernel does not understand. The dump will then fail as the older kernel does not understand it. New userspace on new kernel setting the socket option gets the benefit of the improved data dump. Kernel side the NETLINK_DUMP_STRICT_CHK uapi is converted to a generic NETLINK_F_STRICT_CHK flag which can potentially be leveraged for tighter checking on the NEW, DEL, and SET commands. Signed-off-by: David Ahern --- include/linux/netlink.h | 1 + include/uapi/linux/netlink.h | 1 + net/netlink/af_netlink.c | 21 - net/netlink/af_netlink.h | 1 + 4 files changed, 23 insertions(+), 1 deletion(-) diff --git a/include/linux/netlink.h b/include/linux/netlink.h index 88c8a2d83eb3..72580f1a72a2 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -179,6 +179,7 @@ struct netlink_callback { struct netlink_ext_ack *extack; u16 family; u16 min_dump_alloc; + boolstrict_check; unsigned intprev_seq, seq; longargs[6]; }; diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h index 776bc92e9118..486ed1f0c0bc 100644 --- a/include/uapi/linux/netlink.h +++ b/include/uapi/linux/netlink.h @@ -155,6 +155,7 @@ enum nlmsgerr_attrs { #define NETLINK_LIST_MEMBERSHIPS 9 #define NETLINK_CAP_ACK10 #define NETLINK_EXT_ACK11 +#define NETLINK_DUMP_STRICT_CHK12 struct nl_pktinfo { __u32 group; diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 7ac585f33a9e..e613a9f89600 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1706,6 +1706,13 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname, nlk->flags &= ~NETLINK_F_EXT_ACK; err = 0; break; + case NETLINK_DUMP_STRICT_CHK: + if (val) + nlk->flags |= NETLINK_F_STRICT_CHK; + else + nlk->flags &= ~NETLINK_F_STRICT_CHK; + err = 0; + break; default: err = -ENOPROTOOPT; } @@ -1799,6 +1806,15 @@ static int netlink_getsockopt(struct socket *sock, int level, int optname, return -EFAULT; err = 0; break; + case NETLINK_DUMP_STRICT_CHK: + if (len < sizeof(int)) + return -EINVAL; + len = sizeof(int); + val = nlk->flags & NETLINK_F_STRICT_CHK ? 1 : 0; + if (put_user(len, optlen) || put_user(val, optval)) + return -EFAULT; + err = 0; + break; default: err = -ENOPROTOOPT; } @@ -2282,9 +2298,9 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, const struct nlmsghdr *nlh, struct netlink_dump_control *control) { + struct netlink_sock *nlk, *nlk2; struct netlink_callback *cb; struct sock *sk; - struct netlink_sock *nlk; int ret; refcount_inc(>users); @@ -2318,6 +2334,9 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, cb->min_dump_alloc = control->min_dump_alloc; cb->skb = skb; + nlk2 = nlk_sk(NETLINK_CB(skb).sk); + cb->strict_check = !!(nlk2->flags & NETLINK_F_STRICT_CHK); + if (control->start) { ret = control->start(cb); if (ret) diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h index 962de7b3c023..5f454c8de6a4 100644 --- a/net/netlink/af_netlink.h +++ b/net/netlink/af_netlink.h @@ -15,6 +15,7 @@ #define NETLINK_F_LISTEN_ALL_NSID 0x10 #define NETLINK_F_CAP_ACK 0x20 #define NETLINK_F_EXT_ACK 0x40 +#define NETLINK_F_STRICT_CHK
[PATCH v2 net-next 08/23] net/ipv6: Update inet6_dump_addr for strict data checking
From: David Ahern Update inet6_dump_addr for strict data checking. If the flag is set, the dump request is expected to have an ifaddrmsg struct as the header potentially followed by one or more attributes. Any data passed in the header or as an attribute is taken as a request to influence the data returned. Only values suppored by the dump handler are allowed to be non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID attribute is supported. Follow on patches can add support for other fields (e.g., honor ifa_index and only return data for the given device index). Signed-off-by: David Ahern --- net/ipv6/addrconf.c | 69 + 1 file changed, 59 insertions(+), 10 deletions(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index afa279170ba5..095d3f56f0a9 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -4998,9 +4998,62 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb, return err; } +static int inet6_valid_dump_ifaddr_req(const struct nlmsghdr *nlh, + struct inet6_fill_args *fillargs, + struct net **tgt_net, struct sock *sk, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[IFA_MAX+1]; + struct ifaddrmsg *ifm; + int err, i; + + if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ifm))) { + NL_SET_ERR_MSG_MOD(extack, "Invalid header for address dump request"); + return -EINVAL; + } + + ifm = nlmsg_data(nlh); + if (ifm->ifa_prefixlen || ifm->ifa_flags || ifm->ifa_scope) { + NL_SET_ERR_MSG_MOD(extack, "Invalid values in header for address dump request"); + return -EINVAL; + } + if (ifm->ifa_index) { + NL_SET_ERR_MSG_MOD(extack, "Filter by device index not supported for address dump"); + return -EINVAL; + } + + err = nlmsg_parse_strict(nlh, sizeof(*ifm), tb, IFA_MAX, +ifa_ipv6_policy, extack); + if (err < 0) + return err; + + for (i = 0; i <= IFA_MAX; ++i) { + if (!tb[i]) + continue; + + if (i == IFA_TARGET_NETNSID) { + struct net *net; + + fillargs->netnsid = nla_get_s32(tb[i]); + net = rtnl_get_net_ns_capable(sk, fillargs->netnsid); + if (IS_ERR(net)) { + NL_SET_ERR_MSG_MOD(extack, "Invalid target network namespace id"); + return PTR_ERR(net); + } + *tgt_net = net; + } else { + NL_SET_ERR_MSG_MOD(extack, "Unsupported attribute in dump request"); + return -EINVAL; + } + } + + return 0; +} + static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb, enum addr_type_t type) { + const struct nlmsghdr *nlh = cb->nlh; struct inet6_fill_args fillargs = { .portid = NETLINK_CB(cb->skb).portid, .seq = cb->nlh->nlmsg_seq, @@ -5009,7 +5062,6 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb, .type = type, }; struct net *net = sock_net(skb->sk); - struct nlattr *tb[IFA_MAX+1]; struct net *tgt_net = net; int h, s_h; int idx, ip_idx; @@ -5022,16 +5074,13 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb, s_idx = idx = cb->args[1]; s_ip_idx = ip_idx = cb->args[2]; - if (nlmsg_parse(cb->nlh, sizeof(struct ifaddrmsg), tb, IFA_MAX, - ifa_ipv6_policy, cb->extack) >= 0) { - if (tb[IFA_TARGET_NETNSID]) { - fillargs.netnsid = nla_get_s32(tb[IFA_TARGET_NETNSID]); + if (cb->strict_check) { + int err; - tgt_net = rtnl_get_net_ns_capable(skb->sk, - fillargs.netnsid); - if (IS_ERR(tgt_net)) - return PTR_ERR(tgt_net); - } + err = inet6_valid_dump_ifaddr_req(nlh, , _net, + skb->sk, cb->extack); + if (err < 0) + return err; } rcu_read_lock(); -- 2.11.0
[PATCH v2 net-next 13/23] rtnetlink: Update ipmr_rtm_dumplink for strict data checking
From: David Ahern Update ipmr_rtm_dumplink for strict data checking. If the flag is set, the dump request is expected to have an ifinfomsg struct as the header. All elements of the struct are expected to be 0 and no attributes can be appended. Signed-off-by: David Ahern --- net/ipv4/ipmr.c | 32 1 file changed, 32 insertions(+) diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c index 5660adcf7a04..e7322e407bb4 100644 --- a/net/ipv4/ipmr.c +++ b/net/ipv4/ipmr.c @@ -2710,6 +2710,31 @@ static bool ipmr_fill_vif(struct mr_table *mrt, u32 vifid, struct sk_buff *skb) return true; } +static int ipmr_valid_dumplink(const struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct ifinfomsg *ifm; + + if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ifm))) { + NL_SET_ERR_MSG(extack, "ipv4: Invalid header for ipmr link dump"); + return -EINVAL; + } + + if (nlmsg_attrlen(nlh, sizeof(*ifm))) { + NL_SET_ERR_MSG(extack, "Invalid data after header in ipmr link dump"); + return -EINVAL; + } + + ifm = nlmsg_data(nlh); + if (ifm->__ifi_pad || ifm->ifi_type || ifm->ifi_flags || + ifm->ifi_change || ifm->ifi_index) { + NL_SET_ERR_MSG(extack, "Invalid values in header for ipmr link dump request"); + return -EINVAL; + } + + return 0; +} + static int ipmr_rtm_dumplink(struct sk_buff *skb, struct netlink_callback *cb) { struct net *net = sock_net(skb->sk); @@ -2718,6 +2743,13 @@ static int ipmr_rtm_dumplink(struct sk_buff *skb, struct netlink_callback *cb) unsigned int e = 0, s_e; struct mr_table *mrt; + if (cb->strict_check) { + int err = ipmr_valid_dumplink(cb->nlh, cb->extack); + + if (err < 0) + return err; + } + s_t = cb->args[0]; s_e = cb->args[1]; -- 2.11.0
[PATCH v2 net-next 01/23] netlink: Pass extack to dump handlers
From: David Ahern Declare extack in netlink_dump and pass to dump handlers via netlink_callback. Add any extack message after the dump_done_errno allowing error messages to be returned. This will be useful when strict checking is done on dump requests, returning why the dump fails EINVAL. Signed-off-by: David Ahern Acked-by: Christian Brauner --- include/linux/netlink.h | 1 + net/netlink/af_netlink.c | 12 +++- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/include/linux/netlink.h b/include/linux/netlink.h index 71f121b66ca8..88c8a2d83eb3 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -176,6 +176,7 @@ struct netlink_callback { void*data; /* the module that dump function belong to */ struct module *module; + struct netlink_ext_ack *extack; u16 family; u16 min_dump_alloc; unsigned intprev_seq, seq; diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index e3a0538ec0be..7ac585f33a9e 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2171,6 +2171,7 @@ EXPORT_SYMBOL(__nlmsg_put); static int netlink_dump(struct sock *sk) { struct netlink_sock *nlk = nlk_sk(sk); + struct netlink_ext_ack extack = {}; struct netlink_callback *cb; struct sk_buff *skb = NULL; struct nlmsghdr *nlh; @@ -,8 +2223,11 @@ static int netlink_dump(struct sock *sk) skb_reserve(skb, skb_tailroom(skb) - alloc_size); netlink_skb_set_owner_r(skb, sk); - if (nlk->dump_done_errno > 0) + if (nlk->dump_done_errno > 0) { + cb->extack = nlk->dump_done_errno = cb->dump(skb, cb); + cb->extack = NULL; + } if (nlk->dump_done_errno > 0 || skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) { @@ -2246,6 +2250,12 @@ static int netlink_dump(struct sock *sk) memcpy(nlmsg_data(nlh), >dump_done_errno, sizeof(nlk->dump_done_errno)); + if (extack._msg && nlk->flags & NETLINK_F_EXT_ACK) { + nlh->nlmsg_flags |= NLM_F_ACK_TLVS; + if (!nla_put_string(skb, NLMSGERR_ATTR_MSG, extack._msg)) + nlmsg_end(skb, nlh); + } + if (sk_filter(sk, skb)) kfree_skb(skb); else -- 2.11.0
[PATCH v2 net-next 04/23] netlink: Add strict version of nlmsg_parse and nla_parse
From: David Ahern nla_parse is currently lenient on message parsing, allowing type to be 0 or greater than max expected and only logging a message "netlink: %d bytes leftover after parsing attributes in process `%s'." if the netlink message has unknown data at the end after parsing. What this could mean is that the header at the front of the attributes is actually wrong and the parsing is shifted from what is expected. Add a new strict version that actually fails with EINVAL if there are any bytes remaining after the parsing loop completes, if the atttrbitue type is 0 or greater than max expected. Signed-off-by: David Ahern --- include/net/netlink.h | 17 + lib/nlattr.c | 48 2 files changed, 53 insertions(+), 12 deletions(-) diff --git a/include/net/netlink.h b/include/net/netlink.h index 9522a0bf1f3a..f1db8e594847 100644 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -373,6 +373,9 @@ int nla_validate(const struct nlattr *head, int len, int maxtype, int nla_parse(struct nlattr **tb, int maxtype, const struct nlattr *head, int len, const struct nla_policy *policy, struct netlink_ext_ack *extack); +int nla_parse_strict(struct nlattr **tb, int maxtype, const struct nlattr *head, +int len, const struct nla_policy *policy, +struct netlink_ext_ack *extack); int nla_policy_len(const struct nla_policy *, int); struct nlattr *nla_find(const struct nlattr *head, int len, int attrtype); size_t nla_strlcpy(char *dst, const struct nlattr *nla, size_t dstsize); @@ -525,6 +528,20 @@ static inline int nlmsg_parse(const struct nlmsghdr *nlh, int hdrlen, nlmsg_attrlen(nlh, hdrlen), policy, extack); } +static inline int nlmsg_parse_strict(const struct nlmsghdr *nlh, int hdrlen, +struct nlattr *tb[], int maxtype, +const struct nla_policy *policy, +struct netlink_ext_ack *extack) +{ + if (nlh->nlmsg_len < nlmsg_msg_size(hdrlen)) { + NL_SET_ERR_MSG(extack, "Invalid header length"); + return -EINVAL; + } + + return nla_parse_strict(tb, maxtype, nlmsg_attrdata(nlh, hdrlen), + nlmsg_attrlen(nlh, hdrlen), policy, extack); +} + /** * nlmsg_find_attr - find a specific attribute in a netlink message * @nlh: netlink message header diff --git a/lib/nlattr.c b/lib/nlattr.c index 1e900bb414ef..d26de6156b97 100644 --- a/lib/nlattr.c +++ b/lib/nlattr.c @@ -391,9 +391,10 @@ EXPORT_SYMBOL(nla_policy_len); * * Returns 0 on success or a negative error code. */ -int nla_parse(struct nlattr **tb, int maxtype, const struct nlattr *head, - int len, const struct nla_policy *policy, - struct netlink_ext_ack *extack) +static int __nla_parse(struct nlattr **tb, int maxtype, + const struct nlattr *head, int len, + bool strict, const struct nla_policy *policy, + struct netlink_ext_ack *extack) { const struct nlattr *nla; int rem; @@ -403,27 +404,50 @@ int nla_parse(struct nlattr **tb, int maxtype, const struct nlattr *head, nla_for_each_attr(nla, head, len, rem) { u16 type = nla_type(nla); - if (type > 0 && type <= maxtype) { - if (policy) { - int err = validate_nla(nla, maxtype, policy, - extack); - - if (err < 0) - return err; + if (type == 0 || type > maxtype) { + if (strict) { + NL_SET_ERR_MSG(extack, "Unknown attribute type"); + return -EINVAL; } + continue; + } + if (policy) { + int err = validate_nla(nla, maxtype, policy, extack); - tb[type] = (struct nlattr *)nla; + if (err < 0) + return err; } + + tb[type] = (struct nlattr *)nla; } - if (unlikely(rem > 0)) + if (unlikely(rem > 0)) { pr_warn_ratelimited("netlink: %d bytes leftover after parsing attributes in process `%s'.\n", rem, current->comm); + NL_SET_ERR_MSG(extack, "bytes leftover after parsing attributes"); + if (strict) + return -EINVAL; + } return 0; } + +int nla_parse(struct nlattr **tb, int maxtype, const struct nlattr *head, + int len, const struct nla_policy *policy, + struct netlink_ext_ack *extack) +{ + return
[PATCH v2 net-next 02/23] netlink: Add extack message to nlmsg_parse for invalid header length
From: David Ahern Give a user a reason why EINVAL is returned in nlmsg_parse. Signed-off-by: David Ahern Acked-by: Christian Brauner --- include/net/netlink.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/net/netlink.h b/include/net/netlink.h index 589683091f16..9522a0bf1f3a 100644 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -516,8 +516,10 @@ static inline int nlmsg_parse(const struct nlmsghdr *nlh, int hdrlen, const struct nla_policy *policy, struct netlink_ext_ack *extack) { - if (nlh->nlmsg_len < nlmsg_msg_size(hdrlen)) + if (nlh->nlmsg_len < nlmsg_msg_size(hdrlen)) { + NL_SET_ERR_MSG(extack, "Invalid header length"); return -EINVAL; + } return nla_parse(tb, maxtype, nlmsg_attrdata(nlh, hdrlen), nlmsg_attrlen(nlh, hdrlen), policy, extack); -- 2.11.0
[PATCH v2 net-next 07/23] net/ipv4: Update inet_dump_ifaddr for strict data checking
From: David Ahern Update inet_dump_ifaddr for strict data checking. If the flag is set, the dump request is expected to have an ifaddrmsg struct as the header potentially followed by one or more attributes. Any data passed in the header or as an attribute is taken as a request to influence the data returned. Only values supported by the dump handler are allowed to be non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID attribute is supported. Follow on patches can support for other fields (e.g., honor ifa_index and only return data for the given device index). Signed-off-by: David Ahern --- net/ipv4/devinet.c | 72 +- 1 file changed, 61 insertions(+), 11 deletions(-) diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index ab2b11df5ea4..6f2bbd04e950 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -1660,17 +1660,70 @@ static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa, return -EMSGSIZE; } +static int inet_valid_dump_ifaddr_req(const struct nlmsghdr *nlh, + struct inet_fill_args *fillargs, + struct net **tgt_net, struct sock *sk, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[IFA_MAX+1]; + struct ifaddrmsg *ifm; + int err, i; + + if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ifm))) { + NL_SET_ERR_MSG(extack, "ipv4: Invalid header for address dump request"); + return -EINVAL; + } + + ifm = nlmsg_data(nlh); + if (ifm->ifa_prefixlen || ifm->ifa_flags || ifm->ifa_scope) { + NL_SET_ERR_MSG(extack, "ipv4: Invalid values in header for address dump request"); + return -EINVAL; + } + if (ifm->ifa_index) { + NL_SET_ERR_MSG(extack, "ipv4: Filter by device index not supported for address dump"); + return -EINVAL; + } + + err = nlmsg_parse_strict(nlh, sizeof(*ifm), tb, IFA_MAX, +ifa_ipv4_policy, extack); + if (err < 0) + return err; + + for (i = 0; i <= IFA_MAX; ++i) { + if (!tb[i]) + continue; + + if (i == IFA_TARGET_NETNSID) { + struct net *net; + + fillargs->netnsid = nla_get_s32(tb[i]); + + net = rtnl_get_net_ns_capable(sk, fillargs->netnsid); + if (IS_ERR(net)) { + NL_SET_ERR_MSG(extack, "ipv4: Invalid target network namespace id"); + return PTR_ERR(net); + } + *tgt_net = net; + } else { + NL_SET_ERR_MSG(extack, "ipv4: Unsupported attribute in dump request"); + return -EINVAL; + } + } + + return 0; +} + static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb) { + const struct nlmsghdr *nlh = cb->nlh; struct inet_fill_args fillargs = { .portid = NETLINK_CB(cb->skb).portid, - .seq = cb->nlh->nlmsg_seq, + .seq = nlh->nlmsg_seq, .event = RTM_NEWADDR, .flags = NLM_F_MULTI, .netnsid = -1, }; struct net *net = sock_net(skb->sk); - struct nlattr *tb[IFA_MAX+1]; struct net *tgt_net = net; int h, s_h; int idx, s_idx; @@ -1684,16 +1737,13 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb) s_idx = idx = cb->args[1]; s_ip_idx = ip_idx = cb->args[2]; - if (nlmsg_parse(cb->nlh, sizeof(struct ifaddrmsg), tb, IFA_MAX, - ifa_ipv4_policy, cb->extack) >= 0) { - if (tb[IFA_TARGET_NETNSID]) { - fillargs.netnsid = nla_get_s32(tb[IFA_TARGET_NETNSID]); + if (cb->strict_check) { + int err; - tgt_net = rtnl_get_net_ns_capable(skb->sk, - fillargs.netnsid); - if (IS_ERR(tgt_net)) - return PTR_ERR(tgt_net); - } + err = inet_valid_dump_ifaddr_req(nlh, , _net, +skb->sk, cb->extack); + if (err < 0) + return err; } for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) { -- 2.11.0
[PATCH v2 net-next 21/23] net/bridge: Update br_mdb_dump for strict data checking
From: David Ahern Update br_mdb_dump for strict data checking. If the flag is set, the dump request is expected to have a br_port_msg struct as the header. All elements of the struct are expected to be 0 and no attributes can be appended. Signed-off-by: David Ahern --- net/bridge/br_mdb.c | 30 ++ 1 file changed, 30 insertions(+) diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c index a4a848bf827b..a7ea2d431714 100644 --- a/net/bridge/br_mdb.c +++ b/net/bridge/br_mdb.c @@ -162,6 +162,29 @@ static int br_mdb_fill_info(struct sk_buff *skb, struct netlink_callback *cb, return err; } +static int br_mdb_valid_dump_req(const struct nlmsghdr *nlh, +struct netlink_ext_ack *extack) +{ + struct br_port_msg *bpm; + + if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*bpm))) { + NL_SET_ERR_MSG_MOD(extack, "Invalid header for mdb dump request"); + return -EINVAL; + } + + bpm = nlmsg_data(nlh); + if (bpm->ifindex) { + NL_SET_ERR_MSG_MOD(extack, "Filtering by device index is not supported for mdb dump request"); + return -EINVAL; + } + if (nlmsg_attrlen(nlh, sizeof(*bpm))) { + NL_SET_ERR_MSG(extack, "Invalid data after header in mdb dump request"); + return -EINVAL; + } + + return 0; +} + static int br_mdb_dump(struct sk_buff *skb, struct netlink_callback *cb) { struct net_device *dev; @@ -169,6 +192,13 @@ static int br_mdb_dump(struct sk_buff *skb, struct netlink_callback *cb) struct nlmsghdr *nlh = NULL; int idx = 0, s_idx; + if (cb->strict_check) { + int err = br_mdb_valid_dump_req(cb->nlh, cb->extack); + + if (err < 0) + return err; + } + s_idx = cb->args[0]; rcu_read_lock(); -- 2.11.0
[PATCH v2 net-next 15/23] net/neighbor: Update neigh_dump_info for strict data checking
From: David Ahern Update neigh_dump_info for strict data checking. If the flag is set, the dump request is expected to have an ndmsg struct as the header potentially followed by one or more attributes. Any data passed in the header or as an attribute is taken as a request to influence the data returned. Only values supported by the dump handler are allowed to be non-0 or set in the request. At the moment only the NDA_IFINDEX and NDA_MASTER attributes are supported. Existing code does not fail the dump if nlmsg_parse fails. That behavior is kept for non-strict checking. Signed-off-by: David Ahern --- net/core/neighbour.c | 82 ++-- 1 file changed, 67 insertions(+), 15 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index b06f794bf91e..7c8a3a0ee059 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -2426,11 +2426,73 @@ static int pneigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb, } +static int neigh_valid_dump_req(const struct nlmsghdr *nlh, + bool strict_check, + struct neigh_dump_filter *filter, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[NDA_MAX + 1]; + int err, i; + + if (strict_check) { + struct ndmsg *ndm; + + if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ndm))) { + NL_SET_ERR_MSG(extack, "Invalid header for neighbor dump request"); + return -EINVAL; + } + + ndm = nlmsg_data(nlh); + if (ndm->ndm_pad1 || ndm->ndm_pad2 || ndm->ndm_ifindex || + ndm->ndm_state || ndm->ndm_flags || ndm->ndm_type) { + NL_SET_ERR_MSG(extack, "Invalid values in header for neighbor dump request"); + return -EINVAL; + } + + err = nlmsg_parse_strict(nlh, sizeof(struct ndmsg), tb, NDA_MAX, +NULL, extack); + } else { + err = nlmsg_parse(nlh, sizeof(struct ndmsg), tb, NDA_MAX, + NULL, extack); + } + if (err < 0) + return err; + + for (i = 0; i <= NDA_MAX; ++i) { + if (!tb[i]) + continue; + + /* all new attributes should require strict_check */ + switch (i) { + case NDA_IFINDEX: + if (nla_len(tb[i]) != sizeof(u32)) { + NL_SET_ERR_MSG(extack, "Invalid IFINDEX attribute in neighbor dump request"); + return -EINVAL; + } + filter->dev_idx = nla_get_u32(tb[i]); + break; + case NDA_MASTER: + if (nla_len(tb[i]) != sizeof(u32)) { + NL_SET_ERR_MSG(extack, "Invalid MASTER attribute in neighbor dump request"); + return -EINVAL; + } + filter->master_idx = nla_get_u32(tb[i]); + break; + default: + if (strict_check) { + NL_SET_ERR_MSG(extack, "Unsupported attribute in neighbor dump request"); + return -EINVAL; + } + } + } + + return 0; +} + static int neigh_dump_info(struct sk_buff *skb, struct netlink_callback *cb) { const struct nlmsghdr *nlh = cb->nlh; struct neigh_dump_filter filter = {}; - struct nlattr *tb[NDA_MAX + 1]; struct neigh_table *tbl; int t, family, s_t; int proxy = 0; @@ -2445,20 +2507,10 @@ static int neigh_dump_info(struct sk_buff *skb, struct netlink_callback *cb) ((struct ndmsg *)nlmsg_data(nlh))->ndm_flags == NTF_PROXY) proxy = 1; - err = nlmsg_parse(nlh, sizeof(struct ndmsg), tb, NDA_MAX, NULL, - cb->extack); - if (!err) { - if (tb[NDA_IFINDEX]) { - if (nla_len(tb[NDA_IFINDEX]) != sizeof(u32)) - return -EINVAL; - filter.dev_idx = nla_get_u32(tb[NDA_IFINDEX]); - } - if (tb[NDA_MASTER]) { - if (nla_len(tb[NDA_MASTER]) != sizeof(u32)) - return -EINVAL; - filter.master_idx = nla_get_u32(tb[NDA_MASTER]); - } - } + err = neigh_valid_dump_req(nlh, cb->strict_check, , cb->extack); + if (err < 0 && cb->strict_check) + return err; + s_t = cb->args[0]; for (t = 0; t < NEIGH_NR_TABLES; t++) { -- 2.11.0
[PATCH v2 net-next 00/23] rtnetlink: Add support for rigid checking of data in dump request
From: David Ahern There are many use cases where a user wants to influence what is returned in a dump for some rtnetlink command: one is wanting data for a different namespace than the one the request is received and another is limiting the amount of data returned in the dump to a specific set of interest to userspace, reducing the cpu overhead of both kernel and userspace. Unfortunately, the kernel has historically not been strict with checking for the proper header or checking the values passed in the header. This lenient implementation has allowed iproute2 and other packages to pass any struct or data in the dump request as long as the family is the first byte. For example, ifinfomsg struct is used by iproute2 for all generic dump requests - links, addresses, routes and rules when it is really only valid for link requests. There is 1 is example where the kernel deals with the wrong struct: link dumps after VF support was added. Older iproute2 was sending rtgenmsg as the header instead of ifinfomsg so a patch was added to try and detect old userspace vs new: e5eca6d41f53 ("rtnetlink: fix userspace API breakage for iproute2 < v3.9.0") The latest example is Christian's patch set wanting to return addresses for a target namespace. It guesses the header struct is an ifaddrmsg and if it guesses wrong a netlink warning is generated in the kernel log on every address dump which is unacceptable. Another example where the kernel is a bit lenient is route dumps: iproute2 can send either a request with either ifinfomsg or a rtmsg as the header struct, yet the kernel always treats the header as an rtmsg (see inet_dump_fib and rtm_flags check). The header inconsistency impacts the ability to add kernel side filters for route dumps - a necessary feature for scale setups with 100k+ routes. How to resolve the problem of not breaking old userspace yet be able to move forward with new features such as kernel side filtering which are crucial for efficient operation at high scale? This patch set addresses the problem by adding a new socket flag, NETLINK_DUMP_STRICT_CHK, that userspace can use with setsockopt to request strict checking of headers and attributes on dump requests and hence unlock the ability to use kernel side filters as they are added. Kernel side, the dump handlers are updated to verify the message contains at least the expected header struct: RTM_GETLINK: ifinfomsg RTM_GETADDR: ifaddrmsg RTM_GETMULTICAST: ifaddrmsg RTM_GETANYCAST:ifaddrmsg RTM_GETADDRLABEL: ifaddrlblmsg RTM_GETROUTE: rtmsg RTM_GETSTATS: if_stats_msg RTM_GETNEIGH: ndmsg RTM_GETNEIGHTBL: ndtmsg RTM_GETNSID: rtgenmsg RTM_GETRULE: fib_rule_hdr RTM_GETNETCONF:netconfmsg RTM_GETMDB:br_port_msg And then every field in the header struct should be 0 with the exception of the family. There are a few exceptions to this rule where the kernel already influences the data returned by values in the struct. Next the message should not contain attributes unless the kernel implements filtering for it. Any unexpected data causes the dump to fail with EINVAL. If the new flag is honored by the kernel and the dump contents adjusted by any data passed in the request, the dump handler can set the NLM_F_DUMP_FILTERED flag in the netlink message header. For old userspace on new kernel there is no impact as all checks are wrapped in a check on the new strict flag. For new userspace on old kernel, the data in the headers and any appended attributes are silently ignored though the setsockopt failing is the clue to userspace the feature is not supported. New userspace on new kernel gets the requested data dump. iproute2 patches can be found here: https://github.com/dsahern/iproute2 dump-enhancements Major changes since v1 - inner header is supposed to be 4-bytes aligned. So for dumps that should not have attributes appended changed the check to use: if (nlmsg_attrlen(nlh, sizeof(hdr))) Only impacts patches with headers that are not multiples of 4-bytes (rtgenmsg, netconfmsg), but applied the change to all patches not calling nlmsg_parse for consistency. - Added nlmsg_parse_strict and nla_parse_strict for tighter control on attribute parsing. There should be no unknown attribute types or extra bytes. - Moved validation to a helper in most cases Changes since rfc-v2 - dropped the NLM_F_DUMP_FILTERED flag from target nsid dumps per Jiri's objections - changed the opt-in uapi from a netlink message flag to a socket flag. setsockopt provides an api for userspace to definitively know if the kernel supports strict checking on dumps. - re-ordered patches to peel off the extack on dumps if needed to keep this set size within limits - misc cleanups in patches based on testing David Ahern (23): netlink: Pass extack to dump handlers netlink: Add extack message to nlmsg_parse for invalid header length net: Add extack to
[PATCH v2 net-next 20/23] net: Update netconf dump handlers for strict data checking
From: David Ahern Update inet_netconf_dump_devconf, inet6_netconf_dump_devconf, and mpls_netconf_dump_devconf for strict data checking. If the flag is set, the dump request is expected to have an netconfmsg struct as the header. The struct only has the family member and no attributes can be appended. Signed-off-by: David Ahern --- net/ipv4/devinet.c | 22 +++--- net/ipv6/addrconf.c | 22 +++--- net/mpls/af_mpls.c | 18 +- 3 files changed, 55 insertions(+), 7 deletions(-) diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 6f2bbd04e950..d122ebbe5980 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -2086,6 +2086,7 @@ static int inet_netconf_get_devconf(struct sk_buff *in_skb, static int inet_netconf_dump_devconf(struct sk_buff *skb, struct netlink_callback *cb) { + const struct nlmsghdr *nlh = cb->nlh; struct net *net = sock_net(skb->sk); int h, s_h; int idx, s_idx; @@ -2093,6 +2094,21 @@ static int inet_netconf_dump_devconf(struct sk_buff *skb, struct in_device *in_dev; struct hlist_head *head; + if (cb->strict_check) { + struct netlink_ext_ack *extack = cb->extack; + struct netconfmsg *ncm; + + if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ncm))) { + NL_SET_ERR_MSG(extack, "ipv4: Invalid header for netconf dump request"); + return -EINVAL; + } + + if (nlmsg_attrlen(nlh, sizeof(*ncm))) { + NL_SET_ERR_MSG(extack, "ipv4: Invalid data after header in netconf dump request"); + return -EINVAL; + } + } + s_h = cb->args[0]; s_idx = idx = cb->args[1]; @@ -2112,7 +2128,7 @@ static int inet_netconf_dump_devconf(struct sk_buff *skb, if (inet_netconf_fill_devconf(skb, dev->ifindex, _dev->cnf, NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, + nlh->nlmsg_seq, RTM_NEWNETCONF, NLM_F_MULTI, NETCONFA_ALL) < 0) { @@ -2129,7 +2145,7 @@ static int inet_netconf_dump_devconf(struct sk_buff *skb, if (inet_netconf_fill_devconf(skb, NETCONFA_IFINDEX_ALL, net->ipv4.devconf_all, NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, + nlh->nlmsg_seq, RTM_NEWNETCONF, NLM_F_MULTI, NETCONFA_ALL) < 0) goto done; @@ -2140,7 +2156,7 @@ static int inet_netconf_dump_devconf(struct sk_buff *skb, if (inet_netconf_fill_devconf(skb, NETCONFA_IFINDEX_DEFAULT, net->ipv4.devconf_dflt, NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, + nlh->nlmsg_seq, RTM_NEWNETCONF, NLM_F_MULTI, NETCONFA_ALL) < 0) goto done; diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index ce071d85ad00..2496b12bf721 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -666,6 +666,7 @@ static int inet6_netconf_get_devconf(struct sk_buff *in_skb, static int inet6_netconf_dump_devconf(struct sk_buff *skb, struct netlink_callback *cb) { + const struct nlmsghdr *nlh = cb->nlh; struct net *net = sock_net(skb->sk); int h, s_h; int idx, s_idx; @@ -673,6 +674,21 @@ static int inet6_netconf_dump_devconf(struct sk_buff *skb, struct inet6_dev *idev; struct hlist_head *head; + if (cb->strict_check) { + struct netlink_ext_ack *extack = cb->extack; + struct netconfmsg *ncm; + + if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ncm))) { + NL_SET_ERR_MSG_MOD(extack, "Invalid header for netconf dump request"); + return -EINVAL; + } + + if (nlmsg_attrlen(nlh, sizeof(*ncm))) { + NL_SET_ERR_MSG_MOD(extack, "Invalid data after header in netconf dump request"); + return -EINVAL; + } + } + s_h = cb->args[0]; s_idx = idx = cb->args[1]; @@ -692,7
[PATCH v2 net-next 03/23] net: Add extack to nlmsg_parse
From: David Ahern Make sure extack is passed to nlmsg_parse where easy to do so. Most of these are dump handlers and leveraging the extack in the netlink_callback. Signed-off-by: David Ahern Acked-by: Christian Brauner --- net/core/devlink.c | 2 +- net/core/neighbour.c | 3 ++- net/core/rtnetlink.c | 4 ++-- net/ipv4/devinet.c | 9 + net/ipv6/addrconf.c| 2 +- net/ipv6/route.c | 2 +- net/mpls/af_mpls.c | 2 +- net/netfilter/ipvs/ip_vs_ctl.c | 2 +- net/sched/act_api.c| 2 +- net/sched/cls_api.c| 6 -- net/sched/sch_api.c| 2 +- net/xfrm/xfrm_user.c | 2 +- 12 files changed, 21 insertions(+), 17 deletions(-) diff --git a/net/core/devlink.c b/net/core/devlink.c index 938f68ee92f0..6dae81d65d5c 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -3504,7 +3504,7 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, start_offset = *((u64 *)>args[0]); err = nlmsg_parse(cb->nlh, GENL_HDRLEN + devlink_nl_family.hdrsize, - attrs, DEVLINK_ATTR_MAX, ops->policy, NULL); + attrs, DEVLINK_ATTR_MAX, ops->policy, cb->extack); if (err) goto out; diff --git a/net/core/neighbour.c b/net/core/neighbour.c index fb023df48b83..b06f794bf91e 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -2445,7 +2445,8 @@ static int neigh_dump_info(struct sk_buff *skb, struct netlink_callback *cb) ((struct ndmsg *)nlmsg_data(nlh))->ndm_flags == NTF_PROXY) proxy = 1; - err = nlmsg_parse(nlh, sizeof(struct ndmsg), tb, NDA_MAX, NULL, NULL); + err = nlmsg_parse(nlh, sizeof(struct ndmsg), tb, NDA_MAX, NULL, + cb->extack); if (!err) { if (tb[NDA_IFINDEX]) { if (nla_len(tb[NDA_IFINDEX]) != sizeof(u32)) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 5564eee1e980..4486e8b7d9d0 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1909,7 +1909,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) sizeof(struct rtgenmsg) : sizeof(struct ifinfomsg); if (nlmsg_parse(cb->nlh, hdrlen, tb, IFLA_MAX, - ifla_policy, NULL) >= 0) { + ifla_policy, cb->extack) >= 0) { if (tb[IFLA_TARGET_NETNSID]) { netnsid = nla_get_s32(tb[IFLA_TARGET_NETNSID]); tgt_net = rtnl_get_net_ns_capable(skb->sk, netnsid); @@ -3774,7 +3774,7 @@ static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb) (nlmsg_len(cb->nlh) != sizeof(struct ndmsg) + nla_attr_size(sizeof(u32 { err = nlmsg_parse(cb->nlh, sizeof(struct ifinfomsg), tb, - IFLA_MAX, ifla_policy, NULL); + IFLA_MAX, ifla_policy, cb->extack); if (err < 0) { return -EINVAL; } else if (err == 0) { diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 44d931a3cd50..ab2b11df5ea4 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -782,7 +782,8 @@ static void set_ifa_lifetime(struct in_ifaddr *ifa, __u32 valid_lft, } static struct in_ifaddr *rtm_to_ifaddr(struct net *net, struct nlmsghdr *nlh, - __u32 *pvalid_lft, __u32 *pprefered_lft) + __u32 *pvalid_lft, __u32 *pprefered_lft, + struct netlink_ext_ack *extack) { struct nlattr *tb[IFA_MAX+1]; struct in_ifaddr *ifa; @@ -792,7 +793,7 @@ static struct in_ifaddr *rtm_to_ifaddr(struct net *net, struct nlmsghdr *nlh, int err; err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_ipv4_policy, - NULL); + extack); if (err < 0) goto errout; @@ -897,7 +898,7 @@ static int inet_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, ASSERT_RTNL(); - ifa = rtm_to_ifaddr(net, nlh, _lft, _lft); + ifa = rtm_to_ifaddr(net, nlh, _lft, _lft, extack); if (IS_ERR(ifa)) return PTR_ERR(ifa); @@ -1684,7 +1685,7 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb) s_ip_idx = ip_idx = cb->args[2]; if (nlmsg_parse(cb->nlh, sizeof(struct ifaddrmsg), tb, IFA_MAX, - ifa_ipv4_policy, NULL) >= 0) { + ifa_ipv4_policy, cb->extack) >= 0) { if (tb[IFA_TARGET_NETNSID]) { fillargs.netnsid = nla_get_s32(tb[IFA_TARGET_NETNSID]); diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index a9a317322388..2f8aa4fd5e55 100644 --- a/net/ipv6/addrconf.c +++
[PATCH v2 net-next 11/23] rtnetlink: Update rtnl_stats_dump for strict data checking
From: David Ahern Update rtnl_stats_dump for strict data checking. If the flag is set, the dump request is expected to have an if_stats_msg struct as the header. All elements of the struct are expected to be 0 except filter_mask which must be non-0 (legacy behavior). No attributes are supported. Signed-off-by: David Ahern --- net/core/rtnetlink.c | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index e38e1f178611..f6d2609cfa9f 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -4680,6 +4680,7 @@ static int rtnl_stats_get(struct sk_buff *skb, struct nlmsghdr *nlh, static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb) { + struct netlink_ext_ack *extack = cb->extack; int h, s_h, err, s_idx, s_idxattr, s_prividx; struct net *net = sock_net(skb->sk); unsigned int flags = NLM_F_MULTI; @@ -4696,13 +4697,32 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb) cb->seq = net->dev_base_seq; - if (nlmsg_len(cb->nlh) < sizeof(*ifsm)) + if (nlmsg_len(cb->nlh) < sizeof(*ifsm)) { + NL_SET_ERR_MSG(extack, "Invalid header for stats dump"); return -EINVAL; + } ifsm = nlmsg_data(cb->nlh); + + /* only requests using NLM_F_DUMP_PROPER_HDR can pass data to +* influence the dump. The legacy exception is filter_mask. +*/ + if (cb->strict_check) { + if (ifsm->pad1 || ifsm->pad2 || ifsm->ifindex) { + NL_SET_ERR_MSG(extack, "Invalid values in header for stats dump request"); + return -EINVAL; + } + if (nlmsg_attrlen(cb->nlh, sizeof(*ifsm))) { + NL_SET_ERR_MSG(extack, "Invalid attributes after stats header"); + return -EINVAL; + } + } + filter_mask = ifsm->filter_mask; - if (!filter_mask) + if (!filter_mask) { + NL_SET_ERR_MSG(extack, "Filter mask must be set for stats dump"); return -EINVAL; + } for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) { idx = 0; -- 2.11.0
[PATCH v2 net-next 09/23] rtnetlink: Update rtnl_dump_ifinfo for strict data checking
From: David Ahern Update rtnl_dump_ifinfo for strict data checking. If the flag is set, the dump request is expected to have an ifinfomsg struct as the header potentially followed by one or more attributes. Any data passed in the header or as an attribute is taken as a request to influence the data returned. Only values supported by the dump handler are allowed to be non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID, IFLA_EXT_MASK, IFLA_MASTER, and IFLA_LINKINFO attributes are supported. Existing code does not fail the dump if nlmsg_parse fails. That behavior is kept for non-strict checking. Signed-off-by: David Ahern --- net/core/rtnetlink.c | 113 +-- 1 file changed, 83 insertions(+), 30 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 4486e8b7d9d0..12fd52105005 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1878,8 +1878,52 @@ struct net *rtnl_get_net_ns_capable(struct sock *sk, int netnsid) } EXPORT_SYMBOL_GPL(rtnl_get_net_ns_capable); +static int rtnl_valid_dump_ifinfo_req(const struct nlmsghdr *nlh, + bool strict_check, struct nlattr **tb, + struct netlink_ext_ack *extack) +{ + int hdrlen; + + if (strict_check) { + struct ifinfomsg *ifm; + + if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ifm))) { + NL_SET_ERR_MSG(extack, "Invalid header for link dump"); + return -EINVAL; + } + + ifm = nlmsg_data(nlh); + if (ifm->__ifi_pad || ifm->ifi_type || ifm->ifi_flags || + ifm->ifi_change) { + NL_SET_ERR_MSG(extack, "Invalid values in header for link dump request"); + return -EINVAL; + } + if (ifm->ifi_index) { + NL_SET_ERR_MSG(extack, "Filter by device index not supported for link dumps"); + return -EINVAL; + } + + return nlmsg_parse_strict(nlh, sizeof(*ifm), tb, IFLA_MAX, + ifla_policy, extack); + } + + /* A hack to preserve kernel<->userspace interface. +* The correct header is ifinfomsg. It is consistent with rtnl_getlink. +* However, before Linux v3.9 the code here assumed rtgenmsg and that's +* what iproute2 < v3.9.0 used. +* We can detect the old iproute2. Even including the IFLA_EXT_MASK +* attribute, its netlink message is shorter than struct ifinfomsg. +*/ + hdrlen = nlmsg_len(nlh) < sizeof(struct ifinfomsg) ? +sizeof(struct rtgenmsg) : sizeof(struct ifinfomsg); + + return nlmsg_parse(nlh, hdrlen, tb, IFLA_MAX, ifla_policy, extack); +} + static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) { + struct netlink_ext_ack *extack = cb->extack; + const struct nlmsghdr *nlh = cb->nlh; struct net *net = sock_net(skb->sk); struct net *tgt_net = net; int h, s_h; @@ -1892,44 +1936,54 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) unsigned int flags = NLM_F_MULTI; int master_idx = 0; int netnsid = -1; - int err; - int hdrlen; + int err, i; s_h = cb->args[0]; s_idx = cb->args[1]; - /* A hack to preserve kernel<->userspace interface. -* The correct header is ifinfomsg. It is consistent with rtnl_getlink. -* However, before Linux v3.9 the code here assumed rtgenmsg and that's -* what iproute2 < v3.9.0 used. -* We can detect the old iproute2. Even including the IFLA_EXT_MASK -* attribute, its netlink message is shorter than struct ifinfomsg. -*/ - hdrlen = nlmsg_len(cb->nlh) < sizeof(struct ifinfomsg) ? -sizeof(struct rtgenmsg) : sizeof(struct ifinfomsg); + err = rtnl_valid_dump_ifinfo_req(nlh, cb->strict_check, tb, extack); + if (err < 0) { + if (cb->strict_check) + return err; + + goto walk_entries; + } + + for (i = 0; i <= IFLA_MAX; ++i) { + if (!tb[i]) + continue; - if (nlmsg_parse(cb->nlh, hdrlen, tb, IFLA_MAX, - ifla_policy, cb->extack) >= 0) { - if (tb[IFLA_TARGET_NETNSID]) { - netnsid = nla_get_s32(tb[IFLA_TARGET_NETNSID]); + /* new attributes should only be added with strict checking */ + switch (i) { + case IFLA_TARGET_NETNSID: + netnsid = nla_get_s32(tb[i]); tgt_net = rtnl_get_net_ns_capable(skb->sk, netnsid); - if (IS_ERR(tgt_net)) + if (IS_ERR(tgt_net)) { +
[PATCH v2 net-next 18/23] net/fib_rules: Update fib_nl_dumprule for strict data checking
From: David Ahern Update fib_nl_dumprule for strict data checking. If the flag is set, the dump request is expected to have fib_rule_hdr struct as the header. All elements of the struct are expected to be 0 and no attributes can be appended. Signed-off-by: David Ahern --- net/core/fib_rules.c | 36 +++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c index 0ff3953f64aa..ffbb827723a2 100644 --- a/net/core/fib_rules.c +++ b/net/core/fib_rules.c @@ -1063,13 +1063,47 @@ static int dump_rules(struct sk_buff *skb, struct netlink_callback *cb, return err; } +static int fib_valid_dumprule_req(const struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct fib_rule_hdr *frh; + + if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*frh))) { + NL_SET_ERR_MSG(extack, "Invalid header for fib rule dump request"); + return -EINVAL; + } + + frh = nlmsg_data(nlh); + if (frh->dst_len || frh->src_len || frh->tos || frh->table || + frh->res1 || frh->res2 || frh->action || frh->flags) { + NL_SET_ERR_MSG(extack, + "Invalid values in header for fib rule dump request"); + return -EINVAL; + } + + if (nlmsg_attrlen(nlh, sizeof(*frh))) { + NL_SET_ERR_MSG(extack, "Invalid data after header in fib rule dump request"); + return -EINVAL; + } + + return 0; +} + static int fib_nl_dumprule(struct sk_buff *skb, struct netlink_callback *cb) { + const struct nlmsghdr *nlh = cb->nlh; struct net *net = sock_net(skb->sk); struct fib_rules_ops *ops; int idx = 0, family; - family = rtnl_msg_family(cb->nlh); + if (cb->strict_check) { + int err = fib_valid_dumprule_req(nlh, cb->extack); + + if (err < 0) + return err; + } + + family = rtnl_msg_family(nlh); if (family != AF_UNSPEC) { /* Protocol specific dump request */ ops = lookup_rules_ops(net, family); -- 2.11.0
[PATCH v2 net-next 16/23] net/neighbor: Update neightbl_dump_info for strict data checking
From: David Ahern Update neightbl_dump_info for strict data checking. If the flag is set, the dump request is expected to have an ndtmsg struct as the header. All elements of the struct are expected to be 0 and no attributes can be appended. Signed-off-by: David Ahern --- net/core/neighbour.c | 38 +++--- 1 file changed, 35 insertions(+), 3 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 7c8a3a0ee059..dc1389b8beb1 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -2164,15 +2164,47 @@ static int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh, return err; } +static int neightbl_valid_dump_info(const struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct ndtmsg *ndtm; + + if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ndtm))) { + NL_SET_ERR_MSG(extack, "Invalid header for neighbor table dump request"); + return -EINVAL; + } + + ndtm = nlmsg_data(nlh); + if (ndtm->ndtm_pad1 || ndtm->ndtm_pad2) { + NL_SET_ERR_MSG(extack, "Invalid values in header for neighbor table dump request"); + return -EINVAL; + } + + if (nlmsg_attrlen(nlh, sizeof(*ndtm))) { + NL_SET_ERR_MSG(extack, "Invalid data after header in neighbor table dump request"); + return -EINVAL; + } + + return 0; +} + static int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb) { + const struct nlmsghdr *nlh = cb->nlh; struct net *net = sock_net(skb->sk); int family, tidx, nidx = 0; int tbl_skip = cb->args[0]; int neigh_skip = cb->args[1]; struct neigh_table *tbl; - family = ((struct rtgenmsg *) nlmsg_data(cb->nlh))->rtgen_family; + if (cb->strict_check) { + int err = neightbl_valid_dump_info(nlh, cb->extack); + + if (err < 0) + return err; + } + + family = ((struct rtgenmsg *)nlmsg_data(nlh))->rtgen_family; for (tidx = 0; tidx < NEIGH_NR_TABLES; tidx++) { struct neigh_parms *p; @@ -2185,7 +2217,7 @@ static int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb) continue; if (neightbl_fill_info(skb, tbl, NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, RTM_NEWNEIGHTBL, + nlh->nlmsg_seq, RTM_NEWNEIGHTBL, NLM_F_MULTI) < 0) break; @@ -2200,7 +2232,7 @@ static int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb) if (neightbl_fill_param_info(skb, tbl, p, NETLINK_CB(cb->skb).portid, -cb->nlh->nlmsg_seq, +nlh->nlmsg_seq, RTM_NEWNEIGHTBL, NLM_F_MULTI) < 0) goto out; -- 2.11.0
[PATCH v2 net-next 10/23] rtnetlink: Update rtnl_bridge_getlink for strict data checking
From: David Ahern Update rtnl_bridge_getlink for strict data checking. If the flag is set, the dump request is expected to have an ifinfomsg struct as the header potentially followed by one or more attributes. Any data passed in the header or as an attribute is taken as a request to influence the data returned. Only values supported by the dump handler are allowed to be non-0 or set in the request. At the moment only the IFLA_EXT_MASK attribute is supported. Signed-off-by: David Ahern --- net/core/rtnetlink.c | 70 ++-- 1 file changed, 57 insertions(+), 13 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 12fd52105005..e38e1f178611 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -4021,28 +4021,72 @@ int ndo_dflt_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq, } EXPORT_SYMBOL_GPL(ndo_dflt_bridge_getlink); +static int valid_bridge_getlink_req(const struct nlmsghdr *nlh, + bool strict_check, u32 *filter_mask, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[IFLA_MAX+1]; + int err, i; + + if (strict_check) { + struct ifinfomsg *ifm; + + if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ifm))) { + NL_SET_ERR_MSG(extack, "Invalid header for bridge link dump"); + return -EINVAL; + } + + ifm = nlmsg_data(nlh); + if (ifm->__ifi_pad || ifm->ifi_type || ifm->ifi_flags || + ifm->ifi_change || ifm->ifi_index) { + NL_SET_ERR_MSG(extack, "Invalid values in header for bridge link dump request"); + return -EINVAL; + } + + err = nlmsg_parse_strict(nlh, sizeof(struct ifinfomsg), tb, +IFLA_MAX, ifla_policy, extack); + } else { + err = nlmsg_parse(nlh, sizeof(struct ifinfomsg), tb, + IFLA_MAX, ifla_policy, extack); + } + if (err < 0) + return err; + + /* new attributes should only be added with strict checking */ + for (i = 0; i <= IFLA_MAX; ++i) { + if (!tb[i]) + continue; + + switch (i) { + case IFLA_EXT_MASK: + *filter_mask = nla_get_u32(tb[i]); + break; + default: + if (strict_check) { + NL_SET_ERR_MSG(extack, "Unsupported attribute in bridge link dump request"); + return -EINVAL; + } + } + } + + return 0; +} + static int rtnl_bridge_getlink(struct sk_buff *skb, struct netlink_callback *cb) { + const struct nlmsghdr *nlh = cb->nlh; struct net *net = sock_net(skb->sk); struct net_device *dev; int idx = 0; u32 portid = NETLINK_CB(cb->skb).portid; - u32 seq = cb->nlh->nlmsg_seq; + u32 seq = nlh->nlmsg_seq; u32 filter_mask = 0; int err; - if (nlmsg_len(cb->nlh) > sizeof(struct ifinfomsg)) { - struct nlattr *extfilt; - - extfilt = nlmsg_find_attr(cb->nlh, sizeof(struct ifinfomsg), - IFLA_EXT_MASK); - if (extfilt) { - if (nla_len(extfilt) < sizeof(filter_mask)) - return -EINVAL; - - filter_mask = nla_get_u32(extfilt); - } - } + err = valid_bridge_getlink_req(nlh, cb->strict_check, _mask, + cb->extack); + if (err < 0 && cb->strict_check) + return err; rcu_read_lock(); for_each_netdev_rcu(net, dev) { -- 2.11.0
[PATCH v2 net-next 05/23] net/ipv6: Refactor address dump to push inet6_fill_args to in6_dump_addrs
From: David Ahern Pull the inet6_fill_args arg up to in6_dump_addrs and move netnsid into it. Signed-off-by: David Ahern Acked-by: Christian Brauner --- net/ipv6/addrconf.c | 57 - 1 file changed, 30 insertions(+), 27 deletions(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 2f8aa4fd5e55..afa279170ba5 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -4793,12 +4793,19 @@ static inline int inet6_ifaddr_msgsize(void) + nla_total_size(4) /* IFA_RT_PRIORITY */; } +enum addr_type_t { + UNICAST_ADDR, + MULTICAST_ADDR, + ANYCAST_ADDR, +}; + struct inet6_fill_args { u32 portid; u32 seq; int event; unsigned int flags; int netnsid; + enum addr_type_t type; }; static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa, @@ -4930,39 +4937,28 @@ static int inet6_fill_ifacaddr(struct sk_buff *skb, struct ifacaddr6 *ifaca, return 0; } -enum addr_type_t { - UNICAST_ADDR, - MULTICAST_ADDR, - ANYCAST_ADDR, -}; - /* called with rcu_read_lock() */ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb, - struct netlink_callback *cb, enum addr_type_t type, - int s_ip_idx, int *p_ip_idx, int netnsid) + struct netlink_callback *cb, + int s_ip_idx, int *p_ip_idx, + struct inet6_fill_args *fillargs) { - struct inet6_fill_args fillargs = { - .portid = NETLINK_CB(cb->skb).portid, - .seq = cb->nlh->nlmsg_seq, - .flags = NLM_F_MULTI, - .netnsid = netnsid, - }; struct ifmcaddr6 *ifmca; struct ifacaddr6 *ifaca; int err = 1; int ip_idx = *p_ip_idx; read_lock_bh(>lock); - switch (type) { + switch (fillargs->type) { case UNICAST_ADDR: { struct inet6_ifaddr *ifa; - fillargs.event = RTM_NEWADDR; + fillargs->event = RTM_NEWADDR; /* unicast address incl. temp addr */ list_for_each_entry(ifa, >addr_list, if_list) { if (++ip_idx < s_ip_idx) continue; - err = inet6_fill_ifaddr(skb, ifa, ); + err = inet6_fill_ifaddr(skb, ifa, fillargs); if (err < 0) break; nl_dump_check_consistent(cb, nlmsg_hdr(skb)); @@ -4970,26 +4966,26 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb, break; } case MULTICAST_ADDR: - fillargs.event = RTM_GETMULTICAST; + fillargs->event = RTM_GETMULTICAST; /* multicast address */ for (ifmca = idev->mc_list; ifmca; ifmca = ifmca->next, ip_idx++) { if (ip_idx < s_ip_idx) continue; - err = inet6_fill_ifmcaddr(skb, ifmca, ); + err = inet6_fill_ifmcaddr(skb, ifmca, fillargs); if (err < 0) break; } break; case ANYCAST_ADDR: - fillargs.event = RTM_GETANYCAST; + fillargs->event = RTM_GETANYCAST; /* anycast address */ for (ifaca = idev->ac_list; ifaca; ifaca = ifaca->aca_next, ip_idx++) { if (ip_idx < s_ip_idx) continue; - err = inet6_fill_ifacaddr(skb, ifaca, ); + err = inet6_fill_ifacaddr(skb, ifaca, fillargs); if (err < 0) break; } @@ -5005,10 +5001,16 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb, static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb, enum addr_type_t type) { + struct inet6_fill_args fillargs = { + .portid = NETLINK_CB(cb->skb).portid, + .seq = cb->nlh->nlmsg_seq, + .flags = NLM_F_MULTI, + .netnsid = -1, + .type = type, + }; struct net *net = sock_net(skb->sk); struct nlattr *tb[IFA_MAX+1]; struct net *tgt_net = net; - int netnsid = -1; int h, s_h; int idx, ip_idx; int s_idx, s_ip_idx; @@ -5023,9 +5025,10 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb, if (nlmsg_parse(cb->nlh, sizeof(struct ifaddrmsg), tb, IFA_MAX, ifa_ipv6_policy, cb->extack) >= 0) { if (tb[IFA_TARGET_NETNSID]) { - netnsid =
[PATCH v2 net-next 12/23] rtnetlink: Update inet6_dump_ifinfo for strict data checking
From: David Ahern Update inet6_dump_ifinfo for strict data checking. If the flag is set, the dump request is expected to have an ifinfomsg struct as the header. All elements of the struct are expected to be 0 and no attributes can be appended. Signed-off-by: David Ahern --- net/ipv6/addrconf.c | 35 +++ 1 file changed, 35 insertions(+) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 095d3f56f0a9..ce071d85ad00 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -5644,6 +5644,31 @@ static int inet6_fill_ifinfo(struct sk_buff *skb, struct inet6_dev *idev, return -EMSGSIZE; } +static int inet6_valid_dump_ifinfo(const struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct ifinfomsg *ifm; + + if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ifm))) { + NL_SET_ERR_MSG_MOD(extack, "Invalid header for link dump request"); + return -EINVAL; + } + + if (nlmsg_attrlen(nlh, sizeof(*ifm))) { + NL_SET_ERR_MSG_MOD(extack, "Invalid data after header"); + return -EINVAL; + } + + ifm = nlmsg_data(nlh); + if (ifm->__ifi_pad || ifm->ifi_type || ifm->ifi_flags || + ifm->ifi_change || ifm->ifi_index) { + NL_SET_ERR_MSG_MOD(extack, "Invalid values in header for dump request"); + return -EINVAL; + } + + return 0; +} + static int inet6_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) { struct net *net = sock_net(skb->sk); @@ -5653,6 +5678,16 @@ static int inet6_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) struct inet6_dev *idev; struct hlist_head *head; + /* only requests using strict checking can pass data to +* influence the dump +*/ + if (cb->strict_check) { + int err = inet6_valid_dump_ifinfo(cb->nlh, cb->extack); + + if (err < 0) + return err; + } + s_h = cb->args[0]; s_idx = cb->args[1]; -- 2.11.0
[PATCH v2 net-next 14/23] rtnetlink: Update fib dumps for strict data checking
From: David Ahern Add helper to check netlink message for route dumps. If the strict flag is set the dump request is expected to have an rtmsg struct as the header. All elements of the struct are expected to be 0 with the exception of rtm_flags (which is used by both ipv4 and ipv6 dumps) and no attributes can be appended. rtm_flags can only have RTM_F_CLONED and RTM_F_PREFIX set. Update inet_dump_fib, inet6_dump_fib, mpls_dump_routes, ipmr_rtm_dumproute, and ip6mr_rtm_dumproute to call this helper if strict data checking is enabled. Signed-off-by: David Ahern --- include/net/ip_fib.h| 2 ++ net/ipv4/fib_frontend.c | 42 -- net/ipv4/ipmr.c | 7 +++ net/ipv6/ip6_fib.c | 8 net/ipv6/ip6mr.c| 9 + net/mpls/af_mpls.c | 8 6 files changed, 74 insertions(+), 2 deletions(-) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index f7c109e37298..9846b79c9ee1 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -452,4 +452,6 @@ static inline void fib_proc_exit(struct net *net) u32 ip_mtu_from_fib_result(struct fib_result *res, __be32 daddr); +int ip_valid_fib_dump_req(const struct nlmsghdr *nlh, + struct netlink_ext_ack *extack); #endif /* _NET_FIB_H */ diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 30e2bcc3ef2a..038f511c73fa 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -802,8 +802,40 @@ static int inet_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh, return err; } +int ip_valid_fib_dump_req(const struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct rtmsg *rtm; + + if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*rtm))) { + NL_SET_ERR_MSG(extack, "Invalid header for FIB dump request"); + return -EINVAL; + } + + rtm = nlmsg_data(nlh); + if (rtm->rtm_dst_len || rtm->rtm_src_len || rtm->rtm_tos || + rtm->rtm_table || rtm->rtm_protocol || rtm->rtm_scope || + rtm->rtm_type) { + NL_SET_ERR_MSG(extack, "Invalid values in header for FIB dump request"); + return -EINVAL; + } + if (rtm->rtm_flags & ~(RTM_F_CLONED | RTM_F_PREFIX)) { + NL_SET_ERR_MSG(extack, "Invalid flags for FIB dump request"); + return -EINVAL; + } + + if (nlmsg_attrlen(nlh, sizeof(*rtm))) { + NL_SET_ERR_MSG(extack, "Invalid data after header in FIB dump request"); + return -EINVAL; + } + + return 0; +} +EXPORT_SYMBOL_GPL(ip_valid_fib_dump_req); + static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) { + const struct nlmsghdr *nlh = cb->nlh; struct net *net = sock_net(skb->sk); unsigned int h, s_h; unsigned int e = 0, s_e; @@ -811,8 +843,14 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) struct hlist_head *head; int dumped = 0, err; - if (nlmsg_len(cb->nlh) >= sizeof(struct rtmsg) && - ((struct rtmsg *) nlmsg_data(cb->nlh))->rtm_flags & RTM_F_CLONED) + if (cb->strict_check) { + err = ip_valid_fib_dump_req(nlh, cb->extack); + if (err < 0) + return err; + } + + if (nlmsg_len(nlh) >= sizeof(struct rtmsg) && + ((struct rtmsg *)nlmsg_data(nlh))->rtm_flags & RTM_F_CLONED) return skb->len; s_h = cb->args[0]; diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c index e7322e407bb4..91b0d5671649 100644 --- a/net/ipv4/ipmr.c +++ b/net/ipv4/ipmr.c @@ -2527,6 +2527,13 @@ static int ipmr_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb) { + if (cb->strict_check) { + int err = ip_valid_fib_dump_req(cb->nlh, cb->extack); + + if (err < 0) + return err; + } + return mr_rtm_dumproute(skb, cb, ipmr_mr_table_iter, _ipmr_fill_mroute, _unres_lock); } diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index cf709eadc932..e14d244c551f 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -564,6 +564,7 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb, static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) { + const struct nlmsghdr *nlh = cb->nlh; struct net *net = sock_net(skb->sk); unsigned int h, s_h; unsigned int e = 0, s_e; @@ -573,6 +574,13 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) struct hlist_head *head; int res = 0; + if (cb->strict_check) { + int err = ip_valid_fib_dump_req(nlh, cb->extack); + + if (err < 0) +
[PATCH v2 net-next 19/23] net/ipv6: Update ip6addrlbl_dump for strict data checking
From: David Ahern Update ip6addrlbl_dump for strict data checking. If the flag is set, the dump request is expected to have an ifaddrlblmsg struct as the header. All elements of the struct are expected to be 0 and no attributes can be appended. Signed-off-by: David Ahern --- net/ipv6/addrlabel.c | 34 +- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/net/ipv6/addrlabel.c b/net/ipv6/addrlabel.c index 1d6ced37ad71..0d1ee82ee55b 100644 --- a/net/ipv6/addrlabel.c +++ b/net/ipv6/addrlabel.c @@ -458,20 +458,52 @@ static int ip6addrlbl_fill(struct sk_buff *skb, return 0; } +static int ip6addrlbl_valid_dump_req(const struct nlmsghdr *nlh, +struct netlink_ext_ack *extack) +{ + struct ifaddrlblmsg *ifal; + + if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ifal))) { + NL_SET_ERR_MSG_MOD(extack, "Invalid header for address label dump request"); + return -EINVAL; + } + + ifal = nlmsg_data(nlh); + if (ifal->__ifal_reserved || ifal->ifal_prefixlen || + ifal->ifal_flags || ifal->ifal_index || ifal->ifal_seq) { + NL_SET_ERR_MSG_MOD(extack, "Invalid values in header for address label dump request"); + return -EINVAL; + } + + if (nlmsg_attrlen(nlh, sizeof(*ifal))) { + NL_SET_ERR_MSG_MOD(extack, "Invalid data after header for address label dump requewst"); + return -EINVAL; + } + + return 0; +} + static int ip6addrlbl_dump(struct sk_buff *skb, struct netlink_callback *cb) { + const struct nlmsghdr *nlh = cb->nlh; struct net *net = sock_net(skb->sk); struct ip6addrlbl_entry *p; int idx = 0, s_idx = cb->args[0]; int err; + if (cb->strict_check) { + err = ip6addrlbl_valid_dump_req(nlh, cb->extack); + if (err < 0) + return err; + } + rcu_read_lock(); hlist_for_each_entry_rcu(p, >ipv6.ip6addrlbl_table.head, list) { if (idx >= s_idx) { err = ip6addrlbl_fill(skb, p, net->ipv6.ip6addrlbl_table.seq, NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, + nlh->nlmsg_seq, RTM_NEWADDRLABEL, NLM_F_MULTI); if (err < 0) -- 2.11.0
[PATCH v2 net-next 22/23] rtnetlink: Move input checking for rtnl_fdb_dump to helper
From: David Ahern Move the existing input checking for rtnl_fdb_dump into a helper, valid_fdb_dump_legacy. This function will retain the current logic that works around the 2 headers that userspace has been allowed to send up to this point. Signed-off-by: David Ahern --- net/core/rtnetlink.c | 53 1 file changed, 33 insertions(+), 20 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index f6d2609cfa9f..c7509c789fb6 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3799,22 +3799,13 @@ int ndo_dflt_fdb_dump(struct sk_buff *skb, } EXPORT_SYMBOL(ndo_dflt_fdb_dump); -static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb) +static int valid_fdb_dump_legacy(const struct nlmsghdr *nlh, +int *br_idx, int *brport_idx, +struct netlink_ext_ack *extack) { - struct net_device *dev; + struct ifinfomsg *ifm = nlmsg_data(nlh); struct nlattr *tb[IFLA_MAX+1]; - struct net_device *br_dev = NULL; - const struct net_device_ops *ops = NULL; - const struct net_device_ops *cops = NULL; - struct ifinfomsg *ifm = nlmsg_data(cb->nlh); - struct net *net = sock_net(skb->sk); - struct hlist_head *head; - int brport_idx = 0; - int br_idx = 0; - int h, s_h; - int idx = 0, s_idx; - int err = 0; - int fidx = 0; + int err; /* A hack to preserve kernel<->userspace interface. * Before Linux v4.12 this code accepted ndmsg since iproute2 v3.3.0. @@ -3823,20 +3814,42 @@ static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb) * Fortunately these sizes don't conflict with the size of ifinfomsg * with an optional attribute. */ - if (nlmsg_len(cb->nlh) != sizeof(struct ndmsg) && - (nlmsg_len(cb->nlh) != sizeof(struct ndmsg) + + if (nlmsg_len(nlh) != sizeof(struct ndmsg) && + (nlmsg_len(nlh) != sizeof(struct ndmsg) + nla_attr_size(sizeof(u32 { - err = nlmsg_parse(cb->nlh, sizeof(struct ifinfomsg), tb, - IFLA_MAX, ifla_policy, cb->extack); + err = nlmsg_parse(nlh, sizeof(struct ifinfomsg), tb, IFLA_MAX, + ifla_policy, extack); if (err < 0) { return -EINVAL; } else if (err == 0) { if (tb[IFLA_MASTER]) - br_idx = nla_get_u32(tb[IFLA_MASTER]); + *br_idx = nla_get_u32(tb[IFLA_MASTER]); } - brport_idx = ifm->ifi_index; + *brport_idx = ifm->ifi_index; } + return 0; +} + +static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb) +{ + struct net_device *dev; + struct net_device *br_dev = NULL; + const struct net_device_ops *ops = NULL; + const struct net_device_ops *cops = NULL; + struct net *net = sock_net(skb->sk); + struct hlist_head *head; + int brport_idx = 0; + int br_idx = 0; + int h, s_h; + int idx = 0, s_idx; + int err = 0; + int fidx = 0; + + err = valid_fdb_dump_legacy(cb->nlh, _idx, _idx, + cb->extack); + if (err < 0) + return err; if (br_idx) { br_dev = __dev_get_by_index(net, br_idx); -- 2.11.0
[PATCH v2 net-next 23/23] rtnetlink: Update rtnl_fdb_dump for strict data checking
From: David Ahern Update rtnl_fdb_dump for strict data checking. If the flag is set, the dump request is expected to have an ndmsg struct as the header potentially followed by one or more attributes. Any data passed in the header or as an attribute is taken as a request to influence the data returned. Only values supported by the dump handler are allowed to be non-0 or set in the request. At the moment only the NDA_IFINDEX and NDA_MASTER attributes are supported. Signed-off-by: David Ahern --- net/core/rtnetlink.c | 62 ++-- 1 file changed, 60 insertions(+), 2 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index c7509c789fb6..c894c4af8981 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3799,6 +3799,60 @@ int ndo_dflt_fdb_dump(struct sk_buff *skb, } EXPORT_SYMBOL(ndo_dflt_fdb_dump); +static int valid_fdb_dump_strict(const struct nlmsghdr *nlh, +int *br_idx, int *brport_idx, +struct netlink_ext_ack *extack) +{ + struct nlattr *tb[NDA_MAX + 1]; + struct ndmsg *ndm; + int err, i; + + if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ndm))) { + NL_SET_ERR_MSG(extack, "Invalid header for fdb dump request"); + return -EINVAL; + } + + ndm = nlmsg_data(nlh); + if (ndm->ndm_pad1 || ndm->ndm_pad2 || ndm->ndm_state || + ndm->ndm_flags || ndm->ndm_type) { + NL_SET_ERR_MSG(extack, "Invalid values in header for fbd dump request"); + return -EINVAL; + } + + err = nlmsg_parse_strict(nlh, sizeof(struct ndmsg), tb, NDA_MAX, +NULL, extack); + if (err < 0) + return err; + + *brport_idx = ndm->ndm_ifindex; + for (i = 0; i <= NDA_MAX; ++i) { + if (!tb[i]) + continue; + + switch (i) { + case NDA_IFINDEX: + if (nla_len(tb[i]) != sizeof(u32)) { + NL_SET_ERR_MSG(extack, "Invalid IFINDEX attribute in fdb dump request"); + return -EINVAL; + } + *brport_idx = nla_get_u32(tb[NDA_IFINDEX]); + break; + case NDA_MASTER: + if (nla_len(tb[i]) != sizeof(u32)) { + NL_SET_ERR_MSG(extack, "Invalid MASTER attribute in fdb dump request"); + return -EINVAL; + } + *br_idx = nla_get_u32(tb[NDA_MASTER]); + break; + default: + NL_SET_ERR_MSG(extack, "Unsupported attribute in fdb dump request"); + return -EINVAL; + } + } + + return 0; +} + static int valid_fdb_dump_legacy(const struct nlmsghdr *nlh, int *br_idx, int *brport_idx, struct netlink_ext_ack *extack) @@ -3846,8 +3900,12 @@ static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb) int err = 0; int fidx = 0; - err = valid_fdb_dump_legacy(cb->nlh, _idx, _idx, - cb->extack); + if (cb->strict_check) + err = valid_fdb_dump_strict(cb->nlh, _idx, _idx, + cb->extack); + else + err = valid_fdb_dump_legacy(cb->nlh, _idx, _idx, + cb->extack); if (err < 0) return err; -- 2.11.0
Re: [RFC PATCH bpf-next v4 4/7] bpf: add bpf queue and stack maps
On 10/04/2018 10:40 PM, Mauricio Vasquez wrote: On 10/04/2018 06:57 PM, Alexei Starovoitov wrote: On Thu, Oct 04, 2018 at 07:12:44PM +0200, Mauricio Vasquez B wrote: Implement two new kind of maps that support the peek, push and pop operations. A use case for this is to keep track of a pool of elements, like network ports in a SNAT. Signed-off-by: Mauricio Vasquez B --- include/linux/bpf.h | 7 + include/linux/bpf_types.h | 2 include/uapi/linux/bpf.h | 35 - kernel/bpf/Makefile | 2 kernel/bpf/core.c | 3 kernel/bpf/helpers.c | 43 ++ kernel/bpf/queue_stack_maps.c | 300 + kernel/bpf/syscall.c | 31 +++- kernel/bpf/verifier.c | 14 +- net/core/filter.c | 6 + 10 files changed, 424 insertions(+), 19 deletions(-) create mode 100644 kernel/bpf/queue_stack_maps.c diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 98c7eeb6d138..cad3bc5cffd1 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -40,6 +40,9 @@ struct bpf_map_ops { int (*map_update_elem)(struct bpf_map *map, void *key, void *value, u64 flags); int (*map_delete_elem)(struct bpf_map *map, void *key); void *(*map_lookup_and_delete_elem)(struct bpf_map *map, void *key); + int (*map_push_elem)(struct bpf_map *map, void *value, u64 flags); + int (*map_pop_elem)(struct bpf_map *map, void *value); + int (*map_peek_elem)(struct bpf_map *map, void *value); /* funcs called by prog_array and perf_event_array map */ void *(*map_fd_get_ptr)(struct bpf_map *map, struct file *map_file, @@ -139,6 +142,7 @@ enum bpf_arg_type { ARG_CONST_MAP_PTR, /* const argument used as pointer to bpf_map */ ARG_PTR_TO_MAP_KEY, /* pointer to stack used as map key */ ARG_PTR_TO_MAP_VALUE, /* pointer to stack used as map value */ + ARG_PTR_TO_UNINIT_MAP_VALUE, /* pointer to valid memory used to store a map value */ /* the following constraints used to prototype bpf_memcmp() and other * functions that access data on eBPF program stack @@ -825,6 +829,9 @@ static inline int bpf_fd_reuseport_array_update_elem(struct bpf_map *map, extern const struct bpf_func_proto bpf_map_lookup_elem_proto; extern const struct bpf_func_proto bpf_map_update_elem_proto; extern const struct bpf_func_proto bpf_map_delete_elem_proto; +extern const struct bpf_func_proto bpf_map_push_elem_proto; +extern const struct bpf_func_proto bpf_map_pop_elem_proto; +extern const struct bpf_func_proto bpf_map_peek_elem_proto; extern const struct bpf_func_proto bpf_get_prandom_u32_proto; extern const struct bpf_func_proto bpf_get_smp_processor_id_proto; diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index 658509daacd4..a2ec73aa1ec7 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -69,3 +69,5 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_XSKMAP, xsk_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, reuseport_array_ops) #endif #endif +BPF_MAP_TYPE(BPF_MAP_TYPE_QUEUE, queue_map_ops) +BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 3bb94aa2d408..bfa042273fad 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -129,6 +129,8 @@ enum bpf_map_type { BPF_MAP_TYPE_CGROUP_STORAGE, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, + BPF_MAP_TYPE_QUEUE, + BPF_MAP_TYPE_STACK, }; enum bpf_prog_type { @@ -463,6 +465,28 @@ union bpf_attr { * Return * 0 on success, or a negative error in case of failure. * + * int bpf_map_push_elem(struct bpf_map *map, const void *value, u64 flags) + * Description + * Push an element *value* in *map*. *flags* is one of: + * + * **BPF_EXIST** + * If the queue/stack is full, the oldest element is removed to + * make room for this. + * Return + * 0 on success, or a negative error in case of failure. + * + * int bpf_map_pop_elem(struct bpf_map *map, void *value) + * Description + * Pop an element from *map*. + * Return + * 0 on success, or a negative error in case of failure. + * + * int bpf_map_peek_elem(struct bpf_map *map, void *value) + * Description + * Get an element from *map* without removing it. + * Return + * 0 on success, or a negative error in case of failure. + * * int bpf_probe_read(void *dst, u32 size, const void *src) * Description * For tracing programs, safely attempt to read *size* bytes from @@ -790,14 +814,14 @@ union bpf_attr { * * int ret; * struct bpf_tunnel_key key = {}; - * + * * ret = bpf_skb_get_tunnel_key(skb, , sizeof(key), 0); * if (ret < 0) * return TC_ACT_SHOT; // drop
Re: [PATCH] net/packet: fix packet drop as of virtio gso
On 2018年09月29日 23:41, Jianfeng Tan wrote: When we use raw socket as the vhost backend, a packet from virito with gso offloading information, cannot be sent out in later validaton at xmit path, as we did not set correct skb->protocol which is further used for looking up the gso function. Hi: May I ask the reason for using raw socket for vhost? It was not a common setup with little care in the past few years. And it was slow since it lacks some recent improvements. Can it be replaced with e.g macvtap? Thanks To fix this, we set this field according to virito hdr information. Fixes: e858fae2b0b8f4 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion") Cc: sta...@vger.kernel.org Signed-off-by: Jianfeng Tan --- include/linux/virtio_net.h | 18 ++ net/packet/af_packet.c | 11 +++ 2 files changed, 25 insertions(+), 4 deletions(-) diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h index 9397628a1967..cb462f9ab7dd 100644 --- a/include/linux/virtio_net.h +++ b/include/linux/virtio_net.h @@ -5,6 +5,24 @@ #include #include +static inline int virtio_net_hdr_set_proto(struct sk_buff *skb, + const struct virtio_net_hdr *hdr) +{ + switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) { + case VIRTIO_NET_HDR_GSO_TCPV4: + case VIRTIO_NET_HDR_GSO_UDP: + skb->protocol = cpu_to_be16(ETH_P_IP); + break; + case VIRTIO_NET_HDR_GSO_TCPV6: + skb->protocol = cpu_to_be16(ETH_P_IPV6); + break; + default: + return -EINVAL; + } + + return 0; +} + static inline int virtio_net_hdr_to_skb(struct sk_buff *skb, const struct virtio_net_hdr *hdr, bool little_endian) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 75c92a87e7b2..d6e94dc7e290 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2715,10 +2715,12 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) } } - if (po->has_vnet_hdr && virtio_net_hdr_to_skb(skb, vnet_hdr, - vio_le())) { - tp_len = -EINVAL; - goto tpacket_error; + if (po->has_vnet_hdr) { + if (virtio_net_hdr_to_skb(skb, vnet_hdr, vio_le())) { + tp_len = -EINVAL; + goto tpacket_error; + } + virtio_net_hdr_set_proto(skb, vnet_hdr); } skb->destructor = tpacket_destruct_skb; @@ -2915,6 +2917,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) if (err) goto out_free; len += sizeof(vnet_hdr); + virtio_net_hdr_set_proto(skb, _hdr); } skb_probe_transport_header(skb, reserve);
can not sync master interface when bond1 box connected with another bond1 box
Hi guys, I encountered this problem when using bonding with mode1. I have two linux box, they both have two nics, and i setup these two nics with bond1 mode on each linux box. And then I connected these two linux box with each other. And then , I found , sometimes, Box A selects eth0 as active, eth1 as backup; and at this moment, Box B auto selects eth1 as active, eth0 as backup. But my Box A's eth0 is connected with Box B's eth0, so they are disconnected and can not recovery from this situation, until i reboot or re-plug cables. So , guys, how can I make two linux boxes both with bonding mode 1, connected with each other steadily. Thanks.
Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.
On Sun, Oct 07, 2018 at 09:54:00PM +0200, Andrew Lunn wrote: > Sure, but things have moved on since then. I was curious about this. Based on your uses cases, I guess that you mean phylib? But not much has changed AFAICT. (There is one new global function and two were removed, but that doesn't change the picture WRT time stamping.) Phylink now has two or three new users, one of which is dsa. Is that the big move? The situation with MACs that handle their own PHYs without phylib is unchanged, AFAICT. So what exactly do you mean? Thanks, Richard
[PATCH] net: vhost: remove bad code line
From: Tonghao Zhang Signed-off-by: Tonghao Zhang --- drivers/vhost/net.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 015abf3..ab11b2b 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -562,7 +562,6 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net, if (r == tvq->num && tvq->busyloop_timeout) { /* Flush batched packets first */ if (!vhost_sock_zcopy(tvq->private_data)) - // vhost_net_signal_used(tnvq); vhost_tx_batch(net, tnvq, tvq->private_data, msghdr); vhost_net_busy_poll(net, rvq, tvq, busyloop_intr, false); -- 1.8.3.1
Re: [PATCH net-next 19/20] net: Update netconf dump handlers for strict data checking
On 10/7/18 4:53 AM, Christian Brauner wrote: >> @@ -2076,6 +2077,21 @@ static int inet_netconf_dump_devconf(struct sk_buff >> *skb, >> struct in_device *in_dev; >> struct hlist_head *head; >> >> +if (cb->strict_check) { >> +struct netlink_ext_ack *extack = cb->extack; >> +struct netconfmsg *ncm; >> + >> +if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ncm))) { >> +NL_SET_ERR_MSG(extack, "Invalid header"); >> +return -EINVAL; >> +} >> + >> +if (nlh->nlmsg_len != nlmsg_msg_size(sizeof(*ncm))) { >> +NL_SET_ERR_MSG(extack, "Invalid data after header"); >> +return -EINVAL; >> +} > > Hm, I think this could just be one branch with != > But if you've done this to report back a more meaningful error message > to userspace, fine. :) Consistency with other dump handlers and better userspace error messages. If netconf ever gets a filter the length check is removed in favor of nlmsg_parse_strict
Re: [PATCH net-next 15/20] net/neighbor: Update neightbl_dump_info for strict data checking
On 10/7/18 4:48 AM, Christian Brauner wrote: >> + >> static int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback >> *cb) >> { >> +const struct nlmsghdr *nlh = cb->nlh; >> struct net *net = sock_net(skb->sk); >> int family, tidx, nidx = 0; >> int tbl_skip = cb->args[0]; >> int neigh_skip = cb->args[1]; >> struct neigh_table *tbl; >> >> -family = ((struct rtgenmsg *) nlmsg_data(cb->nlh))->rtgen_family; >> +if (cb->strict_check) { >> +int err = neightbl_valid_dump_info(nlh, cb->extack); >> + >> +if (err) >> +return err; >> +} >> + >> +family = ((struct rtgenmsg *)nlmsg_data(nlh))->rtgen_family; > > So this already was a problem prior to your patch: what happens when you > pass in the wrong struct? Then this case is not safe to do and might > contain all kinds of crap. 'This case' meaning the above dereference? family is *always* the first element in all of the header structs. It is core to the rtnetlink processing.
Re: [PATCH net-next 12/20] rtnetlink: Update ipmr_rtm_dumplink for strict data checking
On 10/7/18 4:40 AM, Christian Brauner wrote: >> @@ -2718,6 +2743,13 @@ static int ipmr_rtm_dumplink(struct sk_buff *skb, >> struct netlink_callback *cb) >> unsigned int e = 0, s_e; >> struct mr_table *mrt; >> >> +if (cb->strict_check) { >> +int err = ipmr_valid_dumplink(cb->nlh, cb->extack); >> + >> +if (err) >> +return err; > > Nit: can we remove the unnecessary \n, please. Coding standards dictate a newline between declarations and code. And that is my preference too.
Re: [PATCH net-next 09/20] rtnetlink: Update rtnl_bridge_getlink for strict data checking
On 10/7/18 4:36 AM, Christian Brauner wrote: >> +if (cb->strict_check) { >> +struct ifinfomsg *ifm; >> >> -extfilt = nlmsg_find_attr(cb->nlh, sizeof(struct ifinfomsg), >> - IFLA_EXT_MASK); >> -if (extfilt) { >> -if (nla_len(extfilt) < sizeof(filter_mask)) >> -return -EINVAL; >> +if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ifm))) { >> +NL_SET_ERR_MSG(extack, "Invalid header"); >> +return -EINVAL; >> +} >> + >> +ifm = nlmsg_data(nlh); >> +if (ifm->__ifi_pad || ifm->ifi_type || ifm->ifi_flags || >> +ifm->ifi_change || ifm->ifi_index) { >> +NL_SET_ERR_MSG(extack, "Invalid values in header for >> dump request"); >> +return -EINVAL; >> +} >> +} >> >> -filter_mask = nla_get_u32(extfilt); >> +err = nlmsg_parse(nlh, sizeof(struct ifinfomsg), tb, IFLA_MAX, >> + ifla_policy, extack); >> +if (err < 0) { >> +if (cb->strict_check) >> +return -EINVAL; >> +goto walk_entries; >> +} > > What's the point of moving this out of the > if (cb->strict_check) {} branch above? This looks like it would cause > the same parse warnings that we're trying to get rid of in inet{4,6} > dumps. Link messages don't have the problem in general because they use ifinfomsg as the header - which is the one abused for other message types. That said ... > Seems to make more sense to make the nlmsg_parse() itself conditional as > well unless I'm lacking context. ... I now have nlmsg_parse and nlmsg_parse_strict.
Re: [PATCH net-next 08/20] rtnetlink: Update rtnl_dump_ifinfo for strict data checking
On 10/7/18 4:29 AM, Christian Brauner wrote: >> I thought about that, but there is so much overlap - they are mostly >> common. Besides, ifinfomsg is the header for link dumps, and ifinfomsg >> is the one that has been (ab)used for other message types, so strict >> versus lenient does not really have a differentiator for this message >> type - other than checking the elements of the struct. > > It's mostly about the function being extremely long and convoluted. > Having parts moved out into (a) descriptive helper(s) with whatever name > might make this way more readable than it is now especially with the new > handling we need for strict checking. > understood. In the next version I have pushed most of the checking into helpers.
Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.
On Sun, Oct 07, 2018 at 02:07:28PM -0700, Richard Cochran wrote: > On Sun, Oct 07, 2018 at 01:59:06PM -0700, Richard Cochran wrote: > > On Sun, Oct 07, 2018 at 09:54:00PM +0200, Andrew Lunn wrote: > > > 1) phylink, not phdev. We have been pushing some MAC drivers towards > > > phylink, especially those which support >1Gbp. > > > > If a phylink device appears that wants time stamping, can't we add the > > call to register_mii_timestamper()? > > Actually, I see that 'struct phylink' has a 'struct phy_device *phydev', > and so it can implement the 'struct mii_timestamper' interface directly. Maybe. But you still don't have skb->dev->phydev. And phylink->phydev is much more dynamic, since it can be hot-{un}plugged. You need to handle it going away at any time. However, your timestamper is unlikely to be hot-{un}pluggable. So skb->dev->mii_timestamper seems a lot safer. Andrew
Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.
On Sun, Oct 07, 2018 at 11:14:05PM +0200, Andrew Lunn wrote: > The problem is you depend on skbuf->dev->phydev. phydev will be NULL. > net_device does not currently have a phylink member. Even if it did, > you end up add more and more tests looking every place a > mii_timestamper could be placed. Ok, so the way to do this is to have something like CONFIG_NETWORK_PHYLINK_TIMESTAMPING. We can deal with that if and when any real devices appear. > I'm currently thinking register_mii_timestamper() should take a netdev > argument, and the net_device structure should gain a struct > mii_timestamper. > > But we have to look at the lifetime problems. A phydev does not know > what netdev it is associated to until phy_connect() is called. It is > at that point you can call register_mii_timestamper(). Right, IOW passing a netdev won't work. Thanks, Richard
Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.
On Sun, Oct 07, 2018 at 01:59:06PM -0700, Richard Cochran wrote: > On Sun, Oct 07, 2018 at 09:54:00PM +0200, Andrew Lunn wrote: > > Sure, but things have moved on since then. > > If you have a specific suggestion on how to better implement this, > please tell us what it is. > > > I can think of three obvious use cases where this does not work: > > > > 1) phylink, not phdev. We have been pushing some MAC drivers towards > > phylink, especially those which support >1Gbp. > > If a phylink device appears that wants time stamping, can't we add the > call to register_mii_timestamper()? Hi Richard The problem is you depend on skbuf->dev->phydev. phydev will be NULL. net_device does not currently have a phylink member. Even if it did, you end up add more and more tests looking every place a mii_timestamper could be placed. > > 2) When an SFP is connected to the MAC, not a copper PHY. The class of > > device you are adding a driver for will work just as well for an SFP > > as for a copper PHY. The SERDES interface remains the same, > > independent of if a copper PHY is used, or a SFP. But an SFP does not > > have an instance of a phydv. > > Well, as I said before in v1, CONFIG_NETWORK_PHY_TIMESTAMPING depends > on phylib, plain and simple, and expanding beyond phylib is not within > the scope of the this series. True. But we also should be forward looking, to make sure we are not heading into a dead end. I'm currently thinking register_mii_timestamper() should take a netdev argument, and the net_device structure should gain a struct mii_timestamper. But we have to look at the lifetime problems. A phydev does not know what netdev it is associated to until phy_connect() is called. It is at that point you can call register_mii_timestamper(). Andrew
Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.
On Sun, Oct 07, 2018 at 01:59:06PM -0700, Richard Cochran wrote: > On Sun, Oct 07, 2018 at 09:54:00PM +0200, Andrew Lunn wrote: > > 1) phylink, not phdev. We have been pushing some MAC drivers towards > > phylink, especially those which support >1Gbp. > > If a phylink device appears that wants time stamping, can't we add the > call to register_mii_timestamper()? Actually, I see that 'struct phylink' has a 'struct phy_device *phydev', and so it can implement the 'struct mii_timestamper' interface directly. Thanks, Richard
Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.
On Sun, Oct 07, 2018 at 09:54:00PM +0200, Andrew Lunn wrote: > Sure, but things have moved on since then. If you have a specific suggestion on how to better implement this, please tell us what it is. > I can think of three obvious use cases where this does not work: > > 1) phylink, not phdev. We have been pushing some MAC drivers towards > phylink, especially those which support >1Gbp. If a phylink device appears that wants time stamping, can't we add the call to register_mii_timestamper()? > 2) When an SFP is connected to the MAC, not a copper PHY. The class of > device you are adding a driver for will work just as well for an SFP > as for a copper PHY. The SERDES interface remains the same, > independent of if a copper PHY is used, or a SFP. But an SFP does not > have an instance of a phydv. Well, as I said before in v1, CONFIG_NETWORK_PHY_TIMESTAMPING depends on phylib, plain and simple, and expanding beyond phylib is not within the scope of the this series. > 3) Firmware controlled PHYs. phylib/phylink is not used, the MAC turns > all ethtool calls into RPCs to the firmware. I've no numbers about > this, but i have the feeling this is becoming more popular. It does > however tend to be high end devices, and those are more likely to have > timestamping in the MAC. I suppose they could also offload > tomestamping to the firmware, in which case, they might want to make > use of this new API. Any MAC with private PHY stuff (that doesn't use phylib) can implement SO_TIMESTAMPING directly, as if it were a MAC. Thanks, Richard
Re: [PATCH V2 net-next 4/5] net: mdio: of: Register discovered MII time stampers.
On Sun, Oct 07, 2018 at 12:26:27PM -0700, Richard Cochran wrote: > On Sun, Oct 07, 2018 at 08:17:54PM +0200, Andrew Lunn wrote: > > > > + if (err == -ENOENT) > > > > + return NULL; > > > > + else if (err) > > > > + return ERR_PTR(err); > > > > + > > > > + if (args.args_count >= 1) > > > > + port = args.args[0]; > > > > > > If it's greater than one, than it is an error, and it should be flagged > > > as such. > > > > > > The idea looks good though, should of_find_mii_timestamper() somehow be > > > made conditional to CONFIG_PTP and we should have a stub for when it is > > > disabled? > > > > Hi Florian > > > > There already is a stub. But register return -EOPNOTSUPP. > > The stub returns NULL... Ah, sorry, it is register_mii_tstamp_controller() which return -EOPNOTSUP. Andrew
Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.
On Sun, Oct 07, 2018 at 12:15:51PM -0700, Richard Cochran wrote: > On Sun, Oct 07, 2018 at 08:27:51PM +0200, Andrew Lunn wrote: > > The mii_timestamper is generic, in the same why hwmon is generic. It > > does not matter where the time stamper is. So i'm wondering if we > > should remove the special case for a PHY timestamper, remove all the > > phylib support, etc. > > This implementation is (to the best of my understanding) what you were > asking for in your review of v1: Sure, but things have moved on since then. > > So i really think you need to cleanly integrate into phylib and > > phylink. > > > Use a phandle, and have > > of_mdiobus_register_phy() follow the phandle to get the device. > > > To keep lifecycle issues simple, i would also keep it in phydev, not > > netdev. > > This present series is a reasonable, incremental improvement to the > existing PHY time stamping support. It will handle any use case that > I can think of, and I would like to avoid over-engineering this. I can think of three obvious use cases where this does not work: 1) phylink, not phdev. We have been pushing some MAC drivers towards phylink, especially those which support >1Gbp. 2) When an SFP is connected to the MAC, not a copper PHY. The class of device you are adding a driver for will work just as well for an SFP as for a copper PHY. The SERDES interface remains the same, independent of if a copper PHY is used, or a SFP. But an SFP does not have an instance of a phydv. 2a) An SFP which is actually a Copper PHY. There is a phydev for this, but it is associated to the phylink, not the netdev. 3) Firmware controlled PHYs. phylib/phylink is not used, the MAC turns all ethtool calls into RPCs to the firmware. I've no numbers about this, but i have the feeling this is becoming more popular. It does however tend to be high end devices, and those are more likely to have timestamping in the MAC. I suppose they could also offload tomestamping to the firmware, in which case, they might want to make use of this new API. Andrew
Re: [PATCH V2 net-next 0/5] Peer to Peer One-Step time stamping
On Sun, Oct 07, 2018 at 10:38:18AM -0700, Richard Cochran wrote: > Changed in v2: > ~~ > - Per the v1 review, changed the modeling of MII time stamping > devices. They are no longer a kind of mdio device. Forgot to add: - Added method to callback into the driver after changes in link status. Thanks, Richard
Re: [PATCH V2 net-next 4/5] net: mdio: of: Register discovered MII time stampers.
On Sun, Oct 07, 2018 at 08:17:54PM +0200, Andrew Lunn wrote: > > > + if (err == -ENOENT) > > > + return NULL; > > > + else if (err) > > > + return ERR_PTR(err); > > > + > > > + if (args.args_count >= 1) > > > + port = args.args[0]; > > > > If it's greater than one, than it is an error, and it should be flagged > > as such. > > > > The idea looks good though, should of_find_mii_timestamper() somehow be > > made conditional to CONFIG_PTP and we should have a stub for when it is > > disabled? > > Hi Florian > > There already is a stub. But register return -EOPNOTSUPP. The stub returns NULL... > > > + return register_mii_timestamper(args.np, port); > > So this returns EOPNOTUP NULL... > > > static int of_mdiobus_register_phy(struct mii_bus *mdio, > > > struct device_node *child, u32 addr) > > > { > > > + struct mii_timestamper *mii_ts; > > > struct phy_device *phy; > > > bool is_c45; > > > int rc; > > > u32 phy_id; > > > > > > + mii_ts = of_find_mii_timestamper(child); > > > + if (IS_ERR(mii_ts)) > > > + return PTR_ERR(mii_ts); > > > + > > and this returns EOPNOPTSUPP, so the PHY is not registered :-( and the phydev.mii_ts field is then set to NULL. Thanks, Richard
Re: [PATCH V2 net-next 4/5] net: mdio: of: Register discovered MII time stampers.
On Sun, Oct 07, 2018 at 11:14:38AM -0700, Florian Fainelli wrote: > There appears to be a binding document missing to describe what a > timerstamper provider is. Using a more specific name than > "#phandle-cells" is preferred when dealing with specific devices, e.g: > > interrupt-controller/#interrupt-cells > clocks/#clock-cells Sure. > So I would go with #timestamp-cells here, and define what the cell sie > and format should be in a separate "dt-bindings" prefixed patch that the > Device Tree folks can also comment on. I documented this in the last patch. I didn't see any example in our device tree that explains a "reference" like this that is not connected to a specific node type. > > > + if (err == -ENOENT) > > + return NULL; > > + else if (err) > > + return ERR_PTR(err); > > + > > + if (args.args_count >= 1) > > + port = args.args[0]; > > If it's greater than one, than it is an error, and it should be flagged > as such. I wanted to allow specific MII time stamping drivers to use one than one value in the future, should the need arise. > The idea looks good though, should of_find_mii_timestamper() somehow be > made conditional to CONFIG_PTP and we should have a stub for when it is > disabled? Do you mean CONFIG_NETWORK_PHY_TIMESTAMPING ? There is a stub for that. Thanks, Richard
Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.
On Sun, Oct 07, 2018 at 08:27:51PM +0200, Andrew Lunn wrote: > The mii_timestamper is generic, in the same why hwmon is generic. It > does not matter where the time stamper is. So i'm wondering if we > should remove the special case for a PHY timestamper, remove all the > phylib support, etc. This implementation is (to the best of my understanding) what you were asking for in your review of v1: > So i really think you need to cleanly integrate into phylib and > phylink. > Use a phandle, and have > of_mdiobus_register_phy() follow the phandle to get the device. > To keep lifecycle issues simple, i would also keep it in phydev, not > netdev. This present series is a reasonable, incremental improvement to the existing PHY time stamping support. It will handle any use case that I can think of, and I would like to avoid over-engineering this. Thanks, Richard
Re: [PATCH rdma-next 3/4] IB/mlx5: Verify that driver supports user flags
On Sun, Oct 07, 2018 at 12:03:36PM +0300, Leon Romanovsky wrote: > From: Yonatan Cohen > > Flags sent down from user might not be supported by > running driver. > This might lead to unwanted bugs. > To solve this, added macro to test for unsupported flags. > > Signed-off-by: Yonatan Cohen > Signed-off-by: Leon Romanovsky > drivers/infiniband/hw/mlx5/qp.c | 12 > 1 file changed, 12 insertions(+) > > diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c > index bae48bdf281c..17c4b6641933 100644 > +++ b/drivers/infiniband/hw/mlx5/qp.c > @@ -1728,6 +1728,15 @@ static void configure_requester_scat_cqe(struct > mlx5_ib_dev *dev, > MLX5_SET(qpc, qpc, cs_req, MLX5_REQ_SCAT_DATA32_CQE); > } > > +#define MLX5_QP_CREATE_FLAGS_NOT_SUPPORTED(flags) \ > + ((flags) & ~(\ This needs a cast, it would be better to add something like the check comp mask function in rdma-core than this goofy macro thing. Jason
Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.
On 10/7/2018 11:27 AM, Andrew Lunn wrote: On Sun, Oct 07, 2018 at 10:38:20AM -0700, Richard Cochran wrote: Currently the stack supports time stamping in PHY devices. However, there are newer, non-PHY devices that can snoop an MII bus and provide time stamps. In order to support such devices, this patch introduces a new interface to be used by both PHY and non-PHY devices. In addition, the one and only user of the old PHY time stamping API is converted to the new interface. Hi Richard I'm a bit undecided about this. If you look at how we do HWMON sensors in PHYs, the probe function just registers with the HWMON subsystem. We don't have any support in phy_device, or anywhere else in the PHY core. The mii_timestamper is generic, in the same why hwmon is generic. It does not matter where the time stamper is. So i'm wondering if we should remove the special case for a PHY timestamper, remove all the phylib support, etc. I need to look at the other patches and see how this all fits together. Agreed, the fact that some PHYs capable of timestamping and register themselves as a timestamper makes sense, whether this needs to be backed into the core PHYLIB might have been something convenient at some point, but maybe we can revisit that paradigm now that there is more generic timestamping provider framework being proposed here. -- Florian
Re: [PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.
On Sun, Oct 07, 2018 at 10:38:20AM -0700, Richard Cochran wrote: > Currently the stack supports time stamping in PHY devices. However, > there are newer, non-PHY devices that can snoop an MII bus and provide > time stamps. In order to support such devices, this patch introduces > a new interface to be used by both PHY and non-PHY devices. > > In addition, the one and only user of the old PHY time stamping API is > converted to the new interface. Hi Richard I'm a bit undecided about this. If you look at how we do HWMON sensors in PHYs, the probe function just registers with the HWMON subsystem. We don't have any support in phy_device, or anywhere else in the PHY core. The mii_timestamper is generic, in the same why hwmon is generic. It does not matter where the time stamper is. So i'm wondering if we should remove the special case for a PHY timestamper, remove all the phylib support, etc. I need to look at the other patches and see how this all fits together. Andrew
Re: [PATCH V2 net-next 4/5] net: mdio: of: Register discovered MII time stampers.
On Sun, Oct 07, 2018 at 10:38:22AM -0700, Richard Cochran wrote: > When parsing a PHY node, register its time stamper, if any, and attach > the instance to the PHY device. Hi Richard This does look a lot better. Thanks for making the changes. Andrew
Re: [PATCH V2 net-next 4/5] net: mdio: of: Register discovered MII time stampers.
> > + if (err == -ENOENT) > > + return NULL; > > + else if (err) > > + return ERR_PTR(err); > > + > > + if (args.args_count >= 1) > > + port = args.args[0]; > > If it's greater than one, than it is an error, and it should be flagged > as such. > > The idea looks good though, should of_find_mii_timestamper() somehow be > made conditional to CONFIG_PTP and we should have a stub for when it is > disabled? Hi Florian There already is a stub. But register return -EOPNOTSUPP. > > + > > + return register_mii_timestamper(args.np, port); So this returns EOPNOTUP > > +} > > + > > static int of_mdiobus_register_phy(struct mii_bus *mdio, > > struct device_node *child, u32 addr) > > { > > + struct mii_timestamper *mii_ts; > > struct phy_device *phy; > > bool is_c45; > > int rc; > > u32 phy_id; > > > > + mii_ts = of_find_mii_timestamper(child); > > + if (IS_ERR(mii_ts)) > > + return PTR_ERR(mii_ts); > > + and this returns EOPNOPTSUPP, so the PHY is not registered :-( Andrew
Re: [PATCH V2 net-next 4/5] net: mdio: of: Register discovered MII time stampers.
Re: [PATCH v2 2/2] netdev/phy: add MDIO bus multiplexer driven by a regmap
On 10/07/18 11:24, Pankaj Bansal wrote: > Add support for an MDIO bus multiplexer controlled by a regmap > device, like an FPGA. > > Tested on a NXP LX2160AQDS board which uses the "QIXIS" FPGA > attached to the i2c bus. > > Signed-off-by: Pankaj Bansal > --- > > Notes: > V2: > - replaced be32_to_cpup with of_property_read_u32 > - incorporated Andrew's comments > > drivers/net/phy/Kconfig | 13 +++ > drivers/net/phy/Makefile | 1 + > drivers/net/phy/mdio-mux-regmap.c | 171 > 3 files changed, 185 insertions(+) > > diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig > index 82070792edbb..d1ac9e70cbb2 100644 > --- a/drivers/net/phy/Kconfig > +++ b/drivers/net/phy/Kconfig > @@ -87,6 +87,19 @@ config MDIO_BUS_MUX_MMIOREG > > Currently, only 8/16/32 bits registers are supported. > > +config MDIO_BUS_MUX_REGMAP > + tristate "REGMAP controlled MDIO bus multiplexers" > + depends on OF_MDIO && REGMAP > + select MDIO_BUS_MUX > + help > + This module provides a driver for MDIO bus multiplexers that > + are controlled via a regmap device, like an FPGA connected to i2c. > + The multiplexer connects one of several child MDIO busses to a > + parent bus.Child bus selection is under the control of one of > + the FPGA's registers. > + > + Currently, only upto 32 bits registers are supported. > + > config MDIO_CAVIUM > tristate > > diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile > index 5805c0b7d60e..33053f9f320d 100644 > --- a/drivers/net/phy/Makefile > +++ b/drivers/net/phy/Makefile > @@ -29,6 +29,7 @@ obj-$(CONFIG_MDIO_BUS_MUX) += mdio-mux.o > obj-$(CONFIG_MDIO_BUS_MUX_BCM_IPROC) += mdio-mux-bcm-iproc.o > obj-$(CONFIG_MDIO_BUS_MUX_GPIO) += mdio-mux-gpio.o > obj-$(CONFIG_MDIO_BUS_MUX_MMIOREG) += mdio-mux-mmioreg.o > +obj-$(CONFIG_MDIO_BUS_MUX_REGMAP) += mdio-mux-regmap.o > obj-$(CONFIG_MDIO_CAVIUM)+= mdio-cavium.o > obj-$(CONFIG_MDIO_GPIO) += mdio-gpio.o > obj-$(CONFIG_MDIO_HISI_FEMAC)+= mdio-hisi-femac.o > diff --git a/drivers/net/phy/mdio-mux-regmap.c > b/drivers/net/phy/mdio-mux-regmap.c > new file mode 100644 > index ..6068d05a728a > --- /dev/null > +++ b/drivers/net/phy/mdio-mux-regmap.c > @@ -0,0 +1,171 @@ > +// SPDX-License-Identifier: GPL-2.0+ > + > +/* Simple regmap based MDIO MUX driver > + * > + * Copyright 2018 NXP > + * > + * Based on mdio-mux-mmioreg.c by Timur Tabi > + * > + * Author: > + * Pankaj Bansal > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +struct mdio_mux_regmap_state { > + void*mux_handle; > + struct regmap *regmap; > + u32 mux_reg; > + u32 mask; > +}; > + > +/* MDIO multiplexing switch function > + * > + * This function is called by the mdio-mux layer when it thinks the mdio bus > + * multiplexer needs to switch. > + * > + * 'current_child' is the current value of the mux register (masked via > + * s->mask). > + * > + * 'desired_child' is the value of the 'reg' property of the target child > MDIO > + * node. > + * > + * The first time this function is called, current_child == -1. > + * > + * If current_child == desired_child, then the mux is already set to the > + * correct bus. > + */ > +static int mdio_mux_regmap_switch_fn(int current_child, int desired_child, > + void *data) > +{ > + struct mdio_mux_regmap_state *s = data; > + bool change; > + int ret; > + > + ret = regmap_update_bits_check(s->regmap, > +s->mux_reg, > +s->mask, > +desired_child, > +); > + > + if (ret) > + return ret; > + if (change) > + pr_debug("%s %d -> %d\n", __func__, current_child, > + desired_child); If you add a struct platform_device or struct device reference to struct mdio_mux_regmap_state, the you can use dev_dbg() here with the correct device, which would be helpful if you are debugging problems, and there are more than once instance of them in the system. > + return ret; > +} > + > +static int mdio_mux_regmap_probe(struct platform_device *pdev) > +{ > + struct device_node *np2, *np = pdev->dev.of_node; How about naming "np2", "child" instead? Everything else looks fine to me, thanks! -- Florian
Re: [PATCH v2 1/2] dt-bindings: net: add MDIO bus multiplexer driven by a regmap device
On 10/07/18 11:24, Pankaj Bansal wrote: > Add support for an MDIO bus multiplexer controlled by a regmap > device, like an FPGA. > > Tested on a NXP LX2160AQDS board which uses the "QIXIS" FPGA > attached to the i2c bus. > > Signed-off-by: Pankaj Bansal > --- > > Notes: > V2: > - Fixed formatting error caused by using space instead of tab > - Add newline between property list and subnode > - Add newline between two subnodes > > .../bindings/net/mdio-mux-regmap.txt | 95 ++ > 1 file changed, 95 insertions(+) > > diff --git a/Documentation/devicetree/bindings/net/mdio-mux-regmap.txt > b/Documentation/devicetree/bindings/net/mdio-mux-regmap.txt > new file mode 100644 > index ..88ebac26c1c5 > --- /dev/null > +++ b/Documentation/devicetree/bindings/net/mdio-mux-regmap.txt > @@ -0,0 +1,95 @@ > +Properties for an MDIO bus multiplexer controlled by a regmap > + > +This is a special case of a MDIO bus multiplexer. A regmap device, > +like an FPGA, is used to control which child bus is connected. The mdio-mux > +node must be a child of the device that is controlled by a regmap. > +The driver currently only supports devices with upto 32-bit registers. I would omit any sort of details about Linux constructs designed to support specific needs (e.g: regmap) as well as putting driver limitations into a binding document because it really ought to be restricted to describing hardware. > + > +Required properties in addition to the generic multiplexer properties: > + > +- compatible : string, must contain "mdio-mux-regmap" > + > +- reg : integer, contains the offset of the register that controls the bus > + multiplexer. it can be 32 bit number. Technically it could be any "reg" property size, the only requirement is that it must be of the appropriate size with respect to what the parent node of this "mdio-mux-regmap" node has, determined by #address-cells/#size-cells width. > + > +- mux-mask : integer, contains an 32 bit mask that specifies which > + bits in the register control the actual bus multiplexer. The > + 'reg' property of each child mdio-mux node must be constrained by > + this mask. Same thing here. Since this is a MDIO mux, I would invite you to specify what the correct #address-cells/#size-cells values should be (1, and 0 respectively as your example correctly shows). > + > +Example: > + > +The FPGA node defines a i2c connected FPGA with a register space of 0x30 > bytes. > +For the "EMI2" MDIO bus, register 0x54 (BRDCFG4) controls the mux on that > bus. > +A bitmask of 0x07 means that bits 0, 1 and 2 (bit 0 is lsb) are the bits on > +BRDCFG4 that control the actual mux. > + > +i2c@200 { > + compatible = "fsl,vf610-i2c"; > + #address-cells = <1>; > + #size-cells = <0>; > + reg = <0x0 0x200 0x0 0x1>; > + interrupts = <0 34 0x4>; // Level high type > + clock-names = "i2c"; > + clocks = < 4 7>; > + fsl-scl-gpio = < 15 0>; > + status = "okay"; > + > + /* The FPGA node */ > + fpga@66 { > + compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c"; > + reg = <0x66>; > + #address-cells = <1>; > + #size-cells = <0>; > + > + mdio1_mux@54 { > + compatible = "mdio-mux-regmap", "mdio-mux"; > + mdio-parent-bus = <>; /* MDIO bus */ > + reg = <0x54>;/* BRDCFG4 */ > + mux-mask = <0x07>; /* EMI2_MDIO */ > + #address-cells=<1>; > + #size-cells = <0>; > + > + mdio1_ioslot1@0 { // Slot 1 > + reg = <0x00>; > + #address-cells = <1>; > + #size-cells = <0>; > + > + phy1@1 { > + reg = <1>; > + compatible = "ethernet-phy-id0210.7441"; > + }; > + > + phy1@0 { > + reg = <0>; > + compatible = "ethernet-phy-id0210.7441"; > + }; > + }; > + > + mdio1_ioslot2@1 { // Slot 2 > + reg = <0x01>; > + #address-cells = <1>; > + #size-cells = <0>; > + > + }; > + > + mdio1_ioslot3@2 { // Slot 3 > + reg = <0x02>; > + #address-cells = <1>; > + #size-cells = <0>; > + > + }; > + }; > + }; > +}; > + > + /* The parent MDIO bus. */ > + emdio2: mdio@0x8B97000 { > + compatible = "fsl,fman-memac-mdio"; > + reg = <0x0 0x8B97000 0x0 0x1000>; > +
Re: [PATCH iproute2 net-next v3 0/6] Introduce the taprio scheduler
On 10/5/18 5:25 PM, Vinicius Costa Gomes wrote: > Hi, > ... > This is the iproute2 side of the taprio v1 series. > > Please see the kernel side cover letter for more information about how > to test this. > > Cheers, > -- > Vinicius > > Jesus Sanchez-Palencia (1): > libnetlink: Add helper for getting a __s32 from netlink msgs > > Vinicius Costa Gomes (5): > utils: Implement get_s64() > include: Add helper to retrieve a __s64 from a netlink msg > include: add definitions for taprio [DO NOT COMMIT] > tc: Add support for configuring the taprio scheduler > taprio: Add manpage for tc-taprio(8) > applied to iproute2-next. Thanks
[PATCH V2 net-next 0/5] Peer to Peer One-Step time stamping
Changed in v2: ~~ - Per the v1 review, changed the modeling of MII time stamping devices. They are no longer a kind of mdio device. This series adds support for PTP (IEEE 1588) P2P one-step time stamping along with a driver for a hardware device that supports this. If the hardware supports p2p one-step, it subtracts the ingress time stamp value from the Pdelay_Request correction field. The user space software stack then simply copies the correction field into the Pdelay_Response, and on transmission the hardware adds the egress time stamp into the correction field. - Patch 1 adds the new option. - Patches 2-4 adds support for MII time stamping in non-PHY devices. - Patch 5 adds a driver implementing the new option. User space support is available in the current linuxptp master branch. Thanks, Richard Richard Cochran (5): net: Introduce peer to peer one step PTP time stamping. net: Introduce a new MII time stamping interface. net: Add a layer for non-PHY MII time stamping drivers. net: mdio: of: Register discovered MII time stampers. ptp: Add a driver for InES time stamping IP core. Documentation/devicetree/bindings/ptp/ptp-ines.txt | 37 + drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 + drivers/net/phy/Makefile | 2 + drivers/net/phy/dp83640.c | 47 +- drivers/net/phy/mii_timestamper.c | 121 +++ drivers/net/phy/phy.c | 4 +- drivers/net/phy/phy_device.c | 5 + drivers/of/of_mdio.c | 26 + drivers/ptp/Kconfig| 10 + drivers/ptp/Makefile | 1 + drivers/ptp/ptp_ines.c | 870 + include/linux/mii_timestamper.h| 115 +++ include/linux/phy.h| 25 +- include/uapi/linux/net_tstamp.h| 8 + net/8021q/vlan_dev.c | 4 +- net/Kconfig| 7 +- net/core/dev_ioctl.c | 1 + net/core/ethtool.c | 4 +- net/core/timestamping.c| 20 +- 19 files changed, 1251 insertions(+), 57 deletions(-) create mode 100644 Documentation/devicetree/bindings/ptp/ptp-ines.txt create mode 100644 drivers/net/phy/mii_timestamper.c create mode 100644 drivers/ptp/ptp_ines.c create mode 100644 include/linux/mii_timestamper.h -- 2.11.0
[PATCH V2 net-next 1/5] net: Introduce peer to peer one step PTP time stamping.
The 1588 standard defines one step operation for both Sync and PDelay_Resp messages. Up until now, hardware with P2P one step has been rare, and kernel support was lacking. This patch adds support of the mode in anticipation of new hardware developments. Signed-off-by: Richard Cochran --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 + include/uapi/linux/net_tstamp.h | 8 net/core/dev_ioctl.c | 1 + 3 files changed, 10 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c index 40093d88353f..2cdbc16245c2 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c @@ -15369,6 +15369,7 @@ int bnx2x_configure_ptp_filters(struct bnx2x *bp) NIG_REG_P0_TLLH_PTP_RULE_MASK, 0x3EEE); break; case HWTSTAMP_TX_ONESTEP_SYNC: + case HWTSTAMP_TX_ONESTEP_P2P: BNX2X_ERR("One-step timestamping is not supported\n"); return -ERANGE; } diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h index 97ff3c17ec4d..091441a4f78f 100644 --- a/include/uapi/linux/net_tstamp.h +++ b/include/uapi/linux/net_tstamp.h @@ -90,6 +90,14 @@ enum hwtstamp_tx_types { * queue. */ HWTSTAMP_TX_ONESTEP_SYNC, + + /* +* Same as HWTSTAMP_TX_ONESTEP_SYNC, but also enables time +* stamp insertion directly into PDelay_Resp packets. In this +* case, neither transmitted Sync nor PDelay_Resp packets will +* receive a time stamp via the socket error queue. +*/ + HWTSTAMP_TX_ONESTEP_P2P, }; /* possible values for hwtstamp_config->rx_filter */ diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c index 90e8aa36881e..8cdc13695909 100644 --- a/net/core/dev_ioctl.c +++ b/net/core/dev_ioctl.c @@ -187,6 +187,7 @@ static int net_hwtstamp_validate(struct ifreq *ifr) case HWTSTAMP_TX_OFF: case HWTSTAMP_TX_ON: case HWTSTAMP_TX_ONESTEP_SYNC: + case HWTSTAMP_TX_ONESTEP_P2P: tx_type_valid = 1; break; } -- 2.11.0
[PATCH V2 net-next 5/5] ptp: Add a driver for InES time stamping IP core.
The InES at the ZHAW offers a PTP time stamping IP core. The FPGA logic recognizes and time stamps PTP frames on the MII bus. This patch adds a driver for the core along with a device tree binding to allow hooking the driver to MII buses. Signed-off-by: Richard Cochran --- Documentation/devicetree/bindings/ptp/ptp-ines.txt | 37 + drivers/ptp/Kconfig| 10 + drivers/ptp/Makefile | 1 + drivers/ptp/ptp_ines.c | 870 + 4 files changed, 918 insertions(+) create mode 100644 Documentation/devicetree/bindings/ptp/ptp-ines.txt create mode 100644 drivers/ptp/ptp_ines.c diff --git a/Documentation/devicetree/bindings/ptp/ptp-ines.txt b/Documentation/devicetree/bindings/ptp/ptp-ines.txt new file mode 100644 index ..1484b62802c7 --- /dev/null +++ b/Documentation/devicetree/bindings/ptp/ptp-ines.txt @@ -0,0 +1,37 @@ +ZHAW InES PTP time stamping IP core + +The IP core needs two different kinds of nodes. The control node +lives somewhere in the memory map and specifies the address of the +control registers. There can be up to three port handles placed as +attributes of PHY nodes. These associate a particular MII bus with a +port index within the IP core. + +Required properties of the control node: + +- compatible: "ines,ptp-ctrl" +- reg: physical address and size of the register bank +- #phandle-cells: must be one (1) + +Required format of the port handle within the PHY node: + +- timestamper: provides control node reference and + the port channel within the IP core + +Example: + + tstamper: timestamper@6000 { + compatible = "ines,ptp-ctrl"; + reg = <0x6000 0x80>; + #phandle-cells = <1>; + }; + + ethernet@8000 { + ... + mdio { + ... + phy@3 { + ... + timestamper = < 0>; + }; + }; + }; diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig index d137c480db46..475aa6f32edd 100644 --- a/drivers/ptp/Kconfig +++ b/drivers/ptp/Kconfig @@ -88,6 +88,16 @@ config DP83640_PHY In order for this to work, your MAC driver must also implement the skb_tx_timestamp() function. +config PTP_1588_CLOCK_INES + tristate "ZHAW InES PTP time stamping IP core" + depends on NETWORK_PHY_TIMESTAMPING + depends on PHYLIB + depends on PTP_1588_CLOCK + help + This driver adds support for using the ZHAW InES 1588 IP + core. This clock is only useful if the MII bus of your MAC + is wired up to the core. + config PTP_1588_CLOCK_PCH tristate "Intel PCH EG20T as PTP clock" depends on X86_32 || COMPILE_TEST diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile index 19efa9cfa950..15b656712897 100644 --- a/drivers/ptp/Makefile +++ b/drivers/ptp/Makefile @@ -6,6 +6,7 @@ ptp-y := ptp_clock.o ptp_chardev.o ptp_sysfs.o obj-$(CONFIG_PTP_1588_CLOCK) += ptp.o obj-$(CONFIG_PTP_1588_CLOCK_DTE) += ptp_dte.o +obj-$(CONFIG_PTP_1588_CLOCK_INES) += ptp_ines.o obj-$(CONFIG_PTP_1588_CLOCK_IXP46X)+= ptp_ixp46x.o obj-$(CONFIG_PTP_1588_CLOCK_PCH) += ptp_pch.o obj-$(CONFIG_PTP_1588_CLOCK_KVM) += ptp_kvm.o diff --git a/drivers/ptp/ptp_ines.c b/drivers/ptp/ptp_ines.c new file mode 100644 index ..a05b478aad38 --- /dev/null +++ b/drivers/ptp/ptp_ines.c @@ -0,0 +1,870 @@ +// SPDX-License-Identifier: GPL-2.0 +// +// Copyright (C) 2018 MOSER-BAER AG +// + +#define pr_fmt(fmt) "InES_PTP: " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +MODULE_DESCRIPTION("Driver for the ZHAW InES PTP time stamping IP core"); +MODULE_AUTHOR("Richard Cochran "); +MODULE_VERSION("1.0"); +MODULE_LICENSE("GPL"); + +/* GLOBAL register */ +#define MCAST_MAC_SELECT_SHIFT 2 +#define MCAST_MAC_SELECT_MASK 0x3 +#define IO_RESET BIT(1) +#define PTP_RESET BIT(0) + +/* VERSION register */ +#define IF_MAJOR_VER_SHIFT 12 +#define IF_MAJOR_VER_MASK 0xf +#define IF_MINOR_VER_SHIFT 8 +#define IF_MINOR_VER_MASK 0xf +#define FPGA_MAJOR_VER_SHIFT 4 +#define FPGA_MAJOR_VER_MASK0xf +#define FPGA_MINOR_VER_SHIFT 0 +#define FPGA_MINOR_VER_MASK0xf + +/* INT_STAT register */ +#define RX_INTR_STATUS_3 BIT(5) +#define RX_INTR_STATUS_2 BIT(4) +#define RX_INTR_STATUS_1 BIT(3) +#define TX_INTR_STATUS_3 BIT(2) +#define TX_INTR_STATUS_2 BIT(1) +#define TX_INTR_STATUS_1 BIT(0) + +/* INT_MSK register */ +#define RX_INTR_MASK_3 BIT(5) +#define RX_INTR_MASK_2 BIT(4) +#define RX_INTR_MASK_1
[PATCH V2 net-next 4/5] net: mdio: of: Register discovered MII time stampers.
When parsing a PHY node, register its time stamper, if any, and attach the instance to the PHY device. Signed-off-by: Richard Cochran --- drivers/net/phy/phy_device.c | 3 +++ drivers/of/of_mdio.c | 26 ++ 2 files changed, 29 insertions(+) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index a454432d166f..c24bce9b7270 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -833,6 +833,9 @@ EXPORT_SYMBOL(phy_device_register); */ void phy_device_remove(struct phy_device *phydev) { + if (phydev->mii_ts) + unregister_mii_timestamper(phydev->mii_ts); + device_del(>mdio.dev); /* Assert the reset signal */ diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c index f76c10ecc616..7699f167e4a9 100644 --- a/drivers/of/of_mdio.c +++ b/drivers/of/of_mdio.c @@ -44,14 +44,38 @@ static int of_get_phy_id(struct device_node *device, u32 *phy_id) return -EINVAL; } +struct mii_timestamper *of_find_mii_timestamper(struct device_node *node) +{ + struct of_phandle_args args; + unsigned int port = 0; + int err; + + err = of_parse_phandle_with_args(node, "timestamper", +"#phandle-cells", 0, ); + if (err == -ENOENT) + return NULL; + else if (err) + return ERR_PTR(err); + + if (args.args_count >= 1) + port = args.args[0]; + + return register_mii_timestamper(args.np, port); +} + static int of_mdiobus_register_phy(struct mii_bus *mdio, struct device_node *child, u32 addr) { + struct mii_timestamper *mii_ts; struct phy_device *phy; bool is_c45; int rc; u32 phy_id; + mii_ts = of_find_mii_timestamper(child); + if (IS_ERR(mii_ts)) + return PTR_ERR(mii_ts); + is_c45 = of_device_is_compatible(child, "ethernet-phy-ieee802.3-c45"); @@ -97,6 +121,8 @@ static int of_mdiobus_register_phy(struct mii_bus *mdio, return rc; } + phy->mii_ts = mii_ts; + dev_dbg(>dev, "registered phy %s at address %i\n", child->name, addr); return 0; -- 2.11.0
[PATCH V2 net-next 3/5] net: Add a layer for non-PHY MII time stamping drivers.
While PHY time stamping drivers can simply attach their interface directly to the PHY instance, stand alone drivers require support in order to manage their services. Non-PHY MII time stamping drivers have a control interface over another bus like I2C, SPI, UART, or via a memory mapped peripheral. The controller device will be associated with one or more time stamping channels, each of which sits snoops in on a MII bus. This patch provides a glue layer that will enable time stamping channels to find their controlling device. Signed-off-by: Richard Cochran --- drivers/net/phy/Makefile | 2 + drivers/net/phy/mii_timestamper.c | 121 ++ include/linux/mii_timestamper.h | 63 net/Kconfig | 7 ++- 4 files changed, 190 insertions(+), 3 deletions(-) create mode 100644 drivers/net/phy/mii_timestamper.c diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile index 5805c0b7d60e..584c7c6f40e7 100644 --- a/drivers/net/phy/Makefile +++ b/drivers/net/phy/Makefile @@ -40,6 +40,8 @@ obj-$(CONFIG_MDIO_SUN4I) += mdio-sun4i.o obj-$(CONFIG_MDIO_THUNDER) += mdio-thunder.o obj-$(CONFIG_MDIO_XGENE) += mdio-xgene.o +obj-$(CONFIG_NETWORK_PHY_TIMESTAMPING) += mii_timestamper.o + obj-$(CONFIG_SFP) += sfp.o sfp-obj-$(CONFIG_SFP) += sfp-bus.o obj-y += $(sfp-obj-y) $(sfp-obj-m) diff --git a/drivers/net/phy/mii_timestamper.c b/drivers/net/phy/mii_timestamper.c new file mode 100644 index ..51b77fc92475 --- /dev/null +++ b/drivers/net/phy/mii_timestamper.c @@ -0,0 +1,121 @@ +// SPDX-License-Identifier: GPL-2.0 +// +// Support for generic time stamping devices on MII buses. +// Copyright (C) 2018 Richard Cochran +// + +#include + +static LIST_HEAD(mii_timestamping_devices); +static DEFINE_MUTEX(tstamping_devices_lock); + +struct mii_timestamping_desc { + struct list_head list; + struct mii_timestamping_ctrl *ctrl; + struct device *device; +}; + +/** + * register_mii_tstamp_controller() - registers an MII time stamping device. + * + * @device:The device to be registered. + * @ctrl: Pointer to device's control interface. + * + * Returns zero on success or non-zero on failure. + */ +int register_mii_tstamp_controller(struct device *device, + struct mii_timestamping_ctrl *ctrl) +{ + struct mii_timestamping_desc *desc; + + desc = kzalloc(sizeof(*desc), GFP_KERNEL); + if (!desc) + return -ENOMEM; + + INIT_LIST_HEAD(>list); + desc->ctrl = ctrl; + desc->device = device; + + mutex_lock(_devices_lock); + list_add_tail(_timestamping_devices, >list); + mutex_unlock(_devices_lock); + + return 0; +} + +/** + * unregister_mii_tstamp_controller() - unregisters an MII time stamping device. + * + * @device:A device previously passed to register_mii_tstamp_controller(). + */ +void unregister_mii_tstamp_controller(struct device *device) +{ + struct mii_timestamping_desc *desc; + struct list_head *this, *next; + + mutex_lock(_devices_lock); + list_for_each_safe(this, next, _timestamping_devices) { + desc = list_entry(this, struct mii_timestamping_desc, list); + if (desc->device == device) { + list_del_init(>list); + kfree(desc); + break; + } + } + mutex_unlock(_devices_lock); +} + +/** + * register_mii_timestamper - Enables a given port of an MII time stamper. + * + * @node: The device tree node of the MII time stamp controller. + * @port: The index of the port to be enabled. + * + * Returns a valid interface on success or ERR_PTR otherwise. + */ +struct mii_timestamper *register_mii_timestamper(struct device_node *node, +unsigned int port) +{ + struct mii_timestamper *mii_ts = NULL; + struct mii_timestamping_desc *desc; + struct list_head *this; + + mutex_lock(_devices_lock); + list_for_each(this, _timestamping_devices) { + desc = list_entry(this, struct mii_timestamping_desc, list); + if (desc->device->of_node == node) { + mii_ts = desc->ctrl->probe_channel(desc->device, port); + if (mii_ts) { + mii_ts->device = desc->device; + get_device(desc->device); + } + break; + } + } + mutex_unlock(_devices_lock); + + return mii_ts ? mii_ts : ERR_PTR(-EPROBE_DEFER); +} + +/** + * unregister_mii_timestamper - Disables a given MII time stamper. + * + * @mii_ts:An interface obtained via register_mii_timestamper(). + * + */ +void unregister_mii_timestamper(struct mii_timestamper *mii_ts) +{ + struct
[PATCH V2 net-next 2/5] net: Introduce a new MII time stamping interface.
Currently the stack supports time stamping in PHY devices. However, there are newer, non-PHY devices that can snoop an MII bus and provide time stamps. In order to support such devices, this patch introduces a new interface to be used by both PHY and non-PHY devices. In addition, the one and only user of the old PHY time stamping API is converted to the new interface. Signed-off-by: Richard Cochran --- drivers/net/phy/dp83640.c | 47 + drivers/net/phy/phy.c | 4 ++-- drivers/net/phy/phy_device.c| 2 ++ include/linux/mii_timestamper.h | 52 + include/linux/phy.h | 25 ++-- net/8021q/vlan_dev.c| 4 ++-- net/core/ethtool.c | 4 ++-- net/core/timestamping.c | 20 8 files changed, 104 insertions(+), 54 deletions(-) create mode 100644 include/linux/mii_timestamper.h diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c index edd4d44a386d..2f895c9bbedb 100644 --- a/drivers/net/phy/dp83640.c +++ b/drivers/net/phy/dp83640.c @@ -111,6 +111,7 @@ struct dp83640_private { struct list_head list; struct dp83640_clock *clock; struct phy_device *phydev; + struct mii_timestamper mii_ts; struct delayed_work ts_work; int hwts_tx_en; int hwts_rx_en; @@ -214,6 +215,14 @@ static void dp83640_gpio_defaults(struct ptp_pin_desc *pd) static LIST_HEAD(phyter_clocks); static DEFINE_MUTEX(phyter_clocks_lock); +static int dp83640_hwtstamp(struct mii_timestamper *mii_ts, + struct ifreq *ifr); +static int dp83640_ts_info(struct mii_timestamper *mii_ts, + struct ethtool_ts_info *info); +static bool dp83640_rxtstamp(struct mii_timestamper *mii_ts, +struct sk_buff *skb, int type); +static void dp83640_txtstamp(struct mii_timestamper *mii_ts, +struct sk_buff *skb, int type); static void rx_timestamp_work(struct work_struct *work); /* extended register access functions */ @@ -1141,13 +1150,18 @@ static int dp83640_probe(struct phy_device *phydev) goto no_memory; dp83640->phydev = phydev; - INIT_DELAYED_WORK(>ts_work, rx_timestamp_work); + dp83640->mii_ts.rxtstamp = dp83640_rxtstamp; + dp83640->mii_ts.txtstamp = dp83640_txtstamp; + dp83640->mii_ts.hwtstamp = dp83640_hwtstamp; + dp83640->mii_ts.ts_info = dp83640_ts_info; + INIT_DELAYED_WORK(>ts_work, rx_timestamp_work); INIT_LIST_HEAD(>rxts); INIT_LIST_HEAD(>rxpool); for (i = 0; i < MAX_RXTS; i++) list_add(>rx_pool_data[i].list, >rxpool); + phydev->mii_ts = >mii_ts; phydev->priv = dp83640; spin_lock_init(>rx_lock); @@ -1188,6 +1202,8 @@ static void dp83640_remove(struct phy_device *phydev) if (phydev->mdio.addr == BROADCAST_ADDR) return; + phydev->mii_ts = NULL; + enable_status_frames(phydev, false); cancel_delayed_work_sync(>ts_work); @@ -1311,9 +1327,10 @@ static int dp83640_config_intr(struct phy_device *phydev) } } -static int dp83640_hwtstamp(struct phy_device *phydev, struct ifreq *ifr) +static int dp83640_hwtstamp(struct mii_timestamper *mii_ts, struct ifreq *ifr) { - struct dp83640_private *dp83640 = phydev->priv; + struct dp83640_private *dp83640 = + container_of(mii_ts, struct dp83640_private, mii_ts); struct hwtstamp_config cfg; u16 txcfg0, rxcfg0; @@ -1389,8 +1406,8 @@ static int dp83640_hwtstamp(struct phy_device *phydev, struct ifreq *ifr) mutex_lock(>clock->extreg_lock); - ext_write(0, phydev, PAGE5, PTP_TXCFG0, txcfg0); - ext_write(0, phydev, PAGE5, PTP_RXCFG0, rxcfg0); + ext_write(0, dp83640->phydev, PAGE5, PTP_TXCFG0, txcfg0); + ext_write(0, dp83640->phydev, PAGE5, PTP_RXCFG0, rxcfg0); mutex_unlock(>clock->extreg_lock); @@ -1420,10 +1437,11 @@ static void rx_timestamp_work(struct work_struct *work) schedule_delayed_work(>ts_work, SKB_TIMESTAMP_TIMEOUT); } -static bool dp83640_rxtstamp(struct phy_device *phydev, +static bool dp83640_rxtstamp(struct mii_timestamper *mii_ts, struct sk_buff *skb, int type) { - struct dp83640_private *dp83640 = phydev->priv; + struct dp83640_private *dp83640 = + container_of(mii_ts, struct dp83640_private, mii_ts); struct dp83640_skb_info *skb_info = (struct dp83640_skb_info *)skb->cb; struct list_head *this, *next; struct rxts *rxts; @@ -1469,10 +1487,11 @@ static bool dp83640_rxtstamp(struct phy_device *phydev, return true; } -static void dp83640_txtstamp(struct phy_device *phydev, +static void dp83640_txtstamp(struct mii_timestamper *mii_ts, struct sk_buff *skb, int
Re: [PATCH iproute2-next v2] tc: flower: expose hardware offload count
On 10/3/18 2:44 PM, Vlad Buslov wrote: > Recently flower classifier was updated to expose count of devices that > filter is offloaded to. Add support to print this counter as 'in_hw_count'. > > Signed-off-by: Vlad Buslov > Acked-by: Jiri Pirko > --- > Changes from V1 to V2: > - Change print format string to "%u" > > tc/f_flower.c | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) > applied to iproute2-next. Thanks
[PATCH net-next 08/11] net: sched: cls_u32: the tp_c argument of u32_set_parms() is always tp->data
From: Al Viro It must be tc_u_common associated with that tp (i.e. tp->data). Proof: * both ->ht_up and ->tp_c are assign-once * ->tp_c of anything inserted into tp_c->hlist is tp_c * hnodes never get reinserted into the lists or moved between those, so anything found by u32_lookup_ht(tp->data, ...) will have ->tp_c equal to tp->data. * tp->root->tp_c == tp->data. * ->ht_up of anything inserted into hnode->ht[...] is equal to hnode. * knodes never get reinserted into hash chains or moved between those, so anything returned by u32_lookup_key(ht, ...) will have ->ht_up equal to ht. * any knode returned by u32_get(tp, ...) will have ->ht_up->tp_c point to tp->data Signed-off-by: Al Viro Signed-off-by: Jamal Hadi Salim --- net/sched/cls_u32.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 53f34f8cde8b..3ed2c9866b36 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -956,8 +956,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, if (!new) return -ENOMEM; - err = u32_set_parms(net, tp, base, - rtnl_dereference(n->ht_up)->tp_c, new, tb, + err = u32_set_parms(net, tp, base, tp_c, new, tb, tca[TCA_RATE], ovr, extack); if (err) { @@ -1124,7 +1123,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, } #endif - err = u32_set_parms(net, tp, base, ht->tp_c, n, tb, tca[TCA_RATE], ovr, + err = u32_set_parms(net, tp, base, tp_c, n, tb, tca[TCA_RATE], ovr, extack); if (err == 0) { struct tc_u_knode __rcu **ins; -- 2.11.0
[PATCH net-next 00/11] net: sched: cls_u32 Various improvements
From: Jamal Hadi Salim Various improvements from Al. Al Viro (11): net: sched: cls_u32: disallow linking to root hnode net: sched: cls_u32: make sure that divisor is a power of 2 net: sched: cls_u32: get rid of unused argument of u32_destroy_key() net: sched: cls_u32: get rid of tc_u_knode ->tp net: sched: cls_u32: get rid of tc_u_common ->rcu net: sched: cls_u32: clean tc_u_common hashtable net: sched: cls_u32: pass tc_u_common to u32_set_parms() instead of tc_u_hnode net: sched: cls_u32: the tp_c argument of u32_set_parms() is always tp->data net: sched: cls_u32: keep track of knodes count in tc_u_common net: sched: cls_u32: simplify the hell out u32_delete() emptiness check net: sched: cls_u32: get rid of tp_c net/sched/cls_u32.c | 117 1 file changed, 35 insertions(+), 82 deletions(-) -- 2.11.0
[PATCH net-next 05/11] net: sched: cls_u32: get rid of tc_u_common ->rcu
From: Al Viro unused Signed-off-by: Al Viro Signed-off-by: Jamal Hadi Salim --- net/sched/cls_u32.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 810c49ac1bbe..c378168f4562 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -98,7 +98,6 @@ struct tc_u_common { int refcnt; struct idr handle_idr; struct hlist_node hnode; - struct rcu_head rcu; }; static inline unsigned int u32_hash_fold(__be32 key, -- 2.11.0
[PATCH net-next 04/11] net: sched: cls_u32: get rid of tc_u_knode ->tp
From: Al Viro not used anymore Signed-off-by: Al Viro Signed-off-by: Jamal Hadi Salim --- net/sched/cls_u32.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index ef0f2e6ec422..810c49ac1bbe 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -68,7 +68,6 @@ struct tc_u_knode { u32 mask; u32 __percpu*pcpu_success; #endif - struct tcf_proto*tp; struct rcu_work rwork; /* The 'sel' field MUST be the last field in structure to allow for * tc_u32_keys allocated at end of structure. @@ -896,7 +895,6 @@ static struct tc_u_knode *u32_init_knode(struct tcf_proto *tp, /* Similarly success statistics must be moved as pointers */ new->pcpu_success = n->pcpu_success; #endif - new->tp = tp; memcpy(>sel, s, sizeof(*s) + s->nkeys*sizeof(struct tc_u32_key)); if (tcf_exts_init(>exts, TCA_U32_ACT, TCA_U32_POLICE)) { @@ -1112,7 +1110,6 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, n->handle = handle; n->fshift = s->hmask ? ffs(ntohl(s->hmask)) - 1 : 0; n->flags = flags; - n->tp = tp; err = tcf_exts_init(>exts, TCA_U32_ACT, TCA_U32_POLICE); if (err < 0) -- 2.11.0
[PATCH net-next 09/11] net: sched: cls_u32: get rid of tp_c
From: Al Viro Both hnode ->tp_c and tp_c argument of u32_set_parms() the latter is redundant, the former - never read... Signed-off-by: Al Viro Signed-off-by: Jamal Hadi Salim --- net/sched/cls_u32.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 3ed2c9866b36..3d4c360f9b0c 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -79,7 +79,6 @@ struct tc_u_hnode { struct tc_u_hnode __rcu *next; u32 handle; u32 prio; - struct tc_u_common *tp_c; int refcnt; unsigned intdivisor; struct idr handle_idr; @@ -390,7 +389,6 @@ static int u32_init(struct tcf_proto *tp) tp_c->refcnt++; RCU_INIT_POINTER(root_ht->next, tp_c->hlist); rcu_assign_pointer(tp_c->hlist, root_ht); - root_ht->tp_c = tp_c; rcu_assign_pointer(tp->root, root_ht); tp->data = tp_c; @@ -761,7 +759,7 @@ static const struct nla_policy u32_policy[TCA_U32_MAX + 1] = { }; static int u32_set_parms(struct net *net, struct tcf_proto *tp, -unsigned long base, struct tc_u_common *tp_c, +unsigned long base, struct tc_u_knode *n, struct nlattr **tb, struct nlattr *est, bool ovr, struct netlink_ext_ack *extack) @@ -782,7 +780,7 @@ static int u32_set_parms(struct net *net, struct tcf_proto *tp, } if (handle) { - ht_down = u32_lookup_ht(tp_c, handle); + ht_down = u32_lookup_ht(tp->data, handle); if (!ht_down) { NL_SET_ERR_MSG_MOD(extack, "Link hash table not found"); @@ -956,7 +954,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, if (!new) return -ENOMEM; - err = u32_set_parms(net, tp, base, tp_c, new, tb, + err = u32_set_parms(net, tp, base, new, tb, tca[TCA_RATE], ovr, extack); if (err) { @@ -1012,7 +1010,6 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, return err; } } - ht->tp_c = tp_c; ht->refcnt = 1; ht->divisor = divisor; ht->handle = handle; @@ -1123,7 +1120,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, } #endif - err = u32_set_parms(net, tp, base, tp_c, n, tb, tca[TCA_RATE], ovr, + err = u32_set_parms(net, tp, base, n, tb, tca[TCA_RATE], ovr, extack); if (err == 0) { struct tc_u_knode __rcu **ins; -- 2.11.0
[PATCH net-next 01/11] net: sched: cls_u32: disallow linking to root hnode
From: Al Viro Operation makes no sense. Nothing will actually break if we do so (depth limit in u32_classify() will prevent infinite loops), but according to maintainers it's best prohibited outright. NOTE: doing so guarantees that u32_destroy() will trigger the call of u32_destroy_hnode(); we might want to make that unconditional. Test: tc qdisc add dev eth0 ingress tc filter add dev eth0 parent : protocol ip prio 100 u32 \ link 800: offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff should fail with Error: cls_u32: Not linking to root node Signed-off-by: Al Viro Signed-off-by: Jamal Hadi Salim --- net/sched/cls_u32.c | 4 1 file changed, 4 insertions(+) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 622f4657da94..3357331a80a2 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -797,6 +797,10 @@ static int u32_set_parms(struct net *net, struct tcf_proto *tp, NL_SET_ERR_MSG_MOD(extack, "Link hash table not found"); return -EINVAL; } + if (ht_down->is_root) { + NL_SET_ERR_MSG_MOD(extack, "Not linking to root node"); + return -EINVAL; + } ht_down->refcnt++; } -- 2.11.0
[PATCH net-next 06/11] net: sched: cls_u32: clean tc_u_common hashtable
From: Al Viro * calculate key *once*, not for each hash chain element * let tc_u_hash() return the pointer to chain head rather than index - callers are cleaner that way. Signed-off-by: Al Viro Signed-off-by: Jamal Hadi Salim --- net/sched/cls_u32.c | 24 +--- 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index c378168f4562..3f6fba831c57 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -343,19 +343,16 @@ static void *tc_u_common_ptr(const struct tcf_proto *tp) return block->q; } -static unsigned int tc_u_hash(const struct tcf_proto *tp) +static struct hlist_head *tc_u_hash(void *key) { - return hash_ptr(tc_u_common_ptr(tp), U32_HASH_SHIFT); + return tc_u_common_hash + hash_ptr(key, U32_HASH_SHIFT); } -static struct tc_u_common *tc_u_common_find(const struct tcf_proto *tp) +static struct tc_u_common *tc_u_common_find(void *key) { struct tc_u_common *tc; - unsigned int h; - - h = tc_u_hash(tp); - hlist_for_each_entry(tc, _u_common_hash[h], hnode) { - if (tc->ptr == tc_u_common_ptr(tp)) + hlist_for_each_entry(tc, tc_u_hash(key), hnode) { + if (tc->ptr == key) return tc; } return NULL; @@ -364,10 +361,8 @@ static struct tc_u_common *tc_u_common_find(const struct tcf_proto *tp) static int u32_init(struct tcf_proto *tp) { struct tc_u_hnode *root_ht; - struct tc_u_common *tp_c; - unsigned int h; - - tp_c = tc_u_common_find(tp); + void *key = tc_u_common_ptr(tp); + struct tc_u_common *tp_c = tc_u_common_find(key); root_ht = kzalloc(sizeof(*root_ht), GFP_KERNEL); if (root_ht == NULL) @@ -385,12 +380,11 @@ static int u32_init(struct tcf_proto *tp) kfree(root_ht); return -ENOBUFS; } - tp_c->ptr = tc_u_common_ptr(tp); + tp_c->ptr = key; INIT_HLIST_NODE(_c->hnode); idr_init(_c->handle_idr); - h = tc_u_hash(tp); - hlist_add_head(_c->hnode, _u_common_hash[h]); + hlist_add_head(_c->hnode, tc_u_hash(key)); } tp_c->refcnt++; -- 2.11.0
[PATCH net-next 07/11] net: sched: cls_u32: pass tc_u_common to u32_set_parms() instead of tc_u_hnode
From: Al Viro the only thing we used ht for was ht->tp_c and callers can get that without going through ->tp_c at all; start with lifting that into the callers, next commits will massage those, eventually removing ->tp_c altogether. Signed-off-by: Al Viro Signed-off-by: Jamal Hadi Salim --- net/sched/cls_u32.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 3f6fba831c57..53f34f8cde8b 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -761,7 +761,7 @@ static const struct nla_policy u32_policy[TCA_U32_MAX + 1] = { }; static int u32_set_parms(struct net *net, struct tcf_proto *tp, -unsigned long base, struct tc_u_hnode *ht, +unsigned long base, struct tc_u_common *tp_c, struct tc_u_knode *n, struct nlattr **tb, struct nlattr *est, bool ovr, struct netlink_ext_ack *extack) @@ -782,7 +782,7 @@ static int u32_set_parms(struct net *net, struct tcf_proto *tp, } if (handle) { - ht_down = u32_lookup_ht(ht->tp_c, handle); + ht_down = u32_lookup_ht(tp_c, handle); if (!ht_down) { NL_SET_ERR_MSG_MOD(extack, "Link hash table not found"); @@ -957,7 +957,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, return -ENOMEM; err = u32_set_parms(net, tp, base, - rtnl_dereference(n->ht_up), new, tb, + rtnl_dereference(n->ht_up)->tp_c, new, tb, tca[TCA_RATE], ovr, extack); if (err) { @@ -1124,7 +1124,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, } #endif - err = u32_set_parms(net, tp, base, ht, n, tb, tca[TCA_RATE], ovr, + err = u32_set_parms(net, tp, base, ht->tp_c, n, tb, tca[TCA_RATE], ovr, extack); if (err == 0) { struct tc_u_knode __rcu **ins; -- 2.11.0
[PATCH net-next 10/11] net: sched: cls_u32: keep track of knodes count in tc_u_common
From: Al Viro allows to simplify u32_delete() considerably Signed-off-by: Al Viro Signed-off-by: Jamal Hadi Salim --- net/sched/cls_u32.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 3d4c360f9b0c..61593bee08db 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -97,6 +97,7 @@ struct tc_u_common { int refcnt; struct idr handle_idr; struct hlist_node hnode; + longknodes; }; static inline unsigned int u32_hash_fold(__be32 key, @@ -452,6 +453,7 @@ static void u32_delete_key_freepf_work(struct work_struct *work) static int u32_delete_key(struct tcf_proto *tp, struct tc_u_knode *key) { + struct tc_u_common *tp_c = tp->data; struct tc_u_knode __rcu **kp; struct tc_u_knode *pkp; struct tc_u_hnode *ht = rtnl_dereference(key->ht_up); @@ -462,6 +464,7 @@ static int u32_delete_key(struct tcf_proto *tp, struct tc_u_knode *key) kp = >next, pkp = rtnl_dereference(*kp)) { if (pkp == key) { RCU_INIT_POINTER(*kp, key->next); + tp_c->knodes--; tcf_unbind_filter(tp, >res); idr_remove(>handle_idr, key->handle); @@ -576,6 +579,7 @@ static int u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n, static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht, struct netlink_ext_ack *extack) { + struct tc_u_common *tp_c = tp->data; struct tc_u_knode *n; unsigned int h; @@ -583,6 +587,7 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht, while ((n = rtnl_dereference(ht->ht[h])) != NULL) { RCU_INIT_POINTER(ht->ht[h], rtnl_dereference(n->next)); + tp_c->knodes--; tcf_unbind_filter(tp, >res); u32_remove_hw_knode(tp, n, extack); idr_remove(>handle_idr, n->handle); @@ -1141,6 +1146,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, RCU_INIT_POINTER(n->next, pins); rcu_assign_pointer(*ins, n); + tp_c->knodes++; *arg = n; return 0; } -- 2.11.0
[PATCH net-next 02/11] net: sched: cls_u32: make sure that divisor is a power of 2
From: Al Viro Tested by modifying iproute2 to to allow sending a divisor > 255 Tested-by: Jamal Hadi Salim Signed-off-by: Al Viro Signed-off-by: Jamal Hadi Salim --- net/sched/cls_u32.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 3357331a80a2..ce55eea448a0 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -994,7 +994,11 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, if (tb[TCA_U32_DIVISOR]) { unsigned int divisor = nla_get_u32(tb[TCA_U32_DIVISOR]); - if (--divisor > 0x100) { + if (!is_power_of_2(divisor)) { + NL_SET_ERR_MSG_MOD(extack, "Divisor is not a power of 2"); + return -EINVAL; + } + if (divisor-- > 0x100) { NL_SET_ERR_MSG_MOD(extack, "Exceeded maximum 256 hash buckets"); return -EINVAL; } -- 2.11.0
[PATCH net-next 11/11] net: sched: cls_u32: simplify the hell out u32_delete() emptiness check
From: Al Viro Now that we have the knode count, we can instantly check if any hnodes are non-empty. And that kills the check for extra references to root hnode - those could happen only if there was a knode to carry such a link. Signed-off-by: Al Viro Signed-off-by: Jamal Hadi Salim --- net/sched/cls_u32.c | 48 +--- 1 file changed, 1 insertion(+), 47 deletions(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 61593bee08db..ac79a40a0392 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -627,17 +627,6 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht, return -ENOENT; } -static bool ht_empty(struct tc_u_hnode *ht) -{ - unsigned int h; - - for (h = 0; h <= ht->divisor; h++) - if (rcu_access_pointer(ht->ht[h])) - return false; - - return true; -} - static void u32_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack) { struct tc_u_common *tp_c = tp->data; @@ -675,13 +664,9 @@ static int u32_delete(struct tcf_proto *tp, void *arg, bool *last, struct netlink_ext_ack *extack) { struct tc_u_hnode *ht = arg; - struct tc_u_hnode *root_ht = rtnl_dereference(tp->root); struct tc_u_common *tp_c = tp->data; int ret = 0; - if (ht == NULL) - goto out; - if (TC_U32_KEY(ht->handle)) { u32_remove_hw_knode(tp, (struct tc_u_knode *)ht, extack); ret = u32_delete_key(tp, (struct tc_u_knode *)ht); @@ -702,38 +687,7 @@ static int u32_delete(struct tcf_proto *tp, void *arg, bool *last, } out: - *last = true; - if (root_ht) { - if (root_ht->refcnt > 1) { - *last = false; - goto ret; - } - if (root_ht->refcnt == 1) { - if (!ht_empty(root_ht)) { - *last = false; - goto ret; - } - } - } - - if (tp_c->refcnt > 1) { - *last = false; - goto ret; - } - - if (tp_c->refcnt == 1) { - struct tc_u_hnode *ht; - - for (ht = rtnl_dereference(tp_c->hlist); -ht; -ht = rtnl_dereference(ht->next)) - if (!ht_empty(ht)) { - *last = false; - break; - } - } - -ret: + *last = tp_c->refcnt == 1 && tp_c->knodes == 0; return ret; } -- 2.11.0
[PATCH net-next 03/11] net: sched: cls_u32: get rid of unused argument of u32_destroy_key()
From: Al Viro Signed-off-by: Al Viro Signed-off-by: Jamal Hadi Salim --- net/sched/cls_u32.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index ce55eea448a0..ef0f2e6ec422 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -405,8 +405,7 @@ static int u32_init(struct tcf_proto *tp) return 0; } -static int u32_destroy_key(struct tcf_proto *tp, struct tc_u_knode *n, - bool free_pf) +static int u32_destroy_key(struct tc_u_knode *n, bool free_pf) { struct tc_u_hnode *ht = rtnl_dereference(n->ht_down); @@ -440,7 +439,7 @@ static void u32_delete_key_work(struct work_struct *work) struct tc_u_knode, rwork); rtnl_lock(); - u32_destroy_key(key->tp, key, false); + u32_destroy_key(key, false); rtnl_unlock(); } @@ -457,7 +456,7 @@ static void u32_delete_key_freepf_work(struct work_struct *work) struct tc_u_knode, rwork); rtnl_lock(); - u32_destroy_key(key->tp, key, true); + u32_destroy_key(key, true); rtnl_unlock(); } @@ -600,7 +599,7 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht, if (tcf_exts_get_net(>exts)) tcf_queue_work(>rwork, u32_delete_key_freepf_work); else - u32_destroy_key(n->tp, n, true); + u32_destroy_key(n, true); } } } @@ -971,13 +970,13 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, tca[TCA_RATE], ovr, extack); if (err) { - u32_destroy_key(tp, new, false); + u32_destroy_key(new, false); return err; } err = u32_replace_hw_knode(tp, new, flags, extack); if (err) { - u32_destroy_key(tp, new, false); + u32_destroy_key(new, false); return err; } -- 2.11.0
Re: [PATCH bpf-next] bpf: emit audit messages upon successful prog load and unload
On Sat, 6 Oct 2018 00:05:22 +0200 Jiri Olsa wrote: > On Fri, Oct 05, 2018 at 11:44:35AM -0700, Alexei Starovoitov wrote: > > On Fri, Oct 05, 2018 at 08:14:09AM +0200, Jiri Olsa wrote: > > > On Thu, Oct 04, 2018 at 03:10:15PM -0700, Alexei Starovoitov wrote: > > > > On Thu, Oct 04, 2018 at 10:22:31PM +0200, Jesper Dangaard Brouer wrote: > > > > > > > > > On Thu, 4 Oct 2018 21:41:17 +0200 Daniel Borkmann > > > > > wrote: > > > > > > > > > > > On 10/04/2018 08:39 PM, Jesper Dangaard Brouer wrote: > > > > > > > On Thu, 4 Oct 2018 10:11:43 -0700 Alexei Starovoitov > > > > > > > wrote: > > > > > > >> On Thu, Oct 04, 2018 at 03:50:38PM +0200, Daniel Borkmann wrote: > > > > > > >> > > > > > [...] > > > > > > >> > > > > > > >> If the purpose of the patch is to give user space visibility into > > > > > > >> bpf prog load/unload as a notification, then I completely agree > > > > > > >> that > > > > > > >> some notification mechanism is necessary. > > > > > > > > > > > > Yeah, I did only regard it as only that, nothing more. Some means > > > > > > of timeline and notification that can be kept in a record in user > > > > > > space and later retrieved e.g. for introspection on what has been > > > > > > loaded. > > > > > > > > > > > > >> I've started working on such mechanism via perf ring buffer > > > > > > >> which is > > > > > > >> the fastest mechanism we have in the kernel so far. > > > > > > >> See long discussion here: > > > > > > >> https://patchwork.ozlabs.org/patch/971970/ > > > [...] > > > > > > > > > > > > That one is definitely needed in any case to resolve the kallsyms > > > > > > limitations, and it does have overlap in that in either case we > > > > > > want to look at past BPF programs that have been unloaded in the > > > > > > meantime, so I don't have a strong preference either way, and the > > > > > > former is needed in any case. Though thought was that audit might > > > > > > be an option for those not running profiling daemons 24/7, but > > > > > > presumably bpftool could be extended to record these events as > > > > > > well if we don't want to reuse audit infra. > > > > > > > > > > Yes, exactly, I don't want to run a profiling daemon 24/7 to record > > > > > these events. I do acknowledge that this perf event is relevant, > > > > > especially for catching the kernel symbols (I need that myself), but > > > > > it > > > > > does not cover my use-case. > > > > > > > > > > My use-case is to 24/7 collect and keep records in userspace, and > > > > > have a > > > > > timeline of these notifications, for later retrieval. The idea is > > > > > that > > > > > our support engineers can look at these records when troubleshooting > > > > > the system. And the plan is also to collect these records as part of > > > > > our sosreport tool, which is part of the support case. > > > > > > > > I don't think you're implying that prog load/unload should be spamming > > > > dmesg > > > > and auditd not even running... > > > > > > I think the problem Jesper implied is that in order to collect > > > those logs you'll need perf tool running all the time.. which > > > it's not equipped for yet > > > > I'm not proposing to run 'perf' binary all the time. > > Setting up perf ring buffer just for these new bpf prog load/unload events > > and epolling it is simple enough to do from any application including > > auditd. > > selftests/bpf/ do it for bpf output events. > > ok, did not think about the possibility to teach auditd talk to perf, > time to get that tool evsel/evlist/rb library ready ;-) Interesting, I also didn't consider teaching auditd to gets its 'bpf' events from a separate perf ring-buffer, that might work. I do wonder how the audit people will take this suggestion. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
[PATCH v8 15/15] MAINTAINERS: Add entry for Marvell OcteonTX2 Admin Function driver
From: Sunil Goutham Added maintainers entry for Marvell OcteonTX2 SOC's RVU admin function driver. Signed-off-by: Sunil Goutham --- MAINTAINERS | 9 + 1 file changed, 9 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index bb5f431..bc76b03 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8844,6 +8844,15 @@ S: Supported F: drivers/mmc/host/sdhci-xenon* F: Documentation/devicetree/bindings/mmc/marvell,xenon-sdhci.txt +MARVELL OCTEONTX2 RVU ADMIN FUNCTION DRIVER +M: Sunil Goutham +M: Linu Cherian +M: Geetha sowjanya +M: Jerin Jacob +L: netdev@vger.kernel.org +S: Supported +F: drivers/net/ethernet/marvell/octeontx2/af/ + MATROX FRAMEBUFFER DRIVER L: linux-fb...@vger.kernel.org S: Orphan -- 2.7.4
[PATCH v8 13/15] octeontx2-af: Add support for CGX link management
From: Linu Cherian CGX LMAC initialization, link status polling etc is done by low level secure firmware. For link management this patch adds a interface or communication mechanism between firmware and this kernel CGX driver. - Firmware interface specification is defined in cgx_fw_if.h. - Support to send/receive commands/events to/form firmware. - events/commands implemented * link up * link down * reading firmware version Signed-off-by: Linu Cherian Signed-off-by: Nithya Mani Signed-off-by: Sunil Goutham --- drivers/net/ethernet/marvell/octeontx2/af/cgx.c| 357 - drivers/net/ethernet/marvell/octeontx2/af/cgx.h| 32 ++ .../net/ethernet/marvell/octeontx2/af/cgx_fw_if.h | 186 +++ .../net/ethernet/marvell/octeontx2/af/rvu_cgx.c| 97 ++ 4 files changed, 668 insertions(+), 4 deletions(-) create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/cgx_fw_if.h create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c index 6ecae80..f290b1d 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c @@ -24,16 +24,43 @@ #define DRV_NAME "octeontx2-cgx" #define DRV_STRING "Marvell OcteonTX2 CGX/MAC Driver" +/** + * struct lmac + * @wq_cmd_cmplt: waitq to keep the process blocked until cmd completion + * @cmd_lock: Lock to serialize the command interface + * @resp: command response + * @event_cb: callback for linkchange events + * @cmd_pend: flag set before new command is started + * flag cleared after command response is received + * @cgx: parent cgx port + * @lmac_id: lmac port id + * @name: lmac port name + */ +struct lmac { + wait_queue_head_t wq_cmd_cmplt; + struct mutex cmd_lock; + u64 resp; + struct cgx_event_cb event_cb; + bool cmd_pend; + struct cgx *cgx; + u8 lmac_id; + char *name; +}; + struct cgx { void __iomem*reg_base; struct pci_dev *pdev; u8 cgx_id; u8 lmac_count; + struct lmac *lmac_idmap[MAX_LMAC_PER_CGX]; struct list_headcgx_list; }; static LIST_HEAD(cgx_list); +/* CGX PHY management internal APIs */ +static int cgx_fwi_link_change(struct cgx *cgx, int lmac_id, bool en); + /* Supported devices */ static const struct pci_device_id cgx_id_table[] = { { PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_OCTEONTX2_CGX) }, @@ -42,11 +69,24 @@ static const struct pci_device_id cgx_id_table[] = { MODULE_DEVICE_TABLE(pci, cgx_id_table); +static void cgx_write(struct cgx *cgx, u64 lmac, u64 offset, u64 val) +{ + writeq(val, cgx->reg_base + (lmac << 18) + offset); +} + static u64 cgx_read(struct cgx *cgx, u64 lmac, u64 offset) { return readq(cgx->reg_base + (lmac << 18) + offset); } +static inline struct lmac *lmac_pdata(u8 lmac_id, struct cgx *cgx) +{ + if (!cgx || lmac_id >= MAX_LMAC_PER_CGX) + return NULL; + + return cgx->lmac_idmap[lmac_id]; +} + int cgx_get_cgx_cnt(void) { struct cgx *cgx_dev; @@ -82,18 +122,312 @@ void *cgx_get_pdata(int cgx_id) } EXPORT_SYMBOL(cgx_get_pdata); -static void cgx_lmac_init(struct cgx *cgx) +/* CGX Firmware interface low level support */ +static int cgx_fwi_cmd_send(u64 req, u64 *resp, struct lmac *lmac) +{ + struct cgx *cgx = lmac->cgx; + struct device *dev; + int err = 0; + u64 cmd; + + /* Ensure no other command is in progress */ + err = mutex_lock_interruptible(>cmd_lock); + if (err) + return err; + + /* Ensure command register is free */ + cmd = cgx_read(cgx, lmac->lmac_id, CGX_COMMAND_REG); + if (FIELD_GET(CMDREG_OWN, cmd) != CGX_CMD_OWN_NS) { + err = -EBUSY; + goto unlock; + } + + /* Update ownership in command request */ + req = FIELD_SET(CMDREG_OWN, CGX_CMD_OWN_FIRMWARE, req); + + /* Mark this lmac as pending, before we start */ + lmac->cmd_pend = true; + + /* Start command in hardware */ + cgx_write(cgx, lmac->lmac_id, CGX_COMMAND_REG, req); + + /* Ensure command is completed without errors */ + if (!wait_event_timeout(lmac->wq_cmd_cmplt, !lmac->cmd_pend, + msecs_to_jiffies(CGX_CMD_TIMEOUT))) { + dev = >pdev->dev; + dev_err(dev, "cgx port %d:%d cmd timeout\n", + cgx->cgx_id, lmac->lmac_id); + err = -EIO; + goto unlock; + } + + /* we have a valid command response */ + smp_rmb(); /* Ensure the latest updates are visible */ + *resp = lmac->resp; + +unlock: +
[PATCH v8 11/15] octeontx2-af: Add Marvell OcteonTX2 CGX driver
From: Sunil Goutham This patch adds basic template for Marvell OcteonTX2's CGX ethernet interface driver. Just the probe. RVU AF driver will use APIs exported by this driver for various things like PF to physical interface mapping, loopback mode, interface stats etc. Hence marged both drivers into a single module. Signed-off-by: Sunil Goutham --- drivers/net/ethernet/marvell/octeontx2/af/Makefile | 2 +- drivers/net/ethernet/marvell/octeontx2/af/cgx.c| 97 ++ drivers/net/ethernet/marvell/octeontx2/af/cgx.h| 22 + drivers/net/ethernet/marvell/octeontx2/af/rvu.c| 14 +++- 4 files changed, 133 insertions(+), 2 deletions(-) create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/cgx.c create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/cgx.h diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile b/drivers/net/ethernet/marvell/octeontx2/af/Makefile index ac17cb9..8646421 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/Makefile +++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile @@ -7,4 +7,4 @@ obj-$(CONFIG_OCTEONTX2_MBOX) += octeontx2_mbox.o obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o octeontx2_mbox-y := mbox.o -octeontx2_af-y := rvu.o +octeontx2_af-y := cgx.o rvu.o diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c new file mode 100644 index 000..c41d23f --- /dev/null +++ b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c @@ -0,0 +1,97 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Marvell OcteonTx2 CGX driver + * + * Copyright (C) 2018 Marvell International Ltd. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "cgx.h" + +#define DRV_NAME "octeontx2-cgx" +#define DRV_STRING "Marvell OcteonTX2 CGX/MAC Driver" + +struct cgx { + void __iomem*reg_base; + struct pci_dev *pdev; + u8 cgx_id; +}; + +/* Supported devices */ +static const struct pci_device_id cgx_id_table[] = { + { PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_OCTEONTX2_CGX) }, + { 0, } /* end of table */ +}; + +MODULE_DEVICE_TABLE(pci, cgx_id_table); + +static int cgx_probe(struct pci_dev *pdev, const struct pci_device_id *id) +{ + struct device *dev = >dev; + struct cgx *cgx; + int err; + + cgx = devm_kzalloc(dev, sizeof(*cgx), GFP_KERNEL); + if (!cgx) + return -ENOMEM; + cgx->pdev = pdev; + + pci_set_drvdata(pdev, cgx); + + err = pci_enable_device(pdev); + if (err) { + dev_err(dev, "Failed to enable PCI device\n"); + pci_set_drvdata(pdev, NULL); + return err; + } + + err = pci_request_regions(pdev, DRV_NAME); + if (err) { + dev_err(dev, "PCI request regions failed 0x%x\n", err); + goto err_disable_device; + } + + /* MAP configuration registers */ + cgx->reg_base = pcim_iomap(pdev, PCI_CFG_REG_BAR_NUM, 0); + if (!cgx->reg_base) { + dev_err(dev, "CGX: Cannot map CSR memory space, aborting\n"); + err = -ENOMEM; + goto err_release_regions; + } + + return 0; + +err_release_regions: + pci_release_regions(pdev); +err_disable_device: + pci_disable_device(pdev); + pci_set_drvdata(pdev, NULL); + return err; +} + +static void cgx_remove(struct pci_dev *pdev) +{ + pci_release_regions(pdev); + pci_disable_device(pdev); + pci_set_drvdata(pdev, NULL); +} + +struct pci_driver cgx_driver = { + .name = DRV_NAME, + .id_table = cgx_id_table, + .probe = cgx_probe, + .remove = cgx_remove, +}; diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cgx.h b/drivers/net/ethernet/marvell/octeontx2/af/cgx.h new file mode 100644 index 000..a7d4b39 --- /dev/null +++ b/drivers/net/ethernet/marvell/octeontx2/af/cgx.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: GPL-2.0 + * Marvell OcteonTx2 CGX driver + * + * Copyright (C) 2018 Marvell International Ltd. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#ifndef CGX_H +#define CGX_H + + /* PCI device IDs */ +#definePCI_DEVID_OCTEONTX2_CGX 0xA059 + +/* PCI BAR nos */ +#define PCI_CFG_REG_BAR_NUM0 + +extern struct pci_driver cgx_driver; + +#endif /* CGX_H */ diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c index e0c3c18..4927f6b 100644 ---
[PATCH v8 07/15] octeontx2-af: Scan blocks for LFs provisioned to PF/VF
From: Sunil Goutham Scan all RVU blocks to find any 'LF to RVU PF/VF' mapping done by low level firmware. If found any, mark them as used in respective block's LF bitmap and also save mapped PF/VF's PF_FUNC info. This is done to avoid reattaching a block LF to a different RVU PF/VF. Signed-off-by: Sunil Goutham --- drivers/net/ethernet/marvell/octeontx2/af/rvu.c| 148 - drivers/net/ethernet/marvell/octeontx2/af/rvu.h| 16 +++ .../net/ethernet/marvell/octeontx2/af/rvu_struct.h | 16 +++ 3 files changed, 178 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c index d75ce45..53e02b0 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c @@ -22,6 +22,8 @@ #define DRV_STRING "Marvell OcteonTX2 RVU Admin Function Driver" #define DRV_VERSION"1.0" +static int rvu_get_hwvf(struct rvu *rvu, int pcifunc); + /* Supported devices */ static const struct pci_device_id rvu_id_table[] = { { PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_OCTEONTX2_RVU_AF) }, @@ -66,6 +68,91 @@ int rvu_alloc_bitmap(struct rsrc_bmap *rsrc) return 0; } +static void rvu_update_rsrc_map(struct rvu *rvu, struct rvu_pfvf *pfvf, + struct rvu_block *block, u16 pcifunc, + u16 lf, bool attach) +{ + int devnum, num_lfs = 0; + bool is_pf; + u64 reg; + + if (lf >= block->lf.max) { + dev_err(>pdev->dev, + "%s: FATAL: LF %d is >= %s's max lfs i.e %d\n", + __func__, lf, block->name, block->lf.max); + return; + } + + /* Check if this is for a RVU PF or VF */ + if (pcifunc & RVU_PFVF_FUNC_MASK) { + is_pf = false; + devnum = rvu_get_hwvf(rvu, pcifunc); + } else { + is_pf = true; + devnum = rvu_get_pf(pcifunc); + } + + block->fn_map[lf] = attach ? pcifunc : 0; + + switch (block->type) { + case BLKTYPE_NPA: + pfvf->npalf = attach ? true : false; + num_lfs = pfvf->npalf; + break; + case BLKTYPE_NIX: + pfvf->nixlf = attach ? true : false; + num_lfs = pfvf->nixlf; + break; + case BLKTYPE_SSO: + attach ? pfvf->sso++ : pfvf->sso--; + num_lfs = pfvf->sso; + break; + case BLKTYPE_SSOW: + attach ? pfvf->ssow++ : pfvf->ssow--; + num_lfs = pfvf->ssow; + break; + case BLKTYPE_TIM: + attach ? pfvf->timlfs++ : pfvf->timlfs--; + num_lfs = pfvf->timlfs; + break; + case BLKTYPE_CPT: + attach ? pfvf->cptlfs++ : pfvf->cptlfs--; + num_lfs = pfvf->cptlfs; + break; + } + + reg = is_pf ? block->pf_lfcnt_reg : block->vf_lfcnt_reg; + rvu_write64(rvu, BLKADDR_RVUM, reg | (devnum << 16), num_lfs); +} + +inline int rvu_get_pf(u16 pcifunc) +{ + return (pcifunc >> RVU_PFVF_PF_SHIFT) & RVU_PFVF_PF_MASK; +} + +static int rvu_get_hwvf(struct rvu *rvu, int pcifunc) +{ + int pf, func; + u64 cfg; + + pf = rvu_get_pf(pcifunc); + func = pcifunc & RVU_PFVF_FUNC_MASK; + + /* Get first HWVF attached to this PF */ + cfg = rvu_read64(rvu, BLKADDR_RVUM, RVU_PRIV_PFX_CFG(pf)); + + return ((cfg & 0xFFF) + func - 1); +} + +struct rvu_pfvf *rvu_get_pfvf(struct rvu *rvu, int pcifunc) +{ + /* Check if it is a PF or VF */ + if (pcifunc & RVU_PFVF_FUNC_MASK) + return >hwvf[rvu_get_hwvf(rvu, pcifunc)]; + else + return >pf[rvu_get_pf(pcifunc)]; +} + static void rvu_check_block_implemented(struct rvu *rvu) { struct rvu_hwinfo *hw = rvu->hw; @@ -107,6 +194,28 @@ static void rvu_reset_all_blocks(struct rvu *rvu) rvu_block_reset(rvu, BLKADDR_NDC2, NDC_AF_BLK_RST); } +static void rvu_scan_block(struct rvu *rvu, struct rvu_block *block) +{ + struct rvu_pfvf *pfvf; + u64 cfg; + int lf; + + for (lf = 0; lf < block->lf.max; lf++) { + cfg = rvu_read64(rvu, block->addr, +block->lfcfg_reg | (lf << block->lfshift)); + if (!(cfg & BIT_ULL(63))) + continue; + + /* Set this resource as being used */ + __set_bit(lf, block->lf.bmap); + + /* Get, to whom this LF is attached */ + pfvf = rvu_get_pfvf(rvu, (cfg >> 8) & 0x); + rvu_update_rsrc_map(rvu, pfvf, block, + (cfg >> 8) & 0x, lf, true); + } +} + static void rvu_free_hw_resources(struct rvu *rvu) { struct rvu_hwinfo *hw = rvu->hw; @@ -124,7 +233,7 @@ static int
[PATCH v8 09/15] octeontx2-af: Configure block LF's MSIX vector offset
From: Sunil Goutham Firmware configures a certain number of MSIX vectors to each of enabled RVU PF/VF. When a block LF is attached to a PF/VF, number of MSIX vectors needed by that LF are set aside (out of PF/VF's total MSIX vectors) and LF's msix_offset is configured in HW. Also added support for a RVU PF/VF to retrieve that block LF's MSIX vector offset information from AF via mbox. Signed-off-by: Sunil Goutham --- drivers/net/ethernet/marvell/octeontx2/af/mbox.h | 18 ++ drivers/net/ethernet/marvell/octeontx2/af/rvu.c| 333 - drivers/net/ethernet/marvell/octeontx2/af/rvu.h| 7 + .../net/ethernet/marvell/octeontx2/af/rvu_struct.h | 2 + 4 files changed, 357 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h index 7280d49..bedf0ee 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h @@ -122,6 +122,7 @@ static inline struct mbox_msghdr *otx2_mbox_alloc_msg(struct otx2_mbox *mbox, M(READY, 0x001, msg_req, ready_msg_rsp) \ M(ATTACH_RESOURCES,0x002, rsrc_attach, msg_rsp)\ M(DETACH_RESOURCES,0x003, rsrc_detach, msg_rsp)\ +M(MSIX_OFFSET, 0x004, msg_req, msix_offset_rsp)\ /* CGX mbox IDs (range 0x200 - 0x3FF) */ \ /* NPA mbox IDs (range 0x400 - 0x5FF) */ \ /* SSO/SSOW mbox IDs (range 0x600 - 0x7FF) */ \ @@ -190,4 +191,21 @@ struct rsrc_detach { u8 cptlfs:1; }; +#define MSIX_VECTOR_INVALID0x +#define MAX_RVU_BLKLF_CNT 256 + +struct msix_offset_rsp { + struct mbox_msghdr hdr; + u16 npa_msixoff; + u16 nix_msixoff; + u8 sso; + u8 ssow; + u8 timlfs; + u8 cptlfs; + u16 sso_msixoff[MAX_RVU_BLKLF_CNT]; + u16 ssow_msixoff[MAX_RVU_BLKLF_CNT]; + u16 timlf_msixoff[MAX_RVU_BLKLF_CNT]; + u16 cptlf_msixoff[MAX_RVU_BLKLF_CNT]; +}; + #endif /* MBOX_H */ diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c index ef3f559..e4b8ed2 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c @@ -24,6 +24,11 @@ static int rvu_get_hwvf(struct rvu *rvu, int pcifunc); +static void rvu_set_msix_offset(struct rvu *rvu, struct rvu_pfvf *pfvf, + struct rvu_block *block, int lf); +static void rvu_clear_msix_offset(struct rvu *rvu, struct rvu_pfvf *pfvf, + struct rvu_block *block, int lf); + /* Supported devices */ static const struct pci_device_id rvu_id_table[] = { { PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_OCTEONTX2_RVU_AF) }, @@ -75,6 +80,45 @@ int rvu_alloc_rsrc(struct rsrc_bmap *rsrc) return id; } +static int rvu_alloc_rsrc_contig(struct rsrc_bmap *rsrc, int nrsrc) +{ + int start; + + if (!rsrc->bmap) + return -EINVAL; + + start = bitmap_find_next_zero_area(rsrc->bmap, rsrc->max, 0, nrsrc, 0); + if (start >= rsrc->max) + return -ENOSPC; + + bitmap_set(rsrc->bmap, start, nrsrc); + return start; +} + +static void rvu_free_rsrc_contig(struct rsrc_bmap *rsrc, int nrsrc, int start) +{ + if (!rsrc->bmap) + return; + if (start >= rsrc->max) + return; + + bitmap_clear(rsrc->bmap, start, nrsrc); +} + +static bool rvu_rsrc_check_contig(struct rsrc_bmap *rsrc, int nrsrc) +{ + int start; + + if (!rsrc->bmap) + return false; + + start = bitmap_find_next_zero_area(rsrc->bmap, rsrc->max, 0, nrsrc, 0); + if (start >= rsrc->max) + return false; + + return true; +} + void rvu_free_rsrc(struct rsrc_bmap *rsrc, int id) { if (!rsrc->bmap) @@ -103,6 +147,26 @@ int rvu_alloc_bitmap(struct rsrc_bmap *rsrc) return 0; } +/* Get block LF's HW index from a PF_FUNC's block slot number */ +int rvu_get_lf(struct rvu *rvu, struct rvu_block *block, u16 pcifunc, u16 slot) +{ + u16 match = 0; + int lf; + + spin_lock(>rsrc_lock); + for (lf = 0; lf < block->lf.max; lf++) { + if (block->fn_map[lf] == pcifunc) { + if (slot == match) { + spin_unlock(>rsrc_lock); + return lf; + } + match++; + } + } + spin_unlock(>rsrc_lock); + return -ENODEV; +} + /* Convert BLOCK_TYPE_E to a BLOCK_ADDR_E. * Some silicon variants of OcteonTX2 supports * multiple blocks of same type. @@ -237,6 +301,16 @@ inline int rvu_get_pf(u16 pcifunc) return (pcifunc >> RVU_PFVF_PF_SHIFT) & RVU_PFVF_PF_MASK; } +void
[PATCH v8 14/15] octeontx2-af: Register for CGX lmac events
From: Linu Cherian Added support in RVU AF driver to register for CGX LMAC link status change events from firmware and managing them. Processing part will be added in followup patches. - Introduced eventqueue for posting events from cgx lmac. Queueing mechanism will ensure that events can be posted and firmware can be acked immediately and hence event reception and processing are decoupled. - Events gets added to the queue by notification callback. Notification callback is expected to be atomic, since it is called from interrupt context. - Events are dequeued and processed in a worker thread. Signed-off-by: Linu Cherian Signed-off-by: Sunil Goutham --- drivers/net/ethernet/marvell/octeontx2/af/rvu.c| 6 +- drivers/net/ethernet/marvell/octeontx2/af/rvu.h| 5 + .../net/ethernet/marvell/octeontx2/af/rvu_cgx.c| 101 - 3 files changed, 108 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c index 43ee14f..4e7788c 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c @@ -1564,10 +1564,11 @@ static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id) err = rvu_register_interrupts(rvu); if (err) - goto err_mbox; + goto err_cgx; return 0; - +err_cgx: + rvu_cgx_wq_destroy(rvu); err_mbox: rvu_mbox_destroy(rvu); err_hwsetup: @@ -1589,6 +1590,7 @@ static void rvu_remove(struct pci_dev *pdev) struct rvu *rvu = pci_get_drvdata(pdev); rvu_unregister_interrupts(rvu); + rvu_cgx_wq_destroy(rvu); rvu_mbox_destroy(rvu); rvu_reset_all_blocks(rvu); rvu_free_hw_resources(rvu); diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h index 385f597..d169fa9 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h @@ -110,6 +110,10 @@ struct rvu { * every cgx lmac port */ void**cgx_idmap; /* cgx id to cgx data map table */ + struct work_struct cgx_evh_work; + struct workqueue_struct *cgx_evh_wq; + spinlock_t cgx_evq_lock; /* cgx event queue lock */ + struct list_headcgx_evq_head; /* cgx event queue head */ }; static inline void rvu_write64(struct rvu *rvu, u64 block, u64 offset, u64 val) @@ -150,4 +154,5 @@ int rvu_poll_reg(struct rvu *rvu, u64 block, u64 offset, u64 mask, bool zero); /* CGX APIs */ int rvu_cgx_probe(struct rvu *rvu); +void rvu_cgx_wq_destroy(struct rvu *rvu); #endif /* RVU_H */ diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c index bf81507..5ecc223 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c @@ -15,6 +15,11 @@ #include "rvu.h" #include "cgx.h" +struct cgx_evq_entry { + struct list_head evq_node; + struct cgx_link_event link_event; +}; + static inline u8 cgxlmac_id_to_bmap(u8 cgx_id, u8 lmac_id) { return ((cgx_id & 0xF) << 4) | (lmac_id & 0xF); @@ -72,9 +77,95 @@ static int rvu_map_cgx_lmac_pf(struct rvu *rvu) return 0; } +/* This is called from interrupt context and is expected to be atomic */ +static int cgx_lmac_postevent(struct cgx_link_event *event, void *data) +{ + struct cgx_evq_entry *qentry; + struct rvu *rvu = data; + + /* post event to the event queue */ + qentry = kmalloc(sizeof(*qentry), GFP_ATOMIC); + if (!qentry) + return -ENOMEM; + qentry->link_event = *event; + spin_lock(>cgx_evq_lock); + list_add_tail(>evq_node, >cgx_evq_head); + spin_unlock(>cgx_evq_lock); + + /* start worker to process the events */ + queue_work(rvu->cgx_evh_wq, >cgx_evh_work); + + return 0; +} + +static void cgx_evhandler_task(struct work_struct *work) +{ + struct rvu *rvu = container_of(work, struct rvu, cgx_evh_work); + struct cgx_evq_entry *qentry; + struct cgx_link_event *event; + unsigned long flags; + + do { + /* Dequeue an event */ + spin_lock_irqsave(>cgx_evq_lock, flags); + qentry = list_first_entry_or_null(>cgx_evq_head, + struct cgx_evq_entry, + evq_node); + if (qentry) + list_del(>evq_node); + spin_unlock_irqrestore(>cgx_evq_lock, flags); + if (!qentry) + break; /* nothing more to process */ + + event = >link_event; + +
[PATCH v8 08/15] octeontx2-af: Add RVU block LF provisioning support
From: Sunil Goutham Added support for a RVU PF/VF to request AF via mailbox to attach or detach NPA/NIX/SSO/SSOW/TIM/CPT block LFs. Also supports partial detachment and modifying current LF attached count of a certian block type. Signed-off-by: Sunil Goutham --- drivers/net/ethernet/marvell/octeontx2/af/mbox.h | 45 +- drivers/net/ethernet/marvell/octeontx2/af/rvu.c| 472 - drivers/net/ethernet/marvell/octeontx2/af/rvu.h| 8 +- .../net/ethernet/marvell/octeontx2/af/rvu_reg.h| 8 +- 4 files changed, 523 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h index fc593f0..7280d49 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h @@ -118,7 +118,17 @@ static inline struct mbox_msghdr *otx2_mbox_alloc_msg(struct otx2_mbox *mbox, #define MBOX_MSG_MAX 0x #define MBOX_MESSAGES \ -M(READY, 0x001, msg_req, ready_msg_rsp) +/* Generic mbox IDs (range 0x000 - 0x1FF) */ \ +M(READY, 0x001, msg_req, ready_msg_rsp) \ +M(ATTACH_RESOURCES,0x002, rsrc_attach, msg_rsp)\ +M(DETACH_RESOURCES,0x003, rsrc_detach, msg_rsp)\ +/* CGX mbox IDs (range 0x200 - 0x3FF) */ \ +/* NPA mbox IDs (range 0x400 - 0x5FF) */ \ +/* SSO/SSOW mbox IDs (range 0x600 - 0x7FF) */ \ +/* TIM mbox IDs (range 0x800 - 0x9FF) */ \ +/* CPT mbox IDs (range 0xA00 - 0xBFF) */ \ +/* NPC mbox IDs (range 0x6000 - 0x7FFF) */ \ +/* NIX mbox IDs (range 0x8000 - 0x) */ \ enum { #define M(_name, _id, _1, _2) MBOX_MSG_ ## _name = _id, @@ -147,4 +157,37 @@ struct ready_msg_rsp { u16sclk_feq;/* SCLK frequency */ }; +/* Structure for requesting resource provisioning. + * 'modify' flag to be used when either requesting more + * or to detach partial of a cetain resource type. + * Rest of the fields specify how many of what type to + * be attached. + */ +struct rsrc_attach { + struct mbox_msghdr hdr; + u8 modify:1; + u8 npalf:1; + u8 nixlf:1; + u16 sso; + u16 ssow; + u16 timlfs; + u16 cptlfs; +}; + +/* Structure for relinquishing resources. + * 'partial' flag to be used when relinquishing all resources + * but only of a certain type. If not set, all resources of all + * types provisioned to the RVU function will be detached. + */ +struct rsrc_detach { + struct mbox_msghdr hdr; + u8 partial:1; + u8 npalf:1; + u8 nixlf:1; + u8 sso:1; + u8 ssow:1; + u8 timlfs:1; + u8 cptlfs:1; +}; + #endif /* MBOX_H */ diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c index 53e02b0..ef3f559 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c @@ -59,6 +59,41 @@ int rvu_poll_reg(struct rvu *rvu, u64 block, u64 offset, u64 mask, bool zero) return -EBUSY; } +int rvu_alloc_rsrc(struct rsrc_bmap *rsrc) +{ + int id; + + if (!rsrc->bmap) + return -EINVAL; + + id = find_first_zero_bit(rsrc->bmap, rsrc->max); + if (id >= rsrc->max) + return -ENOSPC; + + __set_bit(id, rsrc->bmap); + + return id; +} + +void rvu_free_rsrc(struct rsrc_bmap *rsrc, int id) +{ + if (!rsrc->bmap) + return; + + __clear_bit(id, rsrc->bmap); +} + +int rvu_rsrc_free_count(struct rsrc_bmap *rsrc) +{ + int used; + + if (!rsrc->bmap) + return 0; + + used = bitmap_weight(rsrc->bmap, rsrc->max); + return (rsrc->max - used); +} + int rvu_alloc_bitmap(struct rsrc_bmap *rsrc) { rsrc->bmap = kcalloc(BITS_TO_LONGS(rsrc->max), @@ -68,6 +103,78 @@ int rvu_alloc_bitmap(struct rsrc_bmap *rsrc) return 0; } +/* Convert BLOCK_TYPE_E to a BLOCK_ADDR_E. + * Some silicon variants of OcteonTX2 supports + * multiple blocks of same type. + * + * @pcifunc has to be zero when no LF is yet attached. + */ +int rvu_get_blkaddr(struct rvu *rvu, int blktype, u16 pcifunc) +{ + int devnum, blkaddr = -ENODEV; + u64 cfg, reg; + bool is_pf; + + switch (blktype) { + case BLKTYPE_NPA: + blkaddr = BLKADDR_NPA; + goto exit; + case BLKTYPE_NIX: + /* For now assume NIX0 */ + if (!pcifunc) { + blkaddr = BLKADDR_NIX0; + goto exit; + } + break; + case BLKTYPE_SSO: + blkaddr = BLKADDR_SSO;
[PATCH v8 10/15] octeontx2-af: Reconfig MSIX base with IOVA
From: Geetha sowjanya HW interprets RVU_AF_MSIXTR_BASE address as an IOVA, hence create a IOMMU mapping for the physcial address configured by firmware and reconfig RVU_AF_MSIXTR_BASE with IOVA. Signed-off-by: Geetha sowjanya Signed-off-by: Sunil Goutham --- drivers/net/ethernet/marvell/octeontx2/af/rvu.c | 33 ++--- drivers/net/ethernet/marvell/octeontx2/af/rvu.h | 1 + 2 files changed, 31 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c index e4b8ed2..e0c3c18 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c @@ -442,9 +442,10 @@ static int rvu_setup_msix_resources(struct rvu *rvu) { struct rvu_hwinfo *hw = rvu->hw; int pf, vf, numvfs, hwvf, err; + int nvecs, offset, max_msix; struct rvu_pfvf *pfvf; - int nvecs, offset; - u64 cfg; + u64 cfg, phy_addr; + dma_addr_t iova; for (pf = 0; pf < hw->total_pfs; pf++) { cfg = rvu_read64(rvu, BLKADDR_RVUM, RVU_PRIV_PFX_CFG(pf)); @@ -523,6 +524,22 @@ static int rvu_setup_msix_resources(struct rvu *rvu) } } + /* HW interprets RVU_AF_MSIXTR_BASE address as an IOVA, hence +* create a IOMMU mapping for the physcial address configured by +* firmware and reconfig RVU_AF_MSIXTR_BASE with IOVA. +*/ + cfg = rvu_read64(rvu, BLKADDR_RVUM, RVU_PRIV_CONST); + max_msix = cfg & 0xF; + phy_addr = rvu_read64(rvu, BLKADDR_RVUM, RVU_AF_MSIXTR_BASE); + iova = dma_map_single(rvu->dev, (void *)phy_addr, + max_msix * PCI_MSIX_ENTRY_SIZE, + DMA_BIDIRECTIONAL); + if (dma_mapping_error(rvu->dev, iova)) + return -ENOMEM; + + rvu_write64(rvu, BLKADDR_RVUM, RVU_AF_MSIXTR_BASE, (u64)iova); + rvu->msix_base_iova = iova; + return 0; } @@ -531,7 +548,8 @@ static void rvu_free_hw_resources(struct rvu *rvu) struct rvu_hwinfo *hw = rvu->hw; struct rvu_block *block; struct rvu_pfvf *pfvf; - int id; + int id, max_msix; + u64 cfg; /* Free block LF bitmaps */ for (id = 0; id < BLK_COUNT; id++) { @@ -549,6 +567,15 @@ static void rvu_free_hw_resources(struct rvu *rvu) pfvf = >hwvf[id]; kfree(pfvf->msix.bmap); } + + /* Unmap MSIX vector base IOVA mapping */ + if (!rvu->msix_base_iova) + return; + cfg = rvu_read64(rvu, BLKADDR_RVUM, RVU_PRIV_CONST); + max_msix = cfg & 0xF; + dma_unmap_single(rvu->dev, rvu->msix_base_iova, +max_msix * PCI_MSIX_ENTRY_SIZE, +DMA_BIDIRECTIONAL); } static int rvu_setup_hw_resources(struct rvu *rvu) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h index 7435e83..92c2022 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h @@ -99,6 +99,7 @@ struct rvu { u16 num_vec; char*irq_name; bool*irq_allocated; + dma_addr_t msix_base_iova; }; static inline void rvu_write64(struct rvu *rvu, u64 block, u64 offset, u64 val) -- 2.7.4
[PATCH v8 12/15] octeontx2-af: Set RVU PFs to CGX LMACs mapping
From: Linu Cherian Each of the enabled CGX LMAC is considered a physical interface and RVU PFs are mapped to these. VFs of these SRIOV PFs will be virtual interfaces and share CGX LMAC along with PF. This mapping info will be used later on for Rx/Tx pkt steering. Signed-off-by: Linu Cherian Signed-off-by: Geetha sowjanya Signed-off-by: Sunil Goutham --- drivers/net/ethernet/marvell/octeontx2/af/Makefile | 2 +- drivers/net/ethernet/marvell/octeontx2/af/cgx.c| 59 ++ drivers/net/ethernet/marvell/octeontx2/af/cgx.h| 15 +- drivers/net/ethernet/marvell/octeontx2/af/rvu.c| 4 ++ drivers/net/ethernet/marvell/octeontx2/af/rvu.h| 12 + 5 files changed, 89 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile b/drivers/net/ethernet/marvell/octeontx2/af/Makefile index 8646421..eaac264 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/Makefile +++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile @@ -7,4 +7,4 @@ obj-$(CONFIG_OCTEONTX2_MBOX) += octeontx2_mbox.o obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o octeontx2_mbox-y := mbox.o -octeontx2_af-y := cgx.o rvu.o +octeontx2_af-y := cgx.o rvu.o rvu_cgx.o diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c index c41d23f..6ecae80 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c @@ -28,8 +28,12 @@ struct cgx { void __iomem*reg_base; struct pci_dev *pdev; u8 cgx_id; + u8 lmac_count; + struct list_headcgx_list; }; +static LIST_HEAD(cgx_list); + /* Supported devices */ static const struct pci_device_id cgx_id_table[] = { { PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_OCTEONTX2_CGX) }, @@ -38,6 +42,53 @@ static const struct pci_device_id cgx_id_table[] = { MODULE_DEVICE_TABLE(pci, cgx_id_table); +static u64 cgx_read(struct cgx *cgx, u64 lmac, u64 offset) +{ + return readq(cgx->reg_base + (lmac << 18) + offset); +} + +int cgx_get_cgx_cnt(void) +{ + struct cgx *cgx_dev; + int count = 0; + + list_for_each_entry(cgx_dev, _list, cgx_list) + count++; + + return count; +} +EXPORT_SYMBOL(cgx_get_cgx_cnt); + +int cgx_get_lmac_cnt(void *cgxd) +{ + struct cgx *cgx = cgxd; + + if (!cgx) + return -ENODEV; + + return cgx->lmac_count; +} +EXPORT_SYMBOL(cgx_get_lmac_cnt); + +void *cgx_get_pdata(int cgx_id) +{ + struct cgx *cgx_dev; + + list_for_each_entry(cgx_dev, _list, cgx_list) { + if (cgx_dev->cgx_id == cgx_id) + return cgx_dev; + } + return NULL; +} +EXPORT_SYMBOL(cgx_get_pdata); + +static void cgx_lmac_init(struct cgx *cgx) +{ + cgx->lmac_count = cgx_read(cgx, 0, CGXX_CMRX_RX_LMACS) & 0x7; + if (cgx->lmac_count > MAX_LMAC_PER_CGX) + cgx->lmac_count = MAX_LMAC_PER_CGX; +} + static int cgx_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct device *dev = >dev; @@ -72,9 +123,14 @@ static int cgx_probe(struct pci_dev *pdev, const struct pci_device_id *id) goto err_release_regions; } + list_add(>cgx_list, _list); + cgx->cgx_id = cgx_get_cgx_cnt() - 1; + cgx_lmac_init(cgx); + return 0; err_release_regions: + list_del(>cgx_list); pci_release_regions(pdev); err_disable_device: pci_disable_device(pdev); @@ -84,6 +140,9 @@ static int cgx_probe(struct pci_dev *pdev, const struct pci_device_id *id) static void cgx_remove(struct pci_dev *pdev) { + struct cgx *cgx = pci_get_drvdata(pdev); + + list_del(>cgx_list); pci_release_regions(pdev); pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cgx.h b/drivers/net/ethernet/marvell/octeontx2/af/cgx.h index a7d4b39..acdc16e 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/cgx.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/cgx.h @@ -12,11 +12,22 @@ #define CGX_H /* PCI device IDs */ -#definePCI_DEVID_OCTEONTX2_CGX 0xA059 +#definePCI_DEVID_OCTEONTX2_CGX 0xA059 /* PCI BAR nos */ -#define PCI_CFG_REG_BAR_NUM0 +#define PCI_CFG_REG_BAR_NUM0 + +#define MAX_CGX3 +#define MAX_LMAC_PER_CGX 4 +#define CGX_OFFSET(x) ((x) * MAX_LMAC_PER_CGX) + +/* Registers */ +#define CGXX_CMRX_RX_ID_MAP0x060 +#define CGXX_CMRX_RX_LMACS 0x128 extern struct pci_driver cgx_driver; +int cgx_get_cgx_cnt(void); +int cgx_get_lmac_cnt(void *cgxd); +void *cgx_get_pdata(int cgx_id); #endif /* CGX_H */ diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
[PATCH v8 05/15] octeontx2-af: Add mailbox IRQ and msg handlers
From: Sunil Goutham This patch adds support for mailbox interrupt and message handling. Mapped mailbox region and registered a workqueue for message handling. Enabled mailbox IRQ of RVU PFs and registered a interrupt handler. When IRQ is triggered work is added to the mbox workqueue for msgs to get processed. Signed-off-by: Sunil Goutham --- drivers/net/ethernet/marvell/octeontx2/af/mbox.h | 14 +- drivers/net/ethernet/marvell/octeontx2/af/rvu.c| 254 + drivers/net/ethernet/marvell/octeontx2/af/rvu.h| 22 ++ .../net/ethernet/marvell/octeontx2/af/rvu_struct.h | 22 ++ 4 files changed, 309 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h index 8e205fd..fc593f0 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h @@ -33,6 +33,8 @@ # error "incorrect mailbox area sizes" #endif +#define INTR_MASK(pfvfs) ((pfvfs < 64) ? (BIT_ULL(pfvfs) - 1) : (~0ull)) + #define MBOX_RSP_TIMEOUT 1000 /* in ms, Time to wait for mbox response */ #define MBOX_MSG_ALIGN 16 /* Align mbox msg start to 16bytes */ @@ -90,8 +92,9 @@ struct mbox_msghdr { void otx2_mbox_reset(struct otx2_mbox *mbox, int devid); void otx2_mbox_destroy(struct otx2_mbox *mbox); -int otx2_mbox_init(struct otx2_mbox *mbox, void *hwbase, struct pci_dev *pdev, - void *reg_base, int direction, int ndevs); +int otx2_mbox_init(struct otx2_mbox *mbox, void __force *hwbase, + struct pci_dev *pdev, void __force *reg_base, + int direction, int ndevs); void otx2_mbox_msg_send(struct otx2_mbox *mbox, int devid); int otx2_mbox_wait_for_rsp(struct otx2_mbox *mbox, int devid); int otx2_mbox_busy_poll_for_rsp(struct otx2_mbox *mbox, int devid); @@ -115,7 +118,7 @@ static inline struct mbox_msghdr *otx2_mbox_alloc_msg(struct otx2_mbox *mbox, #define MBOX_MSG_MAX 0x #define MBOX_MESSAGES \ -M(READY, 0x001, msg_req, msg_rsp) +M(READY, 0x001, msg_req, ready_msg_rsp) enum { #define M(_name, _id, _1, _2) MBOX_MSG_ ## _name = _id, @@ -139,4 +142,9 @@ struct msg_rsp { struct mbox_msghdr hdr; }; +struct ready_msg_rsp { + struct mbox_msghdr hdr; + u16sclk_feq;/* SCLK frequency */ +}; + #endif /* MBOX_H */ diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c index fa5f40b..6999d0f 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c @@ -258,6 +258,245 @@ static int rvu_setup_hw_resources(struct rvu *rvu) return 0; } +static int rvu_process_mbox_msg(struct rvu *rvu, int devid, + struct mbox_msghdr *req) +{ + /* Check if valid, if not reply with a invalid msg */ + if (req->sig != OTX2_MBOX_REQ_SIG) + goto bad_message; + + if (req->id == MBOX_MSG_READY) + return 0; + +bad_message: + otx2_reply_invalid_msg(>mbox, devid, req->pcifunc, + req->id); + return -ENODEV; +} + +static void rvu_mbox_handler(struct work_struct *work) +{ + struct rvu_work *mwork = container_of(work, struct rvu_work, work); + struct rvu *rvu = mwork->rvu; + struct otx2_mbox_dev *mdev; + struct mbox_hdr *req_hdr; + struct mbox_msghdr *msg; + struct otx2_mbox *mbox; + int offset, id, err; + u16 pf; + + mbox = >mbox; + pf = mwork - rvu->mbox_wrk; + mdev = >dev[pf]; + + /* Process received mbox messages */ + req_hdr = (struct mbox_hdr *)(mdev->mbase + mbox->rx_start); + if (req_hdr->num_msgs == 0) + return; + + offset = mbox->rx_start + ALIGN(sizeof(*req_hdr), MBOX_MSG_ALIGN); + + for (id = 0; id < req_hdr->num_msgs; id++) { + msg = (struct mbox_msghdr *)(mdev->mbase + offset); + + /* Set which PF sent this message based on mbox IRQ */ + msg->pcifunc &= ~(RVU_PFVF_PF_MASK << RVU_PFVF_PF_SHIFT); + msg->pcifunc |= (pf << RVU_PFVF_PF_SHIFT); + err = rvu_process_mbox_msg(rvu, pf, msg); + if (!err) { + offset = mbox->rx_start + msg->next_msgoff; + continue; + } + + if (msg->pcifunc & RVU_PFVF_FUNC_MASK) + dev_warn(rvu->dev, "Error %d when processing message %s (0x%x) from PF%d:VF%d\n", +err, otx2_mbox_id2name(msg->id), msg->id, pf, +(msg->pcifunc & RVU_PFVF_FUNC_MASK) - 1); + else + dev_warn(rvu->dev, "Error %d when processing message %s (0x%x) from PF%d\n", +
[PATCH v8 02/15] octeontx2-af: Reset all RVU blocks
From: Sunil Goutham Go through all BLKADDRs and check which ones are implemented on this silicon and do a HW reset of each implemented block. Also added all RVU AF and PF register offsets. Signed-off-by: Sunil Goutham --- drivers/net/ethernet/marvell/octeontx2/af/rvu.c| 78 ++ drivers/net/ethernet/marvell/octeontx2/af/rvu.h| 37 +++ .../net/ethernet/marvell/octeontx2/af/rvu_reg.h| 112 + .../net/ethernet/marvell/octeontx2/af/rvu_struct.h | 34 +++ 4 files changed, 261 insertions(+) create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_reg.h create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_struct.h diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c index 5af4da6..d40fabf 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c @@ -16,6 +16,7 @@ #include #include "rvu.h" +#include "rvu_reg.h" #define DRV_NAME "octeontx2-af" #define DRV_STRING "Marvell OcteonTX2 RVU Admin Function Driver" @@ -33,6 +34,70 @@ MODULE_LICENSE("GPL v2"); MODULE_VERSION(DRV_VERSION); MODULE_DEVICE_TABLE(pci, rvu_id_table); +/* Poll a RVU block's register 'offset', for a 'zero' + * or 'nonzero' at bits specified by 'mask' + */ +int rvu_poll_reg(struct rvu *rvu, u64 block, u64 offset, u64 mask, bool zero) +{ + void __iomem *reg; + int timeout = 100; + u64 reg_val; + + reg = rvu->afreg_base + ((block << 28) | offset); + while (timeout) { + reg_val = readq(reg); + if (zero && !(reg_val & mask)) + return 0; + if (!zero && (reg_val & mask)) + return 0; + udelay(1); + cpu_relax(); + timeout--; + } + return -EBUSY; +} + +static void rvu_check_block_implemented(struct rvu *rvu) +{ + struct rvu_hwinfo *hw = rvu->hw; + struct rvu_block *block; + int blkid; + u64 cfg; + + /* For each block check if 'implemented' bit is set */ + for (blkid = 0; blkid < BLK_COUNT; blkid++) { + block = >block[blkid]; + cfg = rvupf_read64(rvu, RVU_PF_BLOCK_ADDRX_DISC(blkid)); + if (cfg & BIT_ULL(11)) + block->implemented = true; + } +} + +static void rvu_block_reset(struct rvu *rvu, int blkaddr, u64 rst_reg) +{ + struct rvu_block *block = >hw->block[blkaddr]; + + if (!block->implemented) + return; + + rvu_write64(rvu, blkaddr, rst_reg, BIT_ULL(0)); + rvu_poll_reg(rvu, blkaddr, rst_reg, BIT_ULL(63), true); +} + +static void rvu_reset_all_blocks(struct rvu *rvu) +{ + /* Do a HW reset of all RVU blocks */ + rvu_block_reset(rvu, BLKADDR_NPA, NPA_AF_BLK_RST); + rvu_block_reset(rvu, BLKADDR_NIX0, NIX_AF_BLK_RST); + rvu_block_reset(rvu, BLKADDR_NPC, NPC_AF_BLK_RST); + rvu_block_reset(rvu, BLKADDR_SSO, SSO_AF_BLK_RST); + rvu_block_reset(rvu, BLKADDR_TIM, TIM_AF_BLK_RST); + rvu_block_reset(rvu, BLKADDR_CPT0, CPT_AF_BLK_RST); + rvu_block_reset(rvu, BLKADDR_NDC0, NDC_AF_BLK_RST); + rvu_block_reset(rvu, BLKADDR_NDC1, NDC_AF_BLK_RST); + rvu_block_reset(rvu, BLKADDR_NDC2, NDC_AF_BLK_RST); +} + static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct device *dev = >dev; @@ -43,6 +108,12 @@ static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (!rvu) return -ENOMEM; + rvu->hw = devm_kzalloc(dev, sizeof(struct rvu_hwinfo), GFP_KERNEL); + if (!rvu->hw) { + devm_kfree(dev, rvu); + return -ENOMEM; + } + pci_set_drvdata(pdev, rvu); rvu->pdev = pdev; rvu->dev = >dev; @@ -80,6 +151,11 @@ static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id) goto err_release_regions; } + /* Check which blocks the HW supports */ + rvu_check_block_implemented(rvu); + + rvu_reset_all_blocks(rvu); + return 0; err_release_regions: @@ -88,6 +164,7 @@ static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id) pci_disable_device(pdev); err_freemem: pci_set_drvdata(pdev, NULL); + devm_kfree(>dev, rvu->hw); devm_kfree(dev, rvu); return err; } @@ -100,6 +177,7 @@ static void rvu_remove(struct pci_dev *pdev) pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); + devm_kfree(>dev, rvu->hw); devm_kfree(>dev, rvu); } diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h index 4a4b0ad..e2c54d0 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h @@ -11,6 +11,8 @@ #ifndef RVU_H
[PATCH v8 06/15] octeontx2-af: Convert mbox msg id check to a macro
From: Aleksey Makarov With 10's of mailbox messages expected to be handled in future, checking for message id could become a lengthy switch case. Hence added a macro to auto generate the switch case for each msg id. Signed-off-by: Aleksey Makarov Signed-off-by: Sunil Goutham --- drivers/net/ethernet/marvell/octeontx2/af/rvu.c | 44 + 1 file changed, 38 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c index 6999d0f..d75ce45 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c @@ -258,6 +258,12 @@ static int rvu_setup_hw_resources(struct rvu *rvu) return 0; } +static int rvu_mbox_handler_READY(struct rvu *rvu, struct msg_req *req, + struct ready_msg_rsp *rsp) +{ + return 0; +} + static int rvu_process_mbox_msg(struct rvu *rvu, int devid, struct mbox_msghdr *req) { @@ -265,13 +271,39 @@ static int rvu_process_mbox_msg(struct rvu *rvu, int devid, if (req->sig != OTX2_MBOX_REQ_SIG) goto bad_message; - if (req->id == MBOX_MSG_READY) - return 0; - + switch (req->id) { +#define M(_name, _id, _req_type, _rsp_type)\ + case _id: { \ + struct _rsp_type *rsp; \ + int err;\ + \ + rsp = (struct _rsp_type *)otx2_mbox_alloc_msg( \ + >mbox, devid, \ + sizeof(struct _rsp_type)); \ + if (rsp) { \ + rsp->hdr.id = _id; \ + rsp->hdr.sig = OTX2_MBOX_RSP_SIG; \ + rsp->hdr.pcifunc = req->pcifunc;\ + rsp->hdr.rc = 0;\ + } \ + \ + err = rvu_mbox_handler_ ## _name(rvu, \ +(struct _req_type *)req, \ +rsp); \ + if (rsp && err) \ + rsp->hdr.rc = err; \ + \ + return rsp ? err : -ENOMEM; \ + } +MBOX_MESSAGES +#undef M + break; bad_message: - otx2_reply_invalid_msg(>mbox, devid, req->pcifunc, - req->id); - return -ENODEV; + default: + otx2_reply_invalid_msg(>mbox, devid, req->pcifunc, + req->id); + return -ENODEV; + } } static void rvu_mbox_handler(struct work_struct *work) -- 2.7.4
[PATCH v8 03/15] octeontx2-af: Gather RVU blocks HW info
From: Sunil Goutham This patch gathers NPA/NIX/SSO/SSOW/TIM/CPT RVU blocks's HW info like number of LFs. Important register offsets saved for later use to avoid code duplication for each block. A bitmap is allocated for each of the blocks which later on will be used to allocate a LF for a RVU PF/VF. Also added RVU NIX/NPA block registers and few registers of other blocks. Signed-off-by: Sunil Goutham --- drivers/net/ethernet/marvell/octeontx2/af/rvu.c| 167 +++ drivers/net/ethernet/marvell/octeontx2/af/rvu.h| 21 ++ .../net/ethernet/marvell/octeontx2/af/rvu_reg.h| 333 - 3 files changed, 517 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c index d40fabf..fa5f40b 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c @@ -57,6 +57,15 @@ int rvu_poll_reg(struct rvu *rvu, u64 block, u64 offset, u64 mask, bool zero) return -EBUSY; } +int rvu_alloc_bitmap(struct rsrc_bmap *rsrc) +{ + rsrc->bmap = kcalloc(BITS_TO_LONGS(rsrc->max), +sizeof(long), GFP_KERNEL); + if (!rsrc->bmap) + return -ENOMEM; + return 0; +} + static void rvu_check_block_implemented(struct rvu *rvu) { struct rvu_hwinfo *hw = rvu->hw; @@ -98,6 +107,157 @@ static void rvu_reset_all_blocks(struct rvu *rvu) rvu_block_reset(rvu, BLKADDR_NDC2, NDC_AF_BLK_RST); } +static void rvu_free_hw_resources(struct rvu *rvu) +{ + struct rvu_hwinfo *hw = rvu->hw; + struct rvu_block *block; + int id; + + /* Free all bitmaps */ + for (id = 0; id < BLK_COUNT; id++) { + block = >block[id]; + kfree(block->lf.bmap); + } +} + +static int rvu_setup_hw_resources(struct rvu *rvu) +{ + struct rvu_hwinfo *hw = rvu->hw; + struct rvu_block *block; + int err; + u64 cfg; + + /* Get HW supported max RVU PF & VF count */ + cfg = rvu_read64(rvu, BLKADDR_RVUM, RVU_PRIV_CONST); + hw->total_pfs = (cfg >> 32) & 0xFF; + hw->total_vfs = (cfg >> 20) & 0xFFF; + hw->max_vfs_per_pf = (cfg >> 40) & 0xFF; + + /* Init NPA LF's bitmap */ + block = >block[BLKADDR_NPA]; + if (!block->implemented) + goto nix; + cfg = rvu_read64(rvu, BLKADDR_NPA, NPA_AF_CONST); + block->lf.max = (cfg >> 16) & 0xFFF; + block->addr = BLKADDR_NPA; + block->lfshift = 8; + block->lookup_reg = NPA_AF_RVU_LF_CFG_DEBUG; + block->pf_lfcnt_reg = RVU_PRIV_PFX_NPA_CFG; + block->vf_lfcnt_reg = RVU_PRIV_HWVFX_NPA_CFG; + block->lfcfg_reg = NPA_PRIV_LFX_CFG; + block->msixcfg_reg = NPA_PRIV_LFX_INT_CFG; + block->lfreset_reg = NPA_AF_LF_RST; + sprintf(block->name, "NPA"); + err = rvu_alloc_bitmap(>lf); + if (err) + return err; + +nix: + /* Init NIX LF's bitmap */ + block = >block[BLKADDR_NIX0]; + if (!block->implemented) + goto sso; + cfg = rvu_read64(rvu, BLKADDR_NIX0, NIX_AF_CONST2); + block->lf.max = cfg & 0xFFF; + block->addr = BLKADDR_NIX0; + block->lfshift = 8; + block->lookup_reg = NIX_AF_RVU_LF_CFG_DEBUG; + block->pf_lfcnt_reg = RVU_PRIV_PFX_NIX_CFG; + block->vf_lfcnt_reg = RVU_PRIV_HWVFX_NIX_CFG; + block->lfcfg_reg = NIX_PRIV_LFX_CFG; + block->msixcfg_reg = NIX_PRIV_LFX_INT_CFG; + block->lfreset_reg = NIX_AF_LF_RST; + sprintf(block->name, "NIX"); + err = rvu_alloc_bitmap(>lf); + if (err) + return err; + +sso: + /* Init SSO group's bitmap */ + block = >block[BLKADDR_SSO]; + if (!block->implemented) + goto ssow; + cfg = rvu_read64(rvu, BLKADDR_SSO, SSO_AF_CONST); + block->lf.max = cfg & 0x; + block->addr = BLKADDR_SSO; + block->multislot = true; + block->lfshift = 3; + block->lookup_reg = SSO_AF_RVU_LF_CFG_DEBUG; + block->pf_lfcnt_reg = RVU_PRIV_PFX_SSO_CFG; + block->vf_lfcnt_reg = RVU_PRIV_HWVFX_SSO_CFG; + block->lfcfg_reg = SSO_PRIV_LFX_HWGRP_CFG; + block->msixcfg_reg = SSO_PRIV_LFX_HWGRP_INT_CFG; + block->lfreset_reg = SSO_AF_LF_HWGRP_RST; + sprintf(block->name, "SSO GROUP"); + err = rvu_alloc_bitmap(>lf); + if (err) + return err; + +ssow: + /* Init SSO workslot's bitmap */ + block = >block[BLKADDR_SSOW]; + if (!block->implemented) + goto tim; + block->lf.max = (cfg >> 56) & 0xFF; + block->addr = BLKADDR_SSOW; + block->multislot = true; + block->lfshift = 3; + block->lookup_reg = SSOW_AF_RVU_LF_HWS_CFG_DEBUG; + block->pf_lfcnt_reg = RVU_PRIV_PFX_SSOW_CFG; + block->vf_lfcnt_reg = RVU_PRIV_HWVFX_SSOW_CFG; + block->lfcfg_reg = SSOW_PRIV_LFX_HWS_CFG; +
[PATCH v8 04/15] octeontx2-af: Add mailbox support infra
From: Aleksey Makarov This patch adds mailbox support infrastructure APIs. Each RVU device has a dedicated 64KB mailbox region shared with it's peer for communication. RVU AF has a separate mailbox region shared with each of RVU PFs and a RVU PF has a separate region shared with each of it's VF. These set of APIs are used by this driver (RVU AF) and other RVU PF/VF drivers eg netdev, crypto e.t.c. Signed-off-by: Aleksey Makarov Signed-off-by: Sunil Goutham Signed-off-by: Lukasz Bartosik --- drivers/net/ethernet/marvell/octeontx2/Kconfig | 4 + drivers/net/ethernet/marvell/octeontx2/af/Makefile | 2 + drivers/net/ethernet/marvell/octeontx2/af/mbox.c | 303 + drivers/net/ethernet/marvell/octeontx2/af/mbox.h | 142 ++ .../net/ethernet/marvell/octeontx2/af/rvu_reg.h| 4 + 5 files changed, 455 insertions(+) create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/mbox.c create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/mbox.h diff --git a/drivers/net/ethernet/marvell/octeontx2/Kconfig b/drivers/net/ethernet/marvell/octeontx2/Kconfig index 9743502..8002f9c 100644 --- a/drivers/net/ethernet/marvell/octeontx2/Kconfig +++ b/drivers/net/ethernet/marvell/octeontx2/Kconfig @@ -2,8 +2,12 @@ # Marvell OcteonTX2 drivers configuration # +config OCTEONTX2_MBOX +tristate + config OCTEONTX2_AF tristate "Marvell OcteonTX2 RVU Admin Function driver" + select OCTEONTX2_MBOX depends on ARM64 && PCI help This driver supports Marvell's OcteonTX2 Resource Virtualization diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile b/drivers/net/ethernet/marvell/octeontx2/af/Makefile index dacbd16..ac17cb9 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/Makefile +++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile @@ -3,6 +3,8 @@ # Makefile for Marvell's OcteonTX2 RVU Admin Function driver # +obj-$(CONFIG_OCTEONTX2_MBOX) += octeontx2_mbox.o obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o +octeontx2_mbox-y := mbox.o octeontx2_af-y := rvu.o diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.c b/drivers/net/ethernet/marvell/octeontx2/af/mbox.c new file mode 100644 index 000..85ba24a --- /dev/null +++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.c @@ -0,0 +1,303 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Marvell OcteonTx2 RVU Admin Function driver + * + * Copyright (C) 2018 Marvell International Ltd. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include +#include +#include + +#include "rvu_reg.h" +#include "mbox.h" + +static const u16 msgs_offset = ALIGN(sizeof(struct mbox_hdr), MBOX_MSG_ALIGN); + +void otx2_mbox_reset(struct otx2_mbox *mbox, int devid) +{ + struct otx2_mbox_dev *mdev = >dev[devid]; + struct mbox_hdr *tx_hdr, *rx_hdr; + + tx_hdr = mdev->mbase + mbox->tx_start; + rx_hdr = mdev->mbase + mbox->rx_start; + + spin_lock(>mbox_lock); + mdev->msg_size = 0; + mdev->rsp_size = 0; + tx_hdr->num_msgs = 0; + rx_hdr->num_msgs = 0; + spin_unlock(>mbox_lock); +} +EXPORT_SYMBOL(otx2_mbox_reset); + +void otx2_mbox_destroy(struct otx2_mbox *mbox) +{ + mbox->reg_base = NULL; + mbox->hwbase = NULL; + + kfree(mbox->dev); + mbox->dev = NULL; +} +EXPORT_SYMBOL(otx2_mbox_destroy); + +int otx2_mbox_init(struct otx2_mbox *mbox, void *hwbase, struct pci_dev *pdev, + void *reg_base, int direction, int ndevs) +{ + struct otx2_mbox_dev *mdev; + int devid; + + switch (direction) { + case MBOX_DIR_AFPF: + case MBOX_DIR_PFVF: + mbox->tx_start = MBOX_DOWN_TX_START; + mbox->rx_start = MBOX_DOWN_RX_START; + mbox->tx_size = MBOX_DOWN_TX_SIZE; + mbox->rx_size = MBOX_DOWN_RX_SIZE; + break; + case MBOX_DIR_PFAF: + case MBOX_DIR_VFPF: + mbox->tx_start = MBOX_DOWN_RX_START; + mbox->rx_start = MBOX_DOWN_TX_START; + mbox->tx_size = MBOX_DOWN_RX_SIZE; + mbox->rx_size = MBOX_DOWN_TX_SIZE; + break; + case MBOX_DIR_AFPF_UP: + case MBOX_DIR_PFVF_UP: + mbox->tx_start = MBOX_UP_TX_START; + mbox->rx_start = MBOX_UP_RX_START; + mbox->tx_size = MBOX_UP_TX_SIZE; + mbox->rx_size = MBOX_UP_RX_SIZE; + break; + case MBOX_DIR_PFAF_UP: + case MBOX_DIR_VFPF_UP: + mbox->tx_start = MBOX_UP_RX_START; + mbox->rx_start = MBOX_UP_TX_START; + mbox->tx_size = MBOX_UP_RX_SIZE; + mbox->rx_size = MBOX_UP_TX_SIZE; + break; + default: + return -ENODEV; + } + + switch (direction) { + case
[PATCH v8 01/15] octeontx2-af: Add Marvell OcteonTX2 RVU AF driver
From: Sunil Goutham This patch adds basic template for Marvell OcteonTX2's resource virtualization unit (RVU) admin function (AF) driver. Just the driver registration and probe. Signed-off-by: Sunil Goutham --- drivers/net/ethernet/marvell/Kconfig | 3 + drivers/net/ethernet/marvell/Makefile | 1 + drivers/net/ethernet/marvell/octeontx2/Kconfig | 12 ++ drivers/net/ethernet/marvell/octeontx2/Makefile| 6 + drivers/net/ethernet/marvell/octeontx2/af/Makefile | 8 ++ drivers/net/ethernet/marvell/octeontx2/af/rvu.c| 126 + drivers/net/ethernet/marvell/octeontx2/af/rvu.h| 31 + 7 files changed, 187 insertions(+) create mode 100644 drivers/net/ethernet/marvell/octeontx2/Kconfig create mode 100644 drivers/net/ethernet/marvell/octeontx2/Makefile create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/Makefile create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu.c create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu.h diff --git a/drivers/net/ethernet/marvell/Kconfig b/drivers/net/ethernet/marvell/Kconfig index f33fd22..3238aa7 100644 --- a/drivers/net/ethernet/marvell/Kconfig +++ b/drivers/net/ethernet/marvell/Kconfig @@ -167,4 +167,7 @@ config SKY2_DEBUG If unsure, say N. + +source "drivers/net/ethernet/marvell/octeontx2/Kconfig" + endif # NET_VENDOR_MARVELL diff --git a/drivers/net/ethernet/marvell/Makefile b/drivers/net/ethernet/marvell/Makefile index 55d4d10..89dea72 100644 --- a/drivers/net/ethernet/marvell/Makefile +++ b/drivers/net/ethernet/marvell/Makefile @@ -11,3 +11,4 @@ obj-$(CONFIG_MVPP2) += mvpp2/ obj-$(CONFIG_PXA168_ETH) += pxa168_eth.o obj-$(CONFIG_SKGE) += skge.o obj-$(CONFIG_SKY2) += sky2.o +obj-y += octeontx2/ diff --git a/drivers/net/ethernet/marvell/octeontx2/Kconfig b/drivers/net/ethernet/marvell/octeontx2/Kconfig new file mode 100644 index 000..9743502 --- /dev/null +++ b/drivers/net/ethernet/marvell/octeontx2/Kconfig @@ -0,0 +1,12 @@ +# +# Marvell OcteonTX2 drivers configuration +# + +config OCTEONTX2_AF + tristate "Marvell OcteonTX2 RVU Admin Function driver" + depends on ARM64 && PCI + help + This driver supports Marvell's OcteonTX2 Resource Virtualization + Unit's admin function manager which manages all RVU HW resources + and provides a medium to other PF/VFs to configure HW. Should be + enabled for other RVU device drivers to work. diff --git a/drivers/net/ethernet/marvell/octeontx2/Makefile b/drivers/net/ethernet/marvell/octeontx2/Makefile new file mode 100644 index 000..e579dcd --- /dev/null +++ b/drivers/net/ethernet/marvell/octeontx2/Makefile @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Makefile for Marvell OcteonTX2 device drivers. +# + +obj-$(CONFIG_OCTEONTX2_AF) += af/ diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile b/drivers/net/ethernet/marvell/octeontx2/af/Makefile new file mode 100644 index 000..dacbd16 --- /dev/null +++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile @@ -0,0 +1,8 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Makefile for Marvell's OcteonTX2 RVU Admin Function driver +# + +obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o + +octeontx2_af-y := rvu.o diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c new file mode 100644 index 000..5af4da6 --- /dev/null +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c @@ -0,0 +1,126 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Marvell OcteonTx2 RVU Admin Function driver + * + * Copyright (C) 2018 Marvell International Ltd. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include +#include +#include +#include +#include +#include + +#include "rvu.h" + +#define DRV_NAME "octeontx2-af" +#define DRV_STRING "Marvell OcteonTX2 RVU Admin Function Driver" +#define DRV_VERSION"1.0" + +/* Supported devices */ +static const struct pci_device_id rvu_id_table[] = { + { PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_OCTEONTX2_RVU_AF) }, + { 0, } /* end of table */ +}; + +MODULE_AUTHOR("Marvell International Ltd."); +MODULE_DESCRIPTION(DRV_STRING); +MODULE_LICENSE("GPL v2"); +MODULE_VERSION(DRV_VERSION); +MODULE_DEVICE_TABLE(pci, rvu_id_table); + +static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id) +{ + struct device *dev = >dev; + struct rvu *rvu; + interr; + + rvu = devm_kzalloc(dev, sizeof(*rvu), GFP_KERNEL); + if (!rvu) + return -ENOMEM; + + pci_set_drvdata(pdev, rvu); + rvu->pdev = pdev; + rvu->dev = >dev; + + err = pci_enable_device(pdev); + if (err) { + dev_err(dev, "Failed to enable PCI device\n"); + goto err_freemem; +
[PATCH v8 00/15] octeontx2-af: Add RVU Admin Function driver
From: Sunil Goutham Resource virtualization unit (RVU) on Marvell's OcteonTX2 SOC maps HW resources from the network, crypto and other functional blocks into PCI-compatible physical and virtual functions. Each functional block again has multiple local functions (LFs) for provisioning to PCI devices. RVU supports multiple PCIe SRIOV physical functions (PFs) and virtual functions (VFs). PF0 is called the administrative / admin function (AF) and has privileges to provision RVU functional block's LFs to each of the PF/VF. RVU managed networking functional blocks - Network pool allocator (NPA) - Network interface controller (NIX) - Network parser CAM (NPC) - Schedule/Synchronize/Order unit (SSO) RVU managed non-networking functional blocks - Crypto accelerator (CPT) - Scheduled timers unit (TIM) - Schedule/Synchronize/Order unit (SSO) Used for both networking and non networking usecases - Compression (upcoming in future variants of the silicons) Resource provisioning examples - A PF/VF with NIX-LF & NPA-LF resources works as a pure network device - A PF/VF with CPT-LF resource works as a pure cyrpto offload device. This admin function driver neither receives any data nor processes it i.e no I/O, a configuration only driver. PF/VFs communicates with AF via a shared memory region (mailbox). Upon receiving requests from PF/VF, AF does resource provisioning and other HW configuration. AF is always attached to host, but PF/VFs may be used by host kernel itself, or attached to VMs or to userspace applications like DPDK etc. So AF has to handle provisioning/configuration requests sent by any device from any domain. This patch series adds logic for the following - RVU AF driver with functional blocks provisioning support. - Mailbox infrastructure for communication between AF and PFs. - CGX (MAC controller) driver which communicates with firmware for managing physical ethernet interfaces. AF collects info from this driver and forwards the same to the PF/VFs uaing these interfaces. This is the first set of patches out of 80+ patches. Changes from v7: 1 Removed unecessary typecasts in mbox infra code. - Suggested by David Miller 2 Fixed MAINTAINERS patch - Suggested by Joe Perches Changes from v6: Fixed ordering of local variables from longest to shortest line. - Suggested by David Miller Changes from v5: Modified bitfield based command structures to bitmasks for communication with firmware, to address endianness issues. - Suggested by Arnd Bergmann Changes from v4: 1 Removed module author/version/description from CGX driver as it's now merged with AF driver module. - Suggested by Arnd Bergmann 2 Added big-endian bitfields for CGX's kernel <=> firmware communication command structures. - Suggested by Arnd Bergmann Changes from v3: Moved driver from drivers/soc to drivers/net/ethernet - Suggested by Arnd Bergmann https://patchwork.kernel.org/cover/10587635/ Changes from v2: No changes, submitted again with netdev mailing list in loop. - Suggested by Arnd Bergmann and Andrew Lunn Changes from v1: 1 Merged RVU admin function and CGX drivers into a single module - Suggested by Arnd Bergmann 2 Pulled mbox communication APIs into a separate module to remove admin function driver dependency in a VM where AF is not attached. - Suggested by Arnd Bergmann Aleksey Makarov (2): octeontx2-af: Add mailbox support infra octeontx2-af: Convert mbox msg id check to a macro Geetha sowjanya (1): octeontx2-af: Reconfig MSIX base with IOVA Linu Cherian (3): octeontx2-af: Set RVU PFs to CGX LMACs mapping octeontx2-af: Add support for CGX link management octeontx2-af: Register for CGX lmac events Sunil Goutham (9): octeontx2-af: Add Marvell OcteonTX2 RVU AF driver octeontx2-af: Reset all RVU blocks octeontx2-af: Gather RVU blocks HW info octeontx2-af: Add mailbox IRQ and msg handlers octeontx2-af: Scan blocks for LFs provisioned to PF/VF octeontx2-af: Add RVU block LF provisioning support octeontx2-af: Configure block LF's MSIX vector offset octeontx2-af: Add Marvell OcteonTX2 CGX driver MAINTAINERS: Add entry for Marvell OcteonTX2 Admin Function driver MAINTAINERS|9 + drivers/net/ethernet/marvell/Kconfig |3 + drivers/net/ethernet/marvell/Makefile |1 + drivers/net/ethernet/marvell/octeontx2/Kconfig | 16 + drivers/net/ethernet/marvell/octeontx2/Makefile|6 + drivers/net/ethernet/marvell/octeontx2/af/Makefile | 10 + drivers/net/ethernet/marvell/octeontx2/af/cgx.c| 505 ++ drivers/net/ethernet/marvell/octeontx2/af/cgx.h| 65 + .../net/ethernet/marvell/octeontx2/af/cgx_fw_if.h | 186 +++ drivers/net/ethernet/marvell/octeontx2/af/mbox.c | 303 drivers/net/ethernet/marvell/octeontx2/af/mbox.h | 211 +++ drivers/net/ethernet/marvell/octeontx2/af/rvu.c| 1637
Re: [PATCH v7 15/15] MAINTAINERS: Add entry for Marvell OcteonTX2 Admin Function driver
On Sat, Oct 6, 2018 at 1:05 PM Joe Perches wrote: > > On Sat, 2018-10-06 at 11:36 +0530, sunil.kovv...@gmail.com wrote: > > Added maintainers entry for Marvell OcteonTX2 SOC's RVU > > admin function driver. > [] > > diff --git a/MAINTAINERS b/MAINTAINERS > [] > > @@ -8844,6 +8844,15 @@ S: Supported > > F: drivers/mmc/host/sdhci-xenon* > > F: Documentation/devicetree/bindings/mmc/marvell,xenon-sdhci.txt > > > > +MARVELL OCTEONTX2 RVU ADMIN FUNCTION DRIVER > > +M: Sunil Goutham > > +M: Linu Cherian > > +M: Geetha sowjanya > +M: Jerin Jacob > > +L: netdev@vger.kernel.org > > +S: Maintained > > Aren't you all being paid? > > So shouldn't this be > > S: Supported > > ? > > > +F: drivers/net/ethernet/marvell/octeontx2/af > > Please add a terminating / to show that this > is a directory and not a file. > > F: drivers/net/ethernet/marvell/octeontx2/af/ > Thanks for looking at this, will fix these. Sunil.
[PATCH v2 2/2] netdev/phy: add MDIO bus multiplexer driven by a regmap
Add support for an MDIO bus multiplexer controlled by a regmap device, like an FPGA. Tested on a NXP LX2160AQDS board which uses the "QIXIS" FPGA attached to the i2c bus. Signed-off-by: Pankaj Bansal --- Notes: V2: - replaced be32_to_cpup with of_property_read_u32 - incorporated Andrew's comments drivers/net/phy/Kconfig | 13 +++ drivers/net/phy/Makefile | 1 + drivers/net/phy/mdio-mux-regmap.c | 171 3 files changed, 185 insertions(+) diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig index 82070792edbb..d1ac9e70cbb2 100644 --- a/drivers/net/phy/Kconfig +++ b/drivers/net/phy/Kconfig @@ -87,6 +87,19 @@ config MDIO_BUS_MUX_MMIOREG Currently, only 8/16/32 bits registers are supported. +config MDIO_BUS_MUX_REGMAP + tristate "REGMAP controlled MDIO bus multiplexers" + depends on OF_MDIO && REGMAP + select MDIO_BUS_MUX + help + This module provides a driver for MDIO bus multiplexers that + are controlled via a regmap device, like an FPGA connected to i2c. + The multiplexer connects one of several child MDIO busses to a + parent bus.Child bus selection is under the control of one of + the FPGA's registers. + + Currently, only upto 32 bits registers are supported. + config MDIO_CAVIUM tristate diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile index 5805c0b7d60e..33053f9f320d 100644 --- a/drivers/net/phy/Makefile +++ b/drivers/net/phy/Makefile @@ -29,6 +29,7 @@ obj-$(CONFIG_MDIO_BUS_MUX)+= mdio-mux.o obj-$(CONFIG_MDIO_BUS_MUX_BCM_IPROC) += mdio-mux-bcm-iproc.o obj-$(CONFIG_MDIO_BUS_MUX_GPIO)+= mdio-mux-gpio.o obj-$(CONFIG_MDIO_BUS_MUX_MMIOREG) += mdio-mux-mmioreg.o +obj-$(CONFIG_MDIO_BUS_MUX_REGMAP) += mdio-mux-regmap.o obj-$(CONFIG_MDIO_CAVIUM) += mdio-cavium.o obj-$(CONFIG_MDIO_GPIO)+= mdio-gpio.o obj-$(CONFIG_MDIO_HISI_FEMAC) += mdio-hisi-femac.o diff --git a/drivers/net/phy/mdio-mux-regmap.c b/drivers/net/phy/mdio-mux-regmap.c new file mode 100644 index ..6068d05a728a --- /dev/null +++ b/drivers/net/phy/mdio-mux-regmap.c @@ -0,0 +1,171 @@ +// SPDX-License-Identifier: GPL-2.0+ + +/* Simple regmap based MDIO MUX driver + * + * Copyright 2018 NXP + * + * Based on mdio-mux-mmioreg.c by Timur Tabi + * + * Author: + * Pankaj Bansal + */ + +#include +#include +#include +#include +#include +#include +#include + +struct mdio_mux_regmap_state { + void*mux_handle; + struct regmap *regmap; + u32 mux_reg; + u32 mask; +}; + +/* MDIO multiplexing switch function + * + * This function is called by the mdio-mux layer when it thinks the mdio bus + * multiplexer needs to switch. + * + * 'current_child' is the current value of the mux register (masked via + * s->mask). + * + * 'desired_child' is the value of the 'reg' property of the target child MDIO + * node. + * + * The first time this function is called, current_child == -1. + * + * If current_child == desired_child, then the mux is already set to the + * correct bus. + */ +static int mdio_mux_regmap_switch_fn(int current_child, int desired_child, +void *data) +{ + struct mdio_mux_regmap_state *s = data; + bool change; + int ret; + + ret = regmap_update_bits_check(s->regmap, + s->mux_reg, + s->mask, + desired_child, + ); + + if (ret) + return ret; + if (change) + pr_debug("%s %d -> %d\n", __func__, current_child, +desired_child); + return ret; +} + +static int mdio_mux_regmap_probe(struct platform_device *pdev) +{ + struct device_node *np2, *np = pdev->dev.of_node; + struct mdio_mux_regmap_state *s; + int ret; + u32 val; + + dev_dbg(>dev, "probing node %pOF\n", np); + + s = devm_kzalloc(>dev, sizeof(*s), GFP_KERNEL); + if (!s) + return -ENOMEM; + + s->regmap = dev_get_regmap(pdev->dev.parent, NULL); + if (IS_ERR(s->regmap)) { + dev_err(>dev, "Failed to get parent regmap\n"); + return PTR_ERR(s->regmap); + } + + ret = of_property_read_u32(np, "reg", >mux_reg); + if (ret) { + dev_err(>dev, "missing or invalid reg property\n"); + return -ENODEV; + } + + /* Test Register read write */ + ret = regmap_read(s->regmap, s->mux_reg, ); + if (ret) { + dev_err(>dev, "error while reading reg\n"); + return ret; + } + + ret = regmap_write(s->regmap, s->mux_reg, val); + if (ret) { + dev_err(>dev, "error while writing reg\n"); + return ret; + } + + ret
[PATCH v2 1/2] dt-bindings: net: add MDIO bus multiplexer driven by a regmap device
Add support for an MDIO bus multiplexer controlled by a regmap device, like an FPGA. Tested on a NXP LX2160AQDS board which uses the "QIXIS" FPGA attached to the i2c bus. Signed-off-by: Pankaj Bansal --- Notes: V2: - Fixed formatting error caused by using space instead of tab - Add newline between property list and subnode - Add newline between two subnodes .../bindings/net/mdio-mux-regmap.txt | 95 ++ 1 file changed, 95 insertions(+) diff --git a/Documentation/devicetree/bindings/net/mdio-mux-regmap.txt b/Documentation/devicetree/bindings/net/mdio-mux-regmap.txt new file mode 100644 index ..88ebac26c1c5 --- /dev/null +++ b/Documentation/devicetree/bindings/net/mdio-mux-regmap.txt @@ -0,0 +1,95 @@ +Properties for an MDIO bus multiplexer controlled by a regmap + +This is a special case of a MDIO bus multiplexer. A regmap device, +like an FPGA, is used to control which child bus is connected. The mdio-mux +node must be a child of the device that is controlled by a regmap. +The driver currently only supports devices with upto 32-bit registers. + +Required properties in addition to the generic multiplexer properties: + +- compatible : string, must contain "mdio-mux-regmap" + +- reg : integer, contains the offset of the register that controls the bus + multiplexer. it can be 32 bit number. + +- mux-mask : integer, contains an 32 bit mask that specifies which + bits in the register control the actual bus multiplexer. The + 'reg' property of each child mdio-mux node must be constrained by + this mask. + +Example: + +The FPGA node defines a i2c connected FPGA with a register space of 0x30 bytes. +For the "EMI2" MDIO bus, register 0x54 (BRDCFG4) controls the mux on that bus. +A bitmask of 0x07 means that bits 0, 1 and 2 (bit 0 is lsb) are the bits on +BRDCFG4 that control the actual mux. + +i2c@200 { + compatible = "fsl,vf610-i2c"; + #address-cells = <1>; + #size-cells = <0>; + reg = <0x0 0x200 0x0 0x1>; + interrupts = <0 34 0x4>; // Level high type + clock-names = "i2c"; + clocks = < 4 7>; + fsl-scl-gpio = < 15 0>; + status = "okay"; + + /* The FPGA node */ + fpga@66 { + compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c"; + reg = <0x66>; + #address-cells = <1>; + #size-cells = <0>; + + mdio1_mux@54 { + compatible = "mdio-mux-regmap", "mdio-mux"; + mdio-parent-bus = <>; /* MDIO bus */ + reg = <0x54>;/* BRDCFG4 */ + mux-mask = <0x07>; /* EMI2_MDIO */ + #address-cells=<1>; + #size-cells = <0>; + + mdio1_ioslot1@0 { // Slot 1 + reg = <0x00>; + #address-cells = <1>; + #size-cells = <0>; + + phy1@1 { + reg = <1>; + compatible = "ethernet-phy-id0210.7441"; + }; + + phy1@0 { + reg = <0>; + compatible = "ethernet-phy-id0210.7441"; + }; + }; + + mdio1_ioslot2@1 { // Slot 2 + reg = <0x01>; + #address-cells = <1>; + #size-cells = <0>; + + }; + + mdio1_ioslot3@2 { // Slot 3 + reg = <0x02>; + #address-cells = <1>; + #size-cells = <0>; + + }; + }; + }; +}; + + /* The parent MDIO bus. */ + emdio2: mdio@0x8B97000 { + compatible = "fsl,fman-memac-mdio"; + reg = <0x0 0x8B97000 0x0 0x1000>; + device_type = "mdio"; + little-endian; + + #address-cells = <1>; + #size-cells = <0>; + }; -- 2.17.1