Re: [PATCH 0/6] Constant Time Memory Comparisons Are Important
On Mon, Jun 12, 2017 at 12:30 AM, Emil Lenngrenwrote: > 2017-06-11 22:48 GMT+02:00 Emmanuel Grumbach : >> On Sun, Jun 11, 2017 at 4:36 PM, Kees Cook wrote: >>> >>> On Sun, Jun 11, 2017 at 1:13 AM, Kalle Valo wrote: >>> > "Jason A. Donenfeld" writes: >>> > >>> >> Whenever you're comparing two MACs, it's important to do this using >>> >> crypto_memneq instead of memcmp. With memcmp, you leak timing >>> >> information, >>> >> which could then be used to iteratively forge a MAC. >>> > >>> > Do you have any pointers where I could learn more about this? >>> >>> While not using C specifically, this talks about the problem generally: >>> https://www.chosenplaintext.ca/articles/beginners-guide-constant-time-cryptography.html >>> >> >> Sorry for the stupid question, but the MAC address is in plaintext in >> the air anyway or easily accessible via user space tools. I fail to >> see what it is so secret about a MAC address in that code where that >> same MAC address is accessible via myriads of ways. > > I think you're mixing up Media Access Control (MAC) addresses with > Message Authentication Code (MAC). The second one is a cryptographic > signature of a message. Obviously... Sorry for the noise.
[iproute2][PATCH] ip: Remove unneed header
Fix redefinition of struct ethhdr with a suitably patched musl libc that suppresses the kernel if_ether.h. Signed-off-by: Changhyeok Bae--- ip/iplink_bridge.c | 1 - 1 file changed, 1 deletion(-) diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c index cccdec1..f065b22 100644 --- a/ip/iplink_bridge.c +++ b/ip/iplink_bridge.c @@ -13,7 +13,6 @@ #include #include #include -#include #include #include #include -- 2.7.4
I am waiting for your response to my numerous un-replied emails to you concerning your family inheritance fund ($7.5 million dollars). I seek your assistance and I assured of your capability to champi
Re: [PATCH net v3] net: ipmr: Fix some mroute forwarding issues in vrf's
On 6/11/17 4:15 PM, David Miller wrote: > > Applied and queued up for -stable, thank you. > The backport will need bcfc7d33110b; that commit should have had the same Fixes tag -- e58e41596811 ("net: Enable support for VRF with ipv4 multicast")
Re: [PATCH v2] arm: eBPF JIT compiler
On Tue, Jun 6, 2017 at 12:47 PM, Shubham Bansalwrote: > Hi Russell, Alexei, David, Daniel, kees, > > Any update on this patch moving forward? Since this has gotten testing by various people and passes the existing self-tests, I think this can probably go in via the ARM patch tracker? Russell does that sound okay to you? -Kees -- Kees Cook Pixel Security
Re: [PATCH v2 1/2] tcp: md5: add an address prefix for key lookup
Hi Ivan, [auto build test WARNING on net/master] [also build test WARNING on v4.12-rc4 next-20170609] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Ivan-Delalande/tcp-md5-add-an-address-prefix-for-key-lookup/20170611-184237 config: x86_64-randconfig-s2-06120830 (attached as .config) compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): net//ipv4/tcp_ipv4.c: In function 'tcp_md5_do_lookup': >> net//ipv4/tcp_ipv4.c:908: warning: unused variable 'size' vim +/size +908 net//ipv4/tcp_ipv4.c ^1da177e4 Linus Torvalds 2005-04-16 892 } ^1da177e4 Linus Torvalds 2005-04-16 893 cfb6eeb4c YOSHIFUJI Hideaki2006-11-14 894 #ifdef CONFIG_TCP_MD5SIG cfb6eeb4c YOSHIFUJI Hideaki2006-11-14 895 /* cfb6eeb4c YOSHIFUJI Hideaki2006-11-14 896 * RFC2385 MD5 checksumming requires a mapping of cfb6eeb4c YOSHIFUJI Hideaki2006-11-14 897 * IP address->MD5 Key. cfb6eeb4c YOSHIFUJI Hideaki2006-11-14 898 * We need to maintain these in the sk structure. cfb6eeb4c YOSHIFUJI Hideaki2006-11-14 899 */ cfb6eeb4c YOSHIFUJI Hideaki2006-11-14 900 cfb6eeb4c YOSHIFUJI Hideaki2006-11-14 901 /* Find the Key structure for an address. */ b83e3deb9 Eric Dumazet 2015-09-25 902 struct tcp_md5sig_key *tcp_md5_do_lookup(const struct sock *sk, a915da9b6 Eric Dumazet 2012-01-31 903 const union tcp_md5_addr *addr, a915da9b6 Eric Dumazet 2012-01-31 904 int family) cfb6eeb4c YOSHIFUJI Hideaki2006-11-14 905 { fd3a154a0 Eric Dumazet 2015-03-24 906 const struct tcp_sock *tp = tcp_sk(sk); a915da9b6 Eric Dumazet 2012-01-31 907 struct tcp_md5sig_key *key; a915da9b6 Eric Dumazet 2012-01-31 @908 unsigned int size = sizeof(struct in_addr); fd3a154a0 Eric Dumazet 2015-03-24 909 const struct tcp_md5sig_info *md5sig; 4b9e1d2ec Ivan Delalande 2017-06-09 910 __be32 mask; 4b9e1d2ec Ivan Delalande 2017-06-09 911 struct tcp_md5sig_key *best_match = NULL; 4b9e1d2ec Ivan Delalande 2017-06-09 912 bool match; cfb6eeb4c YOSHIFUJI Hideaki2006-11-14 913 a8afca032 Eric Dumazet 2012-01-31 914 /* caller either holds rcu_read_lock() or socket lock */ a8afca032 Eric Dumazet 2012-01-31 915 md5sig = rcu_dereference_check(tp->md5sig_info, 1e1d04e67 Hannes Frederic Sowa 2016-04-05 916 lockdep_sock_is_held(sk)); :: The code at line 908 was first introduced by commit :: a915da9b69273815527ccb3789421cb7027b545b tcp: md5: rcu conversion :: TO: Eric Dumazet <eric.duma...@gmail.com> :: CC: David S. Miller <da...@davemloft.net> --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: ipmr: MFC routes when VIF deleted
I would argue that this is just an unintended side effect of the original implementation. Shuffling interface vif's seems like a good way to churn a significant number of mroutes to me. If you want to use a interface it probably is going to be already be configured with it's own vif, and as such it's just better to figure out the new proper RPF and insert that mroute. If it isn't, then I need to bring up that new vif ( and all it's associated pim information, like establishing a new neighbor relationship ) when I do need to shuffle interface vif's, In that time I'm dropping a signficant number of packets while this is happening and all our end users are going to be screaming at us to fix that hole. donald On Sun, Jun 11, 2017 at 12:34 PM, Nikolay Aleksandrovwrote: > On 11/06/17 11:55, Yotam Gigi wrote: >> I have been looking into some weird behavior, and I am not sure whether it is >> a bug or a feature. >> >> When a VIF with index v gets deleted, the MFC routes does not get updated, >> which >> means that there can be routes pointing to that VIF. On datapath, when packet >> hits that route, the VIF validity will be checked and will not be sent to >> that >> device (but still, the route does not get updated). Now, if the user creates >> another VIF with the same index v but different underlay device, the same >> route >> will forward the traffic to that device. >> >> It is relevant to mention that when user adds a MFC route, only the active >> VIFs >> are used, so the flow of adding a route with dummy VIF indices and then >> connecting those VIF indices to real device is not supported. The only way to >> create a MFC route that has non existing VIFs is to create one with existing >> VIFs and then delete them. >> >> Do we really want to support that? To me, it looks like a buggy flow and I >> suggest that upon VIF deletion, the MFC routes will be updated to not point >> to >> any non existing VIF indices. >> > > Hi Yotam, > I'm not strongly against such change but my feeling is that we shouldn't > change it. > I think we shouldn't change it because we cannot guarantee that we won't > break some > user-space app that relies on this behaviour or uses it as an optimization. > User-space ipmr apps work in sync with the kernel and are usually the only > ones > doing such changes so their internal state will be always valid, and I'd guess > they already deal with this one way or another. > My second argument is a minor one and is about performance. There are some > apps > (e.g. pimd) which use interface add/del on interface state change (up/down) > and > this could make these ops slower on large setups. > > Again I see how this could be helpful and should've probably been like that > from the > start, so if other people feel confident we won't break anything then I > wouldn't > mind the change. > > Thanks, > Nik > > >
[PATCH net-next v2] vxlan: dont migrate permanent fdb entries during learn
From: Roopa PrabhuThis patch fixes vxlan_snoop to not move permanent fdb entries on learn events. This is consistent with the bridge fdb handling of permanent entries. Fixes: 26a41ae60438 ("vxlan: only migrate dynamic FDB entries") Signed-off-by: Roopa Prabhu --- v2 - added Fixes tag drivers/net/vxlan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 7cb21a0..e045c34 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -970,7 +970,7 @@ static bool vxlan_snoop(struct net_device *dev, return false; /* Don't migrate static entries, drop packets */ - if (f->state & NUD_NOARP) + if (f->state & (NUD_PERMANENT | NUD_NOARP)) return true; if (net_ratelimit()) -- 1.9.1
Re: [PATCH net-next] vxlan: dont migrate permanent fdb entries during learn
On Sun, Jun 11, 2017 at 4:04 PM, David Millerwrote: > From: Roopa Prabhu > Date: Sun, 11 Jun 2017 15:51:22 -0700 > >> From: Roopa Prabhu >> >> This patch fixes vxlan_snoop to not move permanent fdb entries >> on learn events. This is consistent with the bridge fdb >> handling of permanent entries. >> >> Signed-off-by: Roopa Prabhu > > Is there an appropriate Fixes: tag by chance? Just checked history. It has been that way since beginning of time. Initial vxlan implementation (year 2012) allowed migration of any entry via learn. A 2013 commit 26a41ae60438 ("vxlan: only migrate dynamic FDB entries") tried to fix it, but it added a check for static entries only (NUD_NOARP) and missed permanent entries (NUD_PERMANENT). I think we can add: Fixes: 26a41ae60438 ("vxlan: only migrate dynamic FDB entries") I can re-spin with this Fixes tag.
Re: [PATCH net-next] vxlan: dont migrate permanent fdb entries during learn
From: Roopa PrabhuDate: Sun, 11 Jun 2017 15:51:22 -0700 > From: Roopa Prabhu > > This patch fixes vxlan_snoop to not move permanent fdb entries > on learn events. This is consistent with the bridge fdb > handling of permanent entries. > > Signed-off-by: Roopa Prabhu Is there an appropriate Fixes: tag by chance?
[PATCH net-next] vxlan: dont migrate permanent fdb entries during learn
From: Roopa PrabhuThis patch fixes vxlan_snoop to not move permanent fdb entries on learn events. This is consistent with the bridge fdb handling of permanent entries. Signed-off-by: Roopa Prabhu --- drivers/net/vxlan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 7cb21a0..e045c34 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -970,7 +970,7 @@ static bool vxlan_snoop(struct net_device *dev, return false; /* Don't migrate static entries, drop packets */ - if (f->state & NUD_NOARP) + if (f->state & (NUD_PERMANENT | NUD_NOARP)) return true; if (net_ratelimit()) -- 1.9.1
Re: [PATCH net-next] bpf, arm64: take advantage of stack_depth tracking
From: Daniel BorkmannDate: Sun, 11 Jun 2017 03:55:27 +0200 > Make use of recently implemented stack_depth tracking for arm64 JIT, > so that stack usage can be reduced heavily for programs not using > tail calls at least. > > Signed-off-by: Daniel Borkmann > Acked-by: Alexei Starovoitov Applied, thanks Daniel.
Re: [PATCH net v3] net: ipmr: Fix some mroute forwarding issues in vrf's
From: Donald SharpDate: Sat, 10 Jun 2017 16:30:17 -0400 > This patch fixes two issues: > > 1) When forwarding on *,G mroutes that are in a vrf, the > kernel was dropping information about the actual incoming > interface when calling ip_mr_forward from ip_mr_input. > This caused ip_mr_forward to send the multicast packet > back out the incoming interface. Fix this by > modifying ip_mr_forward to be handed the correctly > resolved dev. > > 2) When a unresolved cache entry is created we store > the incoming skb on the unresolved cache entry and > upon mroute resolution from the user space daemon, > we attempt to forward the packet. Again we were > not resolving to the correct incoming device for > a vrf scenario, before calling ip_mr_forward. > Fix this by resolving to the correct interface > and calling ip_mr_forward with the result. > > Fixes: e58e41596811 ("net: Enable support for VRF with ipv4 multicast") > Signed-off-by: Donald Sharp > --- > v2: Fixed title > v3: Addressed Review comments by Andrew Lunn and David Ahern Applied and queued up for -stable, thank you.
Re: [pull request][net-next 0/9] Mellanox mlx5 updates 2017-06-11
From: Saeed MahameedDate: Sun, 11 Jun 2017 17:55:44 +0300 > This series provides updates to mlx5 header rewrite feature, from Or Gerlitz. > and three more small updates From Maor and Eran. > > For more details please see below. Pulled, thanks. > Please pull and let me know if there's any problem. > *This series doesn't introduce any conflict with the ongoing net > pull request. Ok, thanks for letting me know.
Re: [PATCH 0/6] Constant Time Memory Comparisons Are Important
2017-06-11 22:48 GMT+02:00 Emmanuel Grumbach: > On Sun, Jun 11, 2017 at 4:36 PM, Kees Cook wrote: >> >> On Sun, Jun 11, 2017 at 1:13 AM, Kalle Valo wrote: >> > "Jason A. Donenfeld" writes: >> > >> >> Whenever you're comparing two MACs, it's important to do this using >> >> crypto_memneq instead of memcmp. With memcmp, you leak timing information, >> >> which could then be used to iteratively forge a MAC. >> > >> > Do you have any pointers where I could learn more about this? >> >> While not using C specifically, this talks about the problem generally: >> https://www.chosenplaintext.ca/articles/beginners-guide-constant-time-cryptography.html >> > > Sorry for the stupid question, but the MAC address is in plaintext in > the air anyway or easily accessible via user space tools. I fail to > see what it is so secret about a MAC address in that code where that > same MAC address is accessible via myriads of ways. I think you're mixing up Media Access Control (MAC) addresses with Message Authentication Code (MAC). The second one is a cryptographic signature of a message.
Re: [PATCH 0/6] Constant Time Memory Comparisons Are Important
Hi Stephan, On Sun, Jun 11, 2017 at 11:06 PM, Stephan Müllerwrote: > Are you planning to send an update to your patch set? If yes, there is another > one which should be converted too: crypto/rsa-pkcs1pad.c. I just sent an update to this thread patching that, per your suggestion. Since these issues are expected to be cherry picked by their respective committer, I figure we can just pile on the patches here, listing the 0/6 intro email as each patch's parent. Jason
Re: [PATCH 0/6] Constant Time Memory Comparisons Are Important
Am Samstag, 10. Juni 2017, 04:59:06 CEST schrieb Jason A. Donenfeld: Hi Jason, > Whenever you're comparing two MACs, it's important to do this using > crypto_memneq instead of memcmp. With memcmp, you leak timing information, > which could then be used to iteratively forge a MAC. This is far too basic > of a mistake for us to have so pervasively in the year 2017, so let's begin > cleaning this stuff up. The following 6 locations were found with some > simple regex greps, but I'm sure more lurk below the surface. If you > maintain some code or know somebody who maintains some code that deals > with MACs, tell them to double check which comparison function they're > using. Are you planning to send an update to your patch set? If yes, there is another one which should be converted too: crypto/rsa-pkcs1pad.c. Otherwise, I will send a patch converting this one. Thanks. Ciao Stephan
Re: [PATCH 0/6] Constant Time Memory Comparisons Are Important
On Sun, Jun 11, 2017 at 4:36 PM, Kees Cookwrote: > > On Sun, Jun 11, 2017 at 1:13 AM, Kalle Valo wrote: > > "Jason A. Donenfeld" writes: > > > >> Whenever you're comparing two MACs, it's important to do this using > >> crypto_memneq instead of memcmp. With memcmp, you leak timing information, > >> which could then be used to iteratively forge a MAC. > > > > Do you have any pointers where I could learn more about this? > > While not using C specifically, this talks about the problem generally: > https://www.chosenplaintext.ca/articles/beginners-guide-constant-time-cryptography.html > Sorry for the stupid question, but the MAC address is in plaintext in the air anyway or easily accessible via user space tools. I fail to see what it is so secret about a MAC address in that code where that same MAC address is accessible via myriads of ways.
Re: [pull request][net 0/6] Mellanox mlx5 fixes 2017-06-11
From: Saeed MahameedDate: Sun, 11 Jun 2017 17:00:22 +0300 > This series contains some fixes for the mlx5 core and netdev driver. > > Please pull and let me know if there's any problem. Pulled, thanks. > For -stable: > ("net/mlx5e: Added BW check for DIM decision mechanism") kernels > >= 4.9 > ("net/mlx5e: Fix wrong indications in DIM due to counter wraparound") kernels > >= 4.9 > ("net/mlx5: Remove several module events out of ethtool stats") kernels > >= 4.10 > ("net/mlx5: Enable 4K UAR only when page size is bigger than 4K") kernels > >= 4.11 > > *all patches apply with no issue on their -stable. > > BTW I never asked you if this way of marking patches to -stable is good for > you ? > if you prefer another format please let me know. It works fine for now, I'll let you know if it causes problems :)
Re: [PATCH net 0/9] Bugs fixes in ena ethernet driver
From:Date: Sun, 11 Jun 2017 15:42:42 +0300 > This patchset contains fixes for the bugs that were discovered so far. Series applied.
Re: [PATCH net-next v10 1/4] net netlink: Add new type NLA_FLAG_BITS
On 17-06-11 01:38 PM, Jamal Hadi Salim wrote: On 17-06-11 09:49 AM, Jiri Pirko wrote: Sun, Jun 11, 2017 at 01:53:43PM CEST, j...@mojatatu.com wrote: From: Jamal Hadi SalimThis patch also provides an extra feature: a validation callback that could be speaciliazed for other types. s/speaciliazed/speciliazed/ Will fix. [ATTR_GOO] = { .type = MYTYPE, .validation_data = _data, .validate_content = mycontent_validator }, Indent is wrong. (Does not matter really in desc, but anyway) I cant find out how it got indented that way; my source or email dont show it as such (but really doesnt matter). Suggested-by: Jiri Pirko Will add. --- include/net/netlink.h | 11 +++ include/uapi/linux/rtnetlink.h | 17 + lib/nlattr.c | 25 + 3 files changed, 53 insertions(+) diff --git a/include/net/netlink.h b/include/net/netlink.h index 0170917..8ab9784 100644 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -6,6 +6,11 @@ #include #include +struct nla_bit_flags { +u32 nla_flag_values; +u32 nla_flag_selector; +}; I don't understand why you redefine the struct here. You already have it defined in the uapi: struct __nla_bit_flags Just move this (struct nla_bit_flags) to the uapi and remove __nla_bit_flags ? I am not sure that will compile since the type is defined in netlink.h Also, note: uapi uses _u32 and kernel uses u32 as types i.e it is pretty common approach; i will try to move it to uapi and keep that uapi format. If it doesnt compile without acrobatics I will keep it as is. It doesnt compile - I could move it to linux/netlink.h but it seems so out of place. so i will keep things as is for now unless you can think of something else. cheers, jamal
Re: [PATCH net-next v10 3/4] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
On 17-06-11 10:13 AM, Jiri Pirko wrote: Sun, Jun 11, 2017 at 01:53:45PM CEST, j...@mojatatu.com wrote: From: Jamal Hadi SalimWhen you dump hundreds of thousands of actions, getting only 32 per dump batch even when the socket buffer and memory allocations allow is inefficient. With this change, the user will get as many as possibly fitting within the given constraints available to the kernel. The top level action TLV space is extended. An attribute TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON is set by the user indicating the user is capable of processing these large dumps. Older user space which doesnt set this flag doesnt get the large (than 32) batches. The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many actions are put in a single batch. As such user space app knows how long to iterate (independent of the type of action being dumped) instead of hardcoded maximum of 32 thus maintaining backward compat. Some results dumping 1.5M actions below: first an unpatched tc which doesnt understand these features... prompt$ time -p tc actions ls action gact | grep index | wc -l 150 real 1388.43 user 2.07 sys 1386.79 Now lets see a patched tc which sets the correct flags when requesting a dump: prompt$ time -p updatedtc actions ls action gact | grep index | wc -l 150 real 178.13 user 2.02 sys 176.96 That is about 8x performance improvement for tc app which sets its receive buffer to about 32K. Signed-off-by: Jamal Hadi Salim --- include/uapi/linux/rtnetlink.h | 22 +-- net/sched/act_api.c| 50 +- 2 files changed, 60 insertions(+), 12 deletions(-) diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 8f07957..7d2030f 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -693,10 +693,28 @@ struct tcamsg { unsigned char tca__pad1; unsigned short tca__pad2; }; + +enum { + TCA_ROOT_UNSPEC, + TCA_ROOT_TAB, +#define TCA_ACT_TAB TCA_ROOT_TAB +#define TCAA_MAX TCA_ROOT_TAB + TCA_ROOT_FLAGS, + TCA_ROOT_COUNT, + __TCA_ROOT_MAX, +#defineTCA_ROOT_MAX (__TCA_ROOT_MAX - 1) +}; + #define TA_RTA(r) ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct tcamsg #define TA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcamsg)) -#define TCA_ACT_TAB 1 /* attr type must be >=1 */ -#define TCAA_MAX 1 +/* tcamsg flags stored in attribute TCA_ROOT_FLAGS + * + * TCA_FLAG_LARGE_DUMP_ON user->kernel to request for larger than TCA_ACT_MAX_PRIO + * actions in a dump. All dump responses will contain the number of actions + * being dumped stored in for user app's consumption in TCA_ROOT_COUNT + * + */ +#define TCA_FLAG_LARGE_DUMP_ON (1 << 0) /* New extended info filters for IFLA_EXT_MASK */ #define RTEXT_FILTER_VF (1 << 0) diff --git a/net/sched/act_api.c b/net/sched/act_api.c index 400eb6e..935dc19 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -110,6 +110,7 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, struct sk_buff *skb, struct netlink_callback *cb) { int err = 0, index = -1, i = 0, s_i = 0, n_i = 0; + u32 act_flags = cb->args[2]; struct nlattr *nest; spin_lock_bh(>lock); @@ -138,14 +139,18 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, struct sk_buff *skb, } nla_nest_end(skb, nest); n_i++; - if (n_i >= TCA_ACT_MAX_PRIO) + if (!(act_flags & TCA_FLAG_LARGE_DUMP_ON) && + n_i >= TCA_ACT_MAX_PRIO) goto done; } } done: spin_unlock_bh(>lock); - if (n_i) + if (n_i) { cb->args[0] += n_i; + if (act_flags & TCA_FLAG_LARGE_DUMP_ON) + cb->args[1] = n_i; + } return n_i; nla_put_failure: @@ -1068,11 +1073,17 @@ static int tcf_action_add(struct net *net, struct nlattr *nla, return tcf_add_notify(net, n, , portid); } +static u32 allowed_flags = TCA_FLAG_LARGE_DUMP_ON; An empty line would be nice. Also, since this is outside a function, some context prefix would be nice: static u32 tcaa_root_flags_allowed = TCA_FLAG_LARGE_DUMP_ON; You are going to make me exceed the 80 char limit? ;-> + if (tb[TCA_ROOT_FLAGS]) + nla_memcpy(_flags, tb[TCA_ROOT_FLAGS], + sizeof(select_flags)); Please introduce a helper for this attr type in patch 1: u32 select_flags; select_flags = nla_get_flag_bits_values(tb[TCA_ROOT_FLAGS]) Sure. cheers, jamal
Re: [PATCH net-next v10 1/4] net netlink: Add new type NLA_FLAG_BITS
On 17-06-11 09:49 AM, Jiri Pirko wrote: Sun, Jun 11, 2017 at 01:53:43PM CEST, j...@mojatatu.com wrote: From: Jamal Hadi SalimThis patch also provides an extra feature: a validation callback that could be speaciliazed for other types. s/speaciliazed/speciliazed/ Will fix. [ATTR_GOO] = { .type = MYTYPE, .validation_data = _data, .validate_content = mycontent_validator }, Indent is wrong. (Does not matter really in desc, but anyway) I cant find out how it got indented that way; my source or email dont show it as such (but really doesnt matter). Suggested-by: Jiri Pirko Will add. --- include/net/netlink.h | 11 +++ include/uapi/linux/rtnetlink.h | 17 + lib/nlattr.c | 25 + 3 files changed, 53 insertions(+) diff --git a/include/net/netlink.h b/include/net/netlink.h index 0170917..8ab9784 100644 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -6,6 +6,11 @@ #include #include +struct nla_bit_flags { + u32 nla_flag_values; + u32 nla_flag_selector; +}; I don't understand why you redefine the struct here. You already have it defined in the uapi: struct __nla_bit_flags Just move this (struct nla_bit_flags) to the uapi and remove __nla_bit_flags ? I am not sure that will compile since the type is defined in netlink.h Also, note: uapi uses _u32 and kernel uses u32 as types i.e it is pretty common approach; i will try to move it to uapi and keep that uapi format. If it doesnt compile without acrobatics I will keep it as is. + diff --git a/lib/nlattr.c b/lib/nlattr.c index a7e0b16..78fed43 100644 --- a/lib/nlattr.c +++ b/lib/nlattr.c @@ -27,6 +27,21 @@ [NLA_S64] = sizeof(s64), }; +static int validate_nla_bit_flags(const struct nlattr *nla, void *valid_data) +{ + const struct nla_bit_flags *nbf = nla_data(nla); + u32 *valid_flags_mask = valid_data; + + if (!valid_data) + return -EINVAL; + + Avoid one empty line here (you have 2) Will fix. + if (nbf->nla_flag_values & ~*valid_flags_mask) + return -EINVAL; + + return 0; +} + static int validate_nla(const struct nlattr *nla, int maxtype, const struct nla_policy *policy) { @@ -46,6 +61,13 @@ static int validate_nla(const struct nlattr *nla, int maxtype, return -ERANGE; break; + case NLA_FLAG_BITS: + if (attrlen != 8) /* 2 x 32 bits */ sizeof(struct nla_bit_flags) instead of 8 please, you can skip the comment then. Good point. + return -ERANGE; + + return validate_nla_bit_flags(nla, pt->validation_data); + break; + case NLA_NUL_STRING: if (pt->len) minlen = min_t(int, attrlen, pt->len + 1); @@ -103,6 +125,9 @@ static int validate_nla(const struct nlattr *nla, int maxtype, return -ERANGE; } + if (pt->validate_content) + return pt->validate_content(nla, pt->validation_data); This validation mechanism is completely independent from the added NLA_FLAG_BITS attr as it could be used with other attribute types. Please have it as a separate patch. Ok - so one more patch then ;-> cheers, jamal
Re:
I need your partnership in a transaction that will benefit you, details will disclose to you once i receive your reply. Kind regards Saif.
I Would love to discuse something with you.
Re: ARM GLX Khadas VIM Pro - Ethernet detected as only 10Mbps and stalled after some traffic
Hi Andrew, On Sun, Jun 11, 2017 at 5:21 PM, Andrew Lunnwrote: >> Thank your for the suggestion, and thanks Martin to explaining me over >> IRC what actually I should do. >> >> I recompiled mainline kernel 4.12-rc4 with the Amlogic driver: >> replaced drivers/net/phy/meson-gxl.c with >> https://github.com/khadas/linux/blob/ubuntu-4.9/drivers/amlogic/ethernet/phy/amlogic.c >> >> But this did not solve the issue. As soon as i start git clone i lose >> network connection to device (no session timeout/disconnect this time, >> but I am unable to reconnect over SSH or to get OK ping replay back). > 1) First problem reported I can't reproduce anymore, every reboot/cold boot with mainline kernel the Ethernet speed is detected as "100Mbps/Full" , but as seen in first post there was this issue. 2) see below > So this shows it is more than a PHY problem. The Ethernet MAC driver > is probably also part of the problem. There are some stmmac fixes [1] in soon to be rc5, compiled current master (without amlogic.c) with the fixes but for me the issue still persist. I will compile once released rc5 with amlogic.c and report back. > Are there any mainline kernels which work O.K? Khadas VIM support was added in 4.12-rc1. And I did test all four rc's but without success. > Andrew [1] https://github.com/torvalds/linux/commit/426849e6611f2092553f8d53372ae310818a6292
Re: ipmr: MFC routes when VIF deleted
On 11/06/17 11:55, Yotam Gigi wrote: > I have been looking into some weird behavior, and I am not sure whether it is > a bug or a feature. > > When a VIF with index v gets deleted, the MFC routes does not get updated, > which > means that there can be routes pointing to that VIF. On datapath, when packet > hits that route, the VIF validity will be checked and will not be sent to that > device (but still, the route does not get updated). Now, if the user creates > another VIF with the same index v but different underlay device, the same > route > will forward the traffic to that device. > > It is relevant to mention that when user adds a MFC route, only the active > VIFs > are used, so the flow of adding a route with dummy VIF indices and then > connecting those VIF indices to real device is not supported. The only way to > create a MFC route that has non existing VIFs is to create one with existing > VIFs and then delete them. > > Do we really want to support that? To me, it looks like a buggy flow and I > suggest that upon VIF deletion, the MFC routes will be updated to not point to > any non existing VIF indices. > Hi Yotam, I'm not strongly against such change but my feeling is that we shouldn't change it. I think we shouldn't change it because we cannot guarantee that we won't break some user-space app that relies on this behaviour or uses it as an optimization. User-space ipmr apps work in sync with the kernel and are usually the only ones doing such changes so their internal state will be always valid, and I'd guess they already deal with this one way or another. My second argument is a minor one and is about performance. There are some apps (e.g. pimd) which use interface add/del on interface state change (up/down) and this could make these ops slower on large setups. Again I see how this could be helpful and should've probably been like that from the start, so if other people feel confident we won't break anything then I wouldn't mind the change. Thanks, Nik
Re: ARM GLX Khadas VIM Pro - Ethernet detected as only 10Mbps and stalled after some traffic
> Thank your for the suggestion, and thanks Martin to explaining me over > IRC what actually I should do. > > I recompiled mainline kernel 4.12-rc4 with the Amlogic driver: > replaced drivers/net/phy/meson-gxl.c with > https://github.com/khadas/linux/blob/ubuntu-4.9/drivers/amlogic/ethernet/phy/amlogic.c > > But this did not solve the issue. As soon as i start git clone i lose > network connection to device (no session timeout/disconnect this time, > but I am unable to reconnect over SSH or to get OK ping replay back). So this shows it is more than a PHY problem. The Ethernet MAC driver is probably also part of the problem. Are there any mainline kernels which work O.K? Andrew
[net-next 4/9] net/mlx5e: Use modify header ID cache for offloaded TC E-Switch flows
From: Or GerlitzUse the modify header ID cache for the header re-write part of offloading TC eswitch flows. Signed-off-by: Or Gerlitz Reviewed-by: Paul Blakey Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index e1217c2279ab..4625a0e226da 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -344,10 +344,7 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, } if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) { - err = mlx5_modify_header_alloc(priv->mdev, MLX5_FLOW_NAMESPACE_FDB, - parse_attr->num_mod_hdr_actions, - parse_attr->mod_hdr_actions, - >mod_hdr_id); + err = mlx5e_attach_mod_hdr(priv, flow, parse_attr); kfree(parse_attr->mod_hdr_actions); if (err) { rule = ERR_PTR(err); @@ -363,8 +360,7 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, err_add_rule: if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) - mlx5_modify_header_dealloc(priv->mdev, - attr->mod_hdr_id); + mlx5e_detach_mod_hdr(priv, flow); err_mod_hdr: mlx5_eswitch_del_vlan_action(esw, attr); err_add_vlan: @@ -392,8 +388,7 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv, } if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) - mlx5_modify_header_dealloc(priv->mdev, - attr->mod_hdr_id); + mlx5e_detach_mod_hdr(priv, flow); } void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv, -- 2.11.0
[net-next 2/9] net/mlx5e: Use short attribute form when adding/deleting offloaded TC flows
From: Or GerlitzInstead of going through flow->nic/esw_attr for each usage, assign an attr pointer per the context (nic or esw) and use that. This patch doesn't add any functionality. Signed-off-by: Or Gerlitz Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index cfb32fe5129d..a9feddc31667 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -177,6 +177,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv, static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv, struct mlx5e_tc_flow *flow) { + struct mlx5_nic_flow_attr *attr = flow->nic_attr; struct mlx5_fc *counter = NULL; counter = mlx5_flow_rule_counter(flow->rule); @@ -188,9 +189,9 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv, priv->fs.tc.t = NULL; } - if (flow->nic_attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) + if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) mlx5_modify_header_dealloc(priv->mdev, - flow->nic_attr->mod_hdr_id); + attr->mod_hdr_id); } static void mlx5e_detach_encap(struct mlx5e_priv *priv, @@ -231,7 +232,7 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, return rule; err_add_rule: - if (flow->esw_attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) + if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) mlx5_modify_header_dealloc(priv->mdev, attr->mod_hdr_id); err_mod_hdr: @@ -250,17 +251,17 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv, if (flow->flags & MLX5E_TC_FLOW_OFFLOADED) { flow->flags &= ~MLX5E_TC_FLOW_OFFLOADED; - mlx5_eswitch_del_offloaded_rule(esw, flow->rule, flow->esw_attr); + mlx5_eswitch_del_offloaded_rule(esw, flow->rule, attr); } - mlx5_eswitch_del_vlan_action(esw, flow->esw_attr); + mlx5_eswitch_del_vlan_action(esw, attr); - if (flow->esw_attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP) { + if (attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP) { mlx5e_detach_encap(priv, flow); - kvfree(flow->esw_attr->parse_attr); + kvfree(attr->parse_attr); } - if (flow->esw_attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) + if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) mlx5_modify_header_dealloc(priv->mdev, attr->mod_hdr_id); } -- 2.11.0
[net-next 9/9] net/mlx5e: Fill advertised and supported port data from Hardware info
From: Eran Ben ElishaTranslate hardware port connector type data into link mode supported and advertised info instead of caching it in driver. Signed-off-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 90 ++ 1 file changed, 74 insertions(+), 16 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index f03c2d088d0c..b4514f247402 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -723,24 +723,81 @@ static void ptys2ethtool_adver_link(unsigned long *advertising_modes, __ETHTOOL_LINK_MODE_MASK_NBITS); } -static void ptys2ethtool_supported_port(struct ethtool_link_ksettings *link_ksettings, - u32 eth_proto_cap) +static void ptys2ethtool_supported_advertised_port(struct ethtool_link_ksettings *link_ksettings, + u32 eth_proto_cap, + u8 connector_type) { - if (eth_proto_cap & (MLX5E_PROT_MASK(MLX5E_10GBASE_CR) - | MLX5E_PROT_MASK(MLX5E_10GBASE_SR) - | MLX5E_PROT_MASK(MLX5E_40GBASE_CR4) - | MLX5E_PROT_MASK(MLX5E_40GBASE_SR4) - | MLX5E_PROT_MASK(MLX5E_100GBASE_SR4) - | MLX5E_PROT_MASK(MLX5E_1000BASE_CX_SGMII))) { - ethtool_link_ksettings_add_link_mode(link_ksettings, supported, FIBRE); + if (!connector_type || connector_type >= MLX5E_CONNECTOR_TYPE_NUMBER) { + if (eth_proto_cap & (MLX5E_PROT_MASK(MLX5E_10GBASE_CR) + | MLX5E_PROT_MASK(MLX5E_10GBASE_SR) + | MLX5E_PROT_MASK(MLX5E_40GBASE_CR4) + | MLX5E_PROT_MASK(MLX5E_40GBASE_SR4) + | MLX5E_PROT_MASK(MLX5E_100GBASE_SR4) + | MLX5E_PROT_MASK(MLX5E_1000BASE_CX_SGMII))) { + ethtool_link_ksettings_add_link_mode(link_ksettings, +supported, +FIBRE); + ethtool_link_ksettings_add_link_mode(link_ksettings, +advertising, +FIBRE); + } + + if (eth_proto_cap & (MLX5E_PROT_MASK(MLX5E_100GBASE_KR4) + | MLX5E_PROT_MASK(MLX5E_40GBASE_KR4) + | MLX5E_PROT_MASK(MLX5E_10GBASE_KR) + | MLX5E_PROT_MASK(MLX5E_10GBASE_KX4) + | MLX5E_PROT_MASK(MLX5E_1000BASE_KX))) { + ethtool_link_ksettings_add_link_mode(link_ksettings, +supported, +Backplane); + ethtool_link_ksettings_add_link_mode(link_ksettings, +advertising, +Backplane); + } + return; } - if (eth_proto_cap & (MLX5E_PROT_MASK(MLX5E_100GBASE_KR4) - | MLX5E_PROT_MASK(MLX5E_40GBASE_KR4) - | MLX5E_PROT_MASK(MLX5E_10GBASE_KR) - | MLX5E_PROT_MASK(MLX5E_10GBASE_KX4) - | MLX5E_PROT_MASK(MLX5E_1000BASE_KX))) { - ethtool_link_ksettings_add_link_mode(link_ksettings, supported, Backplane); + switch (connector_type) { + case MLX5E_PORT_TP: + ethtool_link_ksettings_add_link_mode(link_ksettings, +supported, TP); + ethtool_link_ksettings_add_link_mode(link_ksettings, +advertising, TP); + break; + case MLX5E_PORT_AUI: + ethtool_link_ksettings_add_link_mode(link_ksettings, +supported, AUI); + ethtool_link_ksettings_add_link_mode(link_ksettings, +advertising, AUI); + break; + case MLX5E_PORT_BNC: + ethtool_link_ksettings_add_link_mode(link_ksettings, +supported, BNC); + ethtool_link_ksettings_add_link_mode(link_ksettings, +advertising, BNC); +
[net-next 8/9] net/mlx5e: Add support for reading connector type from PTYS
From: Eran Ben ElishaRead port connector type from the firmware instead of caching it in the driver metadata. Signed-off-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 22 -- include/linux/mlx5/mlx5_ifc.h | 7 +-- include/linux/mlx5/port.h | 13 + 3 files changed, 38 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index e9e33fd68279..f03c2d088d0c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -809,8 +809,23 @@ static void get_advertising(u32 eth_proto_cap, u8 tx_pause, ethtool_link_ksettings_add_link_mode(link_ksettings, advertising, Asym_Pause); } -static u8 get_connector_port(u32 eth_proto) +static int ptys2connector_type[MLX5E_CONNECTOR_TYPE_NUMBER] = { + [MLX5E_PORT_UNKNOWN]= PORT_OTHER, + [MLX5E_PORT_NONE] = PORT_NONE, + [MLX5E_PORT_TP] = PORT_TP, + [MLX5E_PORT_AUI]= PORT_AUI, + [MLX5E_PORT_BNC]= PORT_BNC, + [MLX5E_PORT_MII]= PORT_MII, + [MLX5E_PORT_FIBRE] = PORT_FIBRE, + [MLX5E_PORT_DA] = PORT_DA, + [MLX5E_PORT_OTHER] = PORT_OTHER, + }; + +static u8 get_connector_port(u32 eth_proto, u8 connector_type) { + if (connector_type && connector_type < MLX5E_CONNECTOR_TYPE_NUMBER) + return ptys2connector_type[connector_type]; + if (eth_proto & (MLX5E_PROT_MASK(MLX5E_10GBASE_SR) | MLX5E_PROT_MASK(MLX5E_40GBASE_SR4) | MLX5E_PROT_MASK(MLX5E_100GBASE_SR4) @@ -856,6 +871,7 @@ static int mlx5e_get_link_ksettings(struct net_device *netdev, u32 eth_proto_oper; u8 an_disable_admin; u8 an_status; + u8 connector_type; int err; err = mlx5_query_port_ptys(mdev, out, sizeof(out), MLX5_PTYS_EN, 1); @@ -871,6 +887,7 @@ static int mlx5e_get_link_ksettings(struct net_device *netdev, eth_proto_lp = MLX5_GET(ptys_reg, out, eth_proto_lp_advertise); an_disable_admin = MLX5_GET(ptys_reg, out, an_disable_admin); an_status= MLX5_GET(ptys_reg, out, an_status); + connector_type = MLX5_GET(ptys_reg, out, connector_type); mlx5_query_port_pause(mdev, _pause, _pause); @@ -883,7 +900,8 @@ static int mlx5e_get_link_ksettings(struct net_device *netdev, eth_proto_oper = eth_proto_oper ? eth_proto_oper : eth_proto_cap; - link_ksettings->base.port = get_connector_port(eth_proto_oper); + link_ksettings->base.port = get_connector_port(eth_proto_oper, + connector_type); get_lp_advertising(eth_proto_lp, link_ksettings); if (an_status == MLX5_AN_COMPLETE) diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index ec308657af3b..32b044e953d2 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -7295,7 +7295,8 @@ struct mlx5_ifc_ptys_reg_bits { u8 ib_link_width_oper[0x10]; u8 ib_proto_oper[0x10]; - u8 reserved_at_160[0x20]; + u8 reserved_at_160[0x1c]; + u8 connector_type[0x4]; u8 eth_proto_lp_advertise[0x20]; @@ -7698,8 +7699,10 @@ struct mlx5_ifc_peir_reg_bits { }; struct mlx5_ifc_pcam_enhanced_features_bits { - u8 reserved_at_0[0x7e]; + u8 reserved_at_0[0x7c]; + u8 ptys_connector_type[0x1]; + u8 reserved_at_7d[0x1]; u8 ppcnt_discard_group[0x1]; u8 ppcnt_statistical_group[0x1]; }; diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h index e527732fb31b..c57d4b7de3a8 100644 --- a/include/linux/mlx5/port.h +++ b/include/linux/mlx5/port.h @@ -92,6 +92,19 @@ enum mlx5e_link_mode { MLX5E_LINK_MODES_NUMBER, }; +enum mlx5e_connector_type { + MLX5E_PORT_UNKNOWN = 0, + MLX5E_PORT_NONE = 1, + MLX5E_PORT_TP = 2, + MLX5E_PORT_AUI = 3, + MLX5E_PORT_BNC = 4, + MLX5E_PORT_MII = 5, + MLX5E_PORT_FIBRE= 6, + MLX5E_PORT_DA = 7, + MLX5E_PORT_OTHER= 8, + MLX5E_CONNECTOR_TYPE_NUMBER, +}; + #define MLX5E_PROT_MASK(link_mode) (1 << link_mode) #define PORT_MODULE_EVENT_MODULE_STATUS_MASK 0xF -- 2.11.0
[net-next 6/9] net/mlx5e: Support header re-write of partial fields in TC pedit offload
From: Or GerlitzUsing a per field mask field, the TC pedit action supports modifying partial fields. E.g if using the TC tool, the following example would make the kernel to only re-write two bytes of the src ip address: tc filter add dev enp1s0 protocol ip parent : prio 30 flower skip_sw ip_proto udp dst_port 8001 action pedit ex munge ip src set 10.1.0.0 retain 0x We add driver support for offload these partial re-writes, by setting the per FW action offset-in-field and length-from-offset attributes. The 1st bit set in the mask specifies both the offset and the right shift to apply on the value such that the 1st bit which needs to be set will reside in bit 0 of the FW data field. Signed-off-by: Or Gerlitz Reviewed-by: Paul Blakey Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 28 + 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 0c7a0872b22b..f5afacfbe914 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -1091,12 +1091,14 @@ static int offload_pedit_fields(struct pedit_headers *masks, struct mlx5e_tc_flow_parse_attr *parse_attr) { struct pedit_headers *set_masks, *add_masks, *set_vals, *add_vals; - int i, action_size, nactions, max_actions, first, last, first_z; + int i, action_size, nactions, max_actions, first, last, next_z; void *s_masks_p, *a_masks_p, *vals_p; struct mlx5_fields *f; u8 cmd, field_bsize; u32 s_mask, a_mask; unsigned long mask; + __be32 mask_be32; + __be16 mask_be16; void *action; set_masks = [TCA_PEDIT_KEY_EX_CMD_SET]; @@ -1150,11 +1152,19 @@ static int offload_pedit_fields(struct pedit_headers *masks, field_bsize = f->size * BITS_PER_BYTE; - first_z = find_first_zero_bit(, field_bsize); + if (field_bsize == 32) { + mask_be32 = *(__be32 *) + mask = (__force unsigned long)cpu_to_le32(be32_to_cpu(mask_be32)); + } else if (field_bsize == 16) { + mask_be16 = *(__be16 *) + mask = (__force unsigned long)cpu_to_le16(be16_to_cpu(mask_be16)); + } + first = find_first_bit(, field_bsize); + next_z = find_next_zero_bit(, field_bsize, first); last = find_last_bit(, field_bsize); - if (first > 0 || last != (field_bsize - 1) || first_z < last) { - printk(KERN_WARNING "mlx5: partial rewrite (mask %lx) is currently not offloaded\n", + if (first < next_z && next_z < last) { + printk(KERN_WARNING "mlx5: rewrite of few sub-fields (mask %lx) isn't offloaded\n", mask); return -EOPNOTSUPP; } @@ -1163,17 +1173,17 @@ static int offload_pedit_fields(struct pedit_headers *masks, MLX5_SET(set_action_in, action, field, f->field); if (cmd == MLX5_ACTION_TYPE_SET) { - MLX5_SET(set_action_in, action, offset, 0); + MLX5_SET(set_action_in, action, offset, first); /* length is num of bits to be written, zero means length of 32 */ - MLX5_SET(set_action_in, action, length, field_bsize); + MLX5_SET(set_action_in, action, length, (last - first + 1)); } if (field_bsize == 32) - MLX5_SET(set_action_in, action, data, ntohl(*(__be32 *)vals_p)); + MLX5_SET(set_action_in, action, data, ntohl(*(__be32 *)vals_p) >> first); else if (field_bsize == 16) - MLX5_SET(set_action_in, action, data, ntohs(*(__be16 *)vals_p)); + MLX5_SET(set_action_in, action, data, ntohs(*(__be16 *)vals_p) >> first); else if (field_bsize == 8) - MLX5_SET(set_action_in, action, data, *(u8 *)vals_p); + MLX5_SET(set_action_in, action, data, *(u8 *)vals_p >> first); action += action_size; nactions++; -- 2.11.0
[net-next 7/9] net/mlx5: Update flow table commands layout
From: Maor GottliebUpdate struct mlx5_ifc_create(modify)_flow_table_bits according to the last device specification. Signed-off-by: Maor Gottlieb Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c | 32 ++--- include/linux/mlx5/mlx5_ifc.h| 46 +++- 2 files changed, 40 insertions(+), 38 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c index abb44a268563..e750f07793b8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c @@ -78,28 +78,33 @@ int mlx5_cmd_create_flow_table(struct mlx5_core_dev *dev, MLX5_CMD_OP_CREATE_FLOW_TABLE); MLX5_SET(create_flow_table_in, in, table_type, type); - MLX5_SET(create_flow_table_in, in, level, level); - MLX5_SET(create_flow_table_in, in, log_size, log_size); + MLX5_SET(create_flow_table_in, in, flow_table_context.level, level); + MLX5_SET(create_flow_table_in, in, flow_table_context.log_size, log_size); if (vport) { MLX5_SET(create_flow_table_in, in, vport_number, vport); MLX5_SET(create_flow_table_in, in, other_vport, 1); } - MLX5_SET(create_flow_table_in, in, decap_en, en_encap_decap); - MLX5_SET(create_flow_table_in, in, encap_en, en_encap_decap); + MLX5_SET(create_flow_table_in, in, flow_table_context.decap_en, +en_encap_decap); + MLX5_SET(create_flow_table_in, in, flow_table_context.encap_en, +en_encap_decap); switch (op_mod) { case FS_FT_OP_MOD_NORMAL: if (next_ft) { - MLX5_SET(create_flow_table_in, in, table_miss_mode, 1); - MLX5_SET(create_flow_table_in, in, table_miss_id, next_ft->id); + MLX5_SET(create_flow_table_in, in, +flow_table_context.table_miss_action, 1); + MLX5_SET(create_flow_table_in, in, +flow_table_context.table_miss_id, next_ft->id); } break; case FS_FT_OP_MOD_LAG_DEMUX: MLX5_SET(create_flow_table_in, in, op_mod, 0x1); if (next_ft) - MLX5_SET(create_flow_table_in, in, lag_master_next_table_id, + MLX5_SET(create_flow_table_in, in, +flow_table_context.lag_master_next_table_id, next_ft->id); break; } @@ -146,10 +151,10 @@ int mlx5_cmd_modify_flow_table(struct mlx5_core_dev *dev, MLX5_MODIFY_FLOW_TABLE_LAG_NEXT_TABLE_ID); if (next_ft) { MLX5_SET(modify_flow_table_in, in, -lag_master_next_table_id, next_ft->id); +flow_table_context.lag_master_next_table_id, next_ft->id); } else { MLX5_SET(modify_flow_table_in, in, -lag_master_next_table_id, 0); +flow_table_context.lag_master_next_table_id, 0); } } else { if (ft->vport) { @@ -160,11 +165,14 @@ int mlx5_cmd_modify_flow_table(struct mlx5_core_dev *dev, MLX5_SET(modify_flow_table_in, in, modify_field_select, MLX5_MODIFY_FLOW_TABLE_MISS_TABLE_ID); if (next_ft) { - MLX5_SET(modify_flow_table_in, in, table_miss_mode, 1); - MLX5_SET(modify_flow_table_in, in, table_miss_id, + MLX5_SET(modify_flow_table_in, in, +flow_table_context.table_miss_action, 1); + MLX5_SET(modify_flow_table_in, in, +flow_table_context.table_miss_id, next_ft->id); } else { - MLX5_SET(modify_flow_table_in, in, table_miss_mode, 0); + MLX5_SET(modify_flow_table_in, in, +flow_table_context.table_miss_action, 0); } } diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 56e96f6a0a45..ec308657af3b 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -6627,6 +6627,24 @@ struct mlx5_ifc_create_flow_table_out_bits { u8 reserved_at_60[0x20]; }; +struct mlx5_ifc_flow_table_context_bits { + u8 encap_en[0x1]; + u8 decap_en[0x1]; + u8 reserved_at_2[0x2]; + u8 table_miss_action[0x4]; + u8 level[0x8]; + u8 reserved_at_10[0x8]; + u8
[net-next 1/9] net/mlx5e: Remove limitation of single NIC offloaded TC action per rule
From: Or GerlitzRemove the limitation that offloaded NIC filters can have only one action. This allows us for example to provide flow tag as a note to upper layers / apps that that HW header re-write was applied. Signed-off-by: Or Gerlitz Reviewed-by: Paul Blakey Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 8ec13f9be660..cfb32fe5129d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -1194,10 +1194,6 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv, struct tcf_exts *exts, tcf_exts_to_list(exts, ); list_for_each_entry(a, , list) { - /* Only support a single action per rule */ - if (attr->action) - return -EINVAL; - if (is_tcf_gact_shot(a)) { attr->action |= MLX5_FLOW_CONTEXT_ACTION_DROP; if (MLX5_CAP_FLOWTABLE(priv->mdev, -- 2.11.0
[net-next 5/9] net/mlx5e: Use modify header ID cache for offloaded TC NIC flows
From: Or GerlitzUse the modify header ID cache for the header re-write part of offloading TC NIC flows. Signed-off-by: Or Gerlitz Reviewed-by: Paul Blakey Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 4625a0e226da..0c7a0872b22b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -251,10 +251,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv, } if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) { - err = mlx5_modify_header_alloc(dev, MLX5_FLOW_NAMESPACE_KERNEL, - parse_attr->num_mod_hdr_actions, - parse_attr->mod_hdr_actions, - >mod_hdr_id); + err = mlx5e_attach_mod_hdr(priv, flow, parse_attr); flow_act.modify_id = attr->mod_hdr_id; kfree(parse_attr->mod_hdr_actions); if (err) { @@ -296,8 +293,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv, } err_create_ft: if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) - mlx5_modify_header_dealloc(priv->mdev, - attr->mod_hdr_id); + mlx5e_detach_mod_hdr(priv, flow); err_create_mod_hdr_id: mlx5_fc_destroy(dev, counter); @@ -320,8 +316,7 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv, } if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) - mlx5_modify_header_dealloc(priv->mdev, - attr->mod_hdr_id); + mlx5e_detach_mod_hdr(priv, flow); } static void mlx5e_detach_encap(struct mlx5e_priv *priv, -- 2.11.0
[net-next 3/9] net/mlx5e: Add cache for HW modify header IDs
From: Or GerlitzPackets belonging to flows which are different by matching may still need to go through the same header re-write. Add a cache for header re-write IDs keyed by the binary chain of modify header actions. The caching is supported for both eswitch and NIC use-cases, where the actual conversion of the code to use caching comes in next patches, one per use-case. Signed-off-by: Or Gerlitz Reviewed-by: Paul Blakey Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 + drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 134 +- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 1 + 4 files changed, 137 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 2fd044b23875..f4b95dbd1c7f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -623,6 +623,8 @@ struct mlx5e_tc_table { struct rhashtable_paramsht_params; struct rhashtable ht; + + DECLARE_HASHTABLE(mod_hdr_tbl, 8); }; struct mlx5e_vlan_table { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index a9feddc31667..e1217c2279ab 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -69,7 +69,8 @@ struct mlx5e_tc_flow { u64 cookie; u8 flags; struct mlx5_flow_handle *rule; - struct list_headencap; /* flows sharing the same encap */ + struct list_headencap; /* flows sharing the same encap ID */ + struct list_headmod_hdr; /* flows sharing the same mod hdr ID */ union { struct mlx5_esw_flow_attr esw_attr[0]; struct mlx5_nic_flow_attr nic_attr[0]; @@ -90,6 +91,135 @@ enum { #define MLX5E_TC_TABLE_NUM_ENTRIES 1024 #define MLX5E_TC_TABLE_NUM_GROUPS 4 +struct mod_hdr_key { + int num_actions; + void *actions; +}; + +struct mlx5e_mod_hdr_entry { + /* a node of a hash table which keeps all the mod_hdr entries */ + struct hlist_node mod_hdr_hlist; + + /* flows sharing the same mod_hdr entry */ + struct list_head flows; + + struct mod_hdr_key key; + + u32 mod_hdr_id; +}; + +#define MLX5_MH_ACT_SZ MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto) + +static inline u32 hash_mod_hdr_info(struct mod_hdr_key *key) +{ + return jhash(key->actions, +key->num_actions * MLX5_MH_ACT_SZ, 0); +} + +static inline int cmp_mod_hdr_info(struct mod_hdr_key *a, + struct mod_hdr_key *b) +{ + if (a->num_actions != b->num_actions) + return 1; + + return memcmp(a->actions, b->actions, a->num_actions * MLX5_MH_ACT_SZ); +} + +static int mlx5e_attach_mod_hdr(struct mlx5e_priv *priv, + struct mlx5e_tc_flow *flow, + struct mlx5e_tc_flow_parse_attr *parse_attr) +{ + struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; + int num_actions, actions_size, namespace, err; + struct mlx5e_mod_hdr_entry *mh; + struct mod_hdr_key key; + bool found = false; + u32 hash_key; + + num_actions = parse_attr->num_mod_hdr_actions; + actions_size = MLX5_MH_ACT_SZ * num_actions; + + key.actions = parse_attr->mod_hdr_actions; + key.num_actions = num_actions; + + hash_key = hash_mod_hdr_info(); + + if (flow->flags & MLX5E_TC_FLOW_ESWITCH) { + namespace = MLX5_FLOW_NAMESPACE_FDB; + hash_for_each_possible(esw->offloads.mod_hdr_tbl, mh, + mod_hdr_hlist, hash_key) { + if (!cmp_mod_hdr_info(>key, )) { + found = true; + break; + } + } + } else { + namespace = MLX5_FLOW_NAMESPACE_KERNEL; + hash_for_each_possible(priv->fs.tc.mod_hdr_tbl, mh, + mod_hdr_hlist, hash_key) { + if (!cmp_mod_hdr_info(>key, )) { + found = true; + break; + } + } + } + + if (found) + goto attach_flow; + + mh = kzalloc(sizeof(*mh) + actions_size, GFP_KERNEL); + if (!mh) + return -ENOMEM; + + mh->key.actions = (void *)mh + sizeof(*mh); + memcpy(mh->key.actions, key.actions, actions_size); + mh->key.num_actions = num_actions; + INIT_LIST_HEAD(>flows); + + err =
[pull request][net-next 0/9] Mellanox mlx5 updates 2017-06-11
Hi Dave, This series provides updates to mlx5 header rewrite feature, from Or Gerlitz. and three more small updates From Maor and Eran. For more details please see below. Please pull and let me know if there's any problem. *This series doesn't introduce any conflict with the ongoing net pull request. Thanks, Saeed. --- The following changes since commit 50dffe7fad6c156c2928e45c19ff7b86eb951f4c: Merge branch 'mlx4-drivers-version-update' (2017-06-07 15:33:02 -0400) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2017-06-11 for you to fetch changes up to 46e9d0b61e27a3a9286002311f349f0c33dcb18f: net/mlx5e: Fill advertised and supported port data from Hardware info (2017-06-08 14:12:00 +0300) mlx5-updates-2017-06-11 This series provides updates to mlx5 header rewrite feature, from Or Gerlitz. and three more small updates From maor and eran. --- Or says: Packets belonging to flows which are different by matching may still need to go through the same header re-writes (e.g set the current routing hop MACs and issue TTL decrement). To minimize the number of modify header IDs, we add a cache for header re-write IDs which is keyed by the binary chain of modify header actions. The caching is supported for both eswitch and NIC use-cases, where the actual conversion of the code to use caching comes in separate patches, one per use-case. Using a per field mask field, the TC pedit action supports modifying partial fields. The last patch enables offloading that. --- >From Maor, update flow table commands layout to the latest HW spec. >From Eran, ethtool connector type reporting updates. Thanks, Saeed. Eran Ben Elisha (2): net/mlx5e: Add support for reading connector type from PTYS net/mlx5e: Fill advertised and supported port data from Hardware info Maor Gottlieb (1): net/mlx5: Update flow table commands layout Or Gerlitz (6): net/mlx5e: Remove limitation of single NIC offloaded TC action per rule net/mlx5e: Use short attribute form when adding/deleting offloaded TC flows net/mlx5e: Add cache for HW modify header IDs net/mlx5e: Use modify header ID cache for offloaded TC E-Switch flows net/mlx5e: Use modify header ID cache for offloaded TC NIC flows net/mlx5e: Support header re-write of partial fields in TC pedit offload drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 + .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 112 ++-- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 203 + drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 1 + drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c | 32 ++-- include/linux/mlx5/mlx5_ifc.h | 53 +++--- include/linux/mlx5/port.h | 13 ++ 8 files changed, 322 insertions(+), 95 deletions(-)
Re: [PATCH net-next v10 3/4] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
Sun, Jun 11, 2017 at 01:53:45PM CEST, j...@mojatatu.com wrote: >From: Jamal Hadi Salim> >When you dump hundreds of thousands of actions, getting only 32 per >dump batch even when the socket buffer and memory allocations allow >is inefficient. > >With this change, the user will get as many as possibly fitting >within the given constraints available to the kernel. > >The top level action TLV space is extended. An attribute >TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON >is set by the user indicating the user is capable of processing >these large dumps. Older user space which doesnt set this flag >doesnt get the large (than 32) batches. >The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many >actions are put in a single batch. As such user space app knows how long >to iterate (independent of the type of action being dumped) >instead of hardcoded maximum of 32 thus maintaining backward compat. > >Some results dumping 1.5M actions below: >first an unpatched tc which doesnt understand these features... > >prompt$ time -p tc actions ls action gact | grep index | wc -l >150 >real 1388.43 >user 2.07 >sys 1386.79 > >Now lets see a patched tc which sets the correct flags when requesting >a dump: > >prompt$ time -p updatedtc actions ls action gact | grep index | wc -l >150 >real 178.13 >user 2.02 >sys 176.96 > >That is about 8x performance improvement for tc app which sets its >receive buffer to about 32K. > >Signed-off-by: Jamal Hadi Salim >--- > include/uapi/linux/rtnetlink.h | 22 +-- > net/sched/act_api.c| 50 +- > 2 files changed, 60 insertions(+), 12 deletions(-) > >diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h >index 8f07957..7d2030f 100644 >--- a/include/uapi/linux/rtnetlink.h >+++ b/include/uapi/linux/rtnetlink.h >@@ -693,10 +693,28 @@ struct tcamsg { > unsigned char tca__pad1; > unsigned short tca__pad2; > }; >+ >+enum { >+ TCA_ROOT_UNSPEC, >+ TCA_ROOT_TAB, >+#define TCA_ACT_TAB TCA_ROOT_TAB >+#define TCAA_MAX TCA_ROOT_TAB >+ TCA_ROOT_FLAGS, >+ TCA_ROOT_COUNT, >+ __TCA_ROOT_MAX, >+#define TCA_ROOT_MAX (__TCA_ROOT_MAX - 1) >+}; >+ > #define TA_RTA(r) ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct > tcamsg > #define TA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcamsg)) >-#define TCA_ACT_TAB 1 /* attr type must be >=1 */ >-#define TCAA_MAX 1 >+/* tcamsg flags stored in attribute TCA_ROOT_FLAGS >+ * >+ * TCA_FLAG_LARGE_DUMP_ON user->kernel to request for larger than >TCA_ACT_MAX_PRIO >+ * actions in a dump. All dump responses will contain the number of actions >+ * being dumped stored in for user app's consumption in TCA_ROOT_COUNT >+ * >+ */ >+#define TCA_FLAG_LARGE_DUMP_ON(1 << 0) > > /* New extended info filters for IFLA_EXT_MASK */ > #define RTEXT_FILTER_VF (1 << 0) >diff --git a/net/sched/act_api.c b/net/sched/act_api.c >index 400eb6e..935dc19 100644 >--- a/net/sched/act_api.c >+++ b/net/sched/act_api.c >@@ -110,6 +110,7 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, >struct sk_buff *skb, > struct netlink_callback *cb) > { > int err = 0, index = -1, i = 0, s_i = 0, n_i = 0; >+ u32 act_flags = cb->args[2]; > struct nlattr *nest; > > spin_lock_bh(>lock); >@@ -138,14 +139,18 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, >struct sk_buff *skb, > } > nla_nest_end(skb, nest); > n_i++; >- if (n_i >= TCA_ACT_MAX_PRIO) >+ if (!(act_flags & TCA_FLAG_LARGE_DUMP_ON) && >+ n_i >= TCA_ACT_MAX_PRIO) > goto done; > } > } > done: > spin_unlock_bh(>lock); >- if (n_i) >+ if (n_i) { > cb->args[0] += n_i; >+ if (act_flags & TCA_FLAG_LARGE_DUMP_ON) >+ cb->args[1] = n_i; >+ } > return n_i; > > nla_put_failure: >@@ -1068,11 +1073,17 @@ static int tcf_action_add(struct net *net, struct >nlattr *nla, > return tcf_add_notify(net, n, , portid); > } > >+static u32 allowed_flags = TCA_FLAG_LARGE_DUMP_ON; An empty line would be nice. Also, since this is outside a function, some context prefix would be nice: static u32 tcaa_root_flags_allowed = TCA_FLAG_LARGE_DUMP_ON; >+static const struct nla_policy tcaa_policy[TCA_ROOT_MAX + 1] = { >+ [TCA_ROOT_FLAGS] = { .type = NLA_FLAG_BITS, >+ .validation_data = _flags }, >+}; >+ > static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n, >struct netlink_ext_ack *extack) > { > struct net *net = sock_net(skb->sk); >- struct nlattr *tca[TCAA_MAX + 1]; >+ struct nlattr *tca[TCA_ROOT_MAX + 1]; > u32
[net 2/6] net/mlx5: Continue health polling until it is explicitly stopped
From: Mohamad Haj YahiaThe issue is that when we get an assert we will stop polling the health and thus we cant enter error state when we have a real health issue. Fixes: fd76ee4da55a ('net/mlx5_core: Fix internal error detection conditions') Signed-off-by: Mohamad Haj Yahia Reviewed-by: Daniel Jurgens Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/health.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index 44f59b1d6f0f..f27f84ffbc85 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -275,10 +275,8 @@ static void poll_health(unsigned long data) struct mlx5_core_health *health = >priv.health; u32 count; - if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) { - mod_timer(>timer, get_next_poll_jiffies()); - return; - } + if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) + goto out; count = ioread32be(health->health_counter); if (count == health->prev) @@ -290,8 +288,6 @@ static void poll_health(unsigned long data) if (health->miss_counter == MAX_MISSES) { dev_err(>pdev->dev, "device's health compromised - reached miss count\n"); print_health_info(dev); - } else { - mod_timer(>timer, get_next_poll_jiffies()); } if (in_fatal(dev) && !health->sick) { @@ -305,6 +301,9 @@ static void poll_health(unsigned long data) "new health works are not permitted at this stage\n"); spin_unlock(>wq_lock); } + +out: + mod_timer(>timer, get_next_poll_jiffies()); } void mlx5_start_health_poll(struct mlx5_core_dev *dev) -- 2.11.0
[net 1/6] net/mlx5: Fix create vport flow table flow
From: Mohamad Haj YahiaSend vport number to the create flow table inner method instead of ignoring the vport argument and sending always 0. Fixes: b3ba51498bdd ('net/mlx5: Refactor create flow table method to accept underlay QP') Signed-off-by: Mohamad Haj Yahia Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index 0e487e8ca634..8f5125ccd8d4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -862,7 +862,7 @@ struct mlx5_flow_table *mlx5_create_vport_flow_table(struct mlx5_flow_namespace ft_attr.level = level; ft_attr.prio= prio; - return __mlx5_create_flow_table(ns, _attr, FS_FT_OP_MOD_NORMAL, 0); + return __mlx5_create_flow_table(ns, _attr, FS_FT_OP_MOD_NORMAL, vport); } struct mlx5_flow_table* -- 2.11.0
[pull request][net 0/6] Mellanox mlx5 fixes 2017-06-11
Hi Dave, This series contains some fixes for the mlx5 core and netdev driver. Please pull and let me know if there's any problem. For -stable: ("net/mlx5e: Added BW check for DIM decision mechanism") kernels >= 4.9 ("net/mlx5e: Fix wrong indications in DIM due to counter wraparound") kernels >= 4.9 ("net/mlx5: Remove several module events out of ethtool stats") kernels >= 4.10 ("net/mlx5: Enable 4K UAR only when page size is bigger than 4K") kernels >= 4.11 *all patches apply with no issue on their -stable. BTW I never asked you if this way of marking patches to -stable is good for you ? if you prefer another format please let me know. Thanks, Saeed. --- The following changes since commit b87fa0fafef4b16495740432f4eb8262efa500d0: Merge branch 'mvpp2-fixes' (2017-06-10 18:22:56 -0400) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-fixes-2017-06-11 for you to fetch changes up to 91828bd89940e8145f91751a015bc11bc486aad0: net/mlx5: Enable 4K UAR only when page size is bigger than 4K (2017-06-11 13:10:36 +0300) mlx5-fixes-2017-06-11 Huy Nguyen (1): net/mlx5: Remove several module events out of ethtool stats Majd Dibbiny (1): net/mlx5: Enable 4K UAR only when page size is bigger than 4K Mohamad Haj Yahia (2): net/mlx5: Fix create vport flow table flow net/mlx5: Continue health polling until it is explicitly stopped Tal Gilboa (2): net/mlx5e: Added BW check for DIM decision mechanism net/mlx5e: Fix wrong indications in DIM due to counter wraparound drivers/net/ethernet/mellanox/mlx5/core/en.h | 8 ++-- drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 45 +- drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 11 +- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/health.c | 11 +++--- drivers/net/ethernet/mellanox/mlx5/core/main.c | 6 ++- 6 files changed, 43 insertions(+), 40 deletions(-)
[net 4/6] net/mlx5e: Added BW check for DIM decision mechanism
From: Tal GilboaDIM (Dynamically-tuned Interrupt Moderation) is a mechanism designed for changing the channel interrupt moderation values in order to reduce CPU overhead for all traffic types. Until now only interrupt and packet rate were sampled. We found a scenario on which we get a false indication since a change in DIM caused more aggregation and reduced packet rate while increasing BW. We now regard a change as succesfull iff: current_BW > (prev_BW + threshold) or current_BW ~= prev_BW and current_PR > (prev_PR + threshold) or current_BW ~= prev_BW and current_PR ~= prev_PR and current_IR < (prev_IR - threshold) Where BW = Bandwidth, PR = Packet rate and IR = Interrupt rate Improvements (ConnectX-4Lx 25GbE, single RX queue, LRO off) -- packet size | before[Mb/s] | after[Mb/s] | gain | 2B | 343.4| 359.4 | 4.5% | 16B | 2739.7 | 2814.8 | 2.7% | 64B | 9739 | 10185.3 | 4.5% | Fixes: cb3c7fd4f839 ("net/mlx5e: Support adaptive RX coalescing") Signed-off-by: Tal Gilboa Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 ++ drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 37 -- 2 files changed, 22 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 2fd044b23875..429da21a583d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -458,12 +458,14 @@ struct mlx5e_mpw_info { struct mlx5e_rx_am_stats { int ppms; /* packets per msec */ + int bpms; /* bytes per msec */ int epms; /* events per msec */ }; struct mlx5e_rx_am_sample { ktime_t time; unsigned intpkt_ctr; + unsigned intbyte_ctr; u16 event_ctr; }; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c index 02dd3a95ed8f..54cdda957f9a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c @@ -183,28 +183,27 @@ static void mlx5e_am_exit_parking(struct mlx5e_rx_am *am) mlx5e_am_step(am); } +#define IS_SIGNIFICANT_DIFF(val, ref) \ + (((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */ + static int mlx5e_am_stats_compare(struct mlx5e_rx_am_stats *curr, struct mlx5e_rx_am_stats *prev) { - int diff; - - if (!prev->ppms) - return curr->ppms ? MLX5E_AM_STATS_BETTER : + if (!prev->bpms) + return curr->bpms ? MLX5E_AM_STATS_BETTER : MLX5E_AM_STATS_SAME; - diff = curr->ppms - prev->ppms; - if (((100 * abs(diff)) / prev->ppms) > 10) /* more than 10% diff */ - return (diff > 0) ? MLX5E_AM_STATS_BETTER : - MLX5E_AM_STATS_WORSE; + if (IS_SIGNIFICANT_DIFF(curr->bpms, prev->bpms)) + return (curr->bpms > prev->bpms) ? MLX5E_AM_STATS_BETTER : + MLX5E_AM_STATS_WORSE; - if (!prev->epms) - return curr->epms ? MLX5E_AM_STATS_WORSE : - MLX5E_AM_STATS_SAME; + if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms)) + return (curr->ppms > prev->ppms) ? MLX5E_AM_STATS_BETTER : + MLX5E_AM_STATS_WORSE; - diff = curr->epms - prev->epms; - if (((100 * abs(diff)) / prev->epms) > 10) /* more than 10% diff */ - return (diff < 0) ? MLX5E_AM_STATS_BETTER : - MLX5E_AM_STATS_WORSE; + if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms)) + return (curr->epms < prev->epms) ? MLX5E_AM_STATS_BETTER : + MLX5E_AM_STATS_WORSE; return MLX5E_AM_STATS_SAME; } @@ -266,6 +265,7 @@ static void mlx5e_am_sample(struct mlx5e_rq *rq, { s->time = ktime_get(); s->pkt_ctr = rq->stats.packets; + s->byte_ctr = rq->stats.bytes; s->event_ctr = rq->cq.event_ctr; } @@ -278,12 +278,15 @@ static void mlx5e_am_calc_stats(struct mlx5e_rx_am_sample *start, /* u32 holds up to 71 minutes, should be enough */ u32 delta_us = ktime_us_delta(end->time, start->time); unsigned int npkts = end->pkt_ctr - start->pkt_ctr; + unsigned int nbytes = end->byte_ctr - start->byte_ctr; if (!delta_us) return; - curr_stats->ppms =(npkts * USEC_PER_MSEC) / delta_us; - curr_stats->epms = (MLX5E_AM_NEVENTS * USEC_PER_MSEC) / delta_us; + curr_stats->ppms =
[net 3/6] net/mlx5: Remove several module events out of ethtool stats
From: Huy NguyenRemove the following module event counters out of ethtool stats. The reason for removing these event counters is that these events do not occur without techinician's intervention. module_pwr_budget_exd module_long_range module_no_eeprom module_enforce_part module_unknown_id module_unknown_status module_plug Fixes: bedb7c909c19 ("net/mlx5e: Add port module event counters to ethtool stats") Signed-off-by: Huy Nguyen Reviewed by: Gal Pressman Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 11 ++- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h index 53e4992d6511..f81c3aa60b46 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h @@ -417,20 +417,13 @@ struct mlx5e_stats { }; static const struct counter_desc mlx5e_pme_status_desc[] = { - { "module_plug", 0 }, { "module_unplug", 8 }, }; static const struct counter_desc mlx5e_pme_error_desc[] = { - { "module_pwr_budget_exd", 0 }, /* power budget exceed */ - { "module_long_range", 8 }, /* long range for non MLNX cable */ - { "module_bus_stuck", 16 }, /* bus stuck (I2C or data shorted) */ - { "module_no_eeprom", 24 }, /* no eeprom/retry time out */ - { "module_enforce_part", 32 }, /* enforce part number list */ - { "module_unknown_id", 40 }, /* unknown identifier */ - { "module_high_temp", 48 }, /* high temperature */ + { "module_bus_stuck", 16 }, /* bus stuck (I2C or data shorted) */ + { "module_high_temp", 48 }, /* high temperature */ { "module_bad_shorted", 56 },/* bad or shorted cable/module */ - { "module_unknown_status", 64 }, }; #endif /* __MLX5_EN_STATS_H__ */ -- 2.11.0
[net 6/6] net/mlx5: Enable 4K UAR only when page size is bigger than 4K
From: Majd DibbinyWhen the page size isn't bigger than 4K, there is no added value of enabling 4K UAR feature in the Firmware. Modified the condition of enabling the 4K UAR accordingly. Fixes: f502d834950a ("net/mlx5: Activate support for 4K UARs") Signed-off-by: Majd Dibbiny Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index af945edfee19..4f577a5abf88 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -537,8 +537,10 @@ static int handle_hca_cap(struct mlx5_core_dev *dev) /* disable cmdif checksum */ MLX5_SET(cmd_hca_cap, set_hca_cap, cmdif_checksum, 0); - /* If the HCA supports 4K UARs use it */ - if (MLX5_CAP_GEN_MAX(dev, uar_4k)) + /* Enable 4K UAR only when HCA supports it and page size is bigger +* than 4K. +*/ + if (MLX5_CAP_GEN_MAX(dev, uar_4k) && PAGE_SIZE > 4096) MLX5_SET(cmd_hca_cap, set_hca_cap, uar_4k, 1); MLX5_SET(cmd_hca_cap, set_hca_cap, log_uar_page_sz, PAGE_SHIFT - 12); -- 2.11.0
[net 5/6] net/mlx5e: Fix wrong indications in DIM due to counter wraparound
From: Tal GilboaDIM (Dynamically-tuned Interrupt Moderation) is a mechanism designed for changing the channel interrupt moderation values in order to reduce CPU overhead for all traffic types. Each iteration of the algorithm, DIM calculates the difference in throughput, packet rate and interrupt rate from last iteration in order to make a decision. DIM relies on counters for each metric. When these counters get to their type's max value they wraparound. In this case the delta between 'end' and 'start' samples is negative and when translated to unsigned integers - very high. This results in a false indication to the algorithm and might result in a wrong decision. The fix calculates the 'distance' between 'end' and 'start' samples in a cyclic way around the relevant type's max value. It can also be viewed as an absolute value around the type's max value instead of around 0. Testing show higher stability in DIM profile selection and no wraparound issues. Fixes: cb3c7fd4f839 ("net/mlx5e: Support adaptive RX coalescing") Signed-off-by: Tal Gilboa Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 8 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 10 +++--- 2 files changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 429da21a583d..944fc1742464 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -463,10 +463,10 @@ struct mlx5e_rx_am_stats { }; struct mlx5e_rx_am_sample { - ktime_t time; - unsigned intpkt_ctr; - unsigned intbyte_ctr; - u16 event_ctr; + ktime_t time; + u32 pkt_ctr; + u32 byte_ctr; + u16 event_ctr; }; struct mlx5e_rx_am { /* Adaptive Moderation */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c index 54cdda957f9a..acf32fe952cd 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c @@ -270,6 +270,8 @@ static void mlx5e_am_sample(struct mlx5e_rq *rq, } #define MLX5E_AM_NEVENTS 64 +#define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE) +#define BIT_GAP(bits, end, start) end) - (start)) + BIT_ULL(bits)) & (BIT_ULL(bits) - 1)) static void mlx5e_am_calc_stats(struct mlx5e_rx_am_sample *start, struct mlx5e_rx_am_sample *end, @@ -277,8 +279,9 @@ static void mlx5e_am_calc_stats(struct mlx5e_rx_am_sample *start, { /* u32 holds up to 71 minutes, should be enough */ u32 delta_us = ktime_us_delta(end->time, start->time); - unsigned int npkts = end->pkt_ctr - start->pkt_ctr; - unsigned int nbytes = end->byte_ctr - start->byte_ctr; + u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr); + u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr, +start->byte_ctr); if (!delta_us) return; @@ -311,7 +314,8 @@ void mlx5e_rx_am(struct mlx5e_rq *rq) switch (am->state) { case MLX5E_AM_MEASURE_IN_PROGRESS: - nevents = rq->cq.event_ctr - am->start_sample.event_ctr; + nevents = BIT_GAP(BITS_PER_TYPE(u16), rq->cq.event_ctr, + am->start_sample.event_ctr); if (nevents < MLX5E_AM_NEVENTS) break; mlx5e_am_sample(rq, _sample); -- 2.11.0
Re: [PATCH net-next v10 2/4] net sched actions: Use proper root attribute table for actions
Sun, Jun 11, 2017 at 01:53:44PM CEST, j...@mojatatu.com wrote: >From: Jamal Hadi Salim> >Bug fix for an issue which has been around for about a decade. >We got away with it because the enumeration was larger than needed. > >Fixes: 7ba699c604ab ("[NET_SCHED]: Convert actions from rtnetlink to new >netlink API") >Suggested-by: Jiri Pirko >Reviewed-by: Simon Horman >Signed-off-by: Jamal Hadi Salim Reviewed-by: Jiri Pirko
Re: [PATCH v2] brcmfmac: Fix glom_skb leak in brcmf_sdiod_recv_chain
On Sat, Jun 10, 2017 at 10:27 PM, Arend van Sprielwrote: > On 03-06-17 17:36, Andy Shevchenko wrote: >> On Sat, Jun 3, 2017 at 1:29 AM, Peter S. Housel wrote: The following looks good to me. Feel free to add Reviewed-by: Andy Shevchenko > --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c > +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c > @@ -705,7 +705,7 @@ int brcmf_sdiod_recv_pkt(struct brcmf_sdio_dev > *sdiodev, struct sk_buff *pkt) > int brcmf_sdiod_recv_chain(struct brcmf_sdio_dev *sdiodev, >struct sk_buff_head *pktq, uint totlen) > { > - struct sk_buff *glom_skb; > + struct sk_buff *glom_skb = NULL; > struct sk_buff *skb; > u32 addr = sdiodev->sbwad; > int err = 0; > @@ -726,10 +726,8 @@ int brcmf_sdiod_recv_chain(struct brcmf_sdio_dev > *sdiodev, > return -ENOMEM; > err = brcmf_sdiod_buffrw(sdiodev, SDIO_FUNC_2, false, addr, > glom_skb); > - if (err) { > - brcmu_pkt_buf_free_skb(glom_skb); > + if (err) > goto done; > - } > > skb_queue_walk(pktq, skb) { > memcpy(skb->data, glom_skb->data, skb->len); > @@ -740,6 +738,7 @@ int brcmf_sdiod_recv_chain(struct brcmf_sdio_dev > *sdiodev, > pktq); > > done: > + brcmu_pkt_buf_free_skb(glom_skb); > return err; > } > -- With Best Regards, Andy Shevchenko
Re: [PATCH net-next v10 1/4] net netlink: Add new type NLA_FLAG_BITS
Sun, Jun 11, 2017 at 01:53:43PM CEST, j...@mojatatu.com wrote: >From: Jamal Hadi Salim> >Generic bitflags attribute content sent to the kernel by user. >With this type the user can either set or unset a flag in the >kernel. > >The nla_flag_values is a bitmap that defines the values being set >The nla_flag_selector is a bitmask that defines which value is legit. > >A check is made to ensure the rules that a kernel subsystem always >conforms to bitflags the kernel already knows about. i.e >if the user tries to set a bit flag that is not understood then >the _it will be rejected_. > >In the most basic form, the user specifies the attribute policy as: >[ATTR_GOO] = { .type = NLA_FLAG_BITS, .validation_data = }, > >where myvalidflags is the bit mask of the flags the kernel understands. > >If the user _does not_ provide myvalidflags then the attribute will >also be rejected. > >Examples: >nla_flag_values = 0x0, and nla_flag_selector = 0x1 >implies we are selecting bit 1 and we want to set its value to 0. > >nla_flag_values = 0x2, and nla_flag_selector = 0x2 >implies we are selecting bit 2 and we want to set its value to 1. > >This patch also provides an extra feature: a validation callback >that could be speaciliazed for other types. s/speaciliazed/speciliazed/ >This feature is intended to be used by a kernel subsystem to check >for a combination of bits being present. Example "bit x is valid >only if bit y and z are present". > >So a kernel subsystem could specify validation rules of the following >nature: > >[ATTR_GOO] = { .type = MYTYPE, > .validation_data = _data, > .validate_content = mycontent_validator }, Indent is wrong. (Does not matter really in desc, but anyway) > >With validator callback looking like: > >int mycontent_validator(const struct nlattr *nla, void *valid_data) >{ > const struct myattribute *user_data = nla_data(nla); > struct myvalidation_struct *valid_data_constraint = valid_data; > > ... return appropriate error code etc ... >} > > >Signed-off-by: Jamal Hadi Salim Suggested-by: Jiri Pirko >--- > include/net/netlink.h | 11 +++ > include/uapi/linux/rtnetlink.h | 17 + > lib/nlattr.c | 25 + > 3 files changed, 53 insertions(+) > >diff --git a/include/net/netlink.h b/include/net/netlink.h >index 0170917..8ab9784 100644 >--- a/include/net/netlink.h >+++ b/include/net/netlink.h >@@ -6,6 +6,11 @@ > #include > #include > >+struct nla_bit_flags { >+ u32 nla_flag_values; >+ u32 nla_flag_selector; >+}; I don't understand why you redefine the struct here. You already have it defined in the uapi: struct __nla_bit_flags Just move this (struct nla_bit_flags) to the uapi and remove __nla_bit_flags ? >+ > /* > * Netlink Messages and Attributes Interface (As Seen On TV) > * >@@ -178,6 +183,7 @@ enum { > NLA_S16, > NLA_S32, > NLA_S64, >+ NLA_FLAG_BITS, > __NLA_TYPE_MAX, > }; > >@@ -206,6 +212,7 @@ enum { > *NLA_MSECSLeaving the length field zero will verify the > * given type fits, using it verifies minimum length > * just like "All other" >+ *NLA_FLAG_BITSA bitmap/bitselector attribute > *All otherMinimum length of attribute payload > * > * Example: >@@ -213,11 +220,15 @@ enum { > *[ATTR_FOO] = { .type = NLA_U16 }, > *[ATTR_BAR] = { .type = NLA_STRING, .len = BARSIZ }, > *[ATTR_BAZ] = { .len = sizeof(struct mystruct) }, >+ *[ATTR_GOO] = { .type = NLA_FLAG_BITS, .validation_data = >}, > * }; > */ > struct nla_policy { > u16 type; > u16 len; >+ void*validation_data; >+ int (*validate_content)(const struct nlattr *nla, >+ const void *validation_data); > }; > > /** >diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h >index 564790e..8f07957 100644 >--- a/include/uapi/linux/rtnetlink.h >+++ b/include/uapi/linux/rtnetlink.h >@@ -179,6 +179,23 @@ struct rtattr { > #define RTA_DATA(rta) ((void*)(((char*)(rta)) + RTA_LENGTH(0))) > #define RTA_PAYLOAD(rta) ((int)((rta)->rta_len) - RTA_LENGTH(0)) > >+/* Generic bitflags attribute content sent to the kernel. >+ * >+ * The nla_flag_values is a bitmap that defines the values being set >+ * The nla_flag_selector is a bitmask that defines which value is legit >+ * >+ * Examples: >+ * nla_flag_values = 0x0, and nla_flag_selector = 0x1 >+ * implies we are selecting bit 1 and we want to set its value to 0. >+ * >+ * nla_flag_values = 0x2, and nla_flag_selector = 0x2 >+ * implies we are selecting bit 2 and we want to set its
Re: ARM GLX Khadas VIM Pro - Ethernet detected as only 10Mbps and stalled after some traffic
Hi Andrew On Sat, Jun 10, 2017 at 5:27 PM, Andrew Lunnwrote: >> Also what Martin Blumenstingl wrote is following which is also crucial >> for fixing the issue: >> Amlogic has given their ethernet PHY driver some updates [2], it now >> includes wake-on-lan, and they now have an internal_phy_read_status >> which uses reset_internal_phy if there's a link and some error counter >> exceeds some magic value. > > Hi Crow > > You could probably just drop the Amlogic driver into mainline and see > if it works better. If that solves your problem, we can look at > merging the changes. > > Andrew Please ignore my previus email as I used wrong kernel (not the patched one, see the modinfo). Thank your for the suggestion, and thanks Martin to explaining me over IRC what actually I should do. I recompiled mainline kernel 4.12-rc4 with the Amlogic driver: replaced drivers/net/phy/meson-gxl.c with https://github.com/khadas/linux/blob/ubuntu-4.9/drivers/amlogic/ethernet/phy/amlogic.c But this did not solve the issue. As soon as i start git clone i lose network connection to device (no session timeout/disconnect this time, but I am unable to reconnect over SSH or to get OK ping replay back). Linux khadasvimpro 4.12.0-rc4-4-ARCH #1 SMP Sun Jun 11 14:33:40 CEST 2017 aarch64 GNU/Linux # modinfo meson_gxl filename: /lib/modules/4.12.0-rc4-4-ARCH/kernel/drivers/net/phy/meson-gxl.ko.gz license:GPL author: Yizhou Jiang description:amlogic internal ethernet phy driver alias: mdio:0001100101000100 depends: intree: Y vermagic: 4.12.0-rc4-4-ARCH SMP mod_unload aarch64 # # mii-tool -vvv eth0 Using SIOCGMIIPHY=0x8947 eth0: negotiated 1000baseT-HD flow-control, link ok registers for MII PHY 8: 1000 782d 0181 4400 01e1 c1e1 000f 2001 0040 0082 40e8 5400 0d80 1000 a900 fff0 130a 1407 00ca 105a product info: vendor 00:60:51, model 0 rev 0 basic mode: autonegotiation enabled basic status: autonegotiation complete, link ok capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD link partner: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD # over SSH startet following but it stall already at 1%: $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Cloning into 'linux-stable'... remote: Counting objects: 5948690, done. remote: Compressing objects: 100% (124799/124799), done. Receiving objects: 1% (110361/5948690), 48.02 MiB | 6.54 MiB/s journalctl shows timeout while trying to sync with NTP server: systemd-timesyncd[292]: Timed out waiting for reply from 91.206.8.36:123 (2.at.pool.ntp.org). systemd-timesyncd[292]: Timed out waiting for reply from 131.130.251.107:123 (2.at.pool.ntp.org). ping to this device from other client: 100% packet loss dumping register after stall state # mii-tool -vvv eth0 Using SIOCGMIIPHY=0x8947 eth0: negotiated 1000baseT-HD flow-control, link ok registers for MII PHY 8: 1000 782d 0181 4400 01e1 c1e1 000d 2001 0040 0082 40e8 5400 0d80 1000 a900 fff0 130a 1407 105a product info: vendor 00:60:51, model 0 rev 0 basic mode: autonegotiation enabled basic status: autonegotiation complete, link ok capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD link partner: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD # after taking eth0 down and up, I am able to login to device over SSH again, and ping from other devices in network to this device are all OK. # ifconfig eth0 down && ifconfig eth0 up # dmesg -T | tail -n 10 [Sun Jun 11 15:03:02 2017] internal phy init [Sun Jun 11 15:03:02 2017] internal phy init [Sun Jun 11 15:03:02 2017] amlogic internal phy 0.e40908ff:08: attached PHY driver [amlogic internal phy] (mii_bus:phy_addr=0.e40908ff:08, irq=-1) [Sun Jun 11 15:03:02 2017] meson8b-dwmac c941.ethernet eth0: PTP not supported by HW [Sun Jun 11 15:03:03 2017] wol_reg12[12]==0, error [Sun Jun 11 15:03:04 2017] meson8b-dwmac c941.ethernet eth0: Link is Up - 100Mbps/Full - flow control off [Sun Jun 11 15:10:38 2017] internal phy init [Sun Jun 11 15:10:38 2017] internal phy init [Sun Jun 11 15:10:38 2017] amlogic internal phy 0.e40908ff:08: attached PHY driver [amlogic internal phy] (mii_bus:phy_addr=0.e40908ff:08, irq=-1) [Sun Jun 11 15:10:38 2017] meson8b-dwmac c941.ethernet eth0: PTP not supported by HW [Sun Jun 11 15:10:39 2017] wol_reg12[12]==0, error [Sun Jun 11 15:10:40 2017] meson8b-dwmac c941.ethernet eth0: Link is Up - 100Mbps/Full - flow control off then I took eth0 and wlan0 up (same ArchLinuxArm
Re: [PATCH net-next 9/9] net: hns3: Add HNS3 driver to kernel build framework & MAINTAINERS
Hi Salil, [auto build test ERROR on net-next/master] url: https://github.com/0day-ci/linux/commits/Salil-Mehta/Hisilicon-Network-Subsystem-3-Ethernet-Driver/20170611-204908 config: alpha-allyesconfig (attached as .config) compiler: alpha-linux-gnu-gcc (Debian 6.3.0-18) 6.3.0 20170516 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=alpha All errors (new ones prefixed by >>): drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c: In function 'hns3_nic_poll_controller': >> drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c:61:26: error: 'struct >> hnae3_knic_private_info' has no member named 'num_tqp_vectors'; did you mean >> 'num_tqps'? for (i = 0; i < h->kinfo.num_tqp_vectors; i++) ^ >> drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c:62:26: error: 'struct >> hnae3_knic_private_info' has no member named 'tqp_vectors' napi_schedule(>kinfo.tqp_vectors[i].napi); ^ drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c: At top level: >> drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c:1275:19: error: >> initialization from incompatible pointer type >> [-Werror=incompatible-pointer-types] .ndo_setup_tc = hns3_nic_setup_tc, ^ drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c:1275:19: note: (near initialization for 'hns3_nic_netdev_ops.ndo_setup_tc') cc1: some warnings being treated as errors vim +61 drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c b7d58d74 Salil Mehta 2017-06-10 55 struct hns3_nic_priv *priv = netdev_priv(ndev); b7d58d74 Salil Mehta 2017-06-10 56 struct hnae3_handle *h = priv->ae_handle; b7d58d74 Salil Mehta 2017-06-10 57 unsigned long flag; b7d58d74 Salil Mehta 2017-06-10 58 int i; b7d58d74 Salil Mehta 2017-06-10 59 b7d58d74 Salil Mehta 2017-06-10 60 local_irq_save(flag); b7d58d74 Salil Mehta 2017-06-10 @61 for (i = 0; i < h->kinfo.num_tqp_vectors; i++) b7d58d74 Salil Mehta 2017-06-10 @62 napi_schedule(>kinfo.tqp_vectors[i].napi); b7d58d74 Salil Mehta 2017-06-10 63 local_irq_restore(flag); b7d58d74 Salil Mehta 2017-06-10 64 } b7d58d74 Salil Mehta 2017-06-10 65 #endif :: The code at line 61 was first introduced by commit :: b7d58d744ff1224947a4cf8373ee9855e5a29af5 net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC :: TO: Salil Mehta <salil.me...@huawei.com> :: CC: 0day robot <fengguang...@intel.com> --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH net-next 9/9] net: hns3: Add HNS3 driver to kernel build framework & MAINTAINERS
Hi Salil, [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Salil-Mehta/Hisilicon-Network-Subsystem-3-Ethernet-Driver/20170611-204908 config: parisc-allyesconfig (attached as .config) compiler: hppa-linux-gnu-gcc (Debian 6.3.0-18) 6.3.0 20170516 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=parisc All warnings (new ones prefixed by >>): drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c: In function 'hclge_update_speed_duplex': >> drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:2266:1: warning: the >> frame size of 1440 bytes is larger than 1024 bytes [-Wframe-larger-than=] } ^ vim +2266 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 14965015 Salil Mehta 2017-06-10 2250 if (ret) { 14965015 Salil Mehta 2017-06-10 2251 dev_err(>pdev->dev, 14965015 Salil Mehta 2017-06-10 2252 "mac autoneg/speed/duplex query failed %d\n", ret); 14965015 Salil Mehta 2017-06-10 2253 return ret; 14965015 Salil Mehta 2017-06-10 2254 } 14965015 Salil Mehta 2017-06-10 2255 14965015 Salil Mehta 2017-06-10 2256 if ((mac.speed != speed) || (mac.duplex != duplex)) { 14965015 Salil Mehta 2017-06-10 2257 ret = hclge_cfg_mac_speed_dup(hdev, speed, duplex); 14965015 Salil Mehta 2017-06-10 2258 if (ret) { 14965015 Salil Mehta 2017-06-10 2259 dev_err(>pdev->dev, 14965015 Salil Mehta 2017-06-10 2260 "mac speed/duplex config failed %d\n", ret); 14965015 Salil Mehta 2017-06-10 2261 return ret; 14965015 Salil Mehta 2017-06-10 2262 } 14965015 Salil Mehta 2017-06-10 2263 } 14965015 Salil Mehta 2017-06-10 2264 14965015 Salil Mehta 2017-06-10 2265 return 0; 14965015 Salil Mehta 2017-06-10 @2266 } 14965015 Salil Mehta 2017-06-10 2267 14965015 Salil Mehta 2017-06-10 2268 static int hclge_update_speed_duplex_h(struct hnae3_handle *handle) 14965015 Salil Mehta 2017-06-10 2269 { 14965015 Salil Mehta 2017-06-10 2270 struct hclge_vport *vport = hclge_get_vport(handle); 14965015 Salil Mehta 2017-06-10 2271 struct hclge_dev *hdev = vport->back; 14965015 Salil Mehta 2017-06-10 2272 14965015 Salil Mehta 2017-06-10 2273 return hclge_update_speed_duplex(hdev); 14965015 Salil Mehta 2017-06-10 2274 } :: The code at line 2266 was first introduced by commit :: 1496501507394557c091dfe9d27fe0124802fbee net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support :: TO: Salil Mehta <salil.me...@huawei.com> :: CC: 0day robot <fengguang...@intel.com> --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH 0/6] Constant Time Memory Comparisons Are Important
On Sun, Jun 11, 2017 at 1:13 AM, Kalle Valowrote: > "Jason A. Donenfeld" writes: > >> Whenever you're comparing two MACs, it's important to do this using >> crypto_memneq instead of memcmp. With memcmp, you leak timing information, >> which could then be used to iteratively forge a MAC. > > Do you have any pointers where I could learn more about this? While not using C specifically, this talks about the problem generally: https://www.chosenplaintext.ca/articles/beginners-guide-constant-time-cryptography.html -Kees -- Kees Cook Pixel Security
[PATCH net 9/9] net: ena: update ena driver to version 1.1.7
From: Netanel BelgazalSigned-off-by: Netanel Belgazal --- drivers/net/ethernet/amazon/ena/ena_netdev.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h index 88b5e5612338..a4d3d5e21068 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h @@ -45,7 +45,7 @@ #define DRV_MODULE_VER_MAJOR 1 #define DRV_MODULE_VER_MINOR 1 -#define DRV_MODULE_VER_SUBMINOR 2 +#define DRV_MODULE_VER_SUBMINOR 7 #define DRV_MODULE_NAME"ena" #ifndef DRV_MODULE_VERSION -- 2.7.4
[PATCH net 7/9] net: ena: disable admin msix while working in polling mode
From: Netanel BelgazalFixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal --- drivers/net/ethernet/amazon/ena/ena_com.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c index ea60b9e67acb..f5b237e0bd60 100644 --- a/drivers/net/ethernet/amazon/ena/ena_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_com.c @@ -61,6 +61,8 @@ #define ENA_MMIO_READ_TIMEOUT 0x +#define ENA_REGS_ADMIN_INTR_MASK 1 + /*/ /*/ /*/ @@ -1454,6 +1456,12 @@ void ena_com_admin_destroy(struct ena_com_dev *ena_dev) void ena_com_set_admin_polling_mode(struct ena_com_dev *ena_dev, bool polling) { + u32 mask_value = 0; + + if (polling) + mask_value = ENA_REGS_ADMIN_INTR_MASK; + + writel(mask_value, ena_dev->reg_bar + ENA_REGS_INTR_MASK_OFF); ena_dev->admin_queue.polling = polling; } -- 2.7.4
[PATCH net 5/9] net: ena: add missing unmap bars on device removal
From: Netanel BelgazalThis patch also change the mapping functions to devm_ functions Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 1e71e89e1e18..4e9fbddd3b47 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -2853,6 +2853,11 @@ static void ena_release_bars(struct ena_com_dev *ena_dev, struct pci_dev *pdev) { int release_bars; + if (ena_dev->mem_bar) + devm_iounmap(>dev, ena_dev->mem_bar); + + devm_iounmap(>dev, ena_dev->reg_bar); + release_bars = pci_select_bars(pdev, IORESOURCE_MEM) & ENA_BAR_MASK; pci_release_selected_regions(pdev, release_bars); } @@ -2940,8 +2945,9 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent) goto err_free_ena_dev; } - ena_dev->reg_bar = ioremap(pci_resource_start(pdev, ENA_REG_BAR), - pci_resource_len(pdev, ENA_REG_BAR)); + ena_dev->reg_bar = devm_ioremap(>dev, + pci_resource_start(pdev, ENA_REG_BAR), + pci_resource_len(pdev, ENA_REG_BAR)); if (!ena_dev->reg_bar) { dev_err(>dev, "failed to remap regs bar\n"); rc = -EFAULT; @@ -2961,8 +2967,9 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent) ena_set_push_mode(pdev, ena_dev, _feat_ctx); if (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV) { - ena_dev->mem_bar = ioremap_wc(pci_resource_start(pdev, ENA_MEM_BAR), - pci_resource_len(pdev, ENA_MEM_BAR)); + ena_dev->mem_bar = devm_ioremap_wc(>dev, + pci_resource_start(pdev, ENA_MEM_BAR), + pci_resource_len(pdev, ENA_MEM_BAR)); if (!ena_dev->mem_bar) { rc = -EFAULT; goto err_device_destroy; -- 2.7.4
[PATCH net 6/9] net: ena: fix theoretical Rx hang on low memory systems
From: Netanel BelgazalFor the rare case where the device runs out of free rx buffer descriptors (in case of pressure on kernel memory), and the napi handler continuously fail to refill new Rx descriptors until device rx queue totally runs out of all free rx buffers to post incoming packet, leading to a deadlock: * The device won't send interrupts since all the new Rx packets will be dropped. * The napi handler won't try to allocate new Rx descriptors since allocation is part of NAPI that's not being invoked any more The fix involves detecting this scenario and rescheduling NAPI (to refill buffers) by the keepalive/watchdog task. Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal --- drivers/net/ethernet/amazon/ena/ena_ethtool.c | 1 + drivers/net/ethernet/amazon/ena/ena_netdev.c | 55 +++ drivers/net/ethernet/amazon/ena/ena_netdev.h | 2 + 3 files changed, 58 insertions(+) diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c index 67b2338f8fb3..533b2fbdeef1 100644 --- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c +++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c @@ -94,6 +94,7 @@ static const struct ena_stats ena_stats_rx_strings[] = { ENA_STAT_RX_ENTRY(dma_mapping_err), ENA_STAT_RX_ENTRY(bad_desc_num), ENA_STAT_RX_ENTRY(rx_copybreak_pkt), + ENA_STAT_RX_ENTRY(empty_rx_ring), }; static const struct ena_stats ena_stats_ena_com_strings[] = { diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 4e9fbddd3b47..3c366bfbbab1 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -190,6 +190,7 @@ static void ena_init_io_rings(struct ena_adapter *adapter) rxr->sgl_size = adapter->max_rx_sgl_size; rxr->smoothed_interval = ena_com_get_nonadaptive_moderation_interval_rx(ena_dev); + rxr->empty_rx_queue = 0; } } @@ -2619,6 +2620,58 @@ static void check_for_missing_tx_completions(struct ena_adapter *adapter) adapter->last_monitored_tx_qid = i % adapter->num_queues; } +/* trigger napi schedule after 2 consecutive detections */ +#define EMPTY_RX_REFILL 2 +/* For the rare case where the device runs out of Rx descriptors and the + * napi handler failed to refill new Rx descriptors (due to a lack of memory + * for example). + * This case will lead to a deadlock: + * The device won't send interrupts since all the new Rx packets will be dropped + * The napi handler won't allocate new Rx descriptors so the device will be + * able to send new packets. + * + * This scenario can happen when the kernel's vm.min_free_kbytes is too small. + * It is recommended to have at least 512MB, with a minimum of 128MB for + * constrained environment). + * + * When such a situation is detected - Reschedule napi + */ +static void check_for_empty_rx_ring(struct ena_adapter *adapter) +{ + struct ena_ring *rx_ring; + int i, refill_required; + + if (!test_bit(ENA_FLAG_DEV_UP, >flags)) + return; + + if (test_bit(ENA_FLAG_TRIGGER_RESET, >flags)) + return; + + for (i = 0; i < adapter->num_queues; i++) { + rx_ring = >rx_ring[i]; + + refill_required = + ena_com_sq_empty_space(rx_ring->ena_com_io_sq); + if (unlikely(refill_required == (rx_ring->ring_size - 1))) { + rx_ring->empty_rx_queue++; + + if (rx_ring->empty_rx_queue >= EMPTY_RX_REFILL) { + u64_stats_update_begin(_ring->syncp); + rx_ring->rx_stats.empty_rx_ring++; + u64_stats_update_end(_ring->syncp); + + netif_err(adapter, drv, adapter->netdev, + "trigger refill for ring %d\n", i); + + napi_schedule(rx_ring->napi); + rx_ring->empty_rx_queue = 0; + } + } else { + rx_ring->empty_rx_queue = 0; + } + } +} + /* Check for keep alive expiration */ static void check_for_missing_keep_alive(struct ena_adapter *adapter) { @@ -2673,6 +2726,8 @@ static void ena_timer_service(unsigned long data) check_for_missing_tx_completions(adapter); + check_for_empty_rx_ring(adapter); + if (debug_area) ena_dump_stats_to_buf(adapter, debug_area); diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h index 0e22bce6239d..8828f1d6dd22 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h @@
[PATCH net 8/9] net: ena: bug fix in lost tx packets detection mechanism
From: Netanel Belgazalcheck_for_missing_tx_completions() is called from a timer task and looking for lost tx packets. The old implementation accumulate all the lost tx packets and did not check if those packets were retrieved on a later stage. This cause to a situation where the driver reset the device for no reason. Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal --- drivers/net/ethernet/amazon/ena/ena_ethtool.c | 1 - drivers/net/ethernet/amazon/ena/ena_netdev.c | 66 +++ drivers/net/ethernet/amazon/ena/ena_netdev.h | 14 +- 3 files changed, 50 insertions(+), 31 deletions(-) diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c index 533b2fbdeef1..3ee55e2fd694 100644 --- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c +++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c @@ -80,7 +80,6 @@ static const struct ena_stats ena_stats_tx_strings[] = { ENA_STAT_TX_ENTRY(tx_poll), ENA_STAT_TX_ENTRY(doorbells), ENA_STAT_TX_ENTRY(prepare_ctx_err), - ENA_STAT_TX_ENTRY(missing_tx_comp), ENA_STAT_TX_ENTRY(bad_req_id), }; diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 3c366bfbbab1..4f16ed38bcf3 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -1995,6 +1995,7 @@ static netdev_tx_t ena_start_xmit(struct sk_buff *skb, struct net_device *dev) tx_info->tx_descs = nb_hw_desc; tx_info->last_jiffies = jiffies; + tx_info->print_once = 0; tx_ring->next_to_use = ENA_TX_RING_IDX_NEXT(next_to_use, tx_ring->ring_size); @@ -2564,13 +2565,44 @@ static void ena_fw_reset_device(struct work_struct *work) "Reset attempt failed. Can not reset the device\n"); } -static void check_for_missing_tx_completions(struct ena_adapter *adapter) +static int check_missing_comp_in_queue(struct ena_adapter *adapter, + struct ena_ring *tx_ring) { struct ena_tx_buffer *tx_buf; unsigned long last_jiffies; + u32 missed_tx = 0; + int i; + + for (i = 0; i < tx_ring->ring_size; i++) { + tx_buf = _ring->tx_buffer_info[i]; + last_jiffies = tx_buf->last_jiffies; + if (unlikely(last_jiffies && +time_is_before_jiffies(last_jiffies + TX_TIMEOUT))) { + if (!tx_buf->print_once) + netif_notice(adapter, tx_err, adapter->netdev, +"Found a Tx that wasn't completed on time, qid %d, index %d.\n", +tx_ring->qid, i); + + tx_buf->print_once = 1; + missed_tx++; + + if (unlikely(missed_tx > MAX_NUM_OF_TIMEOUTED_PACKETS)) { + netif_err(adapter, tx_err, adapter->netdev, + "The number of lost tx completions is above the threshold (%d > %d). Reset the device\n", + missed_tx, MAX_NUM_OF_TIMEOUTED_PACKETS); + set_bit(ENA_FLAG_TRIGGER_RESET, >flags); + return -EIO; + } + } + } + + return 0; +} + +static void check_for_missing_tx_completions(struct ena_adapter *adapter) +{ struct ena_ring *tx_ring; - int i, j, budget; - u32 missed_tx; + int i, budget, rc; /* Make sure the driver doesn't turn the device in other process */ smp_rmb(); @@ -2586,31 +2618,9 @@ static void check_for_missing_tx_completions(struct ena_adapter *adapter) for (i = adapter->last_monitored_tx_qid; i < adapter->num_queues; i++) { tx_ring = >tx_ring[i]; - for (j = 0; j < tx_ring->ring_size; j++) { - tx_buf = _ring->tx_buffer_info[j]; - last_jiffies = tx_buf->last_jiffies; - if (unlikely(last_jiffies && time_is_before_jiffies(last_jiffies + TX_TIMEOUT))) { - netif_notice(adapter, tx_err, adapter->netdev, -"Found a Tx that wasn't completed on time, qid %d, index %d.\n", -tx_ring->qid, j); - - u64_stats_update_begin(_ring->syncp); - missed_tx = tx_ring->tx_stats.missing_tx_comp++; - u64_stats_update_end(_ring->syncp); - - /* Clear last jiffies so the lost buffer won't -* be counted twice. -
[PATCH net 2/9] net: ena: fix bug that might cause hang after consecutive open/close interface.
From: Netanel BelgazalFixing a bug that the driver does not unmask the IO interrupts in ndo_open(): occasionally, the MSI-X interrupt (for one or more IO queues) can be masked when ndo_close() was called. If that is followed by ndo open(), then the MSI-X will be still masked so no interrupt will be received by the driver. Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 41 ++-- 1 file changed, 26 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 7c1214d78855..0e3c60c7eccf 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -1078,6 +1078,26 @@ inline void ena_adjust_intr_moderation(struct ena_ring *rx_ring, rx_ring->per_napi_bytes = 0; } +static inline void ena_unmask_interrupt(struct ena_ring *tx_ring, + struct ena_ring *rx_ring) +{ + struct ena_eth_io_intr_reg intr_reg; + + /* Update intr register: rx intr delay, +* tx intr delay and interrupt unmask +*/ + ena_com_update_intr_reg(_reg, + rx_ring->smoothed_interval, + tx_ring->smoothed_interval, + true); + + /* It is a shared MSI-X. +* Tx and Rx CQ have pointer to it. +* So we use one of them to reach the intr reg +*/ + ena_com_unmask_intr(rx_ring->ena_com_io_cq, _reg); +} + static inline void ena_update_ring_numa_node(struct ena_ring *tx_ring, struct ena_ring *rx_ring) { @@ -1108,7 +1128,6 @@ static int ena_io_poll(struct napi_struct *napi, int budget) { struct ena_napi *ena_napi = container_of(napi, struct ena_napi, napi); struct ena_ring *tx_ring, *rx_ring; - struct ena_eth_io_intr_reg intr_reg; u32 tx_work_done; u32 rx_work_done; @@ -1149,22 +1168,9 @@ static int ena_io_poll(struct napi_struct *napi, int budget) if (ena_com_get_adaptive_moderation_enabled(rx_ring->ena_dev)) ena_adjust_intr_moderation(rx_ring, tx_ring); - /* Update intr register: rx intr delay, -* tx intr delay and interrupt unmask -*/ - ena_com_update_intr_reg(_reg, - rx_ring->smoothed_interval, - tx_ring->smoothed_interval, - true); - - /* It is a shared MSI-X. -* Tx and Rx CQ have pointer to it. -* So we use one of them to reach the intr reg -*/ - ena_com_unmask_intr(rx_ring->ena_com_io_cq, _reg); + ena_unmask_interrupt(tx_ring, rx_ring); } - ena_update_ring_numa_node(tx_ring, rx_ring); ret = rx_work_done; @@ -1485,6 +1491,11 @@ static int ena_up_complete(struct ena_adapter *adapter) ena_napi_enable_all(adapter); + /* Enable completion queues interrupt */ + for (i = 0; i < adapter->num_queues; i++) + ena_unmask_interrupt(>tx_ring[i], +>rx_ring[i]); + /* schedule napi in case we had pending packets * from the last time we disable napi */ -- 2.7.4
[PATCH net 3/9] net: ena: add missing return when ena_com_get_io_handlers() fails
From: Netanel BelgazalFixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 0e3c60c7eccf..1e71e89e1e18 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -1543,6 +1543,7 @@ static int ena_create_io_tx_queue(struct ena_adapter *adapter, int qid) "Failed to get TX queue handlers. TX queue num %d rc: %d\n", qid, rc); ena_com_destroy_io_queue(ena_dev, ena_qid); + return rc; } ena_com_update_numa_node(tx_ring->ena_com_io_cq, ctx.numa_node); @@ -1607,6 +1608,7 @@ static int ena_create_io_rx_queue(struct ena_adapter *adapter, int qid) "Failed to get RX queue handlers. RX queue num %d rc: %d\n", qid, rc); ena_com_destroy_io_queue(ena_dev, ena_qid); + return rc; } ena_com_update_numa_node(rx_ring->ena_com_io_cq, ctx.numa_node); -- 2.7.4
[PATCH net 0/9] Bugs fixes in ena ethernet driver
From: Netanel BelgazalThis patchset contains fixes for the bugs that were discovered so far. Netanel Belgazal (9): net: ena: fix rare uncompleted admin command false alarm net: ena: fix bug that might cause hang after consecutive open/close interface. net: ena: add missing return when ena_com_get_io_handlers() fails net: ena: fix race condition between submit and completion admin command net: ena: add missing unmap bars on device removal net: ena: fix theoretical Rx hang on low memory systems net: ena: disable admin msix while working in polling mode net: ena: bug fix in lost tx packets detection mechanism net: ena: update ena driver to version 1.1.7 drivers/net/ethernet/amazon/ena/ena_com.c | 35 +++-- drivers/net/ethernet/amazon/ena/ena_ethtool.c | 2 +- drivers/net/ethernet/amazon/ena/ena_netdev.c | 179 +++--- drivers/net/ethernet/amazon/ena/ena_netdev.h | 18 ++- 4 files changed, 169 insertions(+), 65 deletions(-) -- 2.7.4
[PATCH net 4/9] net: ena: fix race condition between submit and completion admin command
From: Netanel BelgazalBug: "Completion context is occupied" error printout will be noticed in dmesg. This error will cause the admin command to fail, which will lead to an ena_probe() failure or a watchdog reset (depends on which admin command failed). Root cause: __ena_com_submit_admin_cmd() is the function that submits new entries to the admin queue. The function have a check that makes sure the queue is not full and the function does not override any outstanding command. It uses head and tail indexes for this check. The head is increased by ena_com_handle_admin_completion() which runs from interrupt context, and the tail index is increased by the submit function (the function is running under ->q_lock, so there is no risk of multithread increment). Each command is associated with a completion context. This context allocated before call to __ena_com_submit_admin_cmd() and freed by ena_com_wait_and_process_admin_cq_interrupts(), right after the command was completed. This can lead to a state where the head was increased, the check passed, but the completion context is still in use. Solution: Use the atomic variable ->outstanding_cmds instead of using the head and the tail indexes. This variable is safe for use since it is bumped in get_comp_ctx() in __ena_com_submit_admin_cmd() and is freed by comp_ctxt_release() Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal --- drivers/net/ethernet/amazon/ena/ena_com.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c index e1c2fab6292f..ea60b9e67acb 100644 --- a/drivers/net/ethernet/amazon/ena/ena_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_com.c @@ -232,11 +232,9 @@ static struct ena_comp_ctx *__ena_com_submit_admin_cmd(struct ena_com_admin_queu tail_masked = admin_queue->sq.tail & queue_size_mask; /* In case of queue FULL */ - cnt = admin_queue->sq.tail - admin_queue->sq.head; + cnt = atomic_read(_queue->outstanding_cmds); if (cnt >= admin_queue->q_depth) { - pr_debug("admin queue is FULL (tail %d head %d depth: %d)\n", -admin_queue->sq.tail, admin_queue->sq.head, -admin_queue->q_depth); + pr_debug("admin queue is full.\n"); admin_queue->stats.out_of_space++; return ERR_PTR(-ENOSPC); } -- 2.7.4
[PATCH net 1/9] net: ena: fix rare uncompleted admin command false alarm
From: Netanel BelgazalThe current flow to detect admin completion is: while (command_not_completed) { if (timeout) error check_for_completion() sleep() } So in case the sleep took more than the timeout (in case the thread/workqueue was not scheduled due to higher priority task or prolonged VMexit), the driver can detect a stall even if the completion is present. The fix changes the order of this function to first check for completion and only after that check if the timeout expired. Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal --- drivers/net/ethernet/amazon/ena/ena_com.c | 21 +++-- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c index 08d11cede9c9..e1c2fab6292f 100644 --- a/drivers/net/ethernet/amazon/ena/ena_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_com.c @@ -508,15 +508,20 @@ static int ena_com_comp_status_to_errno(u8 comp_status) static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_ctx, struct ena_com_admin_queue *admin_queue) { - unsigned long flags; - u32 start_time; + unsigned long flags, timeout; int ret; - start_time = ((u32)jiffies_to_usecs(jiffies)); + timeout = jiffies + ADMIN_CMD_TIMEOUT_US; + + while (1) { + spin_lock_irqsave(_queue->q_lock, flags); + ena_com_handle_admin_completion(admin_queue); + spin_unlock_irqrestore(_queue->q_lock, flags); - while (comp_ctx->status == ENA_CMD_SUBMITTED) { - if u32)jiffies_to_usecs(jiffies)) - start_time) > - ADMIN_CMD_TIMEOUT_US) { + if (comp_ctx->status != ENA_CMD_SUBMITTED) + break; + + if (time_is_before_jiffies(timeout)) { pr_err("Wait for completion (polling) timeout\n"); /* ENA didn't have any completion */ spin_lock_irqsave(_queue->q_lock, flags); @@ -528,10 +533,6 @@ static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_c goto err; } - spin_lock_irqsave(_queue->q_lock, flags); - ena_com_handle_admin_completion(admin_queue); - spin_unlock_irqrestore(_queue->q_lock, flags); - msleep(100); } -- 2.7.4
Re: [PATCH net-next 0/8] Bug fixes in ena ethernet driver
Ack. resubmitting to net branch and adding "Fixes" mark. From: David MillerSent: Saturday, June 10, 2017 11:11 PM To: f.faine...@gmail.com Cc: Belgazal, Netanel; netdev@vger.kernel.org; Woodhouse, David; Machulsky, Zorik; Matushevsky, Alexander; BSHARA, Said; Wilson, Matt; Liguori, Anthony; Bshara, Nafea; Schmeilin, Evgeny Subject: Re: [PATCH net-next 0/8] Bug fixes in ena ethernet driver From: Florian Fainelli Date: Fri, 9 Jun 2017 15:19:54 -0700 > On 06/09/2017 03:13 PM, neta...@amazon.com wrote: >> From: Netanel Belgazal >> >> This patchset contains fixes for the bugs that were discovered so far. > > If these are all fixes you should submit them against the "net" tree. > net-next is for features [1]. > > Since these are fixes, you may also want to provide a Fixes: 12-digit > commit ("commit subject") [2] such that David can queue these patches > for stable trees and this can be retrofitted into kernel distributions. > > [1]: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/netdev-FAQ.txt#n25 > > [2]: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst#n183 Yeah I agree. If they are genuine bug fixes they should be submitted against 'net'. And yes, Fixes: tags are quite desirable as well.
Re: [PATCH net-next v10 0/4] net sched actions: improve dump performance
Attached iproute2 used for testing these features.. cheers, jamal diff --git a/Makefile b/Makefile index 18de7dc..ed3c10a 100644 --- a/Makefile +++ b/Makefile @@ -45,7 +45,7 @@ HOSTCC ?= $(CC) DEFINES += -D_GNU_SOURCE # Turn on transparent support for LFS DEFINES += -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -CCOPTS = -O2 +CCOPTS = -g WFLAGS := -Wall -Wstrict-prototypes -Wmissing-prototypes WFLAGS += -Wmissing-declarations -Wold-style-definition -Wformat=2 diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index a96db83..988ebc2 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -1,5 +1,5 @@ -#ifndef __LINUX_RTNETLINK_H -#define __LINUX_RTNETLINK_H +#ifndef _UAPI__LINUX_RTNETLINK_H +#define _UAPI__LINUX_RTNETLINK_H #include #include @@ -179,6 +179,23 @@ struct rtattr { #define RTA_DATA(rta) ((void*)(((char*)(rta)) + RTA_LENGTH(0))) #define RTA_PAYLOAD(rta) ((int)((rta)->rta_len) - RTA_LENGTH(0)) +/* Generic bitflags attribute content sent to the kernel. + * + * The nla_flag_values is a bitmap that defines the values being set + * The nla_flag_selector is a bitmask that defines which value is legit + * + * Examples: + * nla_flag_values = 0x0, and nla_flag_selector = 0x1 + * implies we are selecting bit 1 and we want to set its value to 0. + * + * nla_flag_values = 0x2, and nla_flag_selector = 0x2 + * implies we are selecting bit 2 and we want to set its value to 1. + * + */ +struct __nla_bit_flags { + __u32 nla_flag_values; + __u32 nla_flag_selector; +}; @@ -278,6 +295,7 @@ enum rt_scope_t { #define RTM_F_EQUALIZE 0x400 /* Multipath equalizer: NI */ #define RTM_F_PREFIX 0x800 /* Prefix addresses */ #define RTM_F_LOOKUP_TABLE 0x1000 /* set rtm_table to FIB lookup result */ +#define RTM_F_FIB_MATCH0x2000 /* return full fib lookup match */ /* Reserved table identifiers */ @@ -549,6 +567,7 @@ enum { TCA_STAB, TCA_PAD, TCA_DUMP_INVISIBLE, + TCA_CHAIN, __TCA_MAX }; @@ -581,6 +600,7 @@ enum { #define NDUSEROPT_MAX (__NDUSEROPT_MAX - 1) +#ifndef __KERNEL__ /* RTnetlink multicast groups - backwards compatibility for userspace */ #define RTMGRP_LINK1 #define RTMGRP_NOTIFY 2 @@ -601,6 +621,7 @@ enum { #define RTMGRP_DECnet_ROUTE 0x4000 #define RTMGRP_IPV6_PREFIX 0x2 +#endif /* RTnetlink multicast groups */ enum rtnetlink_groups { @@ -672,10 +693,29 @@ struct tcamsg { unsigned char tca__pad1; unsigned short tca__pad2; }; + +enum { + TCA_ROOT_UNSPEC, + TCA_ROOT_TAB, +#define TCA_ACT_TAB TCA_ROOT_TAB +#define TCAA_MAX TCA_ROOT_TAB + TCA_ROOT_FLAGS, + TCA_ROOT_COUNT, + TCA_ROOT_TIME_DELTA, /* in msecs */ + __TCA_ROOT_MAX, +#defineTCA_ROOT_MAX (__TCA_ROOT_MAX - 1) +}; + #define TA_RTA(r) ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct tcamsg #define TA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcamsg)) -#define TCA_ACT_TAB 1 /* attr type must be >=1 */ -#define TCAA_MAX 1 +/* tcamsg flags stored in attribute TCA_ROOT_FLAGS + * + * TCA_FLAG_LARGE_DUMP_ON user->kernel to request for larger than TCA_ACT_MAX_PRIO + * actions in a dump. All dump responses will contain the number of actions + * being dumped stored in for user app's consumption in TCA_ROOT_COUNT + * + */ +#define TCA_FLAG_LARGE_DUMP_ON (1 << 0) /* New extended info filters for IFLA_EXT_MASK */ #define RTEXT_FILTER_VF(1 << 0) @@ -687,4 +727,4 @@ struct tcamsg { -#endif /* __LINUX_RTNETLINK_H */ +#endif /* _UAPI__LINUX_RTNETLINK_H */ diff --git a/tc/f_basic.c b/tc/f_basic.c index d663668..8370ea6 100644 --- a/tc/f_basic.c +++ b/tc/f_basic.c @@ -135,7 +135,7 @@ static int basic_print_opt(struct filter_util *qu, FILE *f, } if (tb[TCA_BASIC_ACT]) { - tc_print_action(f, tb[TCA_BASIC_ACT]); + tc_print_action(f, tb[TCA_BASIC_ACT], 0); } return 0; diff --git a/tc/f_bpf.c b/tc/f_bpf.c index 75c44c0..6581293 100644 --- a/tc/f_bpf.c +++ b/tc/f_bpf.c @@ -236,7 +236,7 @@ static int bpf_print_opt(struct filter_util *qu, FILE *f, } if (tb[TCA_BPF_ACT]) - tc_print_action(f, tb[TCA_BPF_ACT]); + tc_print_action(f, tb[TCA_BPF_ACT], 0); return 0; } diff --git a/tc/f_cgroup.c b/tc/f_cgroup.c index ecf9909..633700e 100644 --- a/tc/f_cgroup.c +++ b/tc/f_cgroup.c @@ -102,7 +102,7 @@ static int cgroup_print_opt(struct filter_util *qu, FILE *f, } if (tb[TCA_CGROUP_ACT]) - tc_print_action(f, tb[TCA_CGROUP_ACT]); + tc_print_action(f, tb[TCA_CGROUP_ACT], 0); return 0; } diff --git a/tc/f_flow.c b/tc/f_flow.c index 09ddcaa..b157104 100644 --- a/tc/f_flow.c +++ b/tc/f_flow.c @@ -347,7 +347,7 @@ static int flow_print_opt(struct
Re: [PATCH net-next 1/1] net: reflect mark on tcp syn ack packets
On 17-06-10 07:02 PM, David Miller wrote: @@ -173,7 +173,8 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const struct sock *sk, } skb->priority = sk->sk_priority; - skb->mark = sk->sk_mark; + if (!skb->mark) + skb->mark = sk->sk_mark; Maybe this should both be "inet_request_mark()"? Challenge is making of a synack requires a new allocated skb; and sk is a listening socket - which should/has a mark of 0 meaning at ip_build_and_send_pkt() it overrides the value already set on the skb->mark. cheers, jamal Also, Lorenzo, please review.
[PATCH net-next v10 2/4] net sched actions: Use proper root attribute table for actions
From: Jamal Hadi SalimBug fix for an issue which has been around for about a decade. We got away with it because the enumeration was larger than needed. Fixes: 7ba699c604ab ("[NET_SCHED]: Convert actions from rtnetlink to new netlink API") Suggested-by: Jiri Pirko Reviewed-by: Simon Horman Signed-off-by: Jamal Hadi Salim --- net/sched/act_api.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/sched/act_api.c b/net/sched/act_api.c index aed6cf2..400eb6e 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -1072,7 +1072,7 @@ static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n, struct netlink_ext_ack *extack) { struct net *net = sock_net(skb->sk); - struct nlattr *tca[TCA_ACT_MAX + 1]; + struct nlattr *tca[TCAA_MAX + 1]; u32 portid = skb ? NETLINK_CB(skb).portid : 0; int ret = 0, ovr = 0; @@ -1080,7 +1080,7 @@ static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n, !netlink_capable(skb, CAP_NET_ADMIN)) return -EPERM; - ret = nlmsg_parse(n, sizeof(struct tcamsg), tca, TCA_ACT_MAX, NULL, + ret = nlmsg_parse(n, sizeof(struct tcamsg), tca, TCAA_MAX, NULL, extack); if (ret < 0) return ret; -- 1.9.1
[PATCH net-next v10 4/4] net sched actions: add time filter for action dumping
From: Jamal Hadi SalimThis adds support for filtering based on time since last used. When we are dumping a large number of actions it is useful to have the option of filtering based on when the action was last used to reduce the amount of data crossing to user space. With this patch the user space app sets the TCA_ROOT_TIME_DELTA attribute with the value in milliseconds with "time of interest since now". The kernel converts this to jiffies and does the filtering comparison matching entries that have seen activity since then and returns them to user space. Old kernels and old tc continue to work in legacy mode since they dont specify this attribute. Some example (we have 400 actions bound to 400 filters); at installation time using hacked tc which sets the time of interest to 120 seconds: prompt$ hackedtc actions ls action gact | grep index | wc -l 400 go get some coffee and wait for > 120 seconds and try again: prompt$ hackedtc actions ls action gact | grep index | wc -l 0 Lets see a filter bound to one of these actions: .. filter pref 10 u32 filter pref 10 u32 fh 800: ht divisor 1 filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 (rule hit 2 success 1) match 7f02/ at 12 (success 1 ) action order 1: gact action pass random type none pass val 0 index 23 ref 2 bind 1 installed 1145 sec used 802 sec Action statistics: Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 that coffee took long, no? It was good. Now lets ping -c 1 127.0.0.2, then run the actions again: prompt$ hackedtc actions ls action gact | grep index | wc -l 1 More details please: prompt$ hackedtc -s actions ls action gact action order 0: gact action pass random type none pass val 0 index 23 ref 2 bind 1 installed 1270 sec used 30 sec Action statistics: Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 And the filter? filter pref 10 u32 filter pref 10 u32 fh 800: ht divisor 1 filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 (rule hit 4 success 2) match 7f02/ at 12 (success 2 ) action order 1: gact action pass random type none pass val 0 index 23 ref 2 bind 1 installed 1324 sec used 84 sec Action statistics: Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Signed-off-by: Jamal Hadi Salim --- include/uapi/linux/rtnetlink.h | 1 + net/sched/act_api.c| 20 +++- 2 files changed, 20 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 7d2030f..988ebc2 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -701,6 +701,7 @@ enum { #define TCAA_MAX TCA_ROOT_TAB TCA_ROOT_FLAGS, TCA_ROOT_COUNT, + TCA_ROOT_TIME_DELTA, /* in msecs */ __TCA_ROOT_MAX, #defineTCA_ROOT_MAX (__TCA_ROOT_MAX - 1) }; diff --git a/net/sched/act_api.c b/net/sched/act_api.c index 935dc19..aa9549f 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -111,6 +111,7 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, struct sk_buff *skb, { int err = 0, index = -1, i = 0, s_i = 0, n_i = 0; u32 act_flags = cb->args[2]; + unsigned long jiffy_since = cb->args[3]; struct nlattr *nest; spin_lock_bh(>lock); @@ -128,6 +129,11 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, struct sk_buff *skb, if (index < s_i) continue; + if (jiffy_since && + time_after(jiffy_since, + (unsigned long)p->tcfa_tm.lastuse)) + continue; + nest = nla_nest_start(skb, n_i); if (nest == NULL) goto nla_put_failure; @@ -145,9 +151,11 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, struct sk_buff *skb, } } done: + if (index >= 0) + cb->args[0] = index + 1; + spin_unlock_bh(>lock); if (n_i) { - cb->args[0] += n_i; if (act_flags & TCA_FLAG_LARGE_DUMP_ON) cb->args[1] = n_i; } @@ -1077,6 +1085,7 @@ static int tcf_action_add(struct net *net, struct nlattr *nla, static const struct nla_policy tcaa_policy[TCA_ROOT_MAX + 1] = { [TCA_ROOT_FLAGS] = { .type = NLA_FLAG_BITS, .validation_data = _flags }, + [TCA_ROOT_TIME_DELTA] = { .type = NLA_U32 }, }; static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n, @@ -1167,7 +1176,9 @@ static int tc_dump_action(struct sk_buff *skb, struct netlink_callback *cb) struct __nla_bit_flags select_flags;
[PATCH net-next v10 3/4] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jamal Hadi SalimWhen you dump hundreds of thousands of actions, getting only 32 per dump batch even when the socket buffer and memory allocations allow is inefficient. With this change, the user will get as many as possibly fitting within the given constraints available to the kernel. The top level action TLV space is extended. An attribute TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON is set by the user indicating the user is capable of processing these large dumps. Older user space which doesnt set this flag doesnt get the large (than 32) batches. The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many actions are put in a single batch. As such user space app knows how long to iterate (independent of the type of action being dumped) instead of hardcoded maximum of 32 thus maintaining backward compat. Some results dumping 1.5M actions below: first an unpatched tc which doesnt understand these features... prompt$ time -p tc actions ls action gact | grep index | wc -l 150 real 1388.43 user 2.07 sys 1386.79 Now lets see a patched tc which sets the correct flags when requesting a dump: prompt$ time -p updatedtc actions ls action gact | grep index | wc -l 150 real 178.13 user 2.02 sys 176.96 That is about 8x performance improvement for tc app which sets its receive buffer to about 32K. Signed-off-by: Jamal Hadi Salim --- include/uapi/linux/rtnetlink.h | 22 +-- net/sched/act_api.c| 50 +- 2 files changed, 60 insertions(+), 12 deletions(-) diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 8f07957..7d2030f 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -693,10 +693,28 @@ struct tcamsg { unsigned char tca__pad1; unsigned short tca__pad2; }; + +enum { + TCA_ROOT_UNSPEC, + TCA_ROOT_TAB, +#define TCA_ACT_TAB TCA_ROOT_TAB +#define TCAA_MAX TCA_ROOT_TAB + TCA_ROOT_FLAGS, + TCA_ROOT_COUNT, + __TCA_ROOT_MAX, +#defineTCA_ROOT_MAX (__TCA_ROOT_MAX - 1) +}; + #define TA_RTA(r) ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct tcamsg #define TA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcamsg)) -#define TCA_ACT_TAB 1 /* attr type must be >=1 */ -#define TCAA_MAX 1 +/* tcamsg flags stored in attribute TCA_ROOT_FLAGS + * + * TCA_FLAG_LARGE_DUMP_ON user->kernel to request for larger than TCA_ACT_MAX_PRIO + * actions in a dump. All dump responses will contain the number of actions + * being dumped stored in for user app's consumption in TCA_ROOT_COUNT + * + */ +#define TCA_FLAG_LARGE_DUMP_ON (1 << 0) /* New extended info filters for IFLA_EXT_MASK */ #define RTEXT_FILTER_VF(1 << 0) diff --git a/net/sched/act_api.c b/net/sched/act_api.c index 400eb6e..935dc19 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -110,6 +110,7 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, struct sk_buff *skb, struct netlink_callback *cb) { int err = 0, index = -1, i = 0, s_i = 0, n_i = 0; + u32 act_flags = cb->args[2]; struct nlattr *nest; spin_lock_bh(>lock); @@ -138,14 +139,18 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, struct sk_buff *skb, } nla_nest_end(skb, nest); n_i++; - if (n_i >= TCA_ACT_MAX_PRIO) + if (!(act_flags & TCA_FLAG_LARGE_DUMP_ON) && + n_i >= TCA_ACT_MAX_PRIO) goto done; } } done: spin_unlock_bh(>lock); - if (n_i) + if (n_i) { cb->args[0] += n_i; + if (act_flags & TCA_FLAG_LARGE_DUMP_ON) + cb->args[1] = n_i; + } return n_i; nla_put_failure: @@ -1068,11 +1073,17 @@ static int tcf_action_add(struct net *net, struct nlattr *nla, return tcf_add_notify(net, n, , portid); } +static u32 allowed_flags = TCA_FLAG_LARGE_DUMP_ON; +static const struct nla_policy tcaa_policy[TCA_ROOT_MAX + 1] = { + [TCA_ROOT_FLAGS] = { .type = NLA_FLAG_BITS, +.validation_data = _flags }, +}; + static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n, struct netlink_ext_ack *extack) { struct net *net = sock_net(skb->sk); - struct nlattr *tca[TCAA_MAX + 1]; + struct nlattr *tca[TCA_ROOT_MAX + 1]; u32 portid = skb ? NETLINK_CB(skb).portid : 0; int ret = 0, ovr = 0; @@ -1080,7 +1091,7 @@ static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n, !netlink_capable(skb, CAP_NET_ADMIN)) return -EPERM; - ret = nlmsg_parse(n, sizeof(struct tcamsg), tca, TCAA_MAX, NULL, + ret =
[PATCH net-next v10 1/4] net netlink: Add new type NLA_FLAG_BITS
From: Jamal Hadi SalimGeneric bitflags attribute content sent to the kernel by user. With this type the user can either set or unset a flag in the kernel. The nla_flag_values is a bitmap that defines the values being set The nla_flag_selector is a bitmask that defines which value is legit. A check is made to ensure the rules that a kernel subsystem always conforms to bitflags the kernel already knows about. i.e if the user tries to set a bit flag that is not understood then the _it will be rejected_. In the most basic form, the user specifies the attribute policy as: [ATTR_GOO] = { .type = NLA_FLAG_BITS, .validation_data = }, where myvalidflags is the bit mask of the flags the kernel understands. If the user _does not_ provide myvalidflags then the attribute will also be rejected. Examples: nla_flag_values = 0x0, and nla_flag_selector = 0x1 implies we are selecting bit 1 and we want to set its value to 0. nla_flag_values = 0x2, and nla_flag_selector = 0x2 implies we are selecting bit 2 and we want to set its value to 1. This patch also provides an extra feature: a validation callback that could be speaciliazed for other types. This feature is intended to be used by a kernel subsystem to check for a combination of bits being present. Example "bit x is valid only if bit y and z are present". So a kernel subsystem could specify validation rules of the following nature: [ATTR_GOO] = { .type = MYTYPE, .validation_data = _data, .validate_content = mycontent_validator }, With validator callback looking like: int mycontent_validator(const struct nlattr *nla, void *valid_data) { const struct myattribute *user_data = nla_data(nla); struct myvalidation_struct *valid_data_constraint = valid_data; ... return appropriate error code etc ... } Signed-off-by: Jamal Hadi Salim --- include/net/netlink.h | 11 +++ include/uapi/linux/rtnetlink.h | 17 + lib/nlattr.c | 25 + 3 files changed, 53 insertions(+) diff --git a/include/net/netlink.h b/include/net/netlink.h index 0170917..8ab9784 100644 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -6,6 +6,11 @@ #include #include +struct nla_bit_flags { + u32 nla_flag_values; + u32 nla_flag_selector; +}; + /* * Netlink Messages and Attributes Interface (As Seen On TV) * @@ -178,6 +183,7 @@ enum { NLA_S16, NLA_S32, NLA_S64, + NLA_FLAG_BITS, __NLA_TYPE_MAX, }; @@ -206,6 +212,7 @@ enum { *NLA_MSECSLeaving the length field zero will verify the * given type fits, using it verifies minimum length * just like "All other" + *NLA_FLAG_BITSA bitmap/bitselector attribute *All otherMinimum length of attribute payload * * Example: @@ -213,11 +220,15 @@ enum { * [ATTR_FOO] = { .type = NLA_U16 }, * [ATTR_BAR] = { .type = NLA_STRING, .len = BARSIZ }, * [ATTR_BAZ] = { .len = sizeof(struct mystruct) }, + * [ATTR_GOO] = { .type = NLA_FLAG_BITS, .validation_data = }, * }; */ struct nla_policy { u16 type; u16 len; + void*validation_data; + int (*validate_content)(const struct nlattr *nla, + const void *validation_data); }; /** diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 564790e..8f07957 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -179,6 +179,23 @@ struct rtattr { #define RTA_DATA(rta) ((void*)(((char*)(rta)) + RTA_LENGTH(0))) #define RTA_PAYLOAD(rta) ((int)((rta)->rta_len) - RTA_LENGTH(0)) +/* Generic bitflags attribute content sent to the kernel. + * + * The nla_flag_values is a bitmap that defines the values being set + * The nla_flag_selector is a bitmask that defines which value is legit + * + * Examples: + * nla_flag_values = 0x0, and nla_flag_selector = 0x1 + * implies we are selecting bit 1 and we want to set its value to 0. + * + * nla_flag_values = 0x2, and nla_flag_selector = 0x2 + * implies we are selecting bit 2 and we want to set its value to 1. + * + */ +struct __nla_bit_flags { + __u32 nla_flag_values; + __u32 nla_flag_selector; +}; diff --git a/lib/nlattr.c b/lib/nlattr.c index a7e0b16..78fed43 100644 --- a/lib/nlattr.c +++ b/lib/nlattr.c @@ -27,6 +27,21 @@ [NLA_S64] = sizeof(s64), }; +static int validate_nla_bit_flags(const struct nlattr *nla, void *valid_data) +{ + const struct nla_bit_flags *nbf = nla_data(nla); + u32 *valid_flags_mask = valid_data; + + if (!valid_data) +
[PATCH net-next v10 0/4] net sched actions: improve dump performance
From: Jamal Hadi SalimChanges since v9: - 1) General consensus: - remove again the use of BIT() to maintain uapi consistency ;-> 1) Jiri: - Add a new netlink type NLA_FLAG_BITS to check for valid bits and use it instead of inline vetting. Changes since v8: - 1) Jiri: - Add back the use of BIT(). Eventually fix iproute2 instead - Rename VALID_TCA_FLAGS to VALID_TCA_ROOT_FLAGS Changes since v7: - Jamal: No changes. Patch 1 went out twice. Resend without two copies of patch 1 changes since v6: - 1) DaveM: New rules for netlink messages. From now on we are going to start checking for bits that are not used and rejecting anything we dont understand. In the future this is going to require major changes to user space code (tc etc). This is just a start. To quote, David: " Again, bits you aren't using now, make sure userspace doesn't set them. And if it does, reject. " Added checks for ensuring things work as above. 2) Jiri: a)Fix the commit message to properly use "Fixes" description b)Align assignments for nla_policy Changes since v5: 0) Remove use of BIT() because it is kernel specific. Requires a separate patch (Jiri can submit that in his cleanups) 1)To paraphrase Eric D. "memcpy(nla_data(count_attr), >args[1], sizeof(u32)); wont work on 64bit BE machines because cb->args[1] (which is 64 bit is larger in size than sizeof(u32))" Fixed 2) Jiri Pirko i) Spotted a bug fix mixed in the patch for wrong TLV fix. Add patch 1/3 to address this. Make part of this series because of dependencies. ii) Rename ACT_LARGE_DUMP_ON -> TCA_FLAG_LARGE_DUMP_ON iii) Satisfy Jiri's obsession against the noun "tcaa" a)Rename struct nlattr *tcaa --> struct nlattr *tb b)Rename TCAA_ACT_XXX -> TCA_ROOT_XXX Changes since v4: - 1) Eric D. pointed out that when all skb space is used up by the dump there will be no space to insert the TCAA_ACT_COUNT attribute. 2) Jiri: i) Change: enum { TCAA_UNSPEC, TCAA_ACT_TAB, TCAA_ACT_FLAGS, TCAA_ACT_COUNT, TCAA_ACT_TIME_FILTER, __TCAA_MAX }; #define TCAA_MAX (__TCAA_MAX - 1) #define ACT_LARGE_DUMP_ON (1 << 0) to: enum { TCAA_UNSPEC, TCAA_ACT_TAB, #define TCA_ACT_TAB TCAA_ACT_TAB TCAA_ACT_FLAGS, TCAA_ACT_COUNT, __TCAA_MAX, #defineTCAA_MAX (__TCAA_MAX - 1) }; #define ACT_LARGE_DUMP_ON BIT(0) Jiri plans to followup with the rest of the code to make the style consistent. ii) Rename attribute TCAA_ACT_TIME_FILTER --> TCAA_ACT_TIME_DELTA iii) Rename variable jiffy_filter --> jiffy_since iv) Rename msecs_filter --> msecs_since v) get rid of unused cb->args[0] and rename cb->args[4] to cb->args[0] Earlier Changes - Jiri mostly on names of things. Jamal Hadi Salim (4): net netlink: Add new type NLA_FLAG_BITS net sched actions: Use proper root attribute table for actions net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch net sched actions: add time filter for action dumping include/net/netlink.h | 11 +++ include/uapi/linux/rtnetlink.h | 40 ++-- lib/nlattr.c | 25 +++ net/sched/act_api.c| 70 +++--- 4 files changed, 133 insertions(+), 13 deletions(-) -- 1.9.1
Re: [PATCHv3 net] xfrm: move xfrm_garbage_collect out of xfrm_policy_flush
On Sun, Jun 11, 2017 at 9:44 AM, Hangbin Liuwrote: > Now we will force to do garbage collection if any policy removed in > xfrm_policy_flush(). But during xfrm_net_exit(). We call flow_cache_fini() > first and set set fc->percpu to NULL. Then after we call xfrm_policy_fini() > -> frxm_policy_flush() -> flow_cache_flush(), we will get NULL pointer > dereference when check percpu_empty. The code path looks like: > > flow_cache_fini() > - fc->percpu = NULL > xfrm_policy_fini() > - xfrm_policy_flush() > - xfrm_garbage_collect() > - flow_cache_flush() > - flow_cache_percpu_empty() > - fcp = per_cpu_ptr(fc->percpu, cpu) > > To reproduce, just add ipsec in netns and then remove the netns. > > v2: > As Xin Long suggested, since only two other places need to call it. move > xfrm_garbage_collect() outside xfrm_policy_flush(). > > v3: > Fix subject mismatch after v2 fix. > > Fixes: 35db06912189 ("xfrm: do the garbage collection after flushing policy") > Signed-off-by: Hangbin Liu > --- > net/key/af_key.c | 2 ++ > net/xfrm/xfrm_policy.c | 4 > net/xfrm/xfrm_user.c | 1 + > 3 files changed, 3 insertions(+), 4 deletions(-) > > diff --git a/net/key/af_key.c b/net/key/af_key.c > index 512dc43..5103f92 100644 > --- a/net/key/af_key.c > +++ b/net/key/af_key.c > @@ -2755,6 +2755,8 @@ static int pfkey_spdflush(struct sock *sk, struct > sk_buff *skb, const struct sad > int err, err2; > > err = xfrm_policy_flush(net, XFRM_POLICY_TYPE_MAIN, true); > + if (!err) > + xfrm_garbage_collect(net); > err2 = unicast_flush_resp(sk, hdr); > if (err || err2) { > if (err == -ESRCH) /* empty table - old silent behavior */ > diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c > index ed4e52d..643a18f 100644 > --- a/net/xfrm/xfrm_policy.c > +++ b/net/xfrm/xfrm_policy.c > @@ -1006,10 +1006,6 @@ int xfrm_policy_flush(struct net *net, u8 type, bool > task_valid) > err = -ESRCH; > out: > spin_unlock_bh(>xfrm.xfrm_policy_lock); > - > - if (cnt) > - xfrm_garbage_collect(net); > - > return err; > } > EXPORT_SYMBOL(xfrm_policy_flush); > diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c > index 38614df..86116e9 100644 > --- a/net/xfrm/xfrm_user.c > +++ b/net/xfrm/xfrm_user.c > @@ -2027,6 +2027,7 @@ static int xfrm_flush_policy(struct sk_buff *skb, > struct nlmsghdr *nlh, > return 0; > return err; > } > + xfrm_garbage_collect(net); > > c.data.type = type; > c.event = nlh->nlmsg_type; > -- > 2.5.5 > Reviewed-by: Xin Long though I could still see some typo err in changelog.
ipmr: MFC routes when VIF deleted
I have been looking into some weird behavior, and I am not sure whether it is a bug or a feature. When a VIF with index v gets deleted, the MFC routes does not get updated, which means that there can be routes pointing to that VIF. On datapath, when packet hits that route, the VIF validity will be checked and will not be sent to that device (but still, the route does not get updated). Now, if the user creates another VIF with the same index v but different underlay device, the same route will forward the traffic to that device. It is relevant to mention that when user adds a MFC route, only the active VIFs are used, so the flow of adding a route with dummy VIF indices and then connecting those VIF indices to real device is not supported. The only way to create a MFC route that has non existing VIFs is to create one with existing VIFs and then delete them. Do we really want to support that? To me, it looks like a buggy flow and I suggest that upon VIF deletion, the MFC routes will be updated to not point to any non existing VIF indices.
Re: [PATCH 0/6] Constant Time Memory Comparisons Are Important
"Jason A. Donenfeld"writes: > Whenever you're comparing two MACs, it's important to do this using > crypto_memneq instead of memcmp. With memcmp, you leak timing information, > which could then be used to iteratively forge a MAC. Do you have any pointers where I could learn more about this? -- Kalle Valo
Re: [PATCH v2] brcmfmac: Fix glom_skb leak in brcmf_sdiod_recv_chain
On 11-06-17 02:18, Peter Housel wrote: > >> On Jun 10, 2017, at 12:27 PM, Arend van Spriel >>wrote: >> >> On 03-06-17 17:36, Andy Shevchenko wrote: >>> On Sat, Jun 3, 2017 at 1:29 AM, Peter S. Housel wrote: An earlier change to this function (3bdae810721b) fixed a leak in the case of an unsuccessful call to brcmf_sdiod_buffrw(). However, the glom_skb buffer, used for emulating a scattering read, is never used or referenced after its contents are copied into the destination buffers, and therefore always needs to be freed by the end of the function. >> >> [snip] >> + skb_queue_walk(pktq, skb) { + memcpy(skb->data, glom_skb->data, skb->len); + skb_pull(glom_skb, skb->len); + } } >>> + brcmu_pkt_buf_free_skb(glom_skb); >>> >>> Can we just add this one line instead or I'm missing something? >> >> I guess. We don't want to walk the packet queue if glom_skb is not >> carrying data due to brcmf_sdiod_buffrw() failure. >> >> So I would go with the patch below as brcmu_pkt_buf_free_skb() simply >> ignores null pointer. > > I’m fine with this, or indeed most of the other proposed solutions. The > important thing is that the leak is fixed; in the driver's current state I > was able to run our wearable device out of memory in just over 20 seconds > running iperf. Sure. The reason behind the suggestion from Franky was to get rid of the label inside branch and I agree with that. To address Andy's comment I think my proposal should tackle that. Just out of curiosity, we added the broken-sg-support thing for OMAP platform. So what platform/mmc-host are you using. I try to keep an overview where this workaround is needed. Regards, Arend
Re: ARM GLX Khadas VIM Pro - Ethernet detected as only 10Mbps and stalled after some traffic
Hi Andrew, On Sat, Jun 10, 2017 at 5:27 PM, Andrew Lunnwrote: >> Also what Martin Blumenstingl wrote is following which is also crucial >> for fixing the issue: >> Amlogic has given their ethernet PHY driver some updates [2], it now >> includes wake-on-lan, and they now have an internal_phy_read_status >> which uses reset_internal_phy if there's a link and some error counter >> exceeds some magic value. > > Hi Crow > > You could probably just drop the Amlogic driver into mainline and see > if it works better. If that solves your problem, we can look at > merging the changes. > > Andrew Thank your for the suggestion, and thanks Martin to explaining me over IRC what actually I should do. I recompiled mainline kernel 4.12-rc4 with the Amlogic driver: replaced drivers/net/phy/meson-gxl.c with https://github.com/khadas/linux/blob/ubuntu-4.9/drivers/amlogic/ethernet/phy/amlogic.c But this did not solve the issue. As soon as i start git clone i lose network connection to device (no session timeout/disconnect this time, but I am unable to reconnect over SSH or to get OK ping replay back). Here are the tests: Linux khadasvimpro 4.12.0-rc4-4-ARCH #1 SMP Sun Jun 11 03:39:21 CEST 2017 aarch64 GNU/Linux # modinfo meson_gxl filename: /lib/modules/4.12.0-rc4-4-ARCH/kernel/drivers/net/phy/meson-gxl.ko.gz license:GPL author: Neil Armstrong author: Baoqi wang description:Amlogic Meson GXL Internal PHY driver alias: mdio:0001100101000100 depends: intree: Y vermagic: 4.12.0-rc4-4-ARCH SMP mod_unload aarch64 # # mii-tool -vvv eth0 Using SIOCGMIIPHY=0x8947 eth0: negotiated 1000baseT-HD flow-control, link ok registers for MII PHY 8: 1000 782d 0181 4400 01e1 c1e1 000f 2001 0040 0002 40e8 5400 1c1c fff0 040a 1407 004a 105a product info: vendor 00:60:51, model 0 rev 0 basic mode: autonegotiation enabled basic status: autonegotiation complete, link ok capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD link partner: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD $ over SSH startet following but it stall already at 0%: $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Cloning into 'linux-stable'... remote: Counting objects: 5948690, done. remote: Compressing objects: 100% (124799/124799), done. Receiving objects: 0% (11668/5948690), 2.27 MiB | 4.52 MiB/s shows timeout while trying to sync with NTP server: # journalctl -f systemd-timesyncd[299]: Timed out waiting for reply from 83.68.137.76:123 (2.at.pool.ntp.org). systemd-timesyncd[299]: Timed out waiting for reply from 86.59.113.114:123 (2.at.pool.ntp.org). while still not working dump the register: # mii-tool -vvv eth0 Using SIOCGMIIPHY=0x8947 eth0: negotiated 1000baseT-HD flow-control, link ok registers for MII PHY 8: 1000 782d 0181 4400 01e1 c1e1 000d 2001 0040 0002 40e8 5400 1c1c fff0 040a 1407 105a product info: vendor 00:60:51, model 0 rev 0 basic mode: autonegotiation enabled basic status: autonegotiation complete, link ok capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD link partner: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD # # ifconfig eth0 down && ifconfig eth0 up # dmesg -T | tail -n 10 [Sun Jun 11 07:40:34 2017] meson8b-dwmac c941.ethernet eth0: device MAC address 00:15:18:01:81:31 [Sun Jun 11 07:40:34 2017] random: crng init done [Sun Jun 11 07:40:34 2017] Meson GXL Internal PHY 0.e40908ff:08: attached PHY driver [Meson GXL Internal PHY] (mii_bus:phy_addr=0.e40908ff:08, irq=-1) [Sun Jun 11 07:40:34 2017] meson8b-dwmac c941.ethernet eth0: PTP not supported by HW [Sun Jun 11 07:40:36 2017] brcmfmac: brcmf_c_preinit_dcmds: Firmware version = wl0: Mar 1 2015 07:29:38 version 7.45.18 (r538002) FWID 01-6a2c8ad4 [Sun Jun 11 07:40:36 2017] meson8b-dwmac c941.ethernet eth0: Link is Up - 100Mbps/Full - flow control off [Sun Jun 11 07:41:23 2017] EXT4-fs (mmcblk1p1): mounted filesystem with ordered data mode. Opts: (null) [Sun Jun 11 07:54:28 2017] Meson GXL Internal PHY 0.e40908ff:08: attached PHY driver [Meson GXL Internal PHY] (mii_bus:phy_addr=0.e40908ff:08, irq=-1) [Sun Jun 11 07:54:28 2017] meson8b-dwmac c941.ethernet eth0: PTP not supported by HW [Sun Jun 11 07:54:30 2017] meson8b-dwmac c941.ethernet eth0: Link is Up - 100Mbps/Full - flow control off # then I took eth0 and wlan0 up (same ArchLinuxArm with same mainline kernel same place where the files are stored eMMC) and this
Re: [PATCH net v3] net: ipmr: Fix some mroute forwarding issues in vrf's
On 06/10/2017 11:30 PM, Donald Sharp wrote: > This patch fixes two issues: > > 1) When forwarding on *,G mroutes that are in a vrf, the > kernel was dropping information about the actual incoming > interface when calling ip_mr_forward from ip_mr_input. > This caused ip_mr_forward to send the multicast packet > back out the incoming interface. Fix this by > modifying ip_mr_forward to be handed the correctly > resolved dev. > > 2) When a unresolved cache entry is created we store > the incoming skb on the unresolved cache entry and > upon mroute resolution from the user space daemon, > we attempt to forward the packet. Again we were > not resolving to the correct incoming device for > a vrf scenario, before calling ip_mr_forward. > Fix this by resolving to the correct interface > and calling ip_mr_forward with the result. > > Fixes: e58e41596811 ("net: Enable support for VRF with ipv4 multicast") > Signed-off-by: Donald Sharp> --- > v2: Fixed title > v3: Addressed Review comments by Andrew Lunn and David Ahern > > net/ipv4/ipmr.c | 32 +++- > 1 file changed, 15 insertions(+), 17 deletions(-) > > diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c > index 551de4d..09368a1 100644 > --- a/net/ipv4/ipmr.c > +++ b/net/ipv4/ipmr.c > @@ -101,8 +101,8 @@ static struct mr_table *ipmr_new_table(struct net *net, > u32 id); > static void ipmr_free_table(struct mr_table *mrt); > > static void ip_mr_forward(struct net *net, struct mr_table *mrt, > - struct sk_buff *skb, struct mfc_cache *cache, > - int local); > + struct net_device *dev, struct sk_buff *skb, > + struct mfc_cache *cache, int local); > static int ipmr_cache_report(struct mr_table *mrt, >struct sk_buff *pkt, vifi_t vifi, int assert); > static int __ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb, > @@ -988,7 +988,7 @@ static void ipmr_cache_resolve(struct net *net, struct > mr_table *mrt, > > rtnl_unicast(skb, net, NETLINK_CB(skb).portid); > } else { > - ip_mr_forward(net, mrt, skb, c, 0); > + ip_mr_forward(net, mrt, skb->dev, skb, c, 0); > } > } > } > @@ -1073,7 +1073,7 @@ static int ipmr_cache_report(struct mr_table *mrt, > > /* Queue a packet for resolution. It gets locked cache entry! */ > static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi, > - struct sk_buff *skb) > + struct sk_buff *skb, struct net_device *dev) > { > const struct iphdr *iph = ip_hdr(skb); > struct mfc_cache *c; > @@ -1130,6 +1130,10 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, > vifi_t vifi, > kfree_skb(skb); > err = -ENOBUFS; > } else { > + if (dev) { > + skb->dev = dev; > + skb->skb_iif = dev->ifindex; > + } > skb_queue_tail(>mfc_un.unres.unresolved, skb); > err = 0; > } > @@ -1828,10 +1832,10 @@ static int ipmr_find_vif(struct mr_table *mrt, struct > net_device *dev) > > /* "local" means that we should preserve one skb (for local delivery) */ > static void ip_mr_forward(struct net *net, struct mr_table *mrt, > - struct sk_buff *skb, struct mfc_cache *cache, > - int local) > + struct net_device *dev, struct sk_buff *skb, > + struct mfc_cache *cache, int local) > { > - int true_vifi = ipmr_find_vif(mrt, skb->dev); > + int true_vifi = ipmr_find_vif(mrt, dev); > int psend = -1; > int vif, ct; > > @@ -1853,13 +1857,7 @@ static void ip_mr_forward(struct net *net, struct > mr_table *mrt, > } > > /* Wrong interface: drop packet and (maybe) send PIM assert. */ > - if (mrt->vif_table[vif].dev != skb->dev) { > - struct net_device *mdev; > - > - mdev = l3mdev_master_dev_rcu(mrt->vif_table[vif].dev); > - if (mdev == skb->dev) > - goto forward; > - > + if (mrt->vif_table[vif].dev != dev) { > if (rt_is_output_route(skb_rtable(skb))) { > /* It is our own packet, looped back. >* Very complicated situation... > @@ -2053,7 +2051,7 @@ int ip_mr_input(struct sk_buff *skb) > read_lock(_lock); > vif = ipmr_find_vif(mrt, dev); > if (vif >= 0) { > - int err2 = ipmr_cache_unresolved(mrt, vif, skb); > + int err2 = ipmr_cache_unresolved(mrt, vif, skb, dev); > read_unlock(_lock); > > return err2; > @@ -2064,7 +2062,7 @@ int ip_mr_input(struct sk_buff *skb) > } > > read_lock(_lock); > -