Re: [PATCH 0/6] Constant Time Memory Comparisons Are Important

2017-06-11 Thread Emmanuel Grumbach
On Mon, Jun 12, 2017 at 12:30 AM, Emil Lenngren  wrote:
> 2017-06-11 22:48 GMT+02:00 Emmanuel Grumbach :
>> On Sun, Jun 11, 2017 at 4:36 PM, Kees Cook  wrote:
>>>
>>> On Sun, Jun 11, 2017 at 1:13 AM, Kalle Valo  wrote:
>>> > "Jason A. Donenfeld"  writes:
>>> >
>>> >> Whenever you're comparing two MACs, it's important to do this using
>>> >> crypto_memneq instead of memcmp. With memcmp, you leak timing 
>>> >> information,
>>> >> which could then be used to iteratively forge a MAC.
>>> >
>>> > Do you have any pointers where I could learn more about this?
>>>
>>> While not using C specifically, this talks about the problem generally:
>>> https://www.chosenplaintext.ca/articles/beginners-guide-constant-time-cryptography.html
>>>
>>
>> Sorry for the stupid question, but the MAC address is in plaintext in
>> the air anyway or easily accessible via user space tools. I fail to
>> see what it is so secret about a MAC address in that code where that
>> same MAC address is accessible via myriads of ways.
>
> I think you're mixing up Media Access Control (MAC) addresses with
> Message Authentication Code (MAC). The second one is a cryptographic
> signature of a message.

Obviously... Sorry for the noise.


[iproute2][PATCH] ip: Remove unneed header

2017-06-11 Thread Changhyeok Bae
Fix redefinition of struct ethhdr with a suitably patched musl libc
that suppresses the kernel if_ether.h.

Signed-off-by: Changhyeok Bae 
---
 ip/iplink_bridge.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index cccdec1..f065b22 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -13,7 +13,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.7.4



I am waiting for your response to my numerous un-replied emails to you concerning your family inheritance fund ($7.5 million dollars). I seek your assistance and I assured of your capability to champi

2017-06-11 Thread Fadi Ayele



Re: [PATCH net v3] net: ipmr: Fix some mroute forwarding issues in vrf's

2017-06-11 Thread David Ahern
On 6/11/17 4:15 PM, David Miller wrote:
> 
> Applied and queued up for -stable, thank you.
> 

The backport will need bcfc7d33110b; that commit should have had the
same Fixes tag -- e58e41596811 ("net: Enable support for VRF with ipv4
multicast")


Re: [PATCH v2] arm: eBPF JIT compiler

2017-06-11 Thread Kees Cook
On Tue, Jun 6, 2017 at 12:47 PM, Shubham Bansal
 wrote:
> Hi Russell, Alexei, David, Daniel, kees,
>
> Any update on this patch moving forward?

Since this has gotten testing by various people and passes the
existing self-tests, I think this can probably go in via the ARM patch
tracker? Russell does that sound okay to you?

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH v2 1/2] tcp: md5: add an address prefix for key lookup

2017-06-11 Thread kbuild test robot
Hi Ivan,

[auto build test WARNING on net/master]
[also build test WARNING on v4.12-rc4 next-20170609]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Ivan-Delalande/tcp-md5-add-an-address-prefix-for-key-lookup/20170611-184237
config: x86_64-randconfig-s2-06120830 (attached as .config)
compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   net//ipv4/tcp_ipv4.c: In function 'tcp_md5_do_lookup':
>> net//ipv4/tcp_ipv4.c:908: warning: unused variable 'size'

vim +/size +908 net//ipv4/tcp_ipv4.c

^1da177e4 Linus Torvalds   2005-04-16  892  }
^1da177e4 Linus Torvalds   2005-04-16  893  
cfb6eeb4c YOSHIFUJI Hideaki2006-11-14  894  #ifdef CONFIG_TCP_MD5SIG
cfb6eeb4c YOSHIFUJI Hideaki2006-11-14  895  /*
cfb6eeb4c YOSHIFUJI Hideaki2006-11-14  896   * RFC2385 MD5 checksumming 
requires a mapping of
cfb6eeb4c YOSHIFUJI Hideaki2006-11-14  897   * IP address->MD5 Key.
cfb6eeb4c YOSHIFUJI Hideaki2006-11-14  898   * We need to maintain these in 
the sk structure.
cfb6eeb4c YOSHIFUJI Hideaki2006-11-14  899   */
cfb6eeb4c YOSHIFUJI Hideaki2006-11-14  900  
cfb6eeb4c YOSHIFUJI Hideaki2006-11-14  901  /* Find the Key structure for 
an address.  */
b83e3deb9 Eric Dumazet 2015-09-25  902  struct tcp_md5sig_key 
*tcp_md5_do_lookup(const struct sock *sk,
a915da9b6 Eric Dumazet 2012-01-31  903  
 const union tcp_md5_addr *addr,
a915da9b6 Eric Dumazet 2012-01-31  904  
 int family)
cfb6eeb4c YOSHIFUJI Hideaki2006-11-14  905  {
fd3a154a0 Eric Dumazet 2015-03-24  906  const struct tcp_sock 
*tp = tcp_sk(sk);
a915da9b6 Eric Dumazet 2012-01-31  907  struct tcp_md5sig_key 
*key;
a915da9b6 Eric Dumazet 2012-01-31 @908  unsigned int size = 
sizeof(struct in_addr);
fd3a154a0 Eric Dumazet 2015-03-24  909  const struct 
tcp_md5sig_info *md5sig;
4b9e1d2ec Ivan Delalande   2017-06-09  910  __be32 mask;
4b9e1d2ec Ivan Delalande   2017-06-09  911  struct tcp_md5sig_key 
*best_match = NULL;
4b9e1d2ec Ivan Delalande   2017-06-09  912  bool match;
cfb6eeb4c YOSHIFUJI Hideaki2006-11-14  913  
a8afca032 Eric Dumazet 2012-01-31  914  /* caller either holds 
rcu_read_lock() or socket lock */
a8afca032 Eric Dumazet 2012-01-31  915  md5sig = 
rcu_dereference_check(tp->md5sig_info,
1e1d04e67 Hannes Frederic Sowa 2016-04-05  916  
   lockdep_sock_is_held(sk));

:: The code at line 908 was first introduced by commit
:: a915da9b69273815527ccb3789421cb7027b545b tcp: md5: rcu conversion

:: TO: Eric Dumazet <eric.duma...@gmail.com>
:: CC: David S. Miller <da...@davemloft.net>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: ipmr: MFC routes when VIF deleted

2017-06-11 Thread Donald Sharp
I would argue that this is just an unintended side effect of the
original implementation.  Shuffling interface vif's seems like a good
way to churn a significant number of mroutes to me.  If you want to
use a interface it probably is going to be already be configured with
it's own vif, and as such it's just better to figure out the new
proper RPF and insert that mroute.  If it isn't, then I need to bring
up that new vif ( and all it's associated pim information, like
establishing a new neighbor relationship ) when I do need to shuffle
interface vif's, In that time I'm dropping a signficant number of
packets while this is happening and all our end users are going to be
screaming at us to fix that hole.


donald



On Sun, Jun 11, 2017 at 12:34 PM, Nikolay Aleksandrov
 wrote:
> On 11/06/17 11:55, Yotam Gigi wrote:
>> I have been looking into some weird behavior, and I am not sure whether it is
>> a bug or a feature.
>>
>> When a VIF with index v gets deleted, the MFC routes does not get updated, 
>> which
>> means that there can be routes pointing to that VIF. On datapath, when packet
>> hits that route, the VIF validity will be checked and will not be sent to 
>> that
>> device (but still, the route does not get updated).  Now, if the user creates
>> another VIF with the same index v but different underlay device, the same 
>> route
>> will forward the traffic to that device.
>>
>> It is relevant to mention that when user adds a MFC route, only the active 
>> VIFs
>> are used, so the flow of adding a route with dummy VIF indices and then
>> connecting those VIF indices to real device is not supported. The only way to
>> create a MFC route that has non existing VIFs is to create one with existing
>> VIFs and then delete them.
>>
>> Do we really want to support that?  To me, it looks like a buggy flow and I
>> suggest that upon VIF deletion, the MFC routes will be updated to not point 
>> to
>> any non existing VIF indices.
>>
>
> Hi Yotam,
> I'm not strongly against such change but my feeling is that we shouldn't 
> change it.
> I think we shouldn't change it because we cannot guarantee that we won't 
> break some
> user-space app that relies on this behaviour or uses it as an optimization.
> User-space ipmr apps work in sync with the kernel and are usually the only 
> ones
> doing such changes so their internal state will be always valid, and I'd guess
> they already deal with this one way or another.
> My second argument is a minor one and is about performance. There are some 
> apps
> (e.g. pimd) which use interface add/del on interface state change (up/down) 
> and
> this could make these ops slower on large setups.
>
> Again I see how this could be helpful and should've probably been like that 
> from the
> start, so if other people feel confident we won't break anything then I 
> wouldn't
> mind the change.
>
> Thanks,
>  Nik
>
>
>


[PATCH net-next v2] vxlan: dont migrate permanent fdb entries during learn

2017-06-11 Thread Roopa Prabhu
From: Roopa Prabhu 

This patch fixes vxlan_snoop to not move permanent fdb entries
on learn events. This is consistent with the bridge fdb
handling of permanent entries.

Fixes: 26a41ae60438 ("vxlan: only migrate dynamic FDB entries")
Signed-off-by: Roopa Prabhu 
---
v2 - added Fixes tag

 drivers/net/vxlan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 7cb21a0..e045c34 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -970,7 +970,7 @@ static bool vxlan_snoop(struct net_device *dev,
return false;
 
/* Don't migrate static entries, drop packets */
-   if (f->state & NUD_NOARP)
+   if (f->state & (NUD_PERMANENT | NUD_NOARP))
return true;
 
if (net_ratelimit())
-- 
1.9.1



Re: [PATCH net-next] vxlan: dont migrate permanent fdb entries during learn

2017-06-11 Thread Roopa Prabhu
On Sun, Jun 11, 2017 at 4:04 PM, David Miller  wrote:
> From: Roopa Prabhu 
> Date: Sun, 11 Jun 2017 15:51:22 -0700
>
>> From: Roopa Prabhu 
>>
>> This patch fixes vxlan_snoop to not move permanent fdb entries
>> on learn events. This is consistent with the bridge fdb
>> handling of permanent entries.
>>
>> Signed-off-by: Roopa Prabhu 
>
> Is there an appropriate Fixes: tag by chance?

Just checked history. It has been that way since beginning of time.
Initial vxlan implementation (year 2012) allowed migration of any
entry via learn. A 2013 commit 26a41ae60438 ("vxlan: only migrate
dynamic FDB entries") tried to fix it, but it added a check for static
entries only (NUD_NOARP) and missed permanent entries (NUD_PERMANENT).

I think we can add:
Fixes: 26a41ae60438 ("vxlan: only migrate dynamic FDB entries")

I can re-spin with this Fixes tag.


Re: [PATCH net-next] vxlan: dont migrate permanent fdb entries during learn

2017-06-11 Thread David Miller
From: Roopa Prabhu 
Date: Sun, 11 Jun 2017 15:51:22 -0700

> From: Roopa Prabhu 
> 
> This patch fixes vxlan_snoop to not move permanent fdb entries
> on learn events. This is consistent with the bridge fdb
> handling of permanent entries.
> 
> Signed-off-by: Roopa Prabhu 

Is there an appropriate Fixes: tag by chance?


[PATCH net-next] vxlan: dont migrate permanent fdb entries during learn

2017-06-11 Thread Roopa Prabhu
From: Roopa Prabhu 

This patch fixes vxlan_snoop to not move permanent fdb entries
on learn events. This is consistent with the bridge fdb
handling of permanent entries.

Signed-off-by: Roopa Prabhu 
---
 drivers/net/vxlan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 7cb21a0..e045c34 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -970,7 +970,7 @@ static bool vxlan_snoop(struct net_device *dev,
return false;
 
/* Don't migrate static entries, drop packets */
-   if (f->state & NUD_NOARP)
+   if (f->state & (NUD_PERMANENT | NUD_NOARP))
return true;
 
if (net_ratelimit())
-- 
1.9.1



Re: [PATCH net-next] bpf, arm64: take advantage of stack_depth tracking

2017-06-11 Thread David Miller
From: Daniel Borkmann 
Date: Sun, 11 Jun 2017 03:55:27 +0200

> Make use of recently implemented stack_depth tracking for arm64 JIT,
> so that stack usage can be reduced heavily for programs not using
> tail calls at least.
> 
> Signed-off-by: Daniel Borkmann 
> Acked-by: Alexei Starovoitov 

Applied, thanks Daniel.


Re: [PATCH net v3] net: ipmr: Fix some mroute forwarding issues in vrf's

2017-06-11 Thread David Miller
From: Donald Sharp 
Date: Sat, 10 Jun 2017 16:30:17 -0400

> This patch fixes two issues:
> 
> 1) When forwarding on *,G mroutes that are in a vrf, the
> kernel was dropping information about the actual incoming
> interface when calling ip_mr_forward from ip_mr_input.
> This caused ip_mr_forward to send the multicast packet
> back out the incoming interface.  Fix this by
> modifying ip_mr_forward to be handed the correctly
> resolved dev.
> 
> 2) When a unresolved cache entry is created we store
> the incoming skb on the unresolved cache entry and
> upon mroute resolution from the user space daemon,
> we attempt to forward the packet.  Again we were
> not resolving to the correct incoming device for
> a vrf scenario, before calling ip_mr_forward.
> Fix this by resolving to the correct interface
> and calling ip_mr_forward with the result.
> 
> Fixes: e58e41596811 ("net: Enable support for VRF with ipv4 multicast")
> Signed-off-by: Donald Sharp 
> ---
> v2: Fixed title
> v3: Addressed Review comments by Andrew Lunn and David Ahern

Applied and queued up for -stable, thank you.


Re: [pull request][net-next 0/9] Mellanox mlx5 updates 2017-06-11

2017-06-11 Thread David Miller
From: Saeed Mahameed 
Date: Sun, 11 Jun 2017 17:55:44 +0300

> This series provides updates to mlx5 header rewrite feature, from Or Gerlitz.
> and three more small updates From Maor and Eran.
> 
> For more details please see below.

Pulled, thanks.

> Please pull and let me know if there's any problem.
> *This series doesn't introduce any conflict with the ongoing net
> pull request.

Ok, thanks for letting me know.


Re: [PATCH 0/6] Constant Time Memory Comparisons Are Important

2017-06-11 Thread Emil Lenngren
2017-06-11 22:48 GMT+02:00 Emmanuel Grumbach :
> On Sun, Jun 11, 2017 at 4:36 PM, Kees Cook  wrote:
>>
>> On Sun, Jun 11, 2017 at 1:13 AM, Kalle Valo  wrote:
>> > "Jason A. Donenfeld"  writes:
>> >
>> >> Whenever you're comparing two MACs, it's important to do this using
>> >> crypto_memneq instead of memcmp. With memcmp, you leak timing information,
>> >> which could then be used to iteratively forge a MAC.
>> >
>> > Do you have any pointers where I could learn more about this?
>>
>> While not using C specifically, this talks about the problem generally:
>> https://www.chosenplaintext.ca/articles/beginners-guide-constant-time-cryptography.html
>>
>
> Sorry for the stupid question, but the MAC address is in plaintext in
> the air anyway or easily accessible via user space tools. I fail to
> see what it is so secret about a MAC address in that code where that
> same MAC address is accessible via myriads of ways.

I think you're mixing up Media Access Control (MAC) addresses with
Message Authentication Code (MAC). The second one is a cryptographic
signature of a message.


Re: [PATCH 0/6] Constant Time Memory Comparisons Are Important

2017-06-11 Thread Jason A. Donenfeld
Hi Stephan,

On Sun, Jun 11, 2017 at 11:06 PM, Stephan Müller  wrote:
> Are you planning to send an update to your patch set? If yes, there is another
> one which should be converted too: crypto/rsa-pkcs1pad.c.

I just sent an update to this thread patching that, per your
suggestion. Since these issues are expected to be cherry picked by
their respective committer, I figure we can just pile on the patches
here, listing the 0/6 intro email as each patch's parent.

Jason


Re: [PATCH 0/6] Constant Time Memory Comparisons Are Important

2017-06-11 Thread Stephan Müller
Am Samstag, 10. Juni 2017, 04:59:06 CEST schrieb Jason A. Donenfeld:

Hi Jason,

> Whenever you're comparing two MACs, it's important to do this using
> crypto_memneq instead of memcmp. With memcmp, you leak timing information,
> which could then be used to iteratively forge a MAC. This is far too basic
> of a mistake for us to have so pervasively in the year 2017, so let's begin
> cleaning this stuff up. The following 6 locations were found with some
> simple regex greps, but I'm sure more lurk below the surface. If you
> maintain some code or know somebody who maintains some code that deals
> with MACs, tell them to double check which comparison function they're
> using.

Are you planning to send an update to your patch set? If yes, there is another 
one which should be converted too: crypto/rsa-pkcs1pad.c.

Otherwise, I will send a patch converting this one.

Thanks.

Ciao
Stephan


Re: [PATCH 0/6] Constant Time Memory Comparisons Are Important

2017-06-11 Thread Emmanuel Grumbach
On Sun, Jun 11, 2017 at 4:36 PM, Kees Cook  wrote:
>
> On Sun, Jun 11, 2017 at 1:13 AM, Kalle Valo  wrote:
> > "Jason A. Donenfeld"  writes:
> >
> >> Whenever you're comparing two MACs, it's important to do this using
> >> crypto_memneq instead of memcmp. With memcmp, you leak timing information,
> >> which could then be used to iteratively forge a MAC.
> >
> > Do you have any pointers where I could learn more about this?
>
> While not using C specifically, this talks about the problem generally:
> https://www.chosenplaintext.ca/articles/beginners-guide-constant-time-cryptography.html
>

Sorry for the stupid question, but the MAC address is in plaintext in
the air anyway or easily accessible via user space tools. I fail to
see what it is so secret about a MAC address in that code where that
same MAC address is accessible via myriads of ways.


Re: [pull request][net 0/6] Mellanox mlx5 fixes 2017-06-11

2017-06-11 Thread David Miller
From: Saeed Mahameed 
Date: Sun, 11 Jun 2017 17:00:22 +0300

> This series contains some fixes for the mlx5 core and netdev driver.
> 
> Please pull and let me know if there's any problem.

Pulled, thanks.

> For -stable:
> ("net/mlx5e: Added BW check for DIM decision mechanism")  kernels 
> >= 4.9
> ("net/mlx5e: Fix wrong indications in DIM due to counter wraparound") kernels 
> >= 4.9
> ("net/mlx5: Remove several module events out of ethtool stats")   kernels 
> >= 4.10
> ("net/mlx5: Enable 4K UAR only when page size is bigger than 4K") kernels 
> >= 4.11
> 
> *all patches apply with no issue on their -stable.
> 
> BTW I never asked you if this way of marking patches to -stable is good for 
> you ?
> if you prefer another format please let me know.

It works fine for now, I'll let you know if it causes problems :)


Re: [PATCH net 0/9] Bugs fixes in ena ethernet driver

2017-06-11 Thread David Miller
From: 
Date: Sun, 11 Jun 2017 15:42:42 +0300

> This patchset contains fixes for the bugs that were discovered so far.

Series applied.


Re: [PATCH net-next v10 1/4] net netlink: Add new type NLA_FLAG_BITS

2017-06-11 Thread Jamal Hadi Salim

On 17-06-11 01:38 PM, Jamal Hadi Salim wrote:

On 17-06-11 09:49 AM, Jiri Pirko wrote:

Sun, Jun 11, 2017 at 01:53:43PM CEST, j...@mojatatu.com wrote:

From: Jamal Hadi Salim 




This patch also provides an extra feature: a validation callback
that could be speaciliazed for other types.


s/speaciliazed/speciliazed/



Will fix.




[ATTR_GOO] = { .type = MYTYPE,
   .validation_data = _data,
   .validate_content = mycontent_validator },


Indent is wrong. (Does not matter really in desc, but anyway)



I cant find out how it got indented that way; my source
or email dont show it as such (but really doesnt matter).



Suggested-by: Jiri Pirko 



Will add.




---
include/net/netlink.h  | 11 +++
include/uapi/linux/rtnetlink.h | 17 +
lib/nlattr.c   | 25 +
3 files changed, 53 insertions(+)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index 0170917..8ab9784 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -6,6 +6,11 @@
#include 
#include 

+struct nla_bit_flags {
+u32 nla_flag_values;
+u32 nla_flag_selector;
+};


I don't understand why you redefine the struct here. You already have it
defined in the uapi: struct __nla_bit_flags

Just move this (struct nla_bit_flags) to the uapi and remove
__nla_bit_flags ?



I am not sure that will compile since the type is defined in netlink.h
Also, note: uapi uses _u32 and kernel uses u32 as types i.e it is pretty
common approach; i will try to move it to uapi and keep that uapi
format. If it doesnt compile without acrobatics I will keep it as is.



It doesnt compile - I could move it to linux/netlink.h but it seems
so out of place.
so i will keep things as is for now unless you can think of something
else.

cheers,
jamal


Re: [PATCH net-next v10 3/4] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch

2017-06-11 Thread Jamal Hadi Salim

On 17-06-11 10:13 AM, Jiri Pirko wrote:

Sun, Jun 11, 2017 at 01:53:45PM CEST, j...@mojatatu.com wrote:

From: Jamal Hadi Salim 

When you dump hundreds of thousands of actions, getting only 32 per
dump batch even when the socket buffer and memory allocations allow
is inefficient.

With this change, the user will get as many as possibly fitting
within the given constraints available to the kernel.

The top level action TLV space is extended. An attribute
TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON
is set by the user indicating the user is capable of processing
these large dumps. Older user space which doesnt set this flag
doesnt get the large (than 32) batches.
The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many
actions are put in a single batch. As such user space app knows how long
to iterate (independent of the type of action being dumped)
instead of hardcoded maximum of 32 thus maintaining backward compat.

Some results dumping 1.5M actions below:
first an unpatched tc which doesnt understand these features...

prompt$ time -p tc actions ls action gact | grep index | wc -l
150
real 1388.43
user 2.07
sys 1386.79

Now lets see a patched tc which sets the correct flags when requesting
a dump:

prompt$ time -p updatedtc actions ls action gact | grep index | wc -l
150
real 178.13
user 2.02
sys 176.96

That is about 8x performance improvement for tc app which sets its
receive buffer to about 32K.

Signed-off-by: Jamal Hadi Salim 
---
include/uapi/linux/rtnetlink.h | 22 +--
net/sched/act_api.c| 50 +-
2 files changed, 60 insertions(+), 12 deletions(-)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 8f07957..7d2030f 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -693,10 +693,28 @@ struct tcamsg {
unsigned char   tca__pad1;
unsigned short  tca__pad2;
};
+
+enum {
+   TCA_ROOT_UNSPEC,
+   TCA_ROOT_TAB,
+#define TCA_ACT_TAB TCA_ROOT_TAB
+#define TCAA_MAX TCA_ROOT_TAB
+   TCA_ROOT_FLAGS,
+   TCA_ROOT_COUNT,
+   __TCA_ROOT_MAX,
+#defineTCA_ROOT_MAX (__TCA_ROOT_MAX - 1)
+};
+
#define TA_RTA(r)  ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct 
tcamsg
#define TA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcamsg))
-#define TCA_ACT_TAB 1 /* attr type must be >=1 */   
-#define TCAA_MAX 1
+/* tcamsg flags stored in attribute TCA_ROOT_FLAGS
+ *
+ * TCA_FLAG_LARGE_DUMP_ON user->kernel to request for larger than 
TCA_ACT_MAX_PRIO
+ * actions in a dump. All dump responses will contain the number of actions
+ * being dumped stored in for user app's consumption in TCA_ROOT_COUNT
+ *
+ */
+#define TCA_FLAG_LARGE_DUMP_ON (1 << 0)

/* New extended info filters for IFLA_EXT_MASK */
#define RTEXT_FILTER_VF (1 << 0)
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 400eb6e..935dc19 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -110,6 +110,7 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, 
struct sk_buff *skb,
   struct netlink_callback *cb)
{
int err = 0, index = -1, i = 0, s_i = 0, n_i = 0;
+   u32 act_flags = cb->args[2];
struct nlattr *nest;

spin_lock_bh(>lock);
@@ -138,14 +139,18 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, 
struct sk_buff *skb,
}
nla_nest_end(skb, nest);
n_i++;
-   if (n_i >= TCA_ACT_MAX_PRIO)
+   if (!(act_flags & TCA_FLAG_LARGE_DUMP_ON) &&
+   n_i >= TCA_ACT_MAX_PRIO)
goto done;
}
}
done:
spin_unlock_bh(>lock);
-   if (n_i)
+   if (n_i) {
cb->args[0] += n_i;
+   if (act_flags & TCA_FLAG_LARGE_DUMP_ON)
+   cb->args[1] = n_i;
+   }
return n_i;

nla_put_failure:
@@ -1068,11 +1073,17 @@ static int tcf_action_add(struct net *net, struct 
nlattr *nla,
return tcf_add_notify(net, n, , portid);
}

+static u32 allowed_flags = TCA_FLAG_LARGE_DUMP_ON;


An empty line would be nice. Also, since this is outside a function, some
context prefix would be nice:
static u32 tcaa_root_flags_allowed = TCA_FLAG_LARGE_DUMP_ON;



You are going to make me exceed the 80 char limit? ;->



+   if (tb[TCA_ROOT_FLAGS])
+   nla_memcpy(_flags, tb[TCA_ROOT_FLAGS],
+  sizeof(select_flags));


Please introduce a helper for this attr type in patch 1:

u32 select_flags;

select_flags = nla_get_flag_bits_values(tb[TCA_ROOT_FLAGS])



Sure.

cheers,
jamal


Re: [PATCH net-next v10 1/4] net netlink: Add new type NLA_FLAG_BITS

2017-06-11 Thread Jamal Hadi Salim

On 17-06-11 09:49 AM, Jiri Pirko wrote:

Sun, Jun 11, 2017 at 01:53:43PM CEST, j...@mojatatu.com wrote:

From: Jamal Hadi Salim 




This patch also provides an extra feature: a validation callback
that could be speaciliazed for other types.


s/speaciliazed/speciliazed/



Will fix.




[ATTR_GOO] = { .type = MYTYPE,
   .validation_data = _data,
   .validate_content = mycontent_validator },


Indent is wrong. (Does not matter really in desc, but anyway)



I cant find out how it got indented that way; my source
or email dont show it as such (but really doesnt matter).



Suggested-by: Jiri Pirko 



Will add.




---
include/net/netlink.h  | 11 +++
include/uapi/linux/rtnetlink.h | 17 +
lib/nlattr.c   | 25 +
3 files changed, 53 insertions(+)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index 0170917..8ab9784 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -6,6 +6,11 @@
#include 
#include 

+struct nla_bit_flags {
+   u32 nla_flag_values;
+   u32 nla_flag_selector;
+};


I don't understand why you redefine the struct here. You already have it
defined in the uapi: struct __nla_bit_flags

Just move this (struct nla_bit_flags) to the uapi and remove
__nla_bit_flags ?



I am not sure that will compile since the type is defined in netlink.h
Also, note: uapi uses _u32 and kernel uses u32 as types i.e it is pretty
common approach; i will try to move it to uapi and keep that uapi
format. If it doesnt compile without acrobatics I will keep it as is.





+





diff --git a/lib/nlattr.c b/lib/nlattr.c
index a7e0b16..78fed43 100644
--- a/lib/nlattr.c
+++ b/lib/nlattr.c
@@ -27,6 +27,21 @@
[NLA_S64]   = sizeof(s64),
};

+static int validate_nla_bit_flags(const struct nlattr *nla, void *valid_data)
+{
+   const struct nla_bit_flags *nbf = nla_data(nla);
+   u32 *valid_flags_mask = valid_data;
+
+   if (!valid_data)
+   return -EINVAL;
+
+


Avoid one empty line here (you have 2)



Will fix.




+   if (nbf->nla_flag_values & ~*valid_flags_mask)
+   return -EINVAL;
+
+   return 0;
+}
+
static int validate_nla(const struct nlattr *nla, int maxtype,
const struct nla_policy *policy)
{
@@ -46,6 +61,13 @@ static int validate_nla(const struct nlattr *nla, int 
maxtype,
return -ERANGE;
break;

+   case NLA_FLAG_BITS:
+   if (attrlen != 8) /* 2 x 32 bits */


sizeof(struct nla_bit_flags) instead of 8 please, you can skip the
comment then.



Good point.




+   return -ERANGE;
+
+   return validate_nla_bit_flags(nla, pt->validation_data);
+   break;
+
case NLA_NUL_STRING:
if (pt->len)
minlen = min_t(int, attrlen, pt->len + 1);
@@ -103,6 +125,9 @@ static int validate_nla(const struct nlattr *nla, int 
maxtype,
return -ERANGE;
}

+   if (pt->validate_content)
+   return pt->validate_content(nla, pt->validation_data);


This validation mechanism is completely independent from the added NLA_FLAG_BITS
attr as it could be used with other attribute types. Please have it as a
separate patch.




Ok - so one more patch then ;->

cheers,
jamal


Re:

2017-06-11 Thread Sai al
I need your partnership in a transaction that will benefit you,
details will disclose to you once i receive your reply.

Kind regards
Saif.


I Would love to discuse something with you.

2017-06-11 Thread Dave Dawes
 


Re: ARM GLX Khadas VIM Pro - Ethernet detected as only 10Mbps and stalled after some traffic

2017-06-11 Thread crow
Hi Andrew,

On Sun, Jun 11, 2017 at 5:21 PM, Andrew Lunn  wrote:
>> Thank your for the suggestion, and thanks Martin to explaining me over
>> IRC what actually I should do.
>>
>> I recompiled mainline kernel 4.12-rc4 with the Amlogic driver:
>> replaced drivers/net/phy/meson-gxl.c with
>> https://github.com/khadas/linux/blob/ubuntu-4.9/drivers/amlogic/ethernet/phy/amlogic.c
>>
>> But this did not solve the issue. As soon as i start git clone i lose
>> network connection to device (no session timeout/disconnect this time,
>> but I am unable to reconnect over SSH or to get OK ping replay back).
>

1) First problem reported I can't reproduce anymore, every reboot/cold
boot with mainline kernel the Ethernet speed is detected as
"100Mbps/Full" , but as seen in first post there was this issue.

2) see below

> So this shows it is more than a PHY problem. The Ethernet MAC driver
> is probably also part of the problem.

There are some stmmac fixes [1] in soon to be rc5, compiled current
master (without amlogic.c) with the fixes but for me the issue still
persist. I will compile once released rc5 with amlogic.c and report
back.

> Are there any mainline kernels which work O.K?

Khadas VIM support was added in 4.12-rc1. And I did test all four rc's
but without success.

> Andrew

[1] 
https://github.com/torvalds/linux/commit/426849e6611f2092553f8d53372ae310818a6292


Re: ipmr: MFC routes when VIF deleted

2017-06-11 Thread Nikolay Aleksandrov
On 11/06/17 11:55, Yotam Gigi wrote:
> I have been looking into some weird behavior, and I am not sure whether it is
> a bug or a feature.
> 
> When a VIF with index v gets deleted, the MFC routes does not get updated, 
> which
> means that there can be routes pointing to that VIF. On datapath, when packet
> hits that route, the VIF validity will be checked and will not be sent to that
> device (but still, the route does not get updated).  Now, if the user creates
> another VIF with the same index v but different underlay device, the same 
> route
> will forward the traffic to that device.
> 
> It is relevant to mention that when user adds a MFC route, only the active 
> VIFs
> are used, so the flow of adding a route with dummy VIF indices and then
> connecting those VIF indices to real device is not supported. The only way to
> create a MFC route that has non existing VIFs is to create one with existing
> VIFs and then delete them.
> 
> Do we really want to support that?  To me, it looks like a buggy flow and I
> suggest that upon VIF deletion, the MFC routes will be updated to not point to
> any non existing VIF indices.
> 

Hi Yotam,
I'm not strongly against such change but my feeling is that we shouldn't change 
it.
I think we shouldn't change it because we cannot guarantee that we won't break 
some
user-space app that relies on this behaviour or uses it as an optimization.
User-space ipmr apps work in sync with the kernel and are usually the only ones
doing such changes so their internal state will be always valid, and I'd guess
they already deal with this one way or another.
My second argument is a minor one and is about performance. There are some apps
(e.g. pimd) which use interface add/del on interface state change (up/down) and
this could make these ops slower on large setups.

Again I see how this could be helpful and should've probably been like that 
from the
start, so if other people feel confident we won't break anything then I wouldn't
mind the change.

Thanks,
 Nik





Re: ARM GLX Khadas VIM Pro - Ethernet detected as only 10Mbps and stalled after some traffic

2017-06-11 Thread Andrew Lunn
> Thank your for the suggestion, and thanks Martin to explaining me over
> IRC what actually I should do.
> 
> I recompiled mainline kernel 4.12-rc4 with the Amlogic driver:
> replaced drivers/net/phy/meson-gxl.c with
> https://github.com/khadas/linux/blob/ubuntu-4.9/drivers/amlogic/ethernet/phy/amlogic.c
> 
> But this did not solve the issue. As soon as i start git clone i lose
> network connection to device (no session timeout/disconnect this time,
> but I am unable to reconnect over SSH or to get OK ping replay back).

So this shows it is more than a PHY problem. The Ethernet MAC driver
is probably also part of the problem.

Are there any mainline kernels which work O.K?

Andrew


[net-next 4/9] net/mlx5e: Use modify header ID cache for offloaded TC E-Switch flows

2017-06-11 Thread Saeed Mahameed
From: Or Gerlitz 

Use the modify header ID cache for the header re-write part of offloading
TC eswitch flows.

Signed-off-by: Or Gerlitz 
Reviewed-by: Paul Blakey 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index e1217c2279ab..4625a0e226da 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -344,10 +344,7 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
}
 
if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) {
-   err = mlx5_modify_header_alloc(priv->mdev, 
MLX5_FLOW_NAMESPACE_FDB,
-  parse_attr->num_mod_hdr_actions,
-  parse_attr->mod_hdr_actions,
-  >mod_hdr_id);
+   err = mlx5e_attach_mod_hdr(priv, flow, parse_attr);
kfree(parse_attr->mod_hdr_actions);
if (err) {
rule = ERR_PTR(err);
@@ -363,8 +360,7 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 
 err_add_rule:
if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
-   mlx5_modify_header_dealloc(priv->mdev,
-  attr->mod_hdr_id);
+   mlx5e_detach_mod_hdr(priv, flow);
 err_mod_hdr:
mlx5_eswitch_del_vlan_action(esw, attr);
 err_add_vlan:
@@ -392,8 +388,7 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv,
}
 
if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
-   mlx5_modify_header_dealloc(priv->mdev,
-  attr->mod_hdr_id);
+   mlx5e_detach_mod_hdr(priv, flow);
 }
 
 void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv,
-- 
2.11.0



[net-next 2/9] net/mlx5e: Use short attribute form when adding/deleting offloaded TC flows

2017-06-11 Thread Saeed Mahameed
From: Or Gerlitz 

Instead of going through flow->nic/esw_attr for each usage, assign
an attr pointer per the context (nic or esw) and use that.

This patch doesn't add any functionality.

Signed-off-by: Or Gerlitz 
Reviewed-by: Roi Dayan 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index cfb32fe5129d..a9feddc31667 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -177,6 +177,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
  struct mlx5e_tc_flow *flow)
 {
+   struct mlx5_nic_flow_attr *attr = flow->nic_attr;
struct mlx5_fc *counter = NULL;
 
counter = mlx5_flow_rule_counter(flow->rule);
@@ -188,9 +189,9 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
priv->fs.tc.t = NULL;
}
 
-   if (flow->nic_attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
+   if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
mlx5_modify_header_dealloc(priv->mdev,
-  flow->nic_attr->mod_hdr_id);
+  attr->mod_hdr_id);
 }
 
 static void mlx5e_detach_encap(struct mlx5e_priv *priv,
@@ -231,7 +232,7 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
return rule;
 
 err_add_rule:
-   if (flow->esw_attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
+   if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
mlx5_modify_header_dealloc(priv->mdev,
   attr->mod_hdr_id);
 err_mod_hdr:
@@ -250,17 +251,17 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv,
 
if (flow->flags & MLX5E_TC_FLOW_OFFLOADED) {
flow->flags &= ~MLX5E_TC_FLOW_OFFLOADED;
-   mlx5_eswitch_del_offloaded_rule(esw, flow->rule, 
flow->esw_attr);
+   mlx5_eswitch_del_offloaded_rule(esw, flow->rule, attr);
}
 
-   mlx5_eswitch_del_vlan_action(esw, flow->esw_attr);
+   mlx5_eswitch_del_vlan_action(esw, attr);
 
-   if (flow->esw_attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP) {
+   if (attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP) {
mlx5e_detach_encap(priv, flow);
-   kvfree(flow->esw_attr->parse_attr);
+   kvfree(attr->parse_attr);
}
 
-   if (flow->esw_attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
+   if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
mlx5_modify_header_dealloc(priv->mdev,
   attr->mod_hdr_id);
 }
-- 
2.11.0



[net-next 9/9] net/mlx5e: Fill advertised and supported port data from Hardware info

2017-06-11 Thread Saeed Mahameed
From: Eran Ben Elisha 

Translate hardware port connector type data into link mode supported and
advertised info instead of caching it in driver.

Signed-off-by: Eran Ben Elisha 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 90 ++
 1 file changed, 74 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index f03c2d088d0c..b4514f247402 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -723,24 +723,81 @@ static void ptys2ethtool_adver_link(unsigned long 
*advertising_modes,
  __ETHTOOL_LINK_MODE_MASK_NBITS);
 }
 
-static void ptys2ethtool_supported_port(struct ethtool_link_ksettings 
*link_ksettings,
-   u32 eth_proto_cap)
+static void ptys2ethtool_supported_advertised_port(struct 
ethtool_link_ksettings *link_ksettings,
+  u32 eth_proto_cap,
+  u8 connector_type)
 {
-   if (eth_proto_cap & (MLX5E_PROT_MASK(MLX5E_10GBASE_CR)
-  | MLX5E_PROT_MASK(MLX5E_10GBASE_SR)
-  | MLX5E_PROT_MASK(MLX5E_40GBASE_CR4)
-  | MLX5E_PROT_MASK(MLX5E_40GBASE_SR4)
-  | MLX5E_PROT_MASK(MLX5E_100GBASE_SR4)
-  | MLX5E_PROT_MASK(MLX5E_1000BASE_CX_SGMII))) {
-   ethtool_link_ksettings_add_link_mode(link_ksettings, supported, 
FIBRE);
+   if (!connector_type || connector_type >= MLX5E_CONNECTOR_TYPE_NUMBER) {
+   if (eth_proto_cap & (MLX5E_PROT_MASK(MLX5E_10GBASE_CR)
+  | MLX5E_PROT_MASK(MLX5E_10GBASE_SR)
+  | MLX5E_PROT_MASK(MLX5E_40GBASE_CR4)
+  | MLX5E_PROT_MASK(MLX5E_40GBASE_SR4)
+  | MLX5E_PROT_MASK(MLX5E_100GBASE_SR4)
+  | MLX5E_PROT_MASK(MLX5E_1000BASE_CX_SGMII))) 
{
+   ethtool_link_ksettings_add_link_mode(link_ksettings,
+supported,
+FIBRE);
+   ethtool_link_ksettings_add_link_mode(link_ksettings,
+advertising,
+FIBRE);
+   }
+
+   if (eth_proto_cap & (MLX5E_PROT_MASK(MLX5E_100GBASE_KR4)
+  | MLX5E_PROT_MASK(MLX5E_40GBASE_KR4)
+  | MLX5E_PROT_MASK(MLX5E_10GBASE_KR)
+  | MLX5E_PROT_MASK(MLX5E_10GBASE_KX4)
+  | MLX5E_PROT_MASK(MLX5E_1000BASE_KX))) {
+   ethtool_link_ksettings_add_link_mode(link_ksettings,
+supported,
+Backplane);
+   ethtool_link_ksettings_add_link_mode(link_ksettings,
+advertising,
+Backplane);
+   }
+   return;
}
 
-   if (eth_proto_cap & (MLX5E_PROT_MASK(MLX5E_100GBASE_KR4)
-  | MLX5E_PROT_MASK(MLX5E_40GBASE_KR4)
-  | MLX5E_PROT_MASK(MLX5E_10GBASE_KR)
-  | MLX5E_PROT_MASK(MLX5E_10GBASE_KX4)
-  | MLX5E_PROT_MASK(MLX5E_1000BASE_KX))) {
-   ethtool_link_ksettings_add_link_mode(link_ksettings, supported, 
Backplane);
+   switch (connector_type) {
+   case MLX5E_PORT_TP:
+   ethtool_link_ksettings_add_link_mode(link_ksettings,
+supported, TP);
+   ethtool_link_ksettings_add_link_mode(link_ksettings,
+advertising, TP);
+   break;
+   case MLX5E_PORT_AUI:
+   ethtool_link_ksettings_add_link_mode(link_ksettings,
+supported, AUI);
+   ethtool_link_ksettings_add_link_mode(link_ksettings,
+advertising, AUI);
+   break;
+   case MLX5E_PORT_BNC:
+   ethtool_link_ksettings_add_link_mode(link_ksettings,
+supported, BNC);
+   ethtool_link_ksettings_add_link_mode(link_ksettings,
+advertising, BNC);
+  

[net-next 8/9] net/mlx5e: Add support for reading connector type from PTYS

2017-06-11 Thread Saeed Mahameed
From: Eran Ben Elisha 

Read port connector type from the firmware instead of caching it in the
driver metadata.

Signed-off-by: Eran Ben Elisha 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 22 --
 include/linux/mlx5/mlx5_ifc.h  |  7 +--
 include/linux/mlx5/port.h  | 13 +
 3 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index e9e33fd68279..f03c2d088d0c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -809,8 +809,23 @@ static void get_advertising(u32 eth_proto_cap, u8 tx_pause,
ethtool_link_ksettings_add_link_mode(link_ksettings, 
advertising, Asym_Pause);
 }
 
-static u8 get_connector_port(u32 eth_proto)
+static int ptys2connector_type[MLX5E_CONNECTOR_TYPE_NUMBER] = {
+   [MLX5E_PORT_UNKNOWN]= PORT_OTHER,
+   [MLX5E_PORT_NONE]   = PORT_NONE,
+   [MLX5E_PORT_TP] = PORT_TP,
+   [MLX5E_PORT_AUI]= PORT_AUI,
+   [MLX5E_PORT_BNC]= PORT_BNC,
+   [MLX5E_PORT_MII]= PORT_MII,
+   [MLX5E_PORT_FIBRE]  = PORT_FIBRE,
+   [MLX5E_PORT_DA] = PORT_DA,
+   [MLX5E_PORT_OTHER]  = PORT_OTHER,
+   };
+
+static u8 get_connector_port(u32 eth_proto, u8 connector_type)
 {
+   if (connector_type && connector_type < MLX5E_CONNECTOR_TYPE_NUMBER)
+   return ptys2connector_type[connector_type];
+
if (eth_proto & (MLX5E_PROT_MASK(MLX5E_10GBASE_SR)
 | MLX5E_PROT_MASK(MLX5E_40GBASE_SR4)
 | MLX5E_PROT_MASK(MLX5E_100GBASE_SR4)
@@ -856,6 +871,7 @@ static int mlx5e_get_link_ksettings(struct net_device 
*netdev,
u32 eth_proto_oper;
u8 an_disable_admin;
u8 an_status;
+   u8 connector_type;
int err;
 
err = mlx5_query_port_ptys(mdev, out, sizeof(out), MLX5_PTYS_EN, 1);
@@ -871,6 +887,7 @@ static int mlx5e_get_link_ksettings(struct net_device 
*netdev,
eth_proto_lp = MLX5_GET(ptys_reg, out, eth_proto_lp_advertise);
an_disable_admin = MLX5_GET(ptys_reg, out, an_disable_admin);
an_status= MLX5_GET(ptys_reg, out, an_status);
+   connector_type   = MLX5_GET(ptys_reg, out, connector_type);
 
mlx5_query_port_pause(mdev, _pause, _pause);
 
@@ -883,7 +900,8 @@ static int mlx5e_get_link_ksettings(struct net_device 
*netdev,
 
eth_proto_oper = eth_proto_oper ? eth_proto_oper : eth_proto_cap;
 
-   link_ksettings->base.port = get_connector_port(eth_proto_oper);
+   link_ksettings->base.port = get_connector_port(eth_proto_oper,
+  connector_type);
get_lp_advertising(eth_proto_lp, link_ksettings);
 
if (an_status == MLX5_AN_COMPLETE)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index ec308657af3b..32b044e953d2 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -7295,7 +7295,8 @@ struct mlx5_ifc_ptys_reg_bits {
u8 ib_link_width_oper[0x10];
u8 ib_proto_oper[0x10];
 
-   u8 reserved_at_160[0x20];
+   u8 reserved_at_160[0x1c];
+   u8 connector_type[0x4];
 
u8 eth_proto_lp_advertise[0x20];
 
@@ -7698,8 +7699,10 @@ struct mlx5_ifc_peir_reg_bits {
 };
 
 struct mlx5_ifc_pcam_enhanced_features_bits {
-   u8 reserved_at_0[0x7e];
+   u8 reserved_at_0[0x7c];
 
+   u8 ptys_connector_type[0x1];
+   u8 reserved_at_7d[0x1];
u8 ppcnt_discard_group[0x1];
u8 ppcnt_statistical_group[0x1];
 };
diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h
index e527732fb31b..c57d4b7de3a8 100644
--- a/include/linux/mlx5/port.h
+++ b/include/linux/mlx5/port.h
@@ -92,6 +92,19 @@ enum mlx5e_link_mode {
MLX5E_LINK_MODES_NUMBER,
 };
 
+enum mlx5e_connector_type {
+   MLX5E_PORT_UNKNOWN  = 0,
+   MLX5E_PORT_NONE = 1,
+   MLX5E_PORT_TP   = 2,
+   MLX5E_PORT_AUI  = 3,
+   MLX5E_PORT_BNC  = 4,
+   MLX5E_PORT_MII  = 5,
+   MLX5E_PORT_FIBRE= 6,
+   MLX5E_PORT_DA   = 7,
+   MLX5E_PORT_OTHER= 8,
+   MLX5E_CONNECTOR_TYPE_NUMBER,
+};
+
 #define MLX5E_PROT_MASK(link_mode) (1 << link_mode)
 
 #define PORT_MODULE_EVENT_MODULE_STATUS_MASK 0xF
-- 
2.11.0



[net-next 6/9] net/mlx5e: Support header re-write of partial fields in TC pedit offload

2017-06-11 Thread Saeed Mahameed
From: Or Gerlitz 

Using a per field mask field, the TC pedit action supports modifying
partial fields. E.g if using the TC tool, the following example would
make the kernel to only re-write two bytes of the src ip address:

tc filter add dev enp1s0 protocol ip parent : prio 30
flower skip_sw ip_proto udp dst_port 8001
action pedit ex munge ip src set 10.1.0.0 retain 0x

We add driver support for offload these partial re-writes, by setting
the per FW action offset-in-field and length-from-offset attributes.

The 1st bit set in the mask specifies both the offset and the right
shift to apply on the value such that the 1st bit which needs to be
set will reside in bit 0 of the FW data field.

Signed-off-by: Or Gerlitz 
Reviewed-by: Paul Blakey 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 28 +
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 0c7a0872b22b..f5afacfbe914 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -1091,12 +1091,14 @@ static int offload_pedit_fields(struct pedit_headers 
*masks,
struct mlx5e_tc_flow_parse_attr *parse_attr)
 {
struct pedit_headers *set_masks, *add_masks, *set_vals, *add_vals;
-   int i, action_size, nactions, max_actions, first, last, first_z;
+   int i, action_size, nactions, max_actions, first, last, next_z;
void *s_masks_p, *a_masks_p, *vals_p;
struct mlx5_fields *f;
u8 cmd, field_bsize;
u32 s_mask, a_mask;
unsigned long mask;
+   __be32 mask_be32;
+   __be16 mask_be16;
void *action;
 
set_masks = [TCA_PEDIT_KEY_EX_CMD_SET];
@@ -1150,11 +1152,19 @@ static int offload_pedit_fields(struct pedit_headers 
*masks,
 
field_bsize = f->size * BITS_PER_BYTE;
 
-   first_z = find_first_zero_bit(, field_bsize);
+   if (field_bsize == 32) {
+   mask_be32 = *(__be32 *)
+   mask = (__force unsigned 
long)cpu_to_le32(be32_to_cpu(mask_be32));
+   } else if (field_bsize == 16) {
+   mask_be16 = *(__be16 *)
+   mask = (__force unsigned 
long)cpu_to_le16(be16_to_cpu(mask_be16));
+   }
+
first = find_first_bit(, field_bsize);
+   next_z = find_next_zero_bit(, field_bsize, first);
last  = find_last_bit(, field_bsize);
-   if (first > 0 || last != (field_bsize - 1) || first_z < last) {
-   printk(KERN_WARNING "mlx5: partial rewrite (mask %lx) 
is currently not offloaded\n",
+   if (first < next_z && next_z < last) {
+   printk(KERN_WARNING "mlx5: rewrite of few sub-fields 
(mask %lx) isn't offloaded\n",
   mask);
return -EOPNOTSUPP;
}
@@ -1163,17 +1173,17 @@ static int offload_pedit_fields(struct pedit_headers 
*masks,
MLX5_SET(set_action_in, action, field, f->field);
 
if (cmd == MLX5_ACTION_TYPE_SET) {
-   MLX5_SET(set_action_in, action, offset, 0);
+   MLX5_SET(set_action_in, action, offset, first);
/* length is num of bits to be written, zero means 
length of 32 */
-   MLX5_SET(set_action_in, action, length, field_bsize);
+   MLX5_SET(set_action_in, action, length, (last - first + 
1));
}
 
if (field_bsize == 32)
-   MLX5_SET(set_action_in, action, data, ntohl(*(__be32 
*)vals_p));
+   MLX5_SET(set_action_in, action, data, ntohl(*(__be32 
*)vals_p) >> first);
else if (field_bsize == 16)
-   MLX5_SET(set_action_in, action, data, ntohs(*(__be16 
*)vals_p));
+   MLX5_SET(set_action_in, action, data, ntohs(*(__be16 
*)vals_p) >> first);
else if (field_bsize == 8)
-   MLX5_SET(set_action_in, action, data, *(u8 *)vals_p);
+   MLX5_SET(set_action_in, action, data, *(u8 *)vals_p >> 
first);
 
action += action_size;
nactions++;
-- 
2.11.0



[net-next 7/9] net/mlx5: Update flow table commands layout

2017-06-11 Thread Saeed Mahameed
From: Maor Gottlieb 

Update struct mlx5_ifc_create(modify)_flow_table_bits according to
the last device specification.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c | 32 ++---
 include/linux/mlx5/mlx5_ifc.h| 46 +++-
 2 files changed, 40 insertions(+), 38 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index abb44a268563..e750f07793b8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -78,28 +78,33 @@ int mlx5_cmd_create_flow_table(struct mlx5_core_dev *dev,
 MLX5_CMD_OP_CREATE_FLOW_TABLE);
 
MLX5_SET(create_flow_table_in, in, table_type, type);
-   MLX5_SET(create_flow_table_in, in, level, level);
-   MLX5_SET(create_flow_table_in, in, log_size, log_size);
+   MLX5_SET(create_flow_table_in, in, flow_table_context.level, level);
+   MLX5_SET(create_flow_table_in, in, flow_table_context.log_size, 
log_size);
if (vport) {
MLX5_SET(create_flow_table_in, in, vport_number, vport);
MLX5_SET(create_flow_table_in, in, other_vport, 1);
}
 
-   MLX5_SET(create_flow_table_in, in, decap_en, en_encap_decap);
-   MLX5_SET(create_flow_table_in, in, encap_en, en_encap_decap);
+   MLX5_SET(create_flow_table_in, in, flow_table_context.decap_en,
+en_encap_decap);
+   MLX5_SET(create_flow_table_in, in, flow_table_context.encap_en,
+en_encap_decap);
 
switch (op_mod) {
case FS_FT_OP_MOD_NORMAL:
if (next_ft) {
-   MLX5_SET(create_flow_table_in, in, table_miss_mode, 1);
-   MLX5_SET(create_flow_table_in, in, table_miss_id, 
next_ft->id);
+   MLX5_SET(create_flow_table_in, in,
+flow_table_context.table_miss_action, 1);
+   MLX5_SET(create_flow_table_in, in,
+flow_table_context.table_miss_id, next_ft->id);
}
break;
 
case FS_FT_OP_MOD_LAG_DEMUX:
MLX5_SET(create_flow_table_in, in, op_mod, 0x1);
if (next_ft)
-   MLX5_SET(create_flow_table_in, in, 
lag_master_next_table_id,
+   MLX5_SET(create_flow_table_in, in,
+flow_table_context.lag_master_next_table_id,
 next_ft->id);
break;
}
@@ -146,10 +151,10 @@ int mlx5_cmd_modify_flow_table(struct mlx5_core_dev *dev,
 MLX5_MODIFY_FLOW_TABLE_LAG_NEXT_TABLE_ID);
if (next_ft) {
MLX5_SET(modify_flow_table_in, in,
-lag_master_next_table_id, next_ft->id);
+flow_table_context.lag_master_next_table_id, 
next_ft->id);
} else {
MLX5_SET(modify_flow_table_in, in,
-lag_master_next_table_id, 0);
+flow_table_context.lag_master_next_table_id, 
0);
}
} else {
if (ft->vport) {
@@ -160,11 +165,14 @@ int mlx5_cmd_modify_flow_table(struct mlx5_core_dev *dev,
MLX5_SET(modify_flow_table_in, in, modify_field_select,
 MLX5_MODIFY_FLOW_TABLE_MISS_TABLE_ID);
if (next_ft) {
-   MLX5_SET(modify_flow_table_in, in, table_miss_mode, 1);
-   MLX5_SET(modify_flow_table_in, in, table_miss_id,
+   MLX5_SET(modify_flow_table_in, in,
+flow_table_context.table_miss_action, 1);
+   MLX5_SET(modify_flow_table_in, in,
+flow_table_context.table_miss_id,
 next_ft->id);
} else {
-   MLX5_SET(modify_flow_table_in, in, table_miss_mode, 0);
+   MLX5_SET(modify_flow_table_in, in,
+flow_table_context.table_miss_action, 0);
}
}
 
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 56e96f6a0a45..ec308657af3b 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -6627,6 +6627,24 @@ struct mlx5_ifc_create_flow_table_out_bits {
u8 reserved_at_60[0x20];
 };
 
+struct mlx5_ifc_flow_table_context_bits {
+   u8 encap_en[0x1];
+   u8 decap_en[0x1];
+   u8 reserved_at_2[0x2];
+   u8 table_miss_action[0x4];
+   u8 level[0x8];
+   u8 reserved_at_10[0x8];
+   u8 

[net-next 1/9] net/mlx5e: Remove limitation of single NIC offloaded TC action per rule

2017-06-11 Thread Saeed Mahameed
From: Or Gerlitz 

Remove the limitation that offloaded NIC filters can have only one
action. This allows us for example to provide flow tag as a note
to upper layers / apps that that HW header re-write was applied.

Signed-off-by: Or Gerlitz 
Reviewed-by: Paul Blakey 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 8ec13f9be660..cfb32fe5129d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -1194,10 +1194,6 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv, 
struct tcf_exts *exts,
 
tcf_exts_to_list(exts, );
list_for_each_entry(a, , list) {
-   /* Only support a single action per rule */
-   if (attr->action)
-   return -EINVAL;
-
if (is_tcf_gact_shot(a)) {
attr->action |= MLX5_FLOW_CONTEXT_ACTION_DROP;
if (MLX5_CAP_FLOWTABLE(priv->mdev,
-- 
2.11.0



[net-next 5/9] net/mlx5e: Use modify header ID cache for offloaded TC NIC flows

2017-06-11 Thread Saeed Mahameed
From: Or Gerlitz 

Use the modify header ID cache for the header re-write part of offloading
TC NIC flows.

Signed-off-by: Or Gerlitz 
Reviewed-by: Paul Blakey 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 4625a0e226da..0c7a0872b22b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -251,10 +251,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
}
 
if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) {
-   err = mlx5_modify_header_alloc(dev, MLX5_FLOW_NAMESPACE_KERNEL,
-  parse_attr->num_mod_hdr_actions,
-  parse_attr->mod_hdr_actions,
-  >mod_hdr_id);
+   err = mlx5e_attach_mod_hdr(priv, flow, parse_attr);
flow_act.modify_id = attr->mod_hdr_id;
kfree(parse_attr->mod_hdr_actions);
if (err) {
@@ -296,8 +293,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
}
 err_create_ft:
if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
-   mlx5_modify_header_dealloc(priv->mdev,
-  attr->mod_hdr_id);
+   mlx5e_detach_mod_hdr(priv, flow);
 err_create_mod_hdr_id:
mlx5_fc_destroy(dev, counter);
 
@@ -320,8 +316,7 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
}
 
if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
-   mlx5_modify_header_dealloc(priv->mdev,
-  attr->mod_hdr_id);
+   mlx5e_detach_mod_hdr(priv, flow);
 }
 
 static void mlx5e_detach_encap(struct mlx5e_priv *priv,
-- 
2.11.0



[net-next 3/9] net/mlx5e: Add cache for HW modify header IDs

2017-06-11 Thread Saeed Mahameed
From: Or Gerlitz 

Packets belonging to flows which are different by matching may still need
to go through the same header re-write. Add a cache for header re-write IDs
keyed by the binary chain of modify header actions.

The caching is supported for both eswitch and NIC use-cases, where the
actual conversion of the code to use caching comes in next patches, one
per use-case.

Signed-off-by: Or Gerlitz 
Reviewed-by: Paul Blakey 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   2 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c   | 134 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h |   1 +
 4 files changed, 137 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 2fd044b23875..f4b95dbd1c7f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -623,6 +623,8 @@ struct mlx5e_tc_table {
 
struct rhashtable_paramsht_params;
struct rhashtable   ht;
+
+   DECLARE_HASHTABLE(mod_hdr_tbl, 8);
 };
 
 struct mlx5e_vlan_table {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index a9feddc31667..e1217c2279ab 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -69,7 +69,8 @@ struct mlx5e_tc_flow {
u64 cookie;
u8  flags;
struct mlx5_flow_handle *rule;
-   struct list_headencap; /* flows sharing the same encap */
+   struct list_headencap;   /* flows sharing the same encap ID */
+   struct list_headmod_hdr; /* flows sharing the same mod hdr ID */
union {
struct mlx5_esw_flow_attr esw_attr[0];
struct mlx5_nic_flow_attr nic_attr[0];
@@ -90,6 +91,135 @@ enum {
 #define MLX5E_TC_TABLE_NUM_ENTRIES 1024
 #define MLX5E_TC_TABLE_NUM_GROUPS 4
 
+struct mod_hdr_key {
+   int num_actions;
+   void *actions;
+};
+
+struct mlx5e_mod_hdr_entry {
+   /* a node of a hash table which keeps all the mod_hdr entries */
+   struct hlist_node mod_hdr_hlist;
+
+   /* flows sharing the same mod_hdr entry */
+   struct list_head flows;
+
+   struct mod_hdr_key key;
+
+   u32 mod_hdr_id;
+};
+
+#define MLX5_MH_ACT_SZ MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto)
+
+static inline u32 hash_mod_hdr_info(struct mod_hdr_key *key)
+{
+   return jhash(key->actions,
+key->num_actions * MLX5_MH_ACT_SZ, 0);
+}
+
+static inline int cmp_mod_hdr_info(struct mod_hdr_key *a,
+  struct mod_hdr_key *b)
+{
+   if (a->num_actions != b->num_actions)
+   return 1;
+
+   return memcmp(a->actions, b->actions, a->num_actions * MLX5_MH_ACT_SZ);
+}
+
+static int mlx5e_attach_mod_hdr(struct mlx5e_priv *priv,
+   struct mlx5e_tc_flow *flow,
+   struct mlx5e_tc_flow_parse_attr *parse_attr)
+{
+   struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+   int num_actions, actions_size, namespace, err;
+   struct mlx5e_mod_hdr_entry *mh;
+   struct mod_hdr_key key;
+   bool found = false;
+   u32 hash_key;
+
+   num_actions  = parse_attr->num_mod_hdr_actions;
+   actions_size = MLX5_MH_ACT_SZ * num_actions;
+
+   key.actions = parse_attr->mod_hdr_actions;
+   key.num_actions = num_actions;
+
+   hash_key = hash_mod_hdr_info();
+
+   if (flow->flags & MLX5E_TC_FLOW_ESWITCH) {
+   namespace = MLX5_FLOW_NAMESPACE_FDB;
+   hash_for_each_possible(esw->offloads.mod_hdr_tbl, mh,
+  mod_hdr_hlist, hash_key) {
+   if (!cmp_mod_hdr_info(>key, )) {
+   found = true;
+   break;
+   }
+   }
+   } else {
+   namespace = MLX5_FLOW_NAMESPACE_KERNEL;
+   hash_for_each_possible(priv->fs.tc.mod_hdr_tbl, mh,
+  mod_hdr_hlist, hash_key) {
+   if (!cmp_mod_hdr_info(>key, )) {
+   found = true;
+   break;
+   }
+   }
+   }
+
+   if (found)
+   goto attach_flow;
+
+   mh = kzalloc(sizeof(*mh) + actions_size, GFP_KERNEL);
+   if (!mh)
+   return -ENOMEM;
+
+   mh->key.actions = (void *)mh + sizeof(*mh);
+   memcpy(mh->key.actions, key.actions, actions_size);
+   mh->key.num_actions = num_actions;
+   INIT_LIST_HEAD(>flows);
+
+   err = 

[pull request][net-next 0/9] Mellanox mlx5 updates 2017-06-11

2017-06-11 Thread Saeed Mahameed
Hi Dave,

This series provides updates to mlx5 header rewrite feature, from Or Gerlitz.
and three more small updates From Maor and Eran.

For more details please see below.

Please pull and let me know if there's any problem.
*This series doesn't introduce any conflict with the ongoing net
pull request.

Thanks,
Saeed.

---

The following changes since commit 50dffe7fad6c156c2928e45c19ff7b86eb951f4c:

  Merge branch 'mlx4-drivers-version-update' (2017-06-07 15:33:02 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
tags/mlx5-updates-2017-06-11

for you to fetch changes up to 46e9d0b61e27a3a9286002311f349f0c33dcb18f:

  net/mlx5e: Fill advertised and supported port data from Hardware info 
(2017-06-08 14:12:00 +0300)


mlx5-updates-2017-06-11

This series provides updates to mlx5 header rewrite feature, from Or Gerlitz.
and three more small updates From maor and eran.

---
Or says:

Packets belonging to flows which are different by matching may still need
to go through the same header re-writes (e.g set the current routing hop
MACs and issue TTL decrement).  To minimize the number of modify header
IDs, we add a cache for header re-write IDs which is keyed by the binary
chain of modify header actions.

The caching is supported for both eswitch and NIC use-cases, where the
actual conversion of the code to use caching comes in separate patches,
one per use-case.

Using a per field mask field, the TC pedit action supports modifying
partial fields. The last patch enables offloading that.
---

>From Maor, update flow table commands layout to the latest HW spec.
>From Eran, ethtool connector type reporting updates.

Thanks,
Saeed.


Eran Ben Elisha (2):
  net/mlx5e: Add support for reading connector type from PTYS
  net/mlx5e: Fill advertised and supported port data from Hardware info

Maor Gottlieb (1):
  net/mlx5: Update flow table commands layout

Or Gerlitz (6):
  net/mlx5e: Remove limitation of single NIC offloaded TC action per rule
  net/mlx5e: Use short attribute form when adding/deleting offloaded TC 
flows
  net/mlx5e: Add cache for HW modify header IDs
  net/mlx5e: Use modify header ID cache for offloaded TC E-Switch flows
  net/mlx5e: Use modify header ID cache for offloaded TC NIC flows
  net/mlx5e: Support header re-write of partial fields in TC pedit offload

 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   2 +
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 112 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 203 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   |  32 ++--
 include/linux/mlx5/mlx5_ifc.h  |  53 +++---
 include/linux/mlx5/port.h  |  13 ++
 8 files changed, 322 insertions(+), 95 deletions(-)


Re: [PATCH net-next v10 3/4] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch

2017-06-11 Thread Jiri Pirko
Sun, Jun 11, 2017 at 01:53:45PM CEST, j...@mojatatu.com wrote:
>From: Jamal Hadi Salim 
>
>When you dump hundreds of thousands of actions, getting only 32 per
>dump batch even when the socket buffer and memory allocations allow
>is inefficient.
>
>With this change, the user will get as many as possibly fitting
>within the given constraints available to the kernel.
>
>The top level action TLV space is extended. An attribute
>TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON
>is set by the user indicating the user is capable of processing
>these large dumps. Older user space which doesnt set this flag
>doesnt get the large (than 32) batches.
>The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many
>actions are put in a single batch. As such user space app knows how long
>to iterate (independent of the type of action being dumped)
>instead of hardcoded maximum of 32 thus maintaining backward compat.
>
>Some results dumping 1.5M actions below:
>first an unpatched tc which doesnt understand these features...
>
>prompt$ time -p tc actions ls action gact | grep index | wc -l
>150
>real 1388.43
>user 2.07
>sys 1386.79
>
>Now lets see a patched tc which sets the correct flags when requesting
>a dump:
>
>prompt$ time -p updatedtc actions ls action gact | grep index | wc -l
>150
>real 178.13
>user 2.02
>sys 176.96
>
>That is about 8x performance improvement for tc app which sets its
>receive buffer to about 32K.
>
>Signed-off-by: Jamal Hadi Salim 
>---
> include/uapi/linux/rtnetlink.h | 22 +--
> net/sched/act_api.c| 50 +-
> 2 files changed, 60 insertions(+), 12 deletions(-)
>
>diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
>index 8f07957..7d2030f 100644
>--- a/include/uapi/linux/rtnetlink.h
>+++ b/include/uapi/linux/rtnetlink.h
>@@ -693,10 +693,28 @@ struct tcamsg {
>   unsigned char   tca__pad1;
>   unsigned short  tca__pad2;
> };
>+
>+enum {
>+  TCA_ROOT_UNSPEC,
>+  TCA_ROOT_TAB,
>+#define TCA_ACT_TAB TCA_ROOT_TAB
>+#define TCAA_MAX TCA_ROOT_TAB
>+  TCA_ROOT_FLAGS,
>+  TCA_ROOT_COUNT,
>+  __TCA_ROOT_MAX,
>+#define   TCA_ROOT_MAX (__TCA_ROOT_MAX - 1)
>+};
>+
> #define TA_RTA(r)  ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct 
> tcamsg
> #define TA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcamsg))
>-#define TCA_ACT_TAB 1 /* attr type must be >=1 */ 
>-#define TCAA_MAX 1
>+/* tcamsg flags stored in attribute TCA_ROOT_FLAGS
>+ *
>+ * TCA_FLAG_LARGE_DUMP_ON user->kernel to request for larger than 
>TCA_ACT_MAX_PRIO
>+ * actions in a dump. All dump responses will contain the number of actions
>+ * being dumped stored in for user app's consumption in TCA_ROOT_COUNT
>+ *
>+ */
>+#define TCA_FLAG_LARGE_DUMP_ON(1 << 0)
> 
> /* New extended info filters for IFLA_EXT_MASK */
> #define RTEXT_FILTER_VF   (1 << 0)
>diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>index 400eb6e..935dc19 100644
>--- a/net/sched/act_api.c
>+++ b/net/sched/act_api.c
>@@ -110,6 +110,7 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, 
>struct sk_buff *skb,
>  struct netlink_callback *cb)
> {
>   int err = 0, index = -1, i = 0, s_i = 0, n_i = 0;
>+  u32 act_flags = cb->args[2];
>   struct nlattr *nest;
> 
>   spin_lock_bh(>lock);
>@@ -138,14 +139,18 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, 
>struct sk_buff *skb,
>   }
>   nla_nest_end(skb, nest);
>   n_i++;
>-  if (n_i >= TCA_ACT_MAX_PRIO)
>+  if (!(act_flags & TCA_FLAG_LARGE_DUMP_ON) &&
>+  n_i >= TCA_ACT_MAX_PRIO)
>   goto done;
>   }
>   }
> done:
>   spin_unlock_bh(>lock);
>-  if (n_i)
>+  if (n_i) {
>   cb->args[0] += n_i;
>+  if (act_flags & TCA_FLAG_LARGE_DUMP_ON)
>+  cb->args[1] = n_i;
>+  }
>   return n_i;
> 
> nla_put_failure:
>@@ -1068,11 +1073,17 @@ static int tcf_action_add(struct net *net, struct 
>nlattr *nla,
>   return tcf_add_notify(net, n, , portid);
> }
> 
>+static u32 allowed_flags = TCA_FLAG_LARGE_DUMP_ON;

An empty line would be nice. Also, since this is outside a function, some
context prefix would be nice:
static u32 tcaa_root_flags_allowed = TCA_FLAG_LARGE_DUMP_ON;


>+static const struct nla_policy tcaa_policy[TCA_ROOT_MAX + 1] = {
>+  [TCA_ROOT_FLAGS] = { .type = NLA_FLAG_BITS,
>+   .validation_data = _flags },
>+};
>+
> static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n,
>struct netlink_ext_ack *extack)
> {
>   struct net *net = sock_net(skb->sk);
>-  struct nlattr *tca[TCAA_MAX + 1];
>+  struct nlattr *tca[TCA_ROOT_MAX + 1];
>   u32 

[net 2/6] net/mlx5: Continue health polling until it is explicitly stopped

2017-06-11 Thread Saeed Mahameed
From: Mohamad Haj Yahia 

The issue is that when we get an assert we will stop polling the health
and thus we cant enter error state when we have a real health issue.

Fixes: fd76ee4da55a ('net/mlx5_core: Fix internal error detection conditions')
Signed-off-by: Mohamad Haj Yahia 
Reviewed-by: Daniel Jurgens 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/health.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c 
b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index 44f59b1d6f0f..f27f84ffbc85 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -275,10 +275,8 @@ static void poll_health(unsigned long data)
struct mlx5_core_health *health = >priv.health;
u32 count;
 
-   if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) {
-   mod_timer(>timer, get_next_poll_jiffies());
-   return;
-   }
+   if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
+   goto out;
 
count = ioread32be(health->health_counter);
if (count == health->prev)
@@ -290,8 +288,6 @@ static void poll_health(unsigned long data)
if (health->miss_counter == MAX_MISSES) {
dev_err(>pdev->dev, "device's health compromised - reached 
miss count\n");
print_health_info(dev);
-   } else {
-   mod_timer(>timer, get_next_poll_jiffies());
}
 
if (in_fatal(dev) && !health->sick) {
@@ -305,6 +301,9 @@ static void poll_health(unsigned long data)
"new health works are not permitted at this 
stage\n");
spin_unlock(>wq_lock);
}
+
+out:
+   mod_timer(>timer, get_next_poll_jiffies());
 }
 
 void mlx5_start_health_poll(struct mlx5_core_dev *dev)
-- 
2.11.0



[net 1/6] net/mlx5: Fix create vport flow table flow

2017-06-11 Thread Saeed Mahameed
From: Mohamad Haj Yahia 

Send vport number to the create flow table inner method instead of
ignoring the vport argument and sending always 0.

Fixes: b3ba51498bdd ('net/mlx5: Refactor create flow table method to accept 
underlay QP')
Signed-off-by: Mohamad Haj Yahia 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 0e487e8ca634..8f5125ccd8d4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -862,7 +862,7 @@ struct mlx5_flow_table *mlx5_create_vport_flow_table(struct 
mlx5_flow_namespace
ft_attr.level   = level;
ft_attr.prio= prio;
 
-   return __mlx5_create_flow_table(ns, _attr, FS_FT_OP_MOD_NORMAL, 0);
+   return __mlx5_create_flow_table(ns, _attr, FS_FT_OP_MOD_NORMAL, 
vport);
 }
 
 struct mlx5_flow_table*
-- 
2.11.0



[pull request][net 0/6] Mellanox mlx5 fixes 2017-06-11

2017-06-11 Thread Saeed Mahameed
Hi Dave,

This series contains some fixes for the mlx5 core and netdev driver.

Please pull and let me know if there's any problem.

For -stable:
("net/mlx5e: Added BW check for DIM decision mechanism")  kernels 
>= 4.9
("net/mlx5e: Fix wrong indications in DIM due to counter wraparound") kernels 
>= 4.9
("net/mlx5: Remove several module events out of ethtool stats")   kernels 
>= 4.10
("net/mlx5: Enable 4K UAR only when page size is bigger than 4K") kernels 
>= 4.11

*all patches apply with no issue on their -stable.

BTW I never asked you if this way of marking patches to -stable is good for you 
?
if you prefer another format please let me know.

Thanks,
Saeed.
---

The following changes since commit b87fa0fafef4b16495740432f4eb8262efa500d0:

  Merge branch 'mvpp2-fixes' (2017-06-10 18:22:56 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
tags/mlx5-fixes-2017-06-11

for you to fetch changes up to 91828bd89940e8145f91751a015bc11bc486aad0:

  net/mlx5: Enable 4K UAR only when page size is bigger than 4K (2017-06-11 
13:10:36 +0300)


mlx5-fixes-2017-06-11


Huy Nguyen (1):
  net/mlx5: Remove several module events out of ethtool stats

Majd Dibbiny (1):
  net/mlx5: Enable 4K UAR only when page size is bigger than 4K

Mohamad Haj Yahia (2):
  net/mlx5: Fix create vport flow table flow
  net/mlx5: Continue health polling until it is explicitly stopped

Tal Gilboa (2):
  net/mlx5e: Added BW check for DIM decision mechanism
  net/mlx5e: Fix wrong indications in DIM due to counter wraparound

 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  8 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 45 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 11 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/health.c   | 11 +++---
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  6 ++-
 6 files changed, 43 insertions(+), 40 deletions(-)


[net 4/6] net/mlx5e: Added BW check for DIM decision mechanism

2017-06-11 Thread Saeed Mahameed
From: Tal Gilboa 

DIM (Dynamically-tuned Interrupt Moderation) is a mechanism designed for
changing the channel interrupt moderation values in order to reduce CPU
overhead for all traffic types.
Until now only interrupt and packet rate were sampled.
We found a scenario on which we get a false indication since a change in
DIM caused more aggregation and reduced packet rate while increasing BW.

We now regard a change as succesfull iff:
current_BW > (prev_BW + threshold) or
current_BW ~= prev_BW and current_PR > (prev_PR + threshold) or
current_BW ~= prev_BW and current_PR ~= prev_PR and
current_IR < (prev_IR - threshold)
Where BW = Bandwidth, PR = Packet rate and IR = Interrupt rate

Improvements (ConnectX-4Lx 25GbE, single RX queue, LRO off)
--
packet size | before[Mb/s] | after[Mb/s] | gain  |
2B  | 343.4| 359.4   |  4.5% |
16B | 2739.7   | 2814.8  |  2.7% |
64B | 9739 | 10185.3 |  4.5% |

Fixes: cb3c7fd4f839 ("net/mlx5e: Support adaptive RX coalescing")
Signed-off-by: Tal Gilboa 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  2 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 37 --
 2 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 2fd044b23875..429da21a583d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -458,12 +458,14 @@ struct mlx5e_mpw_info {
 
 struct mlx5e_rx_am_stats {
int ppms; /* packets per msec */
+   int bpms; /* bytes per msec */
int epms; /* events per msec */
 };
 
 struct mlx5e_rx_am_sample {
ktime_t time;
unsigned intpkt_ctr;
+   unsigned intbyte_ctr;
u16 event_ctr;
 };
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index 02dd3a95ed8f..54cdda957f9a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -183,28 +183,27 @@ static void mlx5e_am_exit_parking(struct mlx5e_rx_am *am)
mlx5e_am_step(am);
 }
 
+#define IS_SIGNIFICANT_DIFF(val, ref) \
+   (((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference 
*/
+
 static int mlx5e_am_stats_compare(struct mlx5e_rx_am_stats *curr,
  struct mlx5e_rx_am_stats *prev)
 {
-   int diff;
-
-   if (!prev->ppms)
-   return curr->ppms ? MLX5E_AM_STATS_BETTER :
+   if (!prev->bpms)
+   return curr->bpms ? MLX5E_AM_STATS_BETTER :
MLX5E_AM_STATS_SAME;
 
-   diff = curr->ppms - prev->ppms;
-   if (((100 * abs(diff)) / prev->ppms) > 10) /* more than 10% diff */
-   return (diff > 0) ? MLX5E_AM_STATS_BETTER :
-   MLX5E_AM_STATS_WORSE;
+   if (IS_SIGNIFICANT_DIFF(curr->bpms, prev->bpms))
+   return (curr->bpms > prev->bpms) ? MLX5E_AM_STATS_BETTER :
+  MLX5E_AM_STATS_WORSE;
 
-   if (!prev->epms)
-   return curr->epms ? MLX5E_AM_STATS_WORSE :
-   MLX5E_AM_STATS_SAME;
+   if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
+   return (curr->ppms > prev->ppms) ? MLX5E_AM_STATS_BETTER :
+  MLX5E_AM_STATS_WORSE;
 
-   diff = curr->epms - prev->epms;
-   if (((100 * abs(diff)) / prev->epms) > 10) /* more than 10% diff */
-   return (diff < 0) ? MLX5E_AM_STATS_BETTER :
-   MLX5E_AM_STATS_WORSE;
+   if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
+   return (curr->epms < prev->epms) ? MLX5E_AM_STATS_BETTER :
+  MLX5E_AM_STATS_WORSE;
 
return MLX5E_AM_STATS_SAME;
 }
@@ -266,6 +265,7 @@ static void mlx5e_am_sample(struct mlx5e_rq *rq,
 {
s->time  = ktime_get();
s->pkt_ctr   = rq->stats.packets;
+   s->byte_ctr  = rq->stats.bytes;
s->event_ctr = rq->cq.event_ctr;
 }
 
@@ -278,12 +278,15 @@ static void mlx5e_am_calc_stats(struct mlx5e_rx_am_sample 
*start,
/* u32 holds up to 71 minutes, should be enough */
u32 delta_us = ktime_us_delta(end->time, start->time);
unsigned int npkts = end->pkt_ctr - start->pkt_ctr;
+   unsigned int nbytes = end->byte_ctr - start->byte_ctr;
 
if (!delta_us)
return;
 
-   curr_stats->ppms =(npkts * USEC_PER_MSEC) / delta_us;
-   curr_stats->epms = (MLX5E_AM_NEVENTS * USEC_PER_MSEC) / delta_us;
+   curr_stats->ppms = 

[net 3/6] net/mlx5: Remove several module events out of ethtool stats

2017-06-11 Thread Saeed Mahameed
From: Huy Nguyen 

Remove the following module event counters out of ethtool stats. The
reason for removing these event counters is that these events do not
occur without techinician's intervention.
  module_pwr_budget_exd
  module_long_range
  module_no_eeprom
  module_enforce_part
  module_unknown_id
  module_unknown_status
  module_plug

Fixes: bedb7c909c19 ("net/mlx5e: Add port module event counters to ethtool 
stats")
Signed-off-by: Huy Nguyen 
Reviewed by: Gal Pressman 

Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 53e4992d6511..f81c3aa60b46 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -417,20 +417,13 @@ struct mlx5e_stats {
 };
 
 static const struct counter_desc mlx5e_pme_status_desc[] = {
-   { "module_plug", 0 },
{ "module_unplug", 8 },
 };
 
 static const struct counter_desc mlx5e_pme_error_desc[] = {
-   { "module_pwr_budget_exd", 0 },  /* power budget exceed */
-   { "module_long_range", 8 },  /* long range for non MLNX cable */
-   { "module_bus_stuck", 16 },  /* bus stuck (I2C or data shorted) */
-   { "module_no_eeprom", 24 },  /* no eeprom/retry time out */
-   { "module_enforce_part", 32 },   /* enforce part number list */
-   { "module_unknown_id", 40 }, /* unknown identifier */
-   { "module_high_temp", 48 },  /* high temperature */
+   { "module_bus_stuck", 16 },   /* bus stuck (I2C or data shorted) */
+   { "module_high_temp", 48 },   /* high temperature */
{ "module_bad_shorted", 56 },/* bad or shorted cable/module */
-   { "module_unknown_status", 64 },
 };
 
 #endif /* __MLX5_EN_STATS_H__ */
-- 
2.11.0



[net 6/6] net/mlx5: Enable 4K UAR only when page size is bigger than 4K

2017-06-11 Thread Saeed Mahameed
From: Majd Dibbiny 

When the page size isn't bigger than 4K, there is no added value of enabling 4K
UAR feature in the Firmware.

Modified the condition of enabling the 4K UAR accordingly.

Fixes: f502d834950a ("net/mlx5: Activate support for 4K UARs")
Signed-off-by: Majd Dibbiny 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index af945edfee19..4f577a5abf88 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -537,8 +537,10 @@ static int handle_hca_cap(struct mlx5_core_dev *dev)
/* disable cmdif checksum */
MLX5_SET(cmd_hca_cap, set_hca_cap, cmdif_checksum, 0);
 
-   /* If the HCA supports 4K UARs use it */
-   if (MLX5_CAP_GEN_MAX(dev, uar_4k))
+   /* Enable 4K UAR only when HCA supports it and page size is bigger
+* than 4K.
+*/
+   if (MLX5_CAP_GEN_MAX(dev, uar_4k) && PAGE_SIZE > 4096)
MLX5_SET(cmd_hca_cap, set_hca_cap, uar_4k, 1);
 
MLX5_SET(cmd_hca_cap, set_hca_cap, log_uar_page_sz, PAGE_SHIFT - 12);
-- 
2.11.0



[net 5/6] net/mlx5e: Fix wrong indications in DIM due to counter wraparound

2017-06-11 Thread Saeed Mahameed
From: Tal Gilboa 

DIM (Dynamically-tuned Interrupt Moderation) is a mechanism designed for
changing the channel interrupt moderation values in order to reduce CPU
overhead for all traffic types.
Each iteration of the algorithm, DIM calculates the difference in
throughput, packet rate and interrupt rate from last iteration in order
to make a decision. DIM relies on counters for each metric. When these
counters get to their type's max value they wraparound. In this case
the delta between 'end' and 'start' samples is negative and when
translated to unsigned integers - very high. This results in a false
indication to the algorithm and might result in a wrong decision.

The fix calculates the 'distance' between 'end' and 'start' samples in a
cyclic way around the relevant type's max value. It can also be viewed as
an absolute value around the type's max value instead of around 0.

Testing show higher stability in DIM profile selection and no wraparound
issues.

Fixes: cb3c7fd4f839 ("net/mlx5e: Support adaptive RX coalescing")
Signed-off-by: Tal Gilboa 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  8 
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 10 +++---
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 429da21a583d..944fc1742464 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -463,10 +463,10 @@ struct mlx5e_rx_am_stats {
 };
 
 struct mlx5e_rx_am_sample {
-   ktime_t time;
-   unsigned intpkt_ctr;
-   unsigned intbyte_ctr;
-   u16 event_ctr;
+   ktime_t time;
+   u32 pkt_ctr;
+   u32 byte_ctr;
+   u16 event_ctr;
 };
 
 struct mlx5e_rx_am { /* Adaptive Moderation */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index 54cdda957f9a..acf32fe952cd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -270,6 +270,8 @@ static void mlx5e_am_sample(struct mlx5e_rq *rq,
 }
 
 #define MLX5E_AM_NEVENTS 64
+#define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
+#define BIT_GAP(bits, end, start) end) - (start)) + BIT_ULL(bits)) & 
(BIT_ULL(bits) - 1))
 
 static void mlx5e_am_calc_stats(struct mlx5e_rx_am_sample *start,
struct mlx5e_rx_am_sample *end,
@@ -277,8 +279,9 @@ static void mlx5e_am_calc_stats(struct mlx5e_rx_am_sample 
*start,
 {
/* u32 holds up to 71 minutes, should be enough */
u32 delta_us = ktime_us_delta(end->time, start->time);
-   unsigned int npkts = end->pkt_ctr - start->pkt_ctr;
-   unsigned int nbytes = end->byte_ctr - start->byte_ctr;
+   u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
+   u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
+start->byte_ctr);
 
if (!delta_us)
return;
@@ -311,7 +314,8 @@ void mlx5e_rx_am(struct mlx5e_rq *rq)
 
switch (am->state) {
case MLX5E_AM_MEASURE_IN_PROGRESS:
-   nevents = rq->cq.event_ctr - am->start_sample.event_ctr;
+   nevents = BIT_GAP(BITS_PER_TYPE(u16), rq->cq.event_ctr,
+ am->start_sample.event_ctr);
if (nevents < MLX5E_AM_NEVENTS)
break;
mlx5e_am_sample(rq, _sample);
-- 
2.11.0



Re: [PATCH net-next v10 2/4] net sched actions: Use proper root attribute table for actions

2017-06-11 Thread Jiri Pirko
Sun, Jun 11, 2017 at 01:53:44PM CEST, j...@mojatatu.com wrote:
>From: Jamal Hadi Salim 
>
>Bug fix for an issue which has been around for about a decade.
>We got away with it because the enumeration was larger than needed.
>
>Fixes: 7ba699c604ab ("[NET_SCHED]: Convert actions from rtnetlink to new 
>netlink API")
>Suggested-by: Jiri Pirko 
>Reviewed-by: Simon Horman 
>Signed-off-by: Jamal Hadi Salim 

Reviewed-by: Jiri Pirko 


Re: [PATCH v2] brcmfmac: Fix glom_skb leak in brcmf_sdiod_recv_chain

2017-06-11 Thread Andy Shevchenko
On Sat, Jun 10, 2017 at 10:27 PM, Arend van Spriel
 wrote:
> On 03-06-17 17:36, Andy Shevchenko wrote:
>> On Sat, Jun 3, 2017 at 1:29 AM, Peter S. Housel  wrote:

The following looks good to me.
Feel free to add
Reviewed-by: Andy Shevchenko 

> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
> @@ -705,7 +705,7 @@ int brcmf_sdiod_recv_pkt(struct brcmf_sdio_dev
> *sdiodev, struct sk_buff *pkt)
>  int brcmf_sdiod_recv_chain(struct brcmf_sdio_dev *sdiodev,
>struct sk_buff_head *pktq, uint totlen)
>  {
> -   struct sk_buff *glom_skb;
> +   struct sk_buff *glom_skb = NULL;
> struct sk_buff *skb;
> u32 addr = sdiodev->sbwad;
> int err = 0;
> @@ -726,10 +726,8 @@ int brcmf_sdiod_recv_chain(struct brcmf_sdio_dev
> *sdiodev,
> return -ENOMEM;
> err = brcmf_sdiod_buffrw(sdiodev, SDIO_FUNC_2, false, addr,
>  glom_skb);
> -   if (err) {
> -   brcmu_pkt_buf_free_skb(glom_skb);
> +   if (err)
> goto done;
> -   }
>
> skb_queue_walk(pktq, skb) {
> memcpy(skb->data, glom_skb->data, skb->len);
> @@ -740,6 +738,7 @@ int brcmf_sdiod_recv_chain(struct brcmf_sdio_dev
> *sdiodev,
> pktq);
>
>  done:
> +   brcmu_pkt_buf_free_skb(glom_skb);
> return err;
>  }
>



-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH net-next v10 1/4] net netlink: Add new type NLA_FLAG_BITS

2017-06-11 Thread Jiri Pirko
Sun, Jun 11, 2017 at 01:53:43PM CEST, j...@mojatatu.com wrote:
>From: Jamal Hadi Salim 
>
>Generic bitflags attribute content sent to the kernel by user.
>With this type the user can either set or unset a flag in the
>kernel.
>
>The nla_flag_values is a bitmap that defines the values being set
>The nla_flag_selector is a bitmask that defines which value is legit.
>
>A check is made to ensure the rules that a kernel subsystem always
>conforms to bitflags the kernel already knows about. i.e
>if the user tries to set a bit flag that is not understood then
>the _it will be rejected_.
>
>In the most basic form, the user specifies the attribute policy as:
>[ATTR_GOO] = { .type = NLA_FLAG_BITS, .validation_data =  },
>
>where myvalidflags is the bit mask of the flags the kernel understands.
>
>If the user _does not_ provide myvalidflags then the attribute will
>also be rejected.
>
>Examples:
>nla_flag_values = 0x0, and nla_flag_selector = 0x1
>implies we are selecting bit 1 and we want to set its value to 0.
>
>nla_flag_values = 0x2, and nla_flag_selector = 0x2
>implies we are selecting bit 2 and we want to set its value to 1.
>
>This patch also provides an extra feature: a validation callback
>that could be speaciliazed for other types.

s/speaciliazed/speciliazed/


>This feature is intended to be used by a kernel subsystem to check
>for a combination of bits being present. Example "bit x is valid
>only if bit y and z are present".
>
>So a kernel subsystem could specify validation rules of the following
>nature:
>
>[ATTR_GOO] = { .type = MYTYPE,
>  .validation_data = _data,
>   .validate_content = mycontent_validator },

Indent is wrong. (Does not matter really in desc, but anyway)


>
>With validator callback looking like:
>
>int mycontent_validator(const struct nlattr *nla, void *valid_data)
>{
>   const struct myattribute *user_data = nla_data(nla);
>   struct myvalidation_struct *valid_data_constraint = valid_data;
>
>  ... return appropriate error code etc ...
>}
>
>
>Signed-off-by: Jamal Hadi Salim 

Suggested-by: Jiri Pirko 


>---
> include/net/netlink.h  | 11 +++
> include/uapi/linux/rtnetlink.h | 17 +
> lib/nlattr.c   | 25 +
> 3 files changed, 53 insertions(+)
>
>diff --git a/include/net/netlink.h b/include/net/netlink.h
>index 0170917..8ab9784 100644
>--- a/include/net/netlink.h
>+++ b/include/net/netlink.h
>@@ -6,6 +6,11 @@
> #include 
> #include 
> 
>+struct nla_bit_flags {
>+  u32 nla_flag_values;
>+  u32 nla_flag_selector;
>+};

I don't understand why you redefine the struct here. You already have it
defined in the uapi: struct __nla_bit_flags

Just move this (struct nla_bit_flags) to the uapi and remove
__nla_bit_flags ?



>+
> /* 
>  * Netlink Messages and Attributes Interface (As Seen On TV)
>  * 
>@@ -178,6 +183,7 @@ enum {
>   NLA_S16,
>   NLA_S32,
>   NLA_S64,
>+  NLA_FLAG_BITS,
>   __NLA_TYPE_MAX,
> };
> 
>@@ -206,6 +212,7 @@ enum {
>  *NLA_MSECSLeaving the length field zero will verify the
>  * given type fits, using it verifies minimum length
>  * just like "All other"
>+ *NLA_FLAG_BITSA bitmap/bitselector attribute
>  *All otherMinimum length of attribute payload
>  *
>  * Example:
>@@ -213,11 +220,15 @@ enum {
>  *[ATTR_FOO] = { .type = NLA_U16 },
>  *[ATTR_BAR] = { .type = NLA_STRING, .len = BARSIZ },
>  *[ATTR_BAZ] = { .len = sizeof(struct mystruct) },
>+ *[ATTR_GOO] = { .type = NLA_FLAG_BITS, .validation_data =  
>},
>  * };
>  */
> struct nla_policy {
>   u16 type;
>   u16 len;
>+  void*validation_data;
>+  int (*validate_content)(const struct nlattr *nla,
>+  const void *validation_data);
> };
> 
> /**
>diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
>index 564790e..8f07957 100644
>--- a/include/uapi/linux/rtnetlink.h
>+++ b/include/uapi/linux/rtnetlink.h
>@@ -179,6 +179,23 @@ struct rtattr {
> #define RTA_DATA(rta)   ((void*)(((char*)(rta)) + RTA_LENGTH(0)))
> #define RTA_PAYLOAD(rta) ((int)((rta)->rta_len) - RTA_LENGTH(0))
> 
>+/* Generic bitflags attribute content sent to the kernel.
>+ *
>+ * The nla_flag_values is a bitmap that defines the values being set
>+ * The nla_flag_selector is a bitmask that defines which value is legit
>+ *
>+ * Examples:
>+ *  nla_flag_values = 0x0, and nla_flag_selector = 0x1
>+ *  implies we are selecting bit 1 and we want to set its value to 0.
>+ *
>+ *  nla_flag_values = 0x2, and nla_flag_selector = 0x2
>+ *  implies we are selecting bit 2 and we want to set its 

Re: ARM GLX Khadas VIM Pro - Ethernet detected as only 10Mbps and stalled after some traffic

2017-06-11 Thread crow
Hi Andrew

On Sat, Jun 10, 2017 at 5:27 PM, Andrew Lunn  wrote:
>> Also what Martin Blumenstingl wrote is following which is also crucial
>> for fixing the issue:
>> Amlogic has given their ethernet PHY driver some updates [2], it now
>> includes wake-on-lan, and they now have an internal_phy_read_status
>> which uses reset_internal_phy if there's a link and some error counter
>> exceeds some magic value.
>
> Hi Crow
>
> You could probably just drop the Amlogic driver into mainline and see
> if it works better. If that solves your problem, we can look at
> merging the changes.
>
> Andrew


Please ignore my previus email as I used wrong kernel (not the patched
one, see the modinfo).

Thank your for the suggestion, and thanks Martin to explaining me over
IRC what actually I should do.

I recompiled mainline kernel 4.12-rc4 with the Amlogic driver:
replaced drivers/net/phy/meson-gxl.c with
https://github.com/khadas/linux/blob/ubuntu-4.9/drivers/amlogic/ethernet/phy/amlogic.c

But this did not solve the issue. As soon as i start git clone i lose
network connection to device (no session timeout/disconnect this time,
but I am unable to reconnect over SSH or to get OK ping replay back).

Linux khadasvimpro 4.12.0-rc4-4-ARCH #1 SMP Sun Jun 11 14:33:40 CEST
2017 aarch64 GNU/Linux

# modinfo meson_gxl
filename:
/lib/modules/4.12.0-rc4-4-ARCH/kernel/drivers/net/phy/meson-gxl.ko.gz
license:GPL
author: Yizhou Jiang
description:amlogic internal ethernet phy driver
alias:  mdio:0001100101000100
depends:
intree: Y
vermagic:   4.12.0-rc4-4-ARCH SMP mod_unload aarch64
#

# mii-tool -vvv eth0
Using SIOCGMIIPHY=0x8947
eth0: negotiated 1000baseT-HD flow-control, link ok
  registers for MII PHY 8:
1000 782d 0181 4400 01e1 c1e1 000f 2001
       
0040 0082 40e8 5400 0d80 1000  a900
fff0   130a 1407 00ca  105a
  product info: vendor 00:60:51, model 0 rev 0
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
10baseT-FD 10baseT-HD
  advertising:  1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
10baseT-FD 10baseT-HD
  link partner: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
10baseT-FD 10baseT-HD
#


over SSH startet following but it stall already at 1%:

$ git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Cloning into 'linux-stable'...
remote: Counting objects: 5948690, done.
remote: Compressing objects: 100% (124799/124799), done.
Receiving objects:   1% (110361/5948690), 48.02 MiB | 6.54 MiB/s


journalctl shows timeout while trying to sync with NTP server:
systemd-timesyncd[292]: Timed out waiting for reply from
91.206.8.36:123 (2.at.pool.ntp.org).
systemd-timesyncd[292]: Timed out waiting for reply from
131.130.251.107:123 (2.at.pool.ntp.org).

ping to this device from other client: 100% packet loss

dumping register after stall state
# mii-tool -vvv eth0
Using SIOCGMIIPHY=0x8947
eth0: negotiated 1000baseT-HD flow-control, link ok
  registers for MII PHY 8:
1000 782d 0181 4400 01e1 c1e1 000d 2001
       
0040 0082 40e8 5400 0d80 1000  a900
fff0   130a 1407   105a
  product info: vendor 00:60:51, model 0 rev 0
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
10baseT-FD 10baseT-HD
  advertising:  1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
10baseT-FD 10baseT-HD
  link partner: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
10baseT-FD 10baseT-HD
#

after taking eth0 down and up, I am able to login to device over SSH
again, and ping from other devices in network to this device are all
OK.

# ifconfig eth0 down && ifconfig eth0 up
# dmesg -T | tail -n 10
[Sun Jun 11 15:03:02 2017] internal phy init
[Sun Jun 11 15:03:02 2017] internal phy init
[Sun Jun 11 15:03:02 2017] amlogic internal phy 0.e40908ff:08:
attached PHY driver [amlogic internal phy]
(mii_bus:phy_addr=0.e40908ff:08, irq=-1)
[Sun Jun 11 15:03:02 2017] meson8b-dwmac c941.ethernet eth0: PTP
not supported by HW
[Sun Jun 11 15:03:03 2017] wol_reg12[12]==0, error
[Sun Jun 11 15:03:04 2017] meson8b-dwmac c941.ethernet eth0: Link
is Up - 100Mbps/Full - flow control off

[Sun Jun 11 15:10:38 2017] internal phy init
[Sun Jun 11 15:10:38 2017] internal phy init
[Sun Jun 11 15:10:38 2017] amlogic internal phy 0.e40908ff:08:
attached PHY driver [amlogic internal phy]
(mii_bus:phy_addr=0.e40908ff:08, irq=-1)
[Sun Jun 11 15:10:38 2017] meson8b-dwmac c941.ethernet eth0: PTP
not supported by HW
[Sun Jun 11 15:10:39 2017] wol_reg12[12]==0, error
[Sun Jun 11 15:10:40 2017] meson8b-dwmac c941.ethernet eth0: Link
is Up - 100Mbps/Full - flow control off

then I took eth0 and wlan0 up (same ArchLinuxArm 

Re: [PATCH net-next 9/9] net: hns3: Add HNS3 driver to kernel build framework & MAINTAINERS

2017-06-11 Thread kbuild test robot
Hi Salil,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Salil-Mehta/Hisilicon-Network-Subsystem-3-Ethernet-Driver/20170611-204908
config: alpha-allyesconfig (attached as .config)
compiler: alpha-linux-gnu-gcc (Debian 6.3.0-18) 6.3.0 20170516
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=alpha 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c: In function 
'hns3_nic_poll_controller':
>> drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c:61:26: error: 'struct 
>> hnae3_knic_private_info' has no member named 'num_tqp_vectors'; did you mean 
>> 'num_tqps'?
 for (i = 0; i < h->kinfo.num_tqp_vectors; i++)
 ^
>> drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c:62:26: error: 'struct 
>> hnae3_knic_private_info' has no member named 'tqp_vectors'
  napi_schedule(>kinfo.tqp_vectors[i].napi);
 ^
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c: At top level:
>> drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c:1275:19: error: 
>> initialization from incompatible pointer type 
>> [-Werror=incompatible-pointer-types]
 .ndo_setup_tc  = hns3_nic_setup_tc,
  ^
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c:1275:19: note: (near 
initialization for 'hns3_nic_netdev_ops.ndo_setup_tc')
   cc1: some warnings being treated as errors

vim +61 drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c

b7d58d74 Salil Mehta 2017-06-10  55 struct hns3_nic_priv *priv = 
netdev_priv(ndev);
b7d58d74 Salil Mehta 2017-06-10  56 struct hnae3_handle *h = 
priv->ae_handle;
b7d58d74 Salil Mehta 2017-06-10  57 unsigned long flag;
b7d58d74 Salil Mehta 2017-06-10  58 int i;
b7d58d74 Salil Mehta 2017-06-10  59  
b7d58d74 Salil Mehta 2017-06-10  60 local_irq_save(flag);
b7d58d74 Salil Mehta 2017-06-10 @61 for (i = 0; i < 
h->kinfo.num_tqp_vectors; i++)
b7d58d74 Salil Mehta 2017-06-10 @62 
napi_schedule(>kinfo.tqp_vectors[i].napi);
b7d58d74 Salil Mehta 2017-06-10  63 local_irq_restore(flag);
b7d58d74 Salil Mehta 2017-06-10  64  }
b7d58d74 Salil Mehta 2017-06-10  65  #endif

:: The code at line 61 was first introduced by commit
:: b7d58d744ff1224947a4cf8373ee9855e5a29af5 net: hns3: Add support of HNS3 
Ethernet Driver for hip08 SoC

:: TO: Salil Mehta <salil.me...@huawei.com>
:: CC: 0day robot <fengguang...@intel.com>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH net-next 9/9] net: hns3: Add HNS3 driver to kernel build framework & MAINTAINERS

2017-06-11 Thread kbuild test robot
Hi Salil,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Salil-Mehta/Hisilicon-Network-Subsystem-3-Ethernet-Driver/20170611-204908
config: parisc-allyesconfig (attached as .config)
compiler: hppa-linux-gnu-gcc (Debian 6.3.0-18) 6.3.0 20170516
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=parisc 

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c: In function 
'hclge_update_speed_duplex':
>> drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:2266:1: warning: the 
>> frame size of 1440 bytes is larger than 1024 bytes [-Wframe-larger-than=]
}
^

vim +2266 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c

14965015 Salil Mehta 2017-06-10  2250   if (ret) {
14965015 Salil Mehta 2017-06-10  2251   dev_err(>pdev->dev,
14965015 Salil Mehta 2017-06-10  2252   "mac 
autoneg/speed/duplex query failed %d\n", ret);
14965015 Salil Mehta 2017-06-10  2253   return ret;
14965015 Salil Mehta 2017-06-10  2254   }
14965015 Salil Mehta 2017-06-10  2255  
14965015 Salil Mehta 2017-06-10  2256   if ((mac.speed != speed) || (mac.duplex 
!= duplex)) {
14965015 Salil Mehta 2017-06-10  2257   ret = 
hclge_cfg_mac_speed_dup(hdev, speed, duplex);
14965015 Salil Mehta 2017-06-10  2258   if (ret) {
14965015 Salil Mehta 2017-06-10  2259   
dev_err(>pdev->dev,
14965015 Salil Mehta 2017-06-10  2260   "mac 
speed/duplex config failed %d\n", ret);
14965015 Salil Mehta 2017-06-10  2261   return ret;
14965015 Salil Mehta 2017-06-10  2262   }
14965015 Salil Mehta 2017-06-10  2263   }
14965015 Salil Mehta 2017-06-10  2264  
14965015 Salil Mehta 2017-06-10  2265   return 0;
14965015 Salil Mehta 2017-06-10 @2266  }
14965015 Salil Mehta 2017-06-10  2267  
14965015 Salil Mehta 2017-06-10  2268  static int 
hclge_update_speed_duplex_h(struct hnae3_handle *handle)
14965015 Salil Mehta 2017-06-10  2269  {
14965015 Salil Mehta 2017-06-10  2270   struct hclge_vport *vport = 
hclge_get_vport(handle);
14965015 Salil Mehta 2017-06-10  2271   struct hclge_dev *hdev = vport->back;
14965015 Salil Mehta 2017-06-10  2272  
14965015 Salil Mehta 2017-06-10  2273   return hclge_update_speed_duplex(hdev);
14965015 Salil Mehta 2017-06-10  2274  }

:: The code at line 2266 was first introduced by commit
:: 1496501507394557c091dfe9d27fe0124802fbee net: hns3: Add HNS3 
Acceleration Engine & Compatibility Layer Support

:: TO: Salil Mehta <salil.me...@huawei.com>
:: CC: 0day robot <fengguang...@intel.com>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 0/6] Constant Time Memory Comparisons Are Important

2017-06-11 Thread Kees Cook
On Sun, Jun 11, 2017 at 1:13 AM, Kalle Valo  wrote:
> "Jason A. Donenfeld"  writes:
>
>> Whenever you're comparing two MACs, it's important to do this using
>> crypto_memneq instead of memcmp. With memcmp, you leak timing information,
>> which could then be used to iteratively forge a MAC.
>
> Do you have any pointers where I could learn more about this?

While not using C specifically, this talks about the problem generally:
https://www.chosenplaintext.ca/articles/beginners-guide-constant-time-cryptography.html

-Kees

-- 
Kees Cook
Pixel Security


[PATCH net 9/9] net: ena: update ena driver to version 1.1.7

2017-06-11 Thread netanel
From: Netanel Belgazal 

Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_netdev.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h 
b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index 88b5e5612338..a4d3d5e21068 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -45,7 +45,7 @@
 
 #define DRV_MODULE_VER_MAJOR   1
 #define DRV_MODULE_VER_MINOR   1
-#define DRV_MODULE_VER_SUBMINOR 2
+#define DRV_MODULE_VER_SUBMINOR 7
 
 #define DRV_MODULE_NAME"ena"
 #ifndef DRV_MODULE_VERSION
-- 
2.7.4



[PATCH net 7/9] net: ena: disable admin msix while working in polling mode

2017-06-11 Thread netanel
From: Netanel Belgazal 

Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_com.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
b/drivers/net/ethernet/amazon/ena/ena_com.c
index ea60b9e67acb..f5b237e0bd60 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -61,6 +61,8 @@
 
 #define ENA_MMIO_READ_TIMEOUT 0x
 
+#define ENA_REGS_ADMIN_INTR_MASK 1
+
 /*/
 /*/
 /*/
@@ -1454,6 +1456,12 @@ void ena_com_admin_destroy(struct ena_com_dev *ena_dev)
 
 void ena_com_set_admin_polling_mode(struct ena_com_dev *ena_dev, bool polling)
 {
+   u32 mask_value = 0;
+
+   if (polling)
+   mask_value = ENA_REGS_ADMIN_INTR_MASK;
+
+   writel(mask_value, ena_dev->reg_bar + ENA_REGS_INTR_MASK_OFF);
ena_dev->admin_queue.polling = polling;
 }
 
-- 
2.7.4



[PATCH net 5/9] net: ena: add missing unmap bars on device removal

2017-06-11 Thread netanel
From: Netanel Belgazal 

This patch also change the mapping functions to devm_ functions

Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 1e71e89e1e18..4e9fbddd3b47 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -2853,6 +2853,11 @@ static void ena_release_bars(struct ena_com_dev 
*ena_dev, struct pci_dev *pdev)
 {
int release_bars;
 
+   if (ena_dev->mem_bar)
+   devm_iounmap(>dev, ena_dev->mem_bar);
+
+   devm_iounmap(>dev, ena_dev->reg_bar);
+
release_bars = pci_select_bars(pdev, IORESOURCE_MEM) & ENA_BAR_MASK;
pci_release_selected_regions(pdev, release_bars);
 }
@@ -2940,8 +2945,9 @@ static int ena_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
goto err_free_ena_dev;
}
 
-   ena_dev->reg_bar = ioremap(pci_resource_start(pdev, ENA_REG_BAR),
-  pci_resource_len(pdev, ENA_REG_BAR));
+   ena_dev->reg_bar = devm_ioremap(>dev,
+   pci_resource_start(pdev, ENA_REG_BAR),
+   pci_resource_len(pdev, ENA_REG_BAR));
if (!ena_dev->reg_bar) {
dev_err(>dev, "failed to remap regs bar\n");
rc = -EFAULT;
@@ -2961,8 +2967,9 @@ static int ena_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
ena_set_push_mode(pdev, ena_dev, _feat_ctx);
 
if (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV) {
-   ena_dev->mem_bar = ioremap_wc(pci_resource_start(pdev, 
ENA_MEM_BAR),
- pci_resource_len(pdev, 
ENA_MEM_BAR));
+   ena_dev->mem_bar = devm_ioremap_wc(>dev,
+  pci_resource_start(pdev, 
ENA_MEM_BAR),
+  pci_resource_len(pdev, 
ENA_MEM_BAR));
if (!ena_dev->mem_bar) {
rc = -EFAULT;
goto err_device_destroy;
-- 
2.7.4



[PATCH net 6/9] net: ena: fix theoretical Rx hang on low memory systems

2017-06-11 Thread netanel
From: Netanel Belgazal 

For the rare case where the device runs out of free rx buffer
descriptors (in case of pressure on kernel  memory),
and the napi handler continuously fail to refill new Rx descriptors
until device rx queue totally runs out of all free rx buffers
to post incoming packet, leading to a deadlock:
* The device won't send interrupts since all the new
Rx packets will be dropped.
* The napi handler won't try to allocate new Rx descriptors
since allocation is part of NAPI that's not being invoked any more

The fix involves detecting this scenario and rescheduling NAPI
(to refill buffers) by the keepalive/watchdog task.

Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_ethtool.c |  1 +
 drivers/net/ethernet/amazon/ena/ena_netdev.c  | 55 +++
 drivers/net/ethernet/amazon/ena/ena_netdev.h  |  2 +
 3 files changed, 58 insertions(+)

diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c 
b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
index 67b2338f8fb3..533b2fbdeef1 100644
--- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
+++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
@@ -94,6 +94,7 @@ static const struct ena_stats ena_stats_rx_strings[] = {
ENA_STAT_RX_ENTRY(dma_mapping_err),
ENA_STAT_RX_ENTRY(bad_desc_num),
ENA_STAT_RX_ENTRY(rx_copybreak_pkt),
+   ENA_STAT_RX_ENTRY(empty_rx_ring),
 };
 
 static const struct ena_stats ena_stats_ena_com_strings[] = {
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 4e9fbddd3b47..3c366bfbbab1 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -190,6 +190,7 @@ static void ena_init_io_rings(struct ena_adapter *adapter)
rxr->sgl_size = adapter->max_rx_sgl_size;
rxr->smoothed_interval =
ena_com_get_nonadaptive_moderation_interval_rx(ena_dev);
+   rxr->empty_rx_queue = 0;
}
 }
 
@@ -2619,6 +2620,58 @@ static void check_for_missing_tx_completions(struct 
ena_adapter *adapter)
adapter->last_monitored_tx_qid = i % adapter->num_queues;
 }
 
+/* trigger napi schedule after 2 consecutive detections */
+#define EMPTY_RX_REFILL 2
+/* For the rare case where the device runs out of Rx descriptors and the
+ * napi handler failed to refill new Rx descriptors (due to a lack of memory
+ * for example).
+ * This case will lead to a deadlock:
+ * The device won't send interrupts since all the new Rx packets will be 
dropped
+ * The napi handler won't allocate new Rx descriptors so the device will be
+ * able to send new packets.
+ *
+ * This scenario can happen when the kernel's vm.min_free_kbytes is too small.
+ * It is recommended to have at least 512MB, with a minimum of 128MB for
+ * constrained environment).
+ *
+ * When such a situation is detected - Reschedule napi
+ */
+static void check_for_empty_rx_ring(struct ena_adapter *adapter)
+{
+   struct ena_ring *rx_ring;
+   int i, refill_required;
+
+   if (!test_bit(ENA_FLAG_DEV_UP, >flags))
+   return;
+
+   if (test_bit(ENA_FLAG_TRIGGER_RESET, >flags))
+   return;
+
+   for (i = 0; i < adapter->num_queues; i++) {
+   rx_ring = >rx_ring[i];
+
+   refill_required =
+   ena_com_sq_empty_space(rx_ring->ena_com_io_sq);
+   if (unlikely(refill_required == (rx_ring->ring_size - 1))) {
+   rx_ring->empty_rx_queue++;
+
+   if (rx_ring->empty_rx_queue >= EMPTY_RX_REFILL) {
+   u64_stats_update_begin(_ring->syncp);
+   rx_ring->rx_stats.empty_rx_ring++;
+   u64_stats_update_end(_ring->syncp);
+
+   netif_err(adapter, drv, adapter->netdev,
+ "trigger refill for ring %d\n", i);
+
+   napi_schedule(rx_ring->napi);
+   rx_ring->empty_rx_queue = 0;
+   }
+   } else {
+   rx_ring->empty_rx_queue = 0;
+   }
+   }
+}
+
 /* Check for keep alive expiration */
 static void check_for_missing_keep_alive(struct ena_adapter *adapter)
 {
@@ -2673,6 +2726,8 @@ static void ena_timer_service(unsigned long data)
 
check_for_missing_tx_completions(adapter);
 
+   check_for_empty_rx_ring(adapter);
+
if (debug_area)
ena_dump_stats_to_buf(adapter, debug_area);
 
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h 
b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index 0e22bce6239d..8828f1d6dd22 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ 

[PATCH net 8/9] net: ena: bug fix in lost tx packets detection mechanism

2017-06-11 Thread netanel
From: Netanel Belgazal 

check_for_missing_tx_completions() is called from a timer
task and looking for lost tx packets.
The old implementation accumulate all the lost tx packets
and did not check if those packets were retrieved on a later stage.
This cause to a situation where the driver reset
the device for no reason.

Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_ethtool.c |  1 -
 drivers/net/ethernet/amazon/ena/ena_netdev.c  | 66 +++
 drivers/net/ethernet/amazon/ena/ena_netdev.h  | 14 +-
 3 files changed, 50 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c 
b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
index 533b2fbdeef1..3ee55e2fd694 100644
--- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
+++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
@@ -80,7 +80,6 @@ static const struct ena_stats ena_stats_tx_strings[] = {
ENA_STAT_TX_ENTRY(tx_poll),
ENA_STAT_TX_ENTRY(doorbells),
ENA_STAT_TX_ENTRY(prepare_ctx_err),
-   ENA_STAT_TX_ENTRY(missing_tx_comp),
ENA_STAT_TX_ENTRY(bad_req_id),
 };
 
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 3c366bfbbab1..4f16ed38bcf3 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1995,6 +1995,7 @@ static netdev_tx_t ena_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
 
tx_info->tx_descs = nb_hw_desc;
tx_info->last_jiffies = jiffies;
+   tx_info->print_once = 0;
 
tx_ring->next_to_use = ENA_TX_RING_IDX_NEXT(next_to_use,
tx_ring->ring_size);
@@ -2564,13 +2565,44 @@ static void ena_fw_reset_device(struct work_struct 
*work)
"Reset attempt failed. Can not reset the device\n");
 }
 
-static void check_for_missing_tx_completions(struct ena_adapter *adapter)
+static int check_missing_comp_in_queue(struct ena_adapter *adapter,
+  struct ena_ring *tx_ring)
 {
struct ena_tx_buffer *tx_buf;
unsigned long last_jiffies;
+   u32 missed_tx = 0;
+   int i;
+
+   for (i = 0; i < tx_ring->ring_size; i++) {
+   tx_buf = _ring->tx_buffer_info[i];
+   last_jiffies = tx_buf->last_jiffies;
+   if (unlikely(last_jiffies &&
+time_is_before_jiffies(last_jiffies + 
TX_TIMEOUT))) {
+   if (!tx_buf->print_once)
+   netif_notice(adapter, tx_err, adapter->netdev,
+"Found a Tx that wasn't completed 
on time, qid %d, index %d.\n",
+tx_ring->qid, i);
+
+   tx_buf->print_once = 1;
+   missed_tx++;
+
+   if (unlikely(missed_tx > MAX_NUM_OF_TIMEOUTED_PACKETS)) 
{
+   netif_err(adapter, tx_err, adapter->netdev,
+ "The number of lost tx completions is 
above the threshold (%d > %d). Reset the device\n",
+ missed_tx, 
MAX_NUM_OF_TIMEOUTED_PACKETS);
+   set_bit(ENA_FLAG_TRIGGER_RESET, 
>flags);
+   return -EIO;
+   }
+   }
+   }
+
+   return 0;
+}
+
+static void check_for_missing_tx_completions(struct ena_adapter *adapter)
+{
struct ena_ring *tx_ring;
-   int i, j, budget;
-   u32 missed_tx;
+   int i, budget, rc;
 
/* Make sure the driver doesn't turn the device in other process */
smp_rmb();
@@ -2586,31 +2618,9 @@ static void check_for_missing_tx_completions(struct 
ena_adapter *adapter)
for (i = adapter->last_monitored_tx_qid; i < adapter->num_queues; i++) {
tx_ring = >tx_ring[i];
 
-   for (j = 0; j < tx_ring->ring_size; j++) {
-   tx_buf = _ring->tx_buffer_info[j];
-   last_jiffies = tx_buf->last_jiffies;
-   if (unlikely(last_jiffies && 
time_is_before_jiffies(last_jiffies + TX_TIMEOUT))) {
-   netif_notice(adapter, tx_err, adapter->netdev,
-"Found a Tx that wasn't completed 
on time, qid %d, index %d.\n",
-tx_ring->qid, j);
-
-   u64_stats_update_begin(_ring->syncp);
-   missed_tx = tx_ring->tx_stats.missing_tx_comp++;
-   u64_stats_update_end(_ring->syncp);
-
-   /* Clear last jiffies so the lost buffer won't
-* be counted twice.
-

[PATCH net 2/9] net: ena: fix bug that might cause hang after consecutive open/close interface.

2017-06-11 Thread netanel
From: Netanel Belgazal 

Fixing a bug that the driver does not unmask the IO interrupts
in ndo_open():
occasionally, the MSI-X interrupt (for one or more IO queues)
can be masked when ndo_close() was called.
If that is followed by ndo open(),
then the MSI-X will be still masked so no interrupt
will be received by the driver.

Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 41 ++--
 1 file changed, 26 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 7c1214d78855..0e3c60c7eccf 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1078,6 +1078,26 @@ inline void ena_adjust_intr_moderation(struct ena_ring 
*rx_ring,
rx_ring->per_napi_bytes = 0;
 }
 
+static inline void ena_unmask_interrupt(struct ena_ring *tx_ring,
+   struct ena_ring *rx_ring)
+{
+   struct ena_eth_io_intr_reg intr_reg;
+
+   /* Update intr register: rx intr delay,
+* tx intr delay and interrupt unmask
+*/
+   ena_com_update_intr_reg(_reg,
+   rx_ring->smoothed_interval,
+   tx_ring->smoothed_interval,
+   true);
+
+   /* It is a shared MSI-X.
+* Tx and Rx CQ have pointer to it.
+* So we use one of them to reach the intr reg
+*/
+   ena_com_unmask_intr(rx_ring->ena_com_io_cq, _reg);
+}
+
 static inline void ena_update_ring_numa_node(struct ena_ring *tx_ring,
 struct ena_ring *rx_ring)
 {
@@ -1108,7 +1128,6 @@ static int ena_io_poll(struct napi_struct *napi, int 
budget)
 {
struct ena_napi *ena_napi = container_of(napi, struct ena_napi, napi);
struct ena_ring *tx_ring, *rx_ring;
-   struct ena_eth_io_intr_reg intr_reg;
 
u32 tx_work_done;
u32 rx_work_done;
@@ -1149,22 +1168,9 @@ static int ena_io_poll(struct napi_struct *napi, int 
budget)
if 
(ena_com_get_adaptive_moderation_enabled(rx_ring->ena_dev))
ena_adjust_intr_moderation(rx_ring, tx_ring);
 
-   /* Update intr register: rx intr delay,
-* tx intr delay and interrupt unmask
-*/
-   ena_com_update_intr_reg(_reg,
-   rx_ring->smoothed_interval,
-   tx_ring->smoothed_interval,
-   true);
-
-   /* It is a shared MSI-X.
-* Tx and Rx CQ have pointer to it.
-* So we use one of them to reach the intr reg
-*/
-   ena_com_unmask_intr(rx_ring->ena_com_io_cq, _reg);
+   ena_unmask_interrupt(tx_ring, rx_ring);
}
 
-
ena_update_ring_numa_node(tx_ring, rx_ring);
 
ret = rx_work_done;
@@ -1485,6 +1491,11 @@ static int ena_up_complete(struct ena_adapter *adapter)
 
ena_napi_enable_all(adapter);
 
+   /* Enable completion queues interrupt */
+   for (i = 0; i < adapter->num_queues; i++)
+   ena_unmask_interrupt(>tx_ring[i],
+>rx_ring[i]);
+
/* schedule napi in case we had pending packets
 * from the last time we disable napi
 */
-- 
2.7.4



[PATCH net 3/9] net: ena: add missing return when ena_com_get_io_handlers() fails

2017-06-11 Thread netanel
From: Netanel Belgazal 

Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 0e3c60c7eccf..1e71e89e1e18 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1543,6 +1543,7 @@ static int ena_create_io_tx_queue(struct ena_adapter 
*adapter, int qid)
  "Failed to get TX queue handlers. TX queue num %d rc: 
%d\n",
  qid, rc);
ena_com_destroy_io_queue(ena_dev, ena_qid);
+   return rc;
}
 
ena_com_update_numa_node(tx_ring->ena_com_io_cq, ctx.numa_node);
@@ -1607,6 +1608,7 @@ static int ena_create_io_rx_queue(struct ena_adapter 
*adapter, int qid)
  "Failed to get RX queue handlers. RX queue num %d rc: 
%d\n",
  qid, rc);
ena_com_destroy_io_queue(ena_dev, ena_qid);
+   return rc;
}
 
ena_com_update_numa_node(rx_ring->ena_com_io_cq, ctx.numa_node);
-- 
2.7.4



[PATCH net 0/9] Bugs fixes in ena ethernet driver

2017-06-11 Thread netanel
From: Netanel Belgazal 

This patchset contains fixes for the bugs that were discovered so far.

Netanel Belgazal (9):
  net: ena: fix rare uncompleted admin command false alarm
  net: ena: fix bug that might cause hang after consecutive open/close
interface.
  net: ena: add missing return when ena_com_get_io_handlers() fails
  net: ena: fix race condition between submit and completion admin
command
  net: ena: add missing unmap bars on device removal
  net: ena: fix theoretical Rx hang on low memory systems
  net: ena: disable admin msix while working in polling mode
  net: ena: bug fix in lost tx packets detection mechanism
  net: ena: update ena driver to version 1.1.7

 drivers/net/ethernet/amazon/ena/ena_com.c |  35 +++--
 drivers/net/ethernet/amazon/ena/ena_ethtool.c |   2 +-
 drivers/net/ethernet/amazon/ena/ena_netdev.c  | 179 +++---
 drivers/net/ethernet/amazon/ena/ena_netdev.h  |  18 ++-
 4 files changed, 169 insertions(+), 65 deletions(-)

-- 
2.7.4



[PATCH net 4/9] net: ena: fix race condition between submit and completion admin command

2017-06-11 Thread netanel
From: Netanel Belgazal 

Bug:
"Completion context is occupied" error printout will be noticed in
dmesg.
This error will cause the admin command to fail, which will lead to
an ena_probe() failure or a watchdog reset (depends on which admin
command failed).

Root cause:
__ena_com_submit_admin_cmd() is the function that submits new entries to
the admin queue.
The function have a check that makes sure the queue is not full and the
function does not override any outstanding command.
It uses head and tail indexes for this check.
The head is increased by ena_com_handle_admin_completion() which runs
from interrupt context, and the tail index is increased by the submit
function (the function is running under ->q_lock, so there is no risk
of multithread increment).
Each command is associated with a completion context. This context
allocated before call to __ena_com_submit_admin_cmd() and freed by
ena_com_wait_and_process_admin_cq_interrupts(), right after the command
was completed.

This can lead to a state where the head was increased, the check passed,
but the completion context is still in use.

Solution:
Use the atomic variable ->outstanding_cmds instead of using the head and
the tail indexes.
This variable is safe for use since it is bumped in get_comp_ctx() in
__ena_com_submit_admin_cmd() and is freed by comp_ctxt_release()

Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_com.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
b/drivers/net/ethernet/amazon/ena/ena_com.c
index e1c2fab6292f..ea60b9e67acb 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -232,11 +232,9 @@ static struct ena_comp_ctx 
*__ena_com_submit_admin_cmd(struct ena_com_admin_queu
tail_masked = admin_queue->sq.tail & queue_size_mask;
 
/* In case of queue FULL */
-   cnt = admin_queue->sq.tail - admin_queue->sq.head;
+   cnt = atomic_read(_queue->outstanding_cmds);
if (cnt >= admin_queue->q_depth) {
-   pr_debug("admin queue is FULL (tail %d head %d depth: %d)\n",
-admin_queue->sq.tail, admin_queue->sq.head,
-admin_queue->q_depth);
+   pr_debug("admin queue is full.\n");
admin_queue->stats.out_of_space++;
return ERR_PTR(-ENOSPC);
}
-- 
2.7.4



[PATCH net 1/9] net: ena: fix rare uncompleted admin command false alarm

2017-06-11 Thread netanel
From: Netanel Belgazal 

The current flow to detect admin completion is:
while (command_not_completed) {
if (timeout)
error

check_for_completion()
sleep()
   }
So in case the sleep took more than the timeout
(in case the thread/workqueue was not scheduled due to higher priority
task or prolonged VMexit), the driver can detect a stall even if
the completion is present.

The fix changes the order of this function to first check for
completion and only after that check if the timeout expired.

Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_com.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
b/drivers/net/ethernet/amazon/ena/ena_com.c
index 08d11cede9c9..e1c2fab6292f 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -508,15 +508,20 @@ static int ena_com_comp_status_to_errno(u8 comp_status)
 static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx 
*comp_ctx,
 struct ena_com_admin_queue 
*admin_queue)
 {
-   unsigned long flags;
-   u32 start_time;
+   unsigned long flags, timeout;
int ret;
 
-   start_time = ((u32)jiffies_to_usecs(jiffies));
+   timeout = jiffies + ADMIN_CMD_TIMEOUT_US;
+
+   while (1) {
+   spin_lock_irqsave(_queue->q_lock, flags);
+   ena_com_handle_admin_completion(admin_queue);
+   spin_unlock_irqrestore(_queue->q_lock, flags);
 
-   while (comp_ctx->status == ENA_CMD_SUBMITTED) {
-   if u32)jiffies_to_usecs(jiffies)) - start_time) >
-   ADMIN_CMD_TIMEOUT_US) {
+   if (comp_ctx->status != ENA_CMD_SUBMITTED)
+   break;
+
+   if (time_is_before_jiffies(timeout)) {
pr_err("Wait for completion (polling) timeout\n");
/* ENA didn't have any completion */
spin_lock_irqsave(_queue->q_lock, flags);
@@ -528,10 +533,6 @@ static int 
ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_c
goto err;
}
 
-   spin_lock_irqsave(_queue->q_lock, flags);
-   ena_com_handle_admin_completion(admin_queue);
-   spin_unlock_irqrestore(_queue->q_lock, flags);
-
msleep(100);
}
 
-- 
2.7.4



Re: [PATCH net-next 0/8] Bug fixes in ena ethernet driver

2017-06-11 Thread Belgazal, Netanel
Ack.
resubmitting to net branch and adding "Fixes" mark.

From: David Miller 
Sent: Saturday, June 10, 2017 11:11 PM
To: f.faine...@gmail.com
Cc: Belgazal, Netanel; netdev@vger.kernel.org; Woodhouse, David; Machulsky, 
Zorik; Matushevsky, Alexander; BSHARA, Said; Wilson, Matt; Liguori, Anthony; 
Bshara, Nafea; Schmeilin, Evgeny
Subject: Re: [PATCH net-next 0/8] Bug fixes in ena ethernet driver

From: Florian Fainelli 
Date: Fri, 9 Jun 2017 15:19:54 -0700

> On 06/09/2017 03:13 PM, neta...@amazon.com wrote:
>> From: Netanel Belgazal 
>>
>> This patchset contains fixes for the bugs that were discovered so far.
>
> If these are all fixes you should submit them against the "net" tree.
> net-next is for features [1].
>
> Since these are fixes, you may also want to provide a Fixes: 12-digit
> commit ("commit subject") [2] such that David can queue these patches
> for stable trees and this can be retrofitted into kernel distributions.
>
> [1]:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/netdev-FAQ.txt#n25
>
> [2]:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst#n183

Yeah I agree.  If they are genuine bug fixes they should be submitted
against 'net'.  And yes, Fixes: tags are quite desirable as well.



Re: [PATCH net-next v10 0/4] net sched actions: improve dump performance

2017-06-11 Thread Jamal Hadi Salim


Attached iproute2 used for testing these features..

cheers,
jamal
diff --git a/Makefile b/Makefile
index 18de7dc..ed3c10a 100644
--- a/Makefile
+++ b/Makefile
@@ -45,7 +45,7 @@ HOSTCC ?= $(CC)
 DEFINES += -D_GNU_SOURCE
 # Turn on transparent support for LFS
 DEFINES += -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE
-CCOPTS = -O2
+CCOPTS = -g
 WFLAGS := -Wall -Wstrict-prototypes  -Wmissing-prototypes
 WFLAGS += -Wmissing-declarations -Wold-style-definition -Wformat=2
 
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index a96db83..988ebc2 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -1,5 +1,5 @@
-#ifndef __LINUX_RTNETLINK_H
-#define __LINUX_RTNETLINK_H
+#ifndef _UAPI__LINUX_RTNETLINK_H
+#define _UAPI__LINUX_RTNETLINK_H
 
 #include 
 #include 
@@ -179,6 +179,23 @@ struct rtattr {
 #define RTA_DATA(rta)   ((void*)(((char*)(rta)) + RTA_LENGTH(0)))
 #define RTA_PAYLOAD(rta) ((int)((rta)->rta_len) - RTA_LENGTH(0))
 
+/* Generic bitflags attribute content sent to the kernel.
+ *
+ * The nla_flag_values is a bitmap that defines the values being set
+ * The nla_flag_selector is a bitmask that defines which value is legit
+ *
+ * Examples:
+ *  nla_flag_values = 0x0, and nla_flag_selector = 0x1
+ *  implies we are selecting bit 1 and we want to set its value to 0.
+ *
+ *  nla_flag_values = 0x2, and nla_flag_selector = 0x2
+ *  implies we are selecting bit 2 and we want to set its value to 1.
+ *
+ */
+struct __nla_bit_flags {
+   __u32 nla_flag_values;
+   __u32 nla_flag_selector;
+};
 
 
 
@@ -278,6 +295,7 @@ enum rt_scope_t {
 #define RTM_F_EQUALIZE 0x400   /* Multipath equalizer: NI  */
 #define RTM_F_PREFIX   0x800   /* Prefix addresses */
 #define RTM_F_LOOKUP_TABLE 0x1000  /* set rtm_table to FIB lookup result */
+#define RTM_F_FIB_MATCH0x2000  /* return full fib lookup match 
*/
 
 /* Reserved table identifiers */
 
@@ -549,6 +567,7 @@ enum {
TCA_STAB,
TCA_PAD,
TCA_DUMP_INVISIBLE,
+   TCA_CHAIN,
__TCA_MAX
 };
 
@@ -581,6 +600,7 @@ enum {
 
 #define NDUSEROPT_MAX  (__NDUSEROPT_MAX - 1)
 
+#ifndef __KERNEL__
 /* RTnetlink multicast groups - backwards compatibility for userspace */
 #define RTMGRP_LINK1
 #define RTMGRP_NOTIFY  2
@@ -601,6 +621,7 @@ enum {
 #define RTMGRP_DECnet_ROUTE 0x4000
 
 #define RTMGRP_IPV6_PREFIX 0x2
+#endif
 
 /* RTnetlink multicast groups */
 enum rtnetlink_groups {
@@ -672,10 +693,29 @@ struct tcamsg {
unsigned char   tca__pad1;
unsigned short  tca__pad2;
 };
+
+enum {
+   TCA_ROOT_UNSPEC,
+   TCA_ROOT_TAB,
+#define TCA_ACT_TAB TCA_ROOT_TAB
+#define TCAA_MAX TCA_ROOT_TAB
+   TCA_ROOT_FLAGS,
+   TCA_ROOT_COUNT,
+   TCA_ROOT_TIME_DELTA, /* in msecs */
+   __TCA_ROOT_MAX,
+#defineTCA_ROOT_MAX (__TCA_ROOT_MAX - 1)
+};
+
 #define TA_RTA(r)  ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct 
tcamsg
 #define TA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcamsg))
-#define TCA_ACT_TAB 1 /* attr type must be >=1 */  
-#define TCAA_MAX 1
+/* tcamsg flags stored in attribute TCA_ROOT_FLAGS
+ *
+ * TCA_FLAG_LARGE_DUMP_ON user->kernel to request for larger than 
TCA_ACT_MAX_PRIO
+ * actions in a dump. All dump responses will contain the number of actions
+ * being dumped stored in for user app's consumption in TCA_ROOT_COUNT
+ *
+ */
+#define TCA_FLAG_LARGE_DUMP_ON (1 << 0)
 
 /* New extended info filters for IFLA_EXT_MASK */
 #define RTEXT_FILTER_VF(1 << 0)
@@ -687,4 +727,4 @@ struct tcamsg {
 
 
 
-#endif /* __LINUX_RTNETLINK_H */
+#endif /* _UAPI__LINUX_RTNETLINK_H */
diff --git a/tc/f_basic.c b/tc/f_basic.c
index d663668..8370ea6 100644
--- a/tc/f_basic.c
+++ b/tc/f_basic.c
@@ -135,7 +135,7 @@ static int basic_print_opt(struct filter_util *qu, FILE *f,
}
 
if (tb[TCA_BASIC_ACT]) {
-   tc_print_action(f, tb[TCA_BASIC_ACT]);
+   tc_print_action(f, tb[TCA_BASIC_ACT], 0);
}
 
return 0;
diff --git a/tc/f_bpf.c b/tc/f_bpf.c
index 75c44c0..6581293 100644
--- a/tc/f_bpf.c
+++ b/tc/f_bpf.c
@@ -236,7 +236,7 @@ static int bpf_print_opt(struct filter_util *qu, FILE *f,
}
 
if (tb[TCA_BPF_ACT])
-   tc_print_action(f, tb[TCA_BPF_ACT]);
+   tc_print_action(f, tb[TCA_BPF_ACT], 0);
 
return 0;
 }
diff --git a/tc/f_cgroup.c b/tc/f_cgroup.c
index ecf9909..633700e 100644
--- a/tc/f_cgroup.c
+++ b/tc/f_cgroup.c
@@ -102,7 +102,7 @@ static int cgroup_print_opt(struct filter_util *qu, FILE *f,
}
 
if (tb[TCA_CGROUP_ACT])
-   tc_print_action(f, tb[TCA_CGROUP_ACT]);
+   tc_print_action(f, tb[TCA_CGROUP_ACT], 0);
 
return 0;
 }
diff --git a/tc/f_flow.c b/tc/f_flow.c
index 09ddcaa..b157104 100644
--- a/tc/f_flow.c
+++ b/tc/f_flow.c
@@ -347,7 +347,7 @@ static int flow_print_opt(struct 

Re: [PATCH net-next 1/1] net: reflect mark on tcp syn ack packets

2017-06-11 Thread Jamal Hadi Salim

On 17-06-10 07:02 PM, David Miller wrote:


@@ -173,7 +173,8 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const struct 
sock *sk,
}
  
  	skb->priority = sk->sk_priority;

-   skb->mark = sk->sk_mark;
+   if (!skb->mark)
+   skb->mark = sk->sk_mark;


Maybe this should both be "inet_request_mark()"?



Challenge is making of a synack requires a new allocated skb;
and sk is a listening socket - which should/has a mark of
0 meaning at ip_build_and_send_pkt() it overrides
the value already set on the skb->mark.

cheers,
jamal


Also, Lorenzo, please review.





[PATCH net-next v10 2/4] net sched actions: Use proper root attribute table for actions

2017-06-11 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Bug fix for an issue which has been around for about a decade.
We got away with it because the enumeration was larger than needed.

Fixes: 7ba699c604ab ("[NET_SCHED]: Convert actions from rtnetlink to new 
netlink API")
Suggested-by: Jiri Pirko 
Reviewed-by: Simon Horman 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/act_api.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index aed6cf2..400eb6e 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -1072,7 +1072,7 @@ static int tc_ctl_action(struct sk_buff *skb, struct 
nlmsghdr *n,
 struct netlink_ext_ack *extack)
 {
struct net *net = sock_net(skb->sk);
-   struct nlattr *tca[TCA_ACT_MAX + 1];
+   struct nlattr *tca[TCAA_MAX + 1];
u32 portid = skb ? NETLINK_CB(skb).portid : 0;
int ret = 0, ovr = 0;
 
@@ -1080,7 +1080,7 @@ static int tc_ctl_action(struct sk_buff *skb, struct 
nlmsghdr *n,
!netlink_capable(skb, CAP_NET_ADMIN))
return -EPERM;
 
-   ret = nlmsg_parse(n, sizeof(struct tcamsg), tca, TCA_ACT_MAX, NULL,
+   ret = nlmsg_parse(n, sizeof(struct tcamsg), tca, TCAA_MAX, NULL,
  extack);
if (ret < 0)
return ret;
-- 
1.9.1



[PATCH net-next v10 4/4] net sched actions: add time filter for action dumping

2017-06-11 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

This adds support for filtering based on time since last used.
When we are dumping a large number of actions it is useful to
have the option of filtering based on when the action was last
used to reduce the amount of data crossing to user space.

With this patch the user space app sets the TCA_ROOT_TIME_DELTA
attribute with the value in milliseconds with "time of interest
since now".  The kernel converts this to jiffies and does the
filtering comparison matching entries that have seen activity
since then and returns them to user space.
Old kernels and old tc continue to work in legacy mode since
they dont specify this attribute.

Some example (we have 400 actions bound to 400 filters); at
installation time using  hacked tc which sets the time of
interest to 120 seconds:
prompt$ hackedtc actions ls action gact | grep index | wc -l
400

go get some coffee and wait for > 120 seconds and try again:

prompt$ hackedtc actions ls action gact | grep index | wc -l
0

Lets see a filter bound to one of these actions:
..
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule 
hit 2 success 1)
  match 7f02/ at 12 (success 1 )
action order 1: gact action pass
 random type none pass val 0
 index 23 ref 2 bind 1 installed 1145 sec used 802 sec
Action statistics:
Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0


that coffee took long, no? It was good.

Now lets ping -c 1 127.0.0.2, then run the actions again:
prompt$ hackedtc actions ls action gact | grep index | wc -l
1

More details please:

prompt$ hackedtc -s actions ls action gact

action order 0: gact action pass
 random type none pass val 0
 index 23 ref 2 bind 1 installed 1270 sec used 30 sec
Action statistics:
Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

And the filter?

filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule 
hit 4 success 2)
  match 7f02/ at 12 (success 2 )
action order 1: gact action pass
 random type none pass val 0
 index 23 ref 2 bind 1 installed 1324 sec used 84 sec
Action statistics:
Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

Signed-off-by: Jamal Hadi Salim 
---
 include/uapi/linux/rtnetlink.h |  1 +
 net/sched/act_api.c| 20 +++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 7d2030f..988ebc2 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -701,6 +701,7 @@ enum {
 #define TCAA_MAX TCA_ROOT_TAB
TCA_ROOT_FLAGS,
TCA_ROOT_COUNT,
+   TCA_ROOT_TIME_DELTA, /* in msecs */
__TCA_ROOT_MAX,
 #defineTCA_ROOT_MAX (__TCA_ROOT_MAX - 1)
 };
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 935dc19..aa9549f 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -111,6 +111,7 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, 
struct sk_buff *skb,
 {
int err = 0, index = -1, i = 0, s_i = 0, n_i = 0;
u32 act_flags = cb->args[2];
+   unsigned long jiffy_since = cb->args[3];
struct nlattr *nest;
 
spin_lock_bh(>lock);
@@ -128,6 +129,11 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, 
struct sk_buff *skb,
if (index < s_i)
continue;
 
+   if (jiffy_since &&
+   time_after(jiffy_since,
+  (unsigned long)p->tcfa_tm.lastuse))
+   continue;
+
nest = nla_nest_start(skb, n_i);
if (nest == NULL)
goto nla_put_failure;
@@ -145,9 +151,11 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, 
struct sk_buff *skb,
}
}
 done:
+   if (index >= 0)
+   cb->args[0] = index + 1;
+
spin_unlock_bh(>lock);
if (n_i) {
-   cb->args[0] += n_i;
if (act_flags & TCA_FLAG_LARGE_DUMP_ON)
cb->args[1] = n_i;
}
@@ -1077,6 +1085,7 @@ static int tcf_action_add(struct net *net, struct nlattr 
*nla,
 static const struct nla_policy tcaa_policy[TCA_ROOT_MAX + 1] = {
[TCA_ROOT_FLAGS] = { .type = NLA_FLAG_BITS,
 .validation_data = _flags },
+   [TCA_ROOT_TIME_DELTA]  = { .type = NLA_U32 },
 };
 
 static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n,
@@ -1167,7 +1176,9 @@ static int tc_dump_action(struct sk_buff *skb, struct 
netlink_callback *cb)
struct __nla_bit_flags select_flags;

[PATCH net-next v10 3/4] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch

2017-06-11 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

When you dump hundreds of thousands of actions, getting only 32 per
dump batch even when the socket buffer and memory allocations allow
is inefficient.

With this change, the user will get as many as possibly fitting
within the given constraints available to the kernel.

The top level action TLV space is extended. An attribute
TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON
is set by the user indicating the user is capable of processing
these large dumps. Older user space which doesnt set this flag
doesnt get the large (than 32) batches.
The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many
actions are put in a single batch. As such user space app knows how long
to iterate (independent of the type of action being dumped)
instead of hardcoded maximum of 32 thus maintaining backward compat.

Some results dumping 1.5M actions below:
first an unpatched tc which doesnt understand these features...

prompt$ time -p tc actions ls action gact | grep index | wc -l
150
real 1388.43
user 2.07
sys 1386.79

Now lets see a patched tc which sets the correct flags when requesting
a dump:

prompt$ time -p updatedtc actions ls action gact | grep index | wc -l
150
real 178.13
user 2.02
sys 176.96

That is about 8x performance improvement for tc app which sets its
receive buffer to about 32K.

Signed-off-by: Jamal Hadi Salim 
---
 include/uapi/linux/rtnetlink.h | 22 +--
 net/sched/act_api.c| 50 +-
 2 files changed, 60 insertions(+), 12 deletions(-)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 8f07957..7d2030f 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -693,10 +693,28 @@ struct tcamsg {
unsigned char   tca__pad1;
unsigned short  tca__pad2;
 };
+
+enum {
+   TCA_ROOT_UNSPEC,
+   TCA_ROOT_TAB,
+#define TCA_ACT_TAB TCA_ROOT_TAB
+#define TCAA_MAX TCA_ROOT_TAB
+   TCA_ROOT_FLAGS,
+   TCA_ROOT_COUNT,
+   __TCA_ROOT_MAX,
+#defineTCA_ROOT_MAX (__TCA_ROOT_MAX - 1)
+};
+
 #define TA_RTA(r)  ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct 
tcamsg
 #define TA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcamsg))
-#define TCA_ACT_TAB 1 /* attr type must be >=1 */  
-#define TCAA_MAX 1
+/* tcamsg flags stored in attribute TCA_ROOT_FLAGS
+ *
+ * TCA_FLAG_LARGE_DUMP_ON user->kernel to request for larger than 
TCA_ACT_MAX_PRIO
+ * actions in a dump. All dump responses will contain the number of actions
+ * being dumped stored in for user app's consumption in TCA_ROOT_COUNT
+ *
+ */
+#define TCA_FLAG_LARGE_DUMP_ON (1 << 0)
 
 /* New extended info filters for IFLA_EXT_MASK */
 #define RTEXT_FILTER_VF(1 << 0)
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 400eb6e..935dc19 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -110,6 +110,7 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, 
struct sk_buff *skb,
   struct netlink_callback *cb)
 {
int err = 0, index = -1, i = 0, s_i = 0, n_i = 0;
+   u32 act_flags = cb->args[2];
struct nlattr *nest;
 
spin_lock_bh(>lock);
@@ -138,14 +139,18 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, 
struct sk_buff *skb,
}
nla_nest_end(skb, nest);
n_i++;
-   if (n_i >= TCA_ACT_MAX_PRIO)
+   if (!(act_flags & TCA_FLAG_LARGE_DUMP_ON) &&
+   n_i >= TCA_ACT_MAX_PRIO)
goto done;
}
}
 done:
spin_unlock_bh(>lock);
-   if (n_i)
+   if (n_i) {
cb->args[0] += n_i;
+   if (act_flags & TCA_FLAG_LARGE_DUMP_ON)
+   cb->args[1] = n_i;
+   }
return n_i;
 
 nla_put_failure:
@@ -1068,11 +1073,17 @@ static int tcf_action_add(struct net *net, struct 
nlattr *nla,
return tcf_add_notify(net, n, , portid);
 }
 
+static u32 allowed_flags = TCA_FLAG_LARGE_DUMP_ON;
+static const struct nla_policy tcaa_policy[TCA_ROOT_MAX + 1] = {
+   [TCA_ROOT_FLAGS] = { .type = NLA_FLAG_BITS,
+.validation_data = _flags },
+};
+
 static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n,
 struct netlink_ext_ack *extack)
 {
struct net *net = sock_net(skb->sk);
-   struct nlattr *tca[TCAA_MAX + 1];
+   struct nlattr *tca[TCA_ROOT_MAX + 1];
u32 portid = skb ? NETLINK_CB(skb).portid : 0;
int ret = 0, ovr = 0;
 
@@ -1080,7 +1091,7 @@ static int tc_ctl_action(struct sk_buff *skb, struct 
nlmsghdr *n,
!netlink_capable(skb, CAP_NET_ADMIN))
return -EPERM;
 
-   ret = nlmsg_parse(n, sizeof(struct tcamsg), tca, TCAA_MAX, NULL,
+   ret = 

[PATCH net-next v10 1/4] net netlink: Add new type NLA_FLAG_BITS

2017-06-11 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Generic bitflags attribute content sent to the kernel by user.
With this type the user can either set or unset a flag in the
kernel.

The nla_flag_values is a bitmap that defines the values being set
The nla_flag_selector is a bitmask that defines which value is legit.

A check is made to ensure the rules that a kernel subsystem always
conforms to bitflags the kernel already knows about. i.e
if the user tries to set a bit flag that is not understood then
the _it will be rejected_.

In the most basic form, the user specifies the attribute policy as:
[ATTR_GOO] = { .type = NLA_FLAG_BITS, .validation_data =  },

where myvalidflags is the bit mask of the flags the kernel understands.

If the user _does not_ provide myvalidflags then the attribute will
also be rejected.

Examples:
nla_flag_values = 0x0, and nla_flag_selector = 0x1
implies we are selecting bit 1 and we want to set its value to 0.

nla_flag_values = 0x2, and nla_flag_selector = 0x2
implies we are selecting bit 2 and we want to set its value to 1.

This patch also provides an extra feature: a validation callback
that could be speaciliazed for other types.
This feature is intended to be used by a kernel subsystem to check
for a combination of bits being present. Example "bit x is valid
only if bit y and z are present".

So a kernel subsystem could specify validation rules of the following
nature:

[ATTR_GOO] = { .type = MYTYPE,
   .validation_data = _data,
   .validate_content = mycontent_validator },

With validator callback looking like:

int mycontent_validator(const struct nlattr *nla, void *valid_data)
{
   const struct myattribute *user_data = nla_data(nla);
   struct myvalidation_struct *valid_data_constraint = valid_data;

  ... return appropriate error code etc ...
}


Signed-off-by: Jamal Hadi Salim 
---
 include/net/netlink.h  | 11 +++
 include/uapi/linux/rtnetlink.h | 17 +
 lib/nlattr.c   | 25 +
 3 files changed, 53 insertions(+)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index 0170917..8ab9784 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -6,6 +6,11 @@
 #include 
 #include 
 
+struct nla_bit_flags {
+   u32 nla_flag_values;
+   u32 nla_flag_selector;
+};
+
 /* 
  * Netlink Messages and Attributes Interface (As Seen On TV)
  * 
@@ -178,6 +183,7 @@ enum {
NLA_S16,
NLA_S32,
NLA_S64,
+   NLA_FLAG_BITS,
__NLA_TYPE_MAX,
 };
 
@@ -206,6 +212,7 @@ enum {
  *NLA_MSECSLeaving the length field zero will verify the
  * given type fits, using it verifies minimum length
  * just like "All other"
+ *NLA_FLAG_BITSA bitmap/bitselector attribute
  *All otherMinimum length of attribute payload
  *
  * Example:
@@ -213,11 +220,15 @@ enum {
  * [ATTR_FOO] = { .type = NLA_U16 },
  * [ATTR_BAR] = { .type = NLA_STRING, .len = BARSIZ },
  * [ATTR_BAZ] = { .len = sizeof(struct mystruct) },
+ * [ATTR_GOO] = { .type = NLA_FLAG_BITS, .validation_data =  
},
  * };
  */
 struct nla_policy {
u16 type;
u16 len;
+   void*validation_data;
+   int (*validate_content)(const struct nlattr *nla,
+   const void *validation_data);
 };
 
 /**
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 564790e..8f07957 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -179,6 +179,23 @@ struct rtattr {
 #define RTA_DATA(rta)   ((void*)(((char*)(rta)) + RTA_LENGTH(0)))
 #define RTA_PAYLOAD(rta) ((int)((rta)->rta_len) - RTA_LENGTH(0))
 
+/* Generic bitflags attribute content sent to the kernel.
+ *
+ * The nla_flag_values is a bitmap that defines the values being set
+ * The nla_flag_selector is a bitmask that defines which value is legit
+ *
+ * Examples:
+ *  nla_flag_values = 0x0, and nla_flag_selector = 0x1
+ *  implies we are selecting bit 1 and we want to set its value to 0.
+ *
+ *  nla_flag_values = 0x2, and nla_flag_selector = 0x2
+ *  implies we are selecting bit 2 and we want to set its value to 1.
+ *
+ */
+struct __nla_bit_flags {
+   __u32 nla_flag_values;
+   __u32 nla_flag_selector;
+};
 
 
 
diff --git a/lib/nlattr.c b/lib/nlattr.c
index a7e0b16..78fed43 100644
--- a/lib/nlattr.c
+++ b/lib/nlattr.c
@@ -27,6 +27,21 @@
[NLA_S64]   = sizeof(s64),
 };
 
+static int validate_nla_bit_flags(const struct nlattr *nla, void *valid_data)
+{
+   const struct nla_bit_flags *nbf = nla_data(nla);
+   u32 *valid_flags_mask = valid_data;
+
+   if (!valid_data)
+   

[PATCH net-next v10 0/4] net sched actions: improve dump performance

2017-06-11 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Changes since v9:
-

1) General consensus:
- remove again the use of BIT() to maintain uapi consistency ;->

1) Jiri:
- Add a new netlink type NLA_FLAG_BITS to check for valid bits 
  and use it instead of inline vetting.
  
Changes since v8:
-

1) Jiri:
- Add back the use of BIT(). Eventually fix iproute2 instead
- Rename VALID_TCA_FLAGS to VALID_TCA_ROOT_FLAGS

Changes since v7:
-

Jamal:
No changes.
Patch 1 went out twice. Resend without two copies of patch 1

changes since v6:
-

1) DaveM:
New rules for netlink messages. From now on we are going to start
checking for bits that are not used and rejecting anything we dont
understand. In the future this is going to require major changes
to user space code (tc etc). This is just a start.

To quote, David:
"
 Again, bits you aren't using now, make sure userspace doesn't
   set them.  And if it does, reject.
"
Added checks for ensuring things work as above.

2) Jiri:
a)Fix the commit message to properly use "Fixes" description
b)Align assignments for nla_policy

Changes since v5:


0)
Remove use of BIT() because it is kernel specific. Requires a separate
patch (Jiri can submit that in his cleanups)

1)To paraphrase Eric D.

"memcpy(nla_data(count_attr), >args[1], sizeof(u32));
wont work on 64bit BE machines because cb->args[1]
(which is 64 bit is larger in size than sizeof(u32))"

Fixed

2) Jiri Pirko

i) Spotted a bug fix mixed in the patch for wrong TLV
fix. Add patch 1/3 to address this. Make part of this
series because of dependencies.

ii) Rename ACT_LARGE_DUMP_ON -> TCA_FLAG_LARGE_DUMP_ON

iii) Satisfy Jiri's obsession against the noun "tcaa"
a)Rename struct nlattr *tcaa --> struct nlattr *tb
b)Rename TCAA_ACT_XXX -> TCA_ROOT_XXX

Changes since v4:
-

1) Eric D.

pointed out that when all skb space is used up by the dump
there will be no space to insert the TCAA_ACT_COUNT attribute.

2) Jiri:

i) Change:

enum {
TCAA_UNSPEC,
TCAA_ACT_TAB,
TCAA_ACT_FLAGS,
TCAA_ACT_COUNT,
TCAA_ACT_TIME_FILTER,
__TCAA_MAX
};

#define TCAA_MAX (__TCAA_MAX - 1)
#define ACT_LARGE_DUMP_ON   (1 << 0)

to:
enum {
   TCAA_UNSPEC,
   TCAA_ACT_TAB,
#define TCA_ACT_TAB TCAA_ACT_TAB
   TCAA_ACT_FLAGS,
   TCAA_ACT_COUNT,
   __TCAA_MAX,
#defineTCAA_MAX (__TCAA_MAX - 1)
};

#define ACT_LARGE_DUMP_ON  BIT(0)

Jiri plans to followup with the rest of the code to make the
style consistent.

ii) Rename attribute TCAA_ACT_TIME_FILTER --> TCAA_ACT_TIME_DELTA

iii) Rename variable jiffy_filter --> jiffy_since
iv) Rename msecs_filter --> msecs_since
v) get rid of unused cb->args[0] and rename cb->args[4] to cb->args[0]

Earlier Changes

- Jiri mostly on names of things.

Jamal Hadi Salim (4):
  net netlink: Add new type NLA_FLAG_BITS
  net sched actions: Use proper root attribute table for actions
  net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
  net sched actions: add time filter for action dumping

 include/net/netlink.h  | 11 +++
 include/uapi/linux/rtnetlink.h | 40 ++--
 lib/nlattr.c   | 25 +++
 net/sched/act_api.c| 70 +++---
 4 files changed, 133 insertions(+), 13 deletions(-)

-- 
1.9.1



Re: [PATCHv3 net] xfrm: move xfrm_garbage_collect out of xfrm_policy_flush

2017-06-11 Thread Xin Long
On Sun, Jun 11, 2017 at 9:44 AM, Hangbin Liu  wrote:
> Now we will force to do garbage collection if any policy removed in
> xfrm_policy_flush(). But during xfrm_net_exit(). We call flow_cache_fini()
> first and set set fc->percpu to NULL. Then after we call xfrm_policy_fini()
> -> frxm_policy_flush() -> flow_cache_flush(), we will get NULL pointer
> dereference when check percpu_empty. The code path looks like:
>
> flow_cache_fini()
>   - fc->percpu = NULL
> xfrm_policy_fini()
>   - xfrm_policy_flush()
> - xfrm_garbage_collect()
>   - flow_cache_flush()
> - flow_cache_percpu_empty()
>   - fcp = per_cpu_ptr(fc->percpu, cpu)
>
> To reproduce, just add ipsec in netns and then remove the netns.
>
> v2:
> As Xin Long suggested, since only two other places need to call it. move
> xfrm_garbage_collect() outside xfrm_policy_flush().
>
> v3:
> Fix subject mismatch after v2 fix.
>
> Fixes: 35db06912189 ("xfrm: do the garbage collection after flushing policy")
> Signed-off-by: Hangbin Liu 
> ---
>  net/key/af_key.c   | 2 ++
>  net/xfrm/xfrm_policy.c | 4 
>  net/xfrm/xfrm_user.c   | 1 +
>  3 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/net/key/af_key.c b/net/key/af_key.c
> index 512dc43..5103f92 100644
> --- a/net/key/af_key.c
> +++ b/net/key/af_key.c
> @@ -2755,6 +2755,8 @@ static int pfkey_spdflush(struct sock *sk, struct 
> sk_buff *skb, const struct sad
> int err, err2;
>
> err = xfrm_policy_flush(net, XFRM_POLICY_TYPE_MAIN, true);
> +   if (!err)
> +   xfrm_garbage_collect(net);
> err2 = unicast_flush_resp(sk, hdr);
> if (err || err2) {
> if (err == -ESRCH) /* empty table - old silent behavior */
> diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> index ed4e52d..643a18f 100644
> --- a/net/xfrm/xfrm_policy.c
> +++ b/net/xfrm/xfrm_policy.c
> @@ -1006,10 +1006,6 @@ int xfrm_policy_flush(struct net *net, u8 type, bool 
> task_valid)
> err = -ESRCH;
>  out:
> spin_unlock_bh(>xfrm.xfrm_policy_lock);
> -
> -   if (cnt)
> -   xfrm_garbage_collect(net);
> -
> return err;
>  }
>  EXPORT_SYMBOL(xfrm_policy_flush);
> diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
> index 38614df..86116e9 100644
> --- a/net/xfrm/xfrm_user.c
> +++ b/net/xfrm/xfrm_user.c
> @@ -2027,6 +2027,7 @@ static int xfrm_flush_policy(struct sk_buff *skb, 
> struct nlmsghdr *nlh,
> return 0;
> return err;
> }
> +   xfrm_garbage_collect(net);
>
> c.data.type = type;
> c.event = nlh->nlmsg_type;
> --
> 2.5.5
>
Reviewed-by: Xin Long 

though I could still see some typo err in changelog.


ipmr: MFC routes when VIF deleted

2017-06-11 Thread Yotam Gigi
I have been looking into some weird behavior, and I am not sure whether it is
a bug or a feature.

When a VIF with index v gets deleted, the MFC routes does not get updated, which
means that there can be routes pointing to that VIF. On datapath, when packet
hits that route, the VIF validity will be checked and will not be sent to that
device (but still, the route does not get updated).  Now, if the user creates
another VIF with the same index v but different underlay device, the same route
will forward the traffic to that device.

It is relevant to mention that when user adds a MFC route, only the active VIFs
are used, so the flow of adding a route with dummy VIF indices and then
connecting those VIF indices to real device is not supported. The only way to
create a MFC route that has non existing VIFs is to create one with existing
VIFs and then delete them.

Do we really want to support that?  To me, it looks like a buggy flow and I
suggest that upon VIF deletion, the MFC routes will be updated to not point to
any non existing VIF indices.



Re: [PATCH 0/6] Constant Time Memory Comparisons Are Important

2017-06-11 Thread Kalle Valo
"Jason A. Donenfeld"  writes:

> Whenever you're comparing two MACs, it's important to do this using
> crypto_memneq instead of memcmp. With memcmp, you leak timing information,
> which could then be used to iteratively forge a MAC.

Do you have any pointers where I could learn more about this?

-- 
Kalle Valo


Re: [PATCH v2] brcmfmac: Fix glom_skb leak in brcmf_sdiod_recv_chain

2017-06-11 Thread Arend van Spriel
On 11-06-17 02:18, Peter Housel wrote:
> 
>> On Jun 10, 2017, at 12:27 PM, Arend van Spriel 
>>  wrote:
>>
>> On 03-06-17 17:36, Andy Shevchenko wrote:
>>> On Sat, Jun 3, 2017 at 1:29 AM, Peter S. Housel  wrote:
 An earlier change to this function (3bdae810721b) fixed a leak in the
 case of an unsuccessful call to brcmf_sdiod_buffrw(). However, the
 glom_skb buffer, used for emulating a scattering read, is never used
 or referenced after its contents are copied into the destination
 buffers, and therefore always needs to be freed by the end of the
 function.
>>
>> [snip]
>>
 +   skb_queue_walk(pktq, skb) {
 +   memcpy(skb->data, glom_skb->data, 
 skb->len);
 +   skb_pull(glom_skb, skb->len);
 +   }
}
>>>
 +   brcmu_pkt_buf_free_skb(glom_skb);
>>>
>>> Can we just add this one line instead or I'm missing something?
>>
>> I guess. We don't want to walk the packet queue if glom_skb is not
>> carrying data due to brcmf_sdiod_buffrw() failure.
>>
>> So I would go with the patch below as brcmu_pkt_buf_free_skb() simply
>> ignores null pointer.
> 
> I’m fine with this, or indeed most of the other proposed solutions. The 
> important thing is that the leak is fixed; in the driver's current state I 
> was able to run our wearable device out of memory in just over 20 seconds 
> running iperf.

Sure. The reason behind the suggestion from Franky was to get rid of the
label inside branch and I agree with that. To address Andy's comment I
think my proposal should tackle that.

Just out of curiosity, we added the broken-sg-support thing for OMAP
platform. So what platform/mmc-host are you using. I try to keep an
overview where this workaround is needed.

Regards,
Arend


Re: ARM GLX Khadas VIM Pro - Ethernet detected as only 10Mbps and stalled after some traffic

2017-06-11 Thread crow
Hi Andrew,

On Sat, Jun 10, 2017 at 5:27 PM, Andrew Lunn  wrote:
>> Also what Martin Blumenstingl wrote is following which is also crucial
>> for fixing the issue:
>> Amlogic has given their ethernet PHY driver some updates [2], it now
>> includes wake-on-lan, and they now have an internal_phy_read_status
>> which uses reset_internal_phy if there's a link and some error counter
>> exceeds some magic value.
>
> Hi Crow
>
> You could probably just drop the Amlogic driver into mainline and see
> if it works better. If that solves your problem, we can look at
> merging the changes.
>
> Andrew

Thank your for the suggestion, and thanks Martin to explaining me over
IRC what actually I should do.

I recompiled mainline kernel 4.12-rc4 with the Amlogic driver:
replaced drivers/net/phy/meson-gxl.c with
https://github.com/khadas/linux/blob/ubuntu-4.9/drivers/amlogic/ethernet/phy/amlogic.c

But this did not solve the issue. As soon as i start git clone i lose
network connection to device (no session timeout/disconnect this time,
but I am unable to reconnect over SSH or to get OK ping replay back).

Here are the tests:
Linux khadasvimpro 4.12.0-rc4-4-ARCH #1 SMP Sun Jun 11 03:39:21 CEST
2017 aarch64 GNU/Linux

# modinfo meson_gxl
filename:
/lib/modules/4.12.0-rc4-4-ARCH/kernel/drivers/net/phy/meson-gxl.ko.gz
license:GPL
author: Neil Armstrong 
author: Baoqi wang
description:Amlogic Meson GXL Internal PHY driver
alias:  mdio:0001100101000100
depends:
intree: Y
vermagic:   4.12.0-rc4-4-ARCH SMP mod_unload aarch64
#
# mii-tool -vvv eth0
Using SIOCGMIIPHY=0x8947
eth0: negotiated 1000baseT-HD flow-control, link ok
  registers for MII PHY 8:
1000 782d 0181 4400 01e1 c1e1 000f 2001
       
0040 0002 40e8 5400 1c1c   
fff0   040a 1407 004a  105a
  product info: vendor 00:60:51, model 0 rev 0
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
10baseT-FD 10baseT-HD
  advertising:  1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
10baseT-FD 10baseT-HD
  link partner: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
10baseT-FD 10baseT-HD
$

over SSH startet following but it stall already at 0%:

$ git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Cloning into 'linux-stable'...
remote: Counting objects: 5948690, done.
remote: Compressing objects: 100% (124799/124799), done.
Receiving objects:   0% (11668/5948690), 2.27 MiB | 4.52 MiB/s

shows timeout while trying to sync with NTP server:

# journalctl -f
systemd-timesyncd[299]: Timed out waiting for reply from
83.68.137.76:123 (2.at.pool.ntp.org).
systemd-timesyncd[299]: Timed out waiting for reply from
86.59.113.114:123 (2.at.pool.ntp.org).

while still not working dump the register:
# mii-tool -vvv eth0
Using SIOCGMIIPHY=0x8947
eth0: negotiated 1000baseT-HD flow-control, link ok
  registers for MII PHY 8:
1000 782d 0181 4400 01e1 c1e1 000d 2001
       
0040 0002 40e8 5400 1c1c   
fff0   040a 1407   105a
  product info: vendor 00:60:51, model 0 rev 0
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
10baseT-FD 10baseT-HD
  advertising:  1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
10baseT-FD 10baseT-HD
  link partner: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
10baseT-FD 10baseT-HD
#

# ifconfig eth0 down && ifconfig eth0 up
# dmesg -T | tail -n 10
[Sun Jun 11 07:40:34 2017] meson8b-dwmac c941.ethernet eth0:
device MAC address 00:15:18:01:81:31
[Sun Jun 11 07:40:34 2017] random: crng init done
[Sun Jun 11 07:40:34 2017] Meson GXL Internal PHY 0.e40908ff:08:
attached PHY driver [Meson GXL Internal PHY]
(mii_bus:phy_addr=0.e40908ff:08, irq=-1)
[Sun Jun 11 07:40:34 2017] meson8b-dwmac c941.ethernet eth0: PTP
not supported by HW
[Sun Jun 11 07:40:36 2017] brcmfmac: brcmf_c_preinit_dcmds: Firmware
version = wl0: Mar  1 2015 07:29:38 version 7.45.18 (r538002) FWID
01-6a2c8ad4
[Sun Jun 11 07:40:36 2017] meson8b-dwmac c941.ethernet eth0: Link
is Up - 100Mbps/Full - flow control off
[Sun Jun 11 07:41:23 2017] EXT4-fs (mmcblk1p1): mounted filesystem
with ordered data mode. Opts: (null)
[Sun Jun 11 07:54:28 2017] Meson GXL Internal PHY 0.e40908ff:08:
attached PHY driver [Meson GXL Internal PHY]
(mii_bus:phy_addr=0.e40908ff:08, irq=-1)
[Sun Jun 11 07:54:28 2017] meson8b-dwmac c941.ethernet eth0: PTP
not supported by HW
[Sun Jun 11 07:54:30 2017] meson8b-dwmac c941.ethernet eth0: Link
is Up - 100Mbps/Full - flow control off
#

then I took eth0 and wlan0 up (same ArchLinuxArm with same mainline
kernel same place where the files are stored eMMC) and this 

Re: [PATCH net v3] net: ipmr: Fix some mroute forwarding issues in vrf's

2017-06-11 Thread Yotam Gigi
On 06/10/2017 11:30 PM, Donald Sharp wrote:
> This patch fixes two issues:
>
> 1) When forwarding on *,G mroutes that are in a vrf, the
> kernel was dropping information about the actual incoming
> interface when calling ip_mr_forward from ip_mr_input.
> This caused ip_mr_forward to send the multicast packet
> back out the incoming interface.  Fix this by
> modifying ip_mr_forward to be handed the correctly
> resolved dev.
>
> 2) When a unresolved cache entry is created we store
> the incoming skb on the unresolved cache entry and
> upon mroute resolution from the user space daemon,
> we attempt to forward the packet.  Again we were
> not resolving to the correct incoming device for
> a vrf scenario, before calling ip_mr_forward.
> Fix this by resolving to the correct interface
> and calling ip_mr_forward with the result.
>
> Fixes: e58e41596811 ("net: Enable support for VRF with ipv4 multicast")
> Signed-off-by: Donald Sharp 
> ---
> v2: Fixed title
> v3: Addressed Review comments by Andrew Lunn and David Ahern
>
>  net/ipv4/ipmr.c | 32 +++-
>  1 file changed, 15 insertions(+), 17 deletions(-)
>
> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> index 551de4d..09368a1 100644
> --- a/net/ipv4/ipmr.c
> +++ b/net/ipv4/ipmr.c
> @@ -101,8 +101,8 @@ static struct mr_table *ipmr_new_table(struct net *net, 
> u32 id);
>  static void ipmr_free_table(struct mr_table *mrt);
>  
>  static void ip_mr_forward(struct net *net, struct mr_table *mrt,
> -   struct sk_buff *skb, struct mfc_cache *cache,
> -   int local);
> +   struct net_device *dev, struct sk_buff *skb,
> +   struct mfc_cache *cache, int local);
>  static int ipmr_cache_report(struct mr_table *mrt,
>struct sk_buff *pkt, vifi_t vifi, int assert);
>  static int __ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
> @@ -988,7 +988,7 @@ static void ipmr_cache_resolve(struct net *net, struct 
> mr_table *mrt,
>  
>   rtnl_unicast(skb, net, NETLINK_CB(skb).portid);
>   } else {
> - ip_mr_forward(net, mrt, skb, c, 0);
> + ip_mr_forward(net, mrt, skb->dev, skb, c, 0);
>   }
>   }
>  }
> @@ -1073,7 +1073,7 @@ static int ipmr_cache_report(struct mr_table *mrt,
>  
>  /* Queue a packet for resolution. It gets locked cache entry! */
>  static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
> -  struct sk_buff *skb)
> +  struct sk_buff *skb, struct net_device *dev)
>  {
>   const struct iphdr *iph = ip_hdr(skb);
>   struct mfc_cache *c;
> @@ -1130,6 +1130,10 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, 
> vifi_t vifi,
>   kfree_skb(skb);
>   err = -ENOBUFS;
>   } else {
> + if (dev) {
> + skb->dev = dev;
> + skb->skb_iif = dev->ifindex;
> + }
>   skb_queue_tail(>mfc_un.unres.unresolved, skb);
>   err = 0;
>   }
> @@ -1828,10 +1832,10 @@ static int ipmr_find_vif(struct mr_table *mrt, struct 
> net_device *dev)
>  
>  /* "local" means that we should preserve one skb (for local delivery) */
>  static void ip_mr_forward(struct net *net, struct mr_table *mrt,
> -   struct sk_buff *skb, struct mfc_cache *cache,
> -   int local)
> +   struct net_device *dev, struct sk_buff *skb,
> +   struct mfc_cache *cache, int local)
>  {
> - int true_vifi = ipmr_find_vif(mrt, skb->dev);
> + int true_vifi = ipmr_find_vif(mrt, dev);
>   int psend = -1;
>   int vif, ct;
>  
> @@ -1853,13 +1857,7 @@ static void ip_mr_forward(struct net *net, struct 
> mr_table *mrt,
>   }
>  
>   /* Wrong interface: drop packet and (maybe) send PIM assert. */
> - if (mrt->vif_table[vif].dev != skb->dev) {
> - struct net_device *mdev;
> -
> - mdev = l3mdev_master_dev_rcu(mrt->vif_table[vif].dev);
> - if (mdev == skb->dev)
> - goto forward;
> -
> + if (mrt->vif_table[vif].dev != dev) {
>   if (rt_is_output_route(skb_rtable(skb))) {
>   /* It is our own packet, looped back.
>* Very complicated situation...
> @@ -2053,7 +2051,7 @@ int ip_mr_input(struct sk_buff *skb)
>   read_lock(_lock);
>   vif = ipmr_find_vif(mrt, dev);
>   if (vif >= 0) {
> - int err2 = ipmr_cache_unresolved(mrt, vif, skb);
> + int err2 = ipmr_cache_unresolved(mrt, vif, skb, dev);
>   read_unlock(_lock);
>  
>   return err2;
> @@ -2064,7 +2062,7 @@ int ip_mr_input(struct sk_buff *skb)
>   }
>  
>   read_lock(_lock);
> -