[PATCHv2 iproute2 net-next] ip neigh: allow flush FAILED neighbour entry

2017-06-15 Thread Hangbin Liu
After upstream commit 5071034e4af7 ('neigh: Really delete an arp/neigh entry
on "ip neigh delete" or "arp -d"'), we could delete a single FAILED neighbour
entry now. But `ip neigh flush` still skip the FAILED entry.

Move the filter after first round flush so we can flush FAILED entry on fixed
kernel and also do not keep retrying on old kernel.

Signed-off-by: Hangbin Liu 
---
 ip/ipneigh.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/ipneigh.c b/ip/ipneigh.c
index 4d8fc85..9c38a60 100644
--- a/ip/ipneigh.c
+++ b/ip/ipneigh.c
@@ -445,7 +445,6 @@ static int do_show_or_flush(int argc, char **argv, int 
flush)
filter.flushb = flushb;
filter.flushp = 0;
filter.flushe = sizeof(flushb);
-   filter.state &= ~NUD_FAILED;
 
while (round < MAX_ROUNDS) {
if (rtnl_dump_request_n(, ) < 0) {
@@ -474,6 +473,7 @@ static int do_show_or_flush(int argc, char **argv, int 
flush)
printf("\n*** Round %d, deleting %d entries 
***\n", round, filter.flushed);
fflush(stdout);
}
+   filter.state &= ~NUD_FAILED;
}
printf("*** Flush not complete bailing out after %d rounds\n",
MAX_ROUNDS);
-- 
2.5.5



RE: [PATCH net-next v2 0/2] r8152: adjust runtime suspend/resume

2017-06-15 Thread Hayes Wang
David Miller [mailto:da...@davemloft.net]
> Sent: Wednesday, June 14, 2017 1:02 AM
> > v2:
> > For #1, replace GFP_KERNEL with GFP_NOIO for usb_submit_urb().
> >
> > v1:
> > Improve the flow about runtime suspend/resume and make the code
> > easy to read.
> 
> Series applied.

Excuse me. I don't see these patches in net-next repository. Where could I find 
them?

Best Regards,
Hayes



Re: [PATCH net-next 00/10] mlx4 XDP performance improvements

2017-06-15 Thread David Miller
From: Tariq Toukan 
Date: Thu, 15 Jun 2017 14:35:30 +0300

> This patchset contains data-path improvements, mainly for XDP_DROP
> and XDP_TX cases.
> 
> Main patches:
> * Patch 2 by Saeed allows enabling optimized A0 RX steering (in HW) when
>   setting a single RX ring.
>   With this configuration, HW packet-rate dramatically improves,
>   reaching 28.1 Mpps in XDP_DROP case for both IPv4 (37% gain)
>   and IPv6 (53% gain).
> * Patch 6 enhances the XDP xmit function. Among other changes, now we
>   ring one doorbell per NAPI. Patch gives 17% gain in XDP_TX case.
> * Patch 7 obsoletes the NAPI of XDP_TX completion queue and integrates its
>   poll into the respective RX NAPI. Patch gives 15% gain in XDP_TX case.
> 
> Series generated against net-next commit:
> f7aec129a356 rxrpc: Cache the congestion window setting

Series applied, thanks.


Re: [pull request][net 0/6] Mellanox mlx5 fixes 2017-06-14

2017-06-15 Thread David Miller
From: Saeed Mahameed 
Date: Thu, 15 Jun 2017 23:55:24 +0300

> This series contains some fixes for the mlx5 core and netdev driver.
> 
> Please pull and let me know if there's any problem.

Pulled.

> For -stable:
> ("net/mlx5: Wait for FW readiness before initializing command interface") 
> kernels >= 4.4
> ("net/mlx5e: Fix timestamping capabilities reporting") kernels >= 4.5
> ("net/mlx5e: Avoid doing a cleanup call if the profile doesn't have it") 
> kernels >= 4.9
> ("net/mlx5e: Fix min inline value for VF rep SQs") kernels >= 4.11
> 
> The "net/mlx5e: Fix min inline .." (a oneliner patch) doesn't cleanly apply
> to 4.11, it hits a contextual conflict and can be easily resolved by:
> +   mlx5_query_min_inline(mdev, >params.tx_min_inline_mode);
> to the end of mlx5e_build_rep_netdev_priv. Note the 2nd parameter of
> mlx5_query_min_inline is slightly different from the original one.

Queued up, thanks!


Re: [PATCH] Convert multiple netdev_info messages to netdev_dbg

2017-06-15 Thread Joe Perches
On Thu, 2017-06-15 at 18:49 -0700, Jay Vosburgh wrote:
> Joe Perches  wrote:
> 
> > On Thu, 2017-06-15 at 19:14 +0100, Michael J Dilmore wrote:
> > > Multiple netdev_info messages clutter kernel output. Also add netdev_dbg 
> > > for packets per slave.
> > 
> > []
> > > diff --git a/drivers/net/bonding/bond_options.c 
> > > b/drivers/net/bonding/bond_options.c
> > 
> > []
> > > @@ -9,6 +9,8 @@
> > >   * (at your option) any later version.
> > >   */
> > >   
> > > +#define DEBUG 1
> > 
> > Is defining DEBUG really worthwhile.

Question was really if it's worthwhile to have
that logging always emitted at debug level or if
it's only useful when debugging.

I generally think smaller object code is better
and if it's not necessary, debugging output is
better not enabled/compiled into the kernel.

>   I don't believe so, since if CONFIG_DYNAMIC_DEBUG is not
> enabled, having #define DEBUG will enable all of the netdev_dbg messages
> unconditionally, which is the opposite of the stated purpose of the
> patch.  If DYNAMIC_DEBUG is enabled, having DEBUG doesn't do anything
> that I can see.

Having #define DEBUG means that by default the
dynamic_debug output logging is enabled in the
control file, otherwise it's not emitted unless
it's specifically enabled by a user.

include/linux/dynamic_debug.h:#ifdef DEBUG
include/linux/dynamic_debug.h-#define DEFINE_DYNAMIC_DEBUG_METADATA(name, fmt) \
include/linux/dynamic_debug.h-  DEFINE_DYNAMIC_DEBUG_METADATA_KEY(name, fmt, 
.key.dd_key_true, \
include/linux/dynamic_debug.h-
(STATIC_KEY_TRUE_INIT))


Re: [PATCH] Convert multiple netdev_info messages to netdev_dbg

2017-06-15 Thread Jay Vosburgh
Joe Perches  wrote:

>On Thu, 2017-06-15 at 19:14 +0100, Michael J Dilmore wrote:
>> Multiple netdev_info messages clutter kernel output. Also add netdev_dbg for 
>> packets per slave.
>[]
>> diff --git a/drivers/net/bonding/bond_options.c 
>> b/drivers/net/bonding/bond_options.c
>[]
>> @@ -9,6 +9,8 @@
>>   * (at your option) any later version.
>>   */
>>   
>> +#define DEBUG 1
>
>Is defining DEBUG really worthwhile.

I don't believe so, since if CONFIG_DYNAMIC_DEBUG is not
enabled, having #define DEBUG will enable all of the netdev_dbg messages
unconditionally, which is the opposite of the stated purpose of the
patch.  If DYNAMIC_DEBUG is enabled, having DEBUG doesn't do anything
that I can see.

-J

>As well, it's almost always just
>#define DEBUG
>without any level value unless the
>level value is used in the code.
>
>> +
>>  #include 
>>  #include 
>>  #include 
>> @@ -719,13 +721,13 @@ static int bond_option_mode_set(struct bonding *bond,
>>  const struct bond_opt_value *newval)
>>  {
>>  if (!bond_mode_uses_arp(newval->value) && bond->params.arp_interval) {
>> -netdev_info(bond->dev, "%s mode is incompatible with arp 
>> monitoring, start mii monitoring\n",
>> +netdev_dbg(bond->dev, "%s mode is incompatible with arp 
>> monitoring, start mii monitoring\n",
>>  newval->string);
>
>Please realign any multiple line arguments to the
>open parenthesis at the same time.
>
>>  /* disable arp monitoring */
>>  bond->params.arp_interval = 0;
>>  /* set miimon to default value */
>>  bond->params.miimon = BOND_DEFAULT_MIIMON;
>> -netdev_info(bond->dev, "Setting MII monitoring interval to 
>> %d\n",
>> +netdev_dbg(bond->dev, "Setting MII monitoring interval to %d\n",
>>  bond->params.miimon);
>
>etc...
>

---
-Jay Vosburgh, jay.vosbu...@canonical.com


Re: [PATCH v4 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-06-15 Thread Ding Tianhong


On 2017/6/13 5:28, Alexander Duyck wrote:
> On Mon, Jun 12, 2017 at 4:05 AM, Ding Tianhong  
> wrote:
...
>>  /**
>> + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
>> + * @dev: PCI device to query
>> + *
>> + * If possible clear relaxed ordering
>> + */
>> +int pcie_clear_relaxed_ordering(struct pci_dev *dev)
>> +{
>> +   return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
>> + PCI_EXP_DEVCTL_RELAX_EN);
>> +}
>> +EXPORT_SYMBOL(pcie_clear_relaxed_ordering);
>> +
>> +/**
>> + * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support
>> + * @dev: PCI device to query
>> + *
>> + * Returns true if the device support relaxed ordering attribute.
>> + */
>> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev)
>> +{
>> +   bool ro_supported = false;
>> +   u16 v;
>> +
>> +   pcie_capability_read_word(dev, PCI_EXP_DEVCTL, );
>> +   if ((v & PCI_EXP_DEVCTL_RELAX_EN) >> 4)
>> +   ro_supported = true;
> 
> Instead of "return ro_supported" why not just "return !!(v &
> PCIE_EXP_DEVCTL_RELAX_EN)"? You can cut out the extra steps and save
> yourself some extra steps this way since the shift by 4 shouldn't even
> really be needed since you are just testing for a bit anyway.
> 

OK.

>> +
>> +   return ro_supported;
>> +}
>> +EXPORT_SYMBOL(pcie_relaxed_ordering_supported);
>> +
>> +/**
>>   * pcie_get_minimum_link - determine minimum link settings of a PCI device
>>   * @dev: PCI device to query
>>   * @speed: storage for minimum speed
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 19c8950..ed1f717 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1701,6 +1701,46 @@ static void pci_configure_extended_tags(struct 
>> pci_dev *dev)
>>  PCI_EXP_DEVCTL_EXT_TAG);
>>  }
>>
>> +/**
>> + * pci_dev_should_disable_relaxed_ordering - check if the PCI device
>> + * should disable the relaxed ordering attribute.
>> + * @dev: PCI device
>> + *
>> + * Return true if any of the PCI devices above us do not support
>> + * relaxed ordering.
>> + */
>> +static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev)
>> +{
>> +   bool ro_disabled = false;
>> +
>> +   while (dev) {
>> +   if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING) {
>> +   ro_disabled = true;
>> +   break;
>> +   }
>> +   dev = dev->bus->self;
>> +   }
>> +
>> +   return ro_disabled;
> 
> Same thing here. I would suggest just returning either true or false,
> and drop the ro_disabled value. It will return the lines of code and
> make things a bit bit more direct.
> 

OK.

>> +}
>> +
>> +static void pci_configure_relaxed_ordering(struct pci_dev *dev)
>> +{
>> +   struct pci_dev *bridge = pci_upstream_bridge(dev);
>> +
>> +   if (!pci_is_pcie(dev) || !bridge || !pci_is_pcie(bridge))
>> +   return;
> 
> The pci_is_pcie check is actually redundant based on the
> pcie_relaxed_ordering_supported check using pcie_capability_read_word.
>

Yes, pcie_capability_read_word already check it, thanks.


> Also I am not sure what the point is of the pci_upstream_bridge()
> check is, it seems like you should be able to catch all the same stuff
> in your pci_dev_should_disable_relaxed_ordering() call. Though it did
> give me a thought. I don't think we can alter this for a VF, so you
> might want to add a check for dev->is_virtfn to the list of checks and
> if it is a virtual function just return since I don't think there are
> any VFs that would let you alter this bit anyway.
> 
If the upstream device is null, does it mean that it is in a guest OS device? 
maybe I miss something.
also I will check the dev->is_virtfn to avoid trying to change the 
configuration space for VF.

Another question: Because it looks like that maybe the Casey is too busy these 
days, should we
delay the modification of the cxgb4 and instead to update the ixgbe? what do 
you think about it. :)

Thanks.
Ding

>> +   /* If the releaxed ordering enable bit is not set, do nothing. */
>> +   if (!pcie_relaxed_ordering_supported(dev))
>> +   return;
>> +
>> +   if (pci_dev_should_disable_relaxed_ordering(dev)) {
>> +   pcie_clear_relaxed_ordering(dev);
>> +   dev_info(>dev, "Disable Relaxed Ordering\n");
>> +   }
>> +}
>> +
>>  static void pci_configure_device(struct pci_dev *dev)
>>  {
>> struct hotplug_params hpp;
>> @@ -1708,6 +1748,7 @@ static void pci_configure_device(struct pci_dev *dev)
>>
>> pci_configure_mps(dev);
>> pci_configure_extended_tags(dev);
>> +   pci_configure_relaxed_ordering(dev);
>>
>> memset(, 0, sizeof(hpp));
>> ret = pci_get_hp_params(dev, );
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index e1e8428..9870781 100644
>> --- 

[PATCH v3 1/2] tcp: md5: add an address prefix for key lookup

2017-06-15 Thread Ivan Delalande
This allows the keys used for TCP MD5 signature to be used for whole
range of addresses, specified with a prefix length, instead of only one
address as it currently is.

Signed-off-by: Bob Gilligan 
Signed-off-by: Eric Mowat 
Signed-off-by: Ivan Delalande 
---
 include/net/tcp.h   |  6 +++--
 net/ipv4/tcp_ipv4.c | 68 ++---
 net/ipv6/tcp_ipv6.c | 12 ++
 3 files changed, 70 insertions(+), 16 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 38a7427ae902..2b68023ab095 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1395,6 +1395,7 @@ struct tcp_md5sig_key {
u8  keylen;
u8  family; /* AF_INET or AF_INET6 */
union tcp_md5_addr  addr;
+   u8  prefixlen;
u8  key[TCP_MD5SIG_MAXKEYLEN];
struct rcu_head rcu;
 };
@@ -1438,9 +1439,10 @@ struct tcp_md5sig_pool {
 int tcp_v4_md5_hash_skb(char *md5_hash, const struct tcp_md5sig_key *key,
const struct sock *sk, const struct sk_buff *skb);
 int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
-  int family, const u8 *newkey, u8 newkeylen, gfp_t gfp);
+  int family, u8 prefixlen, const u8 *newkey, u8 newkeylen,
+  gfp_t gfp);
 int tcp_md5_do_del(struct sock *sk, const union tcp_md5_addr *addr,
-  int family);
+  int family, u8 prefixlen);
 struct tcp_md5sig_key *tcp_v4_md5_lookup(const struct sock *sk,
 const struct sock *addr_sk);
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 5ab2aac5ca19..51ca3bd5a8a3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -80,6 +80,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -906,6 +907,9 @@ struct tcp_md5sig_key *tcp_md5_do_lookup(const struct sock 
*sk,
struct tcp_md5sig_key *key;
unsigned int size = sizeof(struct in_addr);
const struct tcp_md5sig_info *md5sig;
+   __be32 mask;
+   struct tcp_md5sig_key *best_match = NULL;
+   bool match;
 
/* caller either holds rcu_read_lock() or socket lock */
md5sig = rcu_dereference_check(tp->md5sig_info,
@@ -919,12 +923,55 @@ struct tcp_md5sig_key *tcp_md5_do_lookup(const struct 
sock *sk,
hlist_for_each_entry_rcu(key, >head, node) {
if (key->family != family)
continue;
-   if (!memcmp(>addr, addr, size))
+
+   if (family == AF_INET) {
+   mask = inet_make_mask(key->prefixlen);
+   match = (key->addr.a4.s_addr & mask) ==
+   (addr->a4.s_addr & mask);
+#if IS_ENABLED(CONFIG_IPV6)
+   } else if (family == AF_INET6) {
+   match = ipv6_prefix_equal(>addr.a6, >a6,
+ key->prefixlen);
+#endif
+   } else {
+   match = false;
+   }
+
+   if (match && (!best_match ||
+ key->prefixlen > best_match->prefixlen))
+   best_match = key;
+   }
+   return best_match;
+}
+EXPORT_SYMBOL(tcp_md5_do_lookup);
+
+struct tcp_md5sig_key *tcp_md5_do_lookup_exact(const struct sock *sk,
+  const union tcp_md5_addr *addr,
+  int family, u8 prefixlen)
+{
+   const struct tcp_sock *tp = tcp_sk(sk);
+   struct tcp_md5sig_key *key;
+   unsigned int size = sizeof(struct in_addr);
+   const struct tcp_md5sig_info *md5sig;
+
+   /* caller either holds rcu_read_lock() or socket lock */
+   md5sig = rcu_dereference_check(tp->md5sig_info,
+  lockdep_sock_is_held(sk));
+   if (!md5sig)
+   return NULL;
+#if IS_ENABLED(CONFIG_IPV6)
+   if (family == AF_INET6)
+   size = sizeof(struct in6_addr);
+#endif
+   hlist_for_each_entry_rcu(key, >head, node) {
+   if (key->family != family)
+   continue;
+   if (!memcmp(>addr, addr, size) &&
+   key->prefixlen == prefixlen)
return key;
}
return NULL;
 }
-EXPORT_SYMBOL(tcp_md5_do_lookup);
 
 struct tcp_md5sig_key *tcp_v4_md5_lookup(const struct sock *sk,
 const struct sock *addr_sk)
@@ -938,14 +985,15 @@ EXPORT_SYMBOL(tcp_v4_md5_lookup);
 
 /* This can be called on a newly created socket, from other files */
 int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
-  int family, const u8 *newkey, u8 newkeylen, gfp_t gfp)
+  int family, u8 prefixlen, const u8 *newkey, u8 

[PATCH v3 2/2] tcp: md5: add TCP_MD5SIG_EXT socket option to set a key address prefix

2017-06-15 Thread Ivan Delalande
Replace first padding in the tcp_md5sig structure with a new flag field
and address prefix length so it can be specified when configuring a new
key for TCP MD5 signature. The tcpm_flags field will only be used if the
socket option is TCP_MD5SIG_EXT to avoid breaking existing programs, and
tcpm_prefixlen only when the TCP_MD5SIG_FLAG_PREFIX flag is set.

Signed-off-by: Bob Gilligan 
Signed-off-by: Eric Mowat 
Signed-off-by: Ivan Delalande 
---
 include/net/tcp.h|  1 +
 include/uapi/linux/tcp.h |  9 +++--
 net/ipv4/tcp.c   |  3 ++-
 net/ipv4/tcp_ipv4.c  | 16 
 net/ipv6/tcp_ipv6.c  | 25 ++---
 5 files changed, 40 insertions(+), 14 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 2b68023ab095..575f95cb8275 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1802,6 +1802,7 @@ struct tcp_sock_af_ops {
 const struct sock *sk,
 const struct sk_buff *skb);
int (*md5_parse)(struct sock *sk,
+int optname,
 char __user *optval,
 int optlen);
 #endif
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 38a2b07afdff..9870b7f08f4f 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -117,6 +117,7 @@ enum {
 #define TCP_SAVED_SYN  28  /* Get SYN headers recorded for 
connection */
 #define TCP_REPAIR_WINDOW  29  /* Get/set window parameters */
 #define TCP_FASTOPEN_CONNECT   30  /* Attempt FastOpen with connect */
+#define TCP_MD5SIG_EXT 31  /* TCP MD5 Signature with extensions */
 
 struct tcp_repair_opt {
__u32   opt_code;
@@ -234,11 +235,15 @@ enum {
 /* for TCP_MD5SIG socket option */
 #define TCP_MD5SIG_MAXKEYLEN   80
 
+/* tcp_md5sig extension flags for TCP_MD5SIG_EXT */
+#define TCP_MD5SIG_FLAG_PREFIX 1   /* address prefix length */
+
 struct tcp_md5sig {
struct __kernel_sockaddr_storage tcpm_addr; /* address associated */
-   __u16   __tcpm_pad1;/* zero */
+   __u8tcpm_flags; /* extension flags */
+   __u8tcpm_prefixlen; /* address prefix */
__u16   tcpm_keylen;/* key length */
-   __u32   __tcpm_pad2;/* zero */
+   __u32   __tcpm_pad; /* zero */
__u8tcpm_key[TCP_MD5SIG_MAXKEYLEN]; /* key (binary) */
 };
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1e4c76d2b827..2a68221d2e55 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2666,8 +2666,9 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 
 #ifdef CONFIG_TCP_MD5SIG
case TCP_MD5SIG:
+   case TCP_MD5SIG_EXT:
/* Read the IP->Key mappings from userspace */
-   err = tp->af_specific->md5_parse(sk, optval, optlen);
+   err = tp->af_specific->md5_parse(sk, optname, optval, optlen);
break;
 #endif
case TCP_USER_TIMEOUT:
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 51ca3bd5a8a3..81d6c16aecdc 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1064,11 +1064,12 @@ static void tcp_clear_md5_list(struct sock *sk)
}
 }
 
-static int tcp_v4_parse_md5_keys(struct sock *sk, char __user *optval,
-int optlen)
+static int tcp_v4_parse_md5_keys(struct sock *sk, int optname,
+char __user *optval, int optlen)
 {
struct tcp_md5sig cmd;
struct sockaddr_in *sin = (struct sockaddr_in *)_addr;
+   u8 prefixlen = 32;
 
if (optlen < sizeof(cmd))
return -EINVAL;
@@ -1079,15 +1080,22 @@ static int tcp_v4_parse_md5_keys(struct sock *sk, char 
__user *optval,
if (sin->sin_family != AF_INET)
return -EINVAL;
 
+   if (optname == TCP_MD5SIG_EXT &&
+   cmd.tcpm_flags & TCP_MD5SIG_FLAG_PREFIX) {
+   prefixlen = cmd.tcpm_prefixlen;
+   if (prefixlen > 32)
+   return -EINVAL;
+   }
+
if (!cmd.tcpm_keylen)
return tcp_md5_do_del(sk, (union tcp_md5_addr 
*)>sin_addr.s_addr,
- AF_INET, 32);
+ AF_INET, prefixlen);
 
if (cmd.tcpm_keylen > TCP_MD5SIG_MAXKEYLEN)
return -EINVAL;
 
return tcp_md5_do_add(sk, (union tcp_md5_addr *)>sin_addr.s_addr,
- AF_INET, 32, cmd.tcpm_key, cmd.tcpm_keylen,
+ AF_INET, prefixlen, cmd.tcpm_key, cmd.tcpm_keylen,
  GFP_KERNEL);
 }
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c

Re: [PATCH v3 net-next 8/9] bpf: nfp: Report bpf_prog ID during XDP_QUERY_PROG

2017-06-15 Thread Jakub Kicinski
On Thu, 15 Jun 2017 17:29:16 -0700, Martin KaFai Lau wrote:
> Add support to nfp to report bpf_prog ID during XDP_QUERY_PROG.
> 
> Signed-off-by: Martin KaFai Lau 
> Cc: Jakub Kicinski 
> Acked-by: Alexei Starovoitov 
> Acked-by: Daniel Borkmann 

Acked-by: Jakub Kicinski 

Thanks!


Re: [PATCH net-next] net: dsa: add cross-chip multicast support

2017-06-15 Thread Florian Fainelli
On 06/15/2017 01:14 PM, Vivien Didelot wrote:
> Similarly to how cross-chip VLAN works, define a bitmap of multicast
> group members for a switch, now including its DSA ports, so that
> multicast traffic can be sent to all switches of the fabric.
> 
> A switch may drop the frames if no user port is a member.
> 
> This brings support for multicast in a multi-chip environment.
> As of now, all switches of the fabric must support the multicast
> operations in order to program a single fabric port.
> 
> Reported-by: Jason Cobham 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Florian Fainelli 
-- 
Florian


[PATCH v3 net-next 9/9] bpf: qede: Report bpf_prog ID during XDP_QUERY_PROG

2017-06-15 Thread Martin KaFai Lau
Add support to qede to report bpf_prog ID during XDP_QUERY_PROG.

Signed-off-by: Martin KaFai Lau 
Cc: Mintz Yuval 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 drivers/net/ethernet/qlogic/qede/qede_filter.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_filter.c 
b/drivers/net/ethernet/qlogic/qede/qede_filter.c
index 13955a3bd3b3..f939db5bac5f 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_filter.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_filter.c
@@ -1037,6 +1037,7 @@ int qede_xdp(struct net_device *dev, struct netdev_xdp 
*xdp)
return qede_xdp_set(edev, xdp->prog);
case XDP_QUERY_PROG:
xdp->prog_attached = !!edev->xdp_prog;
+   xdp->prog_id = edev->xdp_prog ? edev->xdp_prog->aux->id : 0;
return 0;
default:
return -EINVAL;
-- 
2.9.3



[PATCH v3 net-next 6/9] bpf: thunderx: Report bpf_prog ID during XDP_QUERY_PROG

2017-06-15 Thread Martin KaFai Lau
Add support to thunderx to report bpf_prog ID during XDP_QUERY_PROG.

Signed-off-by: Martin KaFai Lau 
Cc: Sunil Goutham 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 drivers/net/ethernet/cavium/thunder/nicvf_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index d6477af88085..573755b0a51b 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -1763,6 +1763,7 @@ static int nicvf_xdp(struct net_device *netdev, struct 
netdev_xdp *xdp)
return nicvf_xdp_setup(nic, xdp->prog);
case XDP_QUERY_PROG:
xdp->prog_attached = !!nic->xdp_prog;
+   xdp->prog_id = nic->xdp_prog ? nic->xdp_prog->aux->id : 0;
return 0;
default:
return -EINVAL;
-- 
2.9.3



[PATCH v3 net-next 3/9] bpf: mlx5e: Report bpf_prog ID during XDP_QUERY_PROG

2017-06-15 Thread Martin KaFai Lau
Add support to mlx5e to report bpf_prog ID during XDP_QUERY_PROG.

Signed-off-by: Martin KaFai Lau 
Cc: Tariq Toukan 
Cc: Saeed Mahameed 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Acked-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 5afec0f4a658..c8f3aefe735d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3599,11 +3599,19 @@ static int mlx5e_xdp_set(struct net_device *netdev, 
struct bpf_prog *prog)
return err;
 }
 
-static bool mlx5e_xdp_attached(struct net_device *dev)
+static u32 mlx5e_xdp_query(struct net_device *dev)
 {
struct mlx5e_priv *priv = netdev_priv(dev);
+   const struct bpf_prog *xdp_prog;
+   u32 prog_id = 0;
 
-   return !!priv->channels.params.xdp_prog;
+   mutex_lock(>state_lock);
+   xdp_prog = priv->channels.params.xdp_prog;
+   if (xdp_prog)
+   prog_id = xdp_prog->aux->id;
+   mutex_unlock(>state_lock);
+
+   return prog_id;
 }
 
 static int mlx5e_xdp(struct net_device *dev, struct netdev_xdp *xdp)
@@ -3612,7 +3620,8 @@ static int mlx5e_xdp(struct net_device *dev, struct 
netdev_xdp *xdp)
case XDP_SETUP_PROG:
return mlx5e_xdp_set(dev, xdp->prog);
case XDP_QUERY_PROG:
-   xdp->prog_attached = mlx5e_xdp_attached(dev);
+   xdp->prog_id = mlx5e_xdp_query(dev);
+   xdp->prog_attached = !!xdp->prog_id;
return 0;
default:
return -EINVAL;
-- 
2.9.3



[PATCH v3 net-next 4/9] bpf: virtio_net: Report bpf_prog ID during XDP_QUERY_PROG

2017-06-15 Thread Martin KaFai Lau
Add support to virtio_net to report bpf_prog ID during XDP_QUERY_PROG.

Signed-off-by: Martin KaFai Lau 
Cc: John Fastabend 
Cc: Jason Wang 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 drivers/net/virtio_net.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1f8c15cb63b0..4f49c3dab124 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1955,16 +1955,18 @@ static int virtnet_xdp_set(struct net_device *dev, 
struct bpf_prog *prog,
return err;
 }
 
-static bool virtnet_xdp_query(struct net_device *dev)
+static u32 virtnet_xdp_query(struct net_device *dev)
 {
struct virtnet_info *vi = netdev_priv(dev);
+   const struct bpf_prog *xdp_prog;
int i;
 
for (i = 0; i < vi->max_queue_pairs; i++) {
-   if (vi->rq[i].xdp_prog)
-   return true;
+   xdp_prog = rtnl_dereference(vi->rq[i].xdp_prog);
+   if (xdp_prog)
+   return xdp_prog->aux->id;
}
-   return false;
+   return 0;
 }
 
 static int virtnet_xdp(struct net_device *dev, struct netdev_xdp *xdp)
@@ -1973,7 +1975,8 @@ static int virtnet_xdp(struct net_device *dev, struct 
netdev_xdp *xdp)
case XDP_SETUP_PROG:
return virtnet_xdp_set(dev, xdp->prog, xdp->extack);
case XDP_QUERY_PROG:
-   xdp->prog_attached = virtnet_xdp_query(dev);
+   xdp->prog_id = virtnet_xdp_query(dev);
+   xdp->prog_attached = !!xdp->prog_id;
return 0;
default:
return -EINVAL;
-- 
2.9.3



[PATCH v3 net-next 7/9] bpf: ixgbe: Report bpf_prog ID during XDP_QUERY_PROG

2017-06-15 Thread Martin KaFai Lau
Add support to ixgbe to report bpf_prog ID during XDP_QUERY_PROG.

Signed-off-by: Martin KaFai Lau 
Cc: Alexander Duyck 
Cc: John Fastabend 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index f3dc5dea9300..f1dbdf26d8e1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9815,6 +9815,8 @@ static int ixgbe_xdp(struct net_device *dev, struct 
netdev_xdp *xdp)
return ixgbe_xdp_setup(dev, xdp->prog);
case XDP_QUERY_PROG:
xdp->prog_attached = !!(adapter->xdp_prog);
+   xdp->prog_id = adapter->xdp_prog ?
+   adapter->xdp_prog->aux->id : 0;
return 0;
default:
return -EINVAL;
-- 
2.9.3



[PATCH v3 net-next 0/9] bpf: xdp: Report bpf_prog ID in IFLA_XDP

2017-06-15 Thread Martin KaFai Lau
This is the first usage of the new bpf_prog ID.  It is for
reporting the ID of a xdp_prog through netlink.

It rides on the existing IFLA_XDP.  This patch adds IFLA_XDP_PROG_ID
for the bpf_prog ID reporting.

It starts with changing the generic_xdp first.  After that,
the hardware driver is changed one by one.  Jakub Kicinski mentioned
that he will soon introduce XDP_ATTACHED_HW (on top of the existing
XDP_ATTACHED_DRV and XDP_ATTACHED_SKB)
and he is going to reuse the prog_attached for this purpose.
Hence, this patch set keeps the prog_attached even though
!!prog_id also implies there is xdp_prog attached.

I have tested with generic_xdp, mlx4 and mlx5.

v3:
1. Replace 'if' by '?' when checking the xdp_prog pointer
   as suggested by Jakub Kicinski (thanks!)

v2:
1. Remove READ_ONCE since it is alredy under rtnl lock
2. Keep prog_attached in 'struct netdev_xdp' as
   requested by Jakub Kicinski.  The existing prog_attached
   and the new prog_id are put under a struct for XDP_QUERY_PROG.

Martin KaFai Lau (9):
  net: Add IFLA_XDP_PROG_ID
  bpf: mlx4: Report bpf_prog ID during XDP_QUERY_PROG
  bpf: mlx5e: Report bpf_prog ID during XDP_QUERY_PROG
  bpf: virtio_net: Report bpf_prog ID during XDP_QUERY_PROG
  bpf: bnxt: Report bpf_prog ID during XDP_QUERY_PROG
  bpf: thunderx: Report bpf_prog ID during XDP_QUERY_PROG
  bpf: ixgbe: Report bpf_prog ID during XDP_QUERY_PROG
  bpf: nfp: Report bpf_prog ID during XDP_QUERY_PROG
  bpf: qede: Report bpf_prog ID during XDP_QUERY_PROG

 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c  |  1 +
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   |  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |  2 ++
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 21 ++---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 15 +---
 .../net/ethernet/netronome/nfp/nfp_net_common.c|  1 +
 drivers/net/ethernet/qlogic/qede/qede_filter.c |  1 +
 drivers/net/virtio_net.c   | 13 +++
 include/linux/netdevice.h  |  7 --
 include/uapi/linux/if_link.h   |  1 +
 net/core/dev.c | 19 ---
 net/core/rtnetlink.c   | 27 +-
 12 files changed, 82 insertions(+), 27 deletions(-)

-- 
2.9.3



[PATCH v3 net-next 2/9] bpf: mlx4: Report bpf_prog ID during XDP_QUERY_PROG

2017-06-15 Thread Martin KaFai Lau
Add support to mlx4 to report bpf_prog ID during XDP_QUERY_PROG.

Signed-off-by: Martin KaFai Lau 
Cc: Tariq Toukan 
Cc: Saeed Mahameed 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index c1de75fc399a..ad908a6e49cd 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2821,11 +2821,25 @@ static int mlx4_xdp_set(struct net_device *dev, struct 
bpf_prog *prog)
return err;
 }
 
-static bool mlx4_xdp_attached(struct net_device *dev)
+static u32 mlx4_xdp_query(struct net_device *dev)
 {
struct mlx4_en_priv *priv = netdev_priv(dev);
+   struct mlx4_en_dev *mdev = priv->mdev;
+   const struct bpf_prog *xdp_prog;
+   u32 prog_id = 0;
+
+   if (!priv->tx_ring_num[TX_XDP])
+   return prog_id;
+
+   mutex_lock(>state_lock);
+   xdp_prog = rcu_dereference_protected(
+   priv->rx_ring[0]->xdp_prog,
+   lockdep_is_held(>state_lock));
+   if (xdp_prog)
+   prog_id = xdp_prog->aux->id;
+   mutex_unlock(>state_lock);
 
-   return !!priv->tx_ring_num[TX_XDP];
+   return prog_id;
 }
 
 static int mlx4_xdp(struct net_device *dev, struct netdev_xdp *xdp)
@@ -2834,7 +2848,8 @@ static int mlx4_xdp(struct net_device *dev, struct 
netdev_xdp *xdp)
case XDP_SETUP_PROG:
return mlx4_xdp_set(dev, xdp->prog);
case XDP_QUERY_PROG:
-   xdp->prog_attached = mlx4_xdp_attached(dev);
+   xdp->prog_id = mlx4_xdp_query(dev);
+   xdp->prog_attached = !!xdp->prog_id;
return 0;
default:
return -EINVAL;
-- 
2.9.3



[PATCH v3 net-next 5/9] bpf: bnxt: Report bpf_prog ID during XDP_QUERY_PROG

2017-06-15 Thread Martin KaFai Lau
Add support to bnxt to report bpf_prog ID during XDP_QUERY_PROG.

Signed-off-by: Martin KaFai Lau 
Cc: Michael Chan 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index 8ce793a0d030..7d67552e70d7 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -218,6 +218,7 @@ int bnxt_xdp(struct net_device *dev, struct netdev_xdp *xdp)
break;
case XDP_QUERY_PROG:
xdp->prog_attached = !!bp->xdp_prog;
+   xdp->prog_id = bp->xdp_prog ? bp->xdp_prog->aux->id : 0;
rc = 0;
break;
default:
-- 
2.9.3



[PATCH v3 net-next 8/9] bpf: nfp: Report bpf_prog ID during XDP_QUERY_PROG

2017-06-15 Thread Martin KaFai Lau
Add support to nfp to report bpf_prog ID during XDP_QUERY_PROG.

Signed-off-by: Martin KaFai Lau 
Cc: Jakub Kicinski 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 49d1756d6a8e..378512dec80d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -3256,6 +3256,7 @@ static int nfp_net_xdp(struct net_device *netdev, struct 
netdev_xdp *xdp)
return nfp_net_xdp_setup(nn, xdp);
case XDP_QUERY_PROG:
xdp->prog_attached = !!nn->dp.xdp_prog;
+   xdp->prog_id = nn->dp.xdp_prog ? nn->dp.xdp_prog->aux->id : 0;
return 0;
default:
return -EINVAL;
-- 
2.9.3



[PATCH v3 net-next 1/9] net: Add IFLA_XDP_PROG_ID

2017-06-15 Thread Martin KaFai Lau
Expose prog_id through IFLA_XDP_PROG_ID.  This patch
makes modification to generic_xdp.  The later patches will
modify other xdp-supported drivers.

prog_id is added to struct net_dev_xdp.

iproute2 patch will be followed. Here is how the 'ip link'
will look like:
> ip link show eth0
3: eth0:  mtu 1500 xdp(prog_id:1) qdisc 
fq_codel state UP mode DEFAULT group default qlen 1000

Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 include/linux/netdevice.h|  7 +--
 include/uapi/linux/if_link.h |  1 +
 net/core/dev.c   | 19 +++
 net/core/rtnetlink.c | 27 +--
 4 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ad98a83f1332..7c7118b3bd69 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -824,7 +824,10 @@ struct netdev_xdp {
struct netlink_ext_ack *extack;
};
/* XDP_QUERY_PROG */
-   bool prog_attached;
+   struct {
+   bool prog_attached;
+   u32 prog_id;
+   };
};
 };
 
@@ -3302,7 +3305,7 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, 
struct net_device *dev,
 typedef int (*xdp_op_t)(struct net_device *dev, struct netdev_xdp *xdp);
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
  int fd, u32 flags);
-bool __dev_xdp_attached(struct net_device *dev, xdp_op_t xdp_op);
+bool __dev_xdp_attached(struct net_device *dev, xdp_op_t xdp_op, u32 *prog_id);
 
 int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
 int dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8ed679fe603f..dd88375a6580 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -907,6 +907,7 @@ enum {
IFLA_XDP_FD,
IFLA_XDP_ATTACHED,
IFLA_XDP_FLAGS,
+   IFLA_XDP_PROG_ID,
__IFLA_XDP_MAX,
 };
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 8658074ecad6..b8d6dd9e8b5c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4342,13 +4342,12 @@ static struct static_key generic_xdp_needed 
__read_mostly;
 
 static int generic_xdp_install(struct net_device *dev, struct netdev_xdp *xdp)
 {
+   struct bpf_prog *old = rtnl_dereference(dev->xdp_prog);
struct bpf_prog *new = xdp->prog;
int ret = 0;
 
switch (xdp->command) {
-   case XDP_SETUP_PROG: {
-   struct bpf_prog *old = rtnl_dereference(dev->xdp_prog);
-
+   case XDP_SETUP_PROG:
rcu_assign_pointer(dev->xdp_prog, new);
if (old)
bpf_prog_put(old);
@@ -4360,10 +4359,10 @@ static int generic_xdp_install(struct net_device *dev, 
struct netdev_xdp *xdp)
dev_disable_lro(dev);
}
break;
-   }
 
case XDP_QUERY_PROG:
-   xdp->prog_attached = !!rcu_access_pointer(dev->xdp_prog);
+   xdp->prog_attached = !!old;
+   xdp->prog_id = old ? old->aux->id : 0;
break;
 
default:
@@ -6937,7 +6936,8 @@ int dev_change_proto_down(struct net_device *dev, bool 
proto_down)
 }
 EXPORT_SYMBOL(dev_change_proto_down);
 
-bool __dev_xdp_attached(struct net_device *dev, xdp_op_t xdp_op)
+bool __dev_xdp_attached(struct net_device *dev, xdp_op_t xdp_op,
+   u32 *prog_id)
 {
struct netdev_xdp xdp;
 
@@ -6946,6 +6946,9 @@ bool __dev_xdp_attached(struct net_device *dev, xdp_op_t 
xdp_op)
 
/* Query must always succeed. */
WARN_ON(xdp_op(dev, ) < 0);
+   if (prog_id)
+   *prog_id = xdp.prog_id;
+
return xdp.prog_attached;
 }
 
@@ -6991,10 +6994,10 @@ int dev_change_xdp_fd(struct net_device *dev, struct 
netlink_ext_ack *extack,
xdp_chk = generic_xdp_install;
 
if (fd >= 0) {
-   if (xdp_chk && __dev_xdp_attached(dev, xdp_chk))
+   if (xdp_chk && __dev_xdp_attached(dev, xdp_chk, NULL))
return -EEXIST;
if ((flags & XDP_FLAGS_UPDATE_IF_NOEXIST) &&
-   __dev_xdp_attached(dev, xdp_op))
+   __dev_xdp_attached(dev, xdp_op, NULL))
return -EBUSY;
 
prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_XDP);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 2769ad9834d1..3aa57848a895 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -899,7 +900,8 @@ static size_t rtnl_port_size(const struct net_device *dev,
 static size_t rtnl_xdp_size(void)
 {
size_t 

Re: [PATCH 31/44] hexagon: remove arch-specific dma_supported implementation

2017-06-15 Thread Richard Kuo
On Thu, Jun 08, 2017 at 03:25:56PM +0200, Christoph Hellwig wrote:
> This implementation is simply bogus - hexagon only has a simple
> direct mapped DMA implementation and thus doesn't care about the
> address.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/hexagon/include/asm/dma-mapping.h | 2 --
>  arch/hexagon/kernel/dma.c  | 9 -
>  2 files changed, 11 deletions(-)
> 

Acked-by: Richard Kuo 

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
a Linux Foundation Collaborative Project


Re: [PATCH 17/44] hexagon: switch to use ->mapping_error for error reporting

2017-06-15 Thread Richard Kuo
On Thu, Jun 08, 2017 at 03:25:42PM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/hexagon/include/asm/dma-mapping.h |  2 --
>  arch/hexagon/kernel/dma.c  | 12 +---
>  arch/hexagon/kernel/hexagon_ksyms.c|  1 -
>  3 files changed, 9 insertions(+), 6 deletions(-)
> 

Acked-by: Richard Kuo 

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
a Linux Foundation Collaborative Project


RE: [PATCH net-next] net: dsa: add cross-chip multicast support

2017-06-15 Thread Jason Cobham
On 06/15/2017 1:15 PM, Vivien Didelot wrote:
> Similarly to how cross-chip VLAN works, define a bitmap of multicast group
> members for a switch, now including its DSA ports, so that multicast traffic
> can be sent to all switches of the fabric.
> 
> A switch may drop the frames if no user port is a member.
> 
> This brings support for multicast in a multi-chip environment.
> As of now, all switches of the fabric must support the multicast operations in
> order to program a single fabric port.
> 
> Reported-by: Jason Cobham 
> Signed-off-by: Vivien Didelot 

Thanks for this fix:

Tested-by: Jason Cobham 
--
Jason


Re: [net-next 01/15] net/mlx5: Update eqe_type_str() event names

2017-06-15 Thread Joe Perches
On Fri, 2017-06-16 at 00:42 +0300, Saeed Mahameed wrote:
> From: Eli Cohen 
> 
> Add missing NIC_VPORT_CHANGE event.
> 
> Signed-off-by: Eli Cohen 
> Signed-off-by: Saeed Mahameed 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index 0ed8e90ba54f..23048247d827 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -157,6 +157,8 @@ static const char *eqe_type_str(u8 type)
>   return "MLX5_EVENT_TYPE_PAGE_FAULT";
>   case MLX5_EVENT_TYPE_PPS_EVENT:
>   return "MLX5_EVENT_TYPE_PPS_EVENT";
> + case MLX5_EVENT_TYPE_NIC_VPORT_CHANGE:
> + return "MLX5_EVENT_TYPE_NIC_VPORT_CHANGE";
>   case MLX5_EVENT_TYPE_FPGA_ERROR:
>   return "MLX5_EVENT_TYPE_FPGA_ERROR";
>   default:

Maybe one day convert this to use a macro
to reduce the case/string duplication.

Maybe what ath9k uses:
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 74 ++--
 1 file changed, 26 insertions(+), 48 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 0ed8e90ba54f..3e79de07c3ee 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -110,55 +110,33 @@ static struct mlx5_eqe *next_eqe_sw(struct mlx5_eq *eq)
 
 static const char *eqe_type_str(u8 type)
 {
+#define case_rtn_string(val) case val: return #val
switch (type) {
-   case MLX5_EVENT_TYPE_COMP:
-   return "MLX5_EVENT_TYPE_COMP";
-   case MLX5_EVENT_TYPE_PATH_MIG:
-   return "MLX5_EVENT_TYPE_PATH_MIG";
-   case MLX5_EVENT_TYPE_COMM_EST:
-   return "MLX5_EVENT_TYPE_COMM_EST";
-   case MLX5_EVENT_TYPE_SQ_DRAINED:
-   return "MLX5_EVENT_TYPE_SQ_DRAINED";
-   case MLX5_EVENT_TYPE_SRQ_LAST_WQE:
-   return "MLX5_EVENT_TYPE_SRQ_LAST_WQE";
-   case MLX5_EVENT_TYPE_SRQ_RQ_LIMIT:
-   return "MLX5_EVENT_TYPE_SRQ_RQ_LIMIT";
-   case MLX5_EVENT_TYPE_CQ_ERROR:
-   return "MLX5_EVENT_TYPE_CQ_ERROR";
-   case MLX5_EVENT_TYPE_WQ_CATAS_ERROR:
-   return "MLX5_EVENT_TYPE_WQ_CATAS_ERROR";
-   case MLX5_EVENT_TYPE_PATH_MIG_FAILED:
-   return "MLX5_EVENT_TYPE_PATH_MIG_FAILED";
-   case MLX5_EVENT_TYPE_WQ_INVAL_REQ_ERROR:
-   return "MLX5_EVENT_TYPE_WQ_INVAL_REQ_ERROR";
-   case MLX5_EVENT_TYPE_WQ_ACCESS_ERROR:
-   return "MLX5_EVENT_TYPE_WQ_ACCESS_ERROR";
-   case MLX5_EVENT_TYPE_SRQ_CATAS_ERROR:
-   return "MLX5_EVENT_TYPE_SRQ_CATAS_ERROR";
-   case MLX5_EVENT_TYPE_INTERNAL_ERROR:
-   return "MLX5_EVENT_TYPE_INTERNAL_ERROR";
-   case MLX5_EVENT_TYPE_PORT_CHANGE:
-   return "MLX5_EVENT_TYPE_PORT_CHANGE";
-   case MLX5_EVENT_TYPE_GPIO_EVENT:
-   return "MLX5_EVENT_TYPE_GPIO_EVENT";
-   case MLX5_EVENT_TYPE_PORT_MODULE_EVENT:
-   return "MLX5_EVENT_TYPE_PORT_MODULE_EVENT";
-   case MLX5_EVENT_TYPE_REMOTE_CONFIG:
-   return "MLX5_EVENT_TYPE_REMOTE_CONFIG";
-   case MLX5_EVENT_TYPE_DB_BF_CONGESTION:
-   return "MLX5_EVENT_TYPE_DB_BF_CONGESTION";
-   case MLX5_EVENT_TYPE_STALL_EVENT:
-   return "MLX5_EVENT_TYPE_STALL_EVENT";
-   case MLX5_EVENT_TYPE_CMD:
-   return "MLX5_EVENT_TYPE_CMD";
-   case MLX5_EVENT_TYPE_PAGE_REQUEST:
-   return "MLX5_EVENT_TYPE_PAGE_REQUEST";
-   case MLX5_EVENT_TYPE_PAGE_FAULT:
-   return "MLX5_EVENT_TYPE_PAGE_FAULT";
-   case MLX5_EVENT_TYPE_PPS_EVENT:
-   return "MLX5_EVENT_TYPE_PPS_EVENT";
-   case MLX5_EVENT_TYPE_FPGA_ERROR:
-   return "MLX5_EVENT_TYPE_FPGA_ERROR";
+   case_rtn_string(MLX5_EVENT_TYPE_COMP);
+   case_rtn_string(MLX5_EVENT_TYPE_PATH_MIG);
+   case_rtn_string(MLX5_EVENT_TYPE_COMM_EST);
+   case_rtn_string(MLX5_EVENT_TYPE_SQ_DRAINED);
+   case_rtn_string(MLX5_EVENT_TYPE_SRQ_LAST_WQE);
+   case_rtn_string(MLX5_EVENT_TYPE_SRQ_RQ_LIMIT);
+   case_rtn_string(MLX5_EVENT_TYPE_CQ_ERROR);
+   case_rtn_string(MLX5_EVENT_TYPE_WQ_CATAS_ERROR);
+   case_rtn_string(MLX5_EVENT_TYPE_PATH_MIG_FAILED);
+   case_rtn_string(MLX5_EVENT_TYPE_WQ_INVAL_REQ_ERROR);
+   case_rtn_string(MLX5_EVENT_TYPE_WQ_ACCESS_ERROR);
+   case_rtn_string(MLX5_EVENT_TYPE_SRQ_CATAS_ERROR);
+   case_rtn_string(MLX5_EVENT_TYPE_INTERNAL_ERROR);
+   case_rtn_string(MLX5_EVENT_TYPE_PORT_CHANGE);
+   case_rtn_string(MLX5_EVENT_TYPE_GPIO_EVENT);
+   case_rtn_string(MLX5_EVENT_TYPE_PORT_MODULE_EVENT);
+   case_rtn_string(MLX5_EVENT_TYPE_REMOTE_CONFIG);
+   

[PATCH net-next repost] nfp: add VLAN filtering support

2017-06-15 Thread Jakub Kicinski
From: Pablo Cascón 

Add general use per-vNIC mailbox area and use it for VLAN filtering
support.  Initially proto is hardcoded to 802.1q.

Signed-off-by: Pablo Cascón 
Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 74 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h  | 27 
 2 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 97c4e53e1e7a..08446465c054 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -280,6 +280,30 @@ int nfp_net_reconfig(struct nfp_net *nn, u32 update)
return ret;
 }
 
+/**
+ * nfp_net_reconfig_mbox() - Reconfigure the firmware via the mailbox
+ * @nn:NFP Net device to reconfigure
+ * @mbox_cmd:  The value for the mailbox command
+ *
+ * Helper function for mailbox updates
+ *
+ * Return: Negative errno on error, 0 on success
+ */
+static int nfp_net_reconfig_mbox(struct nfp_net *nn, u32 mbox_cmd)
+{
+   int ret;
+
+   nn_writeq(nn, NFP_NET_CFG_MBOX_CMD, mbox_cmd);
+
+   ret = nfp_net_reconfig(nn, NFP_NET_CFG_UPDATE_MBOX);
+   if (ret) {
+   nn_err(nn, "Mailbox update error\n");
+   return ret;
+   }
+
+   return -nn_readl(nn, NFP_NET_CFG_MBOX_RET);
+}
+
 /* Interrupt configuration and handling
  */
 
@@ -2960,6 +2984,40 @@ static int nfp_net_change_mtu(struct net_device *netdev, 
int new_mtu)
return nfp_net_ring_reconfig(nn, dp, NULL);
 }
 
+static int
+nfp_net_vlan_rx_add_vid(struct net_device *netdev, __be16 proto, u16 vid)
+{
+   struct nfp_net *nn = netdev_priv(netdev);
+
+   /* Priority tagged packets with vlan id 0 are processed by the
+* NFP as untagged packets
+*/
+   if (!vid)
+   return 0;
+
+   nn_writew(nn, NFP_NET_CFG_VLAN_FILTER_VID, vid);
+   nn_writew(nn, NFP_NET_CFG_VLAN_FILTER_PROTO, ETH_P_8021Q);
+
+   return nfp_net_reconfig_mbox(nn, NFP_NET_CFG_MBOX_CMD_CTAG_FILTER_ADD);
+}
+
+static int
+nfp_net_vlan_rx_kill_vid(struct net_device *netdev, __be16 proto, u16 vid)
+{
+   struct nfp_net *nn = netdev_priv(netdev);
+
+   /* Priority tagged packets with vlan id 0 are processed by the
+* NFP as untagged packets
+*/
+   if (!vid)
+   return 0;
+
+   nn_writew(nn, NFP_NET_CFG_VLAN_FILTER_VID, vid);
+   nn_writew(nn, NFP_NET_CFG_VLAN_FILTER_PROTO, ETH_P_8021Q);
+
+   return nfp_net_reconfig_mbox(nn, NFP_NET_CFG_MBOX_CMD_CTAG_FILTER_KILL);
+}
+
 static void nfp_net_stat64(struct net_device *netdev,
   struct rtnl_link_stats64 *stats)
 {
@@ -3053,6 +3111,13 @@ static int nfp_net_set_features(struct net_device 
*netdev,
new_ctrl &= ~NFP_NET_CFG_CTRL_TXVLAN;
}
 
+   if (changed & NETIF_F_HW_VLAN_CTAG_FILTER) {
+   if (features & NETIF_F_HW_VLAN_CTAG_FILTER)
+   new_ctrl |= NFP_NET_CFG_CTRL_CTAG_FILTER;
+   else
+   new_ctrl &= ~NFP_NET_CFG_CTRL_CTAG_FILTER;
+   }
+
if (changed & NETIF_F_SG) {
if (features & NETIF_F_SG)
new_ctrl |= NFP_NET_CFG_CTRL_GATHER;
@@ -3295,6 +3360,8 @@ const struct net_device_ops nfp_net_netdev_ops = {
.ndo_stop   = nfp_net_netdev_close,
.ndo_start_xmit = nfp_net_tx,
.ndo_get_stats64= nfp_net_stat64,
+   .ndo_vlan_rx_add_vid= nfp_net_vlan_rx_add_vid,
+   .ndo_vlan_rx_kill_vid   = nfp_net_vlan_rx_kill_vid,
.ndo_setup_tc   = nfp_net_setup_tc,
.ndo_tx_timeout = nfp_net_tx_timeout,
.ndo_set_rx_mode= nfp_net_set_rx_mode,
@@ -3322,7 +3389,7 @@ void nfp_net_info(struct nfp_net *nn)
nn->fw_ver.resv, nn->fw_ver.class,
nn->fw_ver.major, nn->fw_ver.minor,
nn->max_mtu);
-   nn_info(nn, "CAP: %#x %s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+   nn_info(nn, "CAP: %#x %s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
nn->cap,
nn->cap & NFP_NET_CFG_CTRL_PROMISC  ? "PROMISC "  : "",
nn->cap & NFP_NET_CFG_CTRL_L2BC ? "L2BCFILT " : "",
@@ -3337,6 +3404,7 @@ void nfp_net_info(struct nfp_net *nn)
nn->cap & NFP_NET_CFG_CTRL_LSO2 ? "TSO2 " : "",
nn->cap & NFP_NET_CFG_CTRL_RSS  ? "RSS1 " : "",
nn->cap & NFP_NET_CFG_CTRL_RSS2 ? "RSS2 " : "",
+   nn->cap & NFP_NET_CFG_CTRL_CTAG_FILTER ? "CTAG_FILTER " : "",
nn->cap & NFP_NET_CFG_CTRL_L2SWITCH ? "L2SWITCH " : "",
nn->cap & NFP_NET_CFG_CTRL_MSIXAUTO ? "AUTOMASK " : "",
nn->cap & 

Re: [PATCH] skbuff: make skb_put_zero() return void

2017-06-15 Thread Joe Perches
On Fri, 2017-06-16 at 00:17 +0200, Johannes Berg wrote:
> On Wed, 2017-06-14 at 14:18 -0700, Joe Perches wrote:
> > On Wed, 2017-06-14 at 22:40 +0200, Johannes Berg wrote:
> > > On Wed, 2017-06-14 at 13:36 -0700, Joe Perches wrote:
> > > > 
> > > > Given you are adding a lot of these, it might be better
> > > > to add an exported function that duplicates most of
> > > > skb_put with a memset at the end.
> > > 
> > > Yeah, could be done. I'm not sure why you'd want to duplicate it
> > > rather
> > > than call it though? To make it about as fast?
> > 
> > Yeah, that and reduced stack use.
> > 
> > Dunno how performance sensitive these uses really are
> > but it seems some might be for slow cpu wireless APs in
> > both the rx and tx paths.
> 
> I haven't really checked now, but the wireless (mac80211) ones I saw
> weren't in the data TX/RX, only for management SKBs which are pretty
> much a slowpath.
> 
> Anyway, I guess you know how to propose a patch with this :-)

I'll wait as I don't want to cause patch conflicts.

> However, I think in that case there should be something like
> skb_pull_inline, so that the skb_put code here isn't all copied around,
> but just lives in a single place that gets inlined into skb_put() and
> skb_put_zero().

Seems sensible.



Re: [PATCH] skbuff: make skb_put_zero() return void

2017-06-15 Thread Joe Perches
On Fri, 2017-06-16 at 00:23 +0200, Johannes Berg wrote:
> On Thu, 2017-06-15 at 15:17 -0700, Joe Perches wrote:
> 
> > Here's a script that does the conversion.
> > 
> > $ /usr/bin/git grep -P --name-only
> > "\*\s*skb_put\s*\(\s*([\w\.\[\]\>\-]+)\s*,\s*1\s*\)\s*=\s*([^;]+);" |
> > \
> >   xargs perl -p -i -e 's/\*\s*skb_put\s*\(\s*([\w\.\[\]\>\-
> > ]+)\s*,\s*1\s*\)\s*=\s*([^;]+);/skb_put_char(\1, \2);/'
> 
> Btw, this is incomplete - you have "\*\s*" at the beginning, but there
> are cases like
> 
>  *(skb_put(skb, 1)) = c;
> 
> where you have extra parentheses. By just adding them to the spatch, it
> finds both cases trivially.
> 
> I'm much more comfortable using spatch to do things like this.

Knock your self out.
Whatever floats your boat.
Have at it.
Go get 'em.

etc...

There are also some uses like:

memcpy(skb_put(h5->rx_skb, 1), byte, 1);

that could also be converted.

cheers, Joe


Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)

2017-06-15 Thread Tobias Diedrich
Yeah, this is basically mostly copy-pasted from the sboot code,
would need some cleaning up.
I've been playing more a little with other bits of the hardware,
writing some test fw from scratch, mostly without using the builtin
rom (except for interrupts).

Oleksij Rempel wrote:
> Am 08.06.2017 um 00:39 schrieb Tobias Diedrich:
> > Oleksij Rempel wrote:
> >> Am 07.06.2017 um 02:12 schrieb Tobias Diedrich:
> >>> Oleksij Rempel wrote:
>  Yes, this is "normal" problem. The firmware has no error handler for PCI
>  bus related exceptions. So if we filed to read PCI bus first time, we
>  have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot
>  and provide an kernel "firmware panic!" message.
>  Every one who can or will to fix this, is welcome.
> 
> > *
> > Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic!
> > exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038.
> >>> [...]
> >>>
>  memdmp 50ae78 50ae88
> >>>
> >>> 50ae78: 6c10 0412 6aa2 0c02 0088 20c0 2008 1940  l...j..@
> >>>
> >>> [...copy to bin...]
> >>> $ bin/objdump -b binary -m xtensa  -D /tmp/memdump.bin 
> >>> [..]
> >>>0:   6c1004  entry   a1, 32
> >>>3:   126aa2  l32ra2, 0xfffdaa8c
> >>>6:   0c0200  memw
> >>>9:   8820l32i.n  a8, a2, 0  <--Exception cause 
> >>> PC still points at load
> >>>b:   c020movi.n  a2, 0
> >>>d:   081940  extui   a9, a8, 1, 1
> >>>
> >>> Judging from that it should be fairly simple to at least implement
> >>> some sort of retry, possible after triggering a PCIe link retrain?
> >>
> >> I assume, yes.
> >>
> >>> There are some related PCIe root complex registers that may point to
> >>> what exactly failed if they were dumped.
> >>>
> >>> The root complex registers live at 0x0004 and I think match the
> >>> registers described for the root complex in the AR9344 datasheet.
> >>
> >> Suddenly I don't have ar7010 docs to tell..
> >>
> >>> PCIE_INT_MASK would map to 0x40050 and has a bit for SYS_ERR:
> >>> "A system error. The RC Core asserts CFG_SYS_ERR_RC if any device in
> >>> the hierarchy reports any of the following errors and the associated
> >>> enable bit is set in the Root Control register: ERR_COR, ERR_FATAL,
> >>> ERR_NONFATAL."
> >>>
> >>> AFAICS link retrain can be done by setting bit3 (INIT_RST,
> >>> "Application request to initiate a training reset") in
> >>> PCIE_APP (0x4).
> >>>
> >>> See sboot/magpie_1_1/sboot/cmnos/eeprom/src/cmnos_eeprom.c (which
> >>> flips some bits in the RC to enable the PCIe bus for reading the
> >>> EEPROM).
> >>>
> >>> The root complex pci configuration space is at 0x2 which could
> >>> have further error details:
>  memdmp 2 20200
> >>>
> >>> 02: a02a 168c 0010 0006  0001 0001   .*..
> >>> 020010:          
> >>> 020020:          
> >>> 020030:    0040    01ff  ...@
> >>> 020040: 5bc3 5001        [.P.
> >>> 020050: 0080 7005        ..p.
> >>> 020060:          
> >>> 020070: 0042 0010  8701  2010 0013 4411  .BD.
> >>> 020080: 3011    00c0 03c0    0...
> >>> 020090:    0010      
> >>> 0200a0:          
> >>> 0200b0:          
> >>> 0200c0:          
> >>> 0200d0:          
> >>> 0200e0:          
> >>> 0200f0:          
> >>> 020100: 1401 0001     0006 2030  ...0
> >>> 020110:    2000  00a0    
> >>> 020120:          
> >>> 020130:          
> >>> 020140: 0001 0002        
> >>> 020150:   8000 00ff      
> >>> 020160:          
> >>> 020170:          
> >>> 020180:          
> >>> 020190:          
> >>> 0201a0:          
> >>> 0201b0:          
> >>> 0201c0:          
> >>> 0201d0:          
> >>> 0201e0:    

[PATCH RFC 1/3] arm64: Gate inclusion of asm/sysreg.h by __EMITTING_BPF__

2017-06-15 Thread David Daney
Compilation to eBPF chokes on the inline asm in asm/sysreg.h, so don't
include it when compiling to a BPF target.

Signed-off-by: David Daney 
---
 arch/arm64/include/asm/sysreg.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 15c142ce991c..faa8f853e369 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -20,6 +20,8 @@
 #ifndef __ASM_SYSREG_H
 #define __ASM_SYSREG_H
 
+#ifndef __EMITTING_BPF__
+
 #include 
 
 /*
@@ -502,5 +504,5 @@ static inline void config_sctlr_el1(u32 clear, u32 set)
 }
 
 #endif
-
+#endif  /* __EMITTING_BPF__ */
 #endif /* __ASM_SYSREG_H */
-- 
2.11.0



[PATCH RFC 2/3] samples/bpf: Add define __EMITTING_BPF__ when building BPF

2017-06-15 Thread David Daney
... this allows gating of inline assembly code that causes llvm to
fail when emitting BPF.

Signed-off-by: David Daney 
---
 samples/bpf/Makefile | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index a0561dc762fe..4979e6b56662 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -193,12 +193,12 @@ $(src)/*.c: verify_target_bpf
 
 $(obj)/tracex5_kern.o: $(obj)/syscall_nrs.h
 
-# asm/sysreg.h - inline assembly used by it is incompatible with llvm.
-# But, there is no easy way to fix it, so just exclude it since it is
-# useless for BPF samples.
+# __EMITTING_BPF__ used to exclude inline assembly, which cannot be
+# emitted in BPF code.
 $(obj)/%.o: $(src)/%.c
$(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
-   -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value 
-Wno-pointer-sign \
+   -D__KERNEL__ -D__EMITTING_BPF__ \
+   -Wno-unused-value -Wno-pointer-sign \
-Wno-compare-distinct-pointer-types \
-Wno-gnu-variable-sized-type-not-at-end \
-Wno-address-of-packed-member -Wno-tautological-compare \
-- 
2.11.0



[PATCH RFC 0/3] bpf/arm64/mips: Avoid inline asm in BPF

2017-06-15 Thread David Daney
To build samples/bpf on MIPS we need to avoid some inline asm that
causes llvm to fail.

Looking at the code, it seems that arm64 had the same problem and
avoided it by defining the header guard symbol.  This approach does
not scale, so I invented a preprocessor define to identify the case of
building with a BPF target that can be used instead.

It is an RFC at this point as I haven't yet tested the arm64 change,
and I wanted to see if others think this is the proper way to handle
avoidance of inline asm.

David Daney (3):
  arm64: Gate inclusion of asm/sysreg.h by __EMITTING_BPF__
  samples/bpf: Add define __EMITTING_BPF__ when building BPF
  MIPS: Include file changes to enable building BPF code with llvm

 arch/arm64/include/asm/sysreg.h   | 4 +++-
 arch/mips/Makefile| 1 +
 arch/mips/cavium-octeon/Platform  | 3 +++
 arch/mips/include/asm/checksum.h  | 2 +-
 arch/mips/include/uapi/asm/swab.h | 2 +-
 samples/bpf/Makefile  | 8 
 6 files changed, 13 insertions(+), 7 deletions(-)

-- 
2.11.0



[PATCH RFC 3/3] MIPS: Include file changes to enable building BPF code with llvm

2017-06-15 Thread David Daney
When building for the eBPF target archecture. Inline asm cannot be
used as MIPS instructions are fundamentally incompatible with eBPF
bytecode.  The preprocessor symbol __EMITTING_BPF__ is used to gate
the inclusion of inline asm in constructs used the by the BPF
programs.

Also make the Makefile symbol LINUXINCLUDE contain the
asm/mach-MACHINE directory so that the BPF compilation process can
pull in the necessary include files.

Signed-off-by: David Daney 
---
 arch/mips/Makefile| 1 +
 arch/mips/cavium-octeon/Platform  | 3 +++
 arch/mips/include/asm/checksum.h  | 2 +-
 arch/mips/include/uapi/asm/swab.h | 2 +-
 4 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/mips/Makefile b/arch/mips/Makefile
index 02a1787c888c..ca968415597f 100644
--- a/arch/mips/Makefile
+++ b/arch/mips/Makefile
@@ -247,6 +247,7 @@ entry-y = 0x$(shell $(NM) 
vmlinux 2>/dev/null \
| grep "\bkernel_entry\b" | cut -f1 -d 
\ )
 
 cflags-y   += 
-I$(srctree)/arch/mips/include/asm/mach-generic
+LINUXINCLUDE   += 
-I$(srctree)/arch/mips/include/asm/mach-generic
 drivers-$(CONFIG_PCI)  += arch/mips/pci/
 
 #
diff --git a/arch/mips/cavium-octeon/Platform b/arch/mips/cavium-octeon/Platform
index 45be853700e6..9ef3e4074099 100644
--- a/arch/mips/cavium-octeon/Platform
+++ b/arch/mips/cavium-octeon/Platform
@@ -4,4 +4,7 @@
 platform-$(CONFIG_CAVIUM_OCTEON_SOC)   += cavium-octeon/
 cflags-$(CONFIG_CAVIUM_OCTEON_SOC) +=  \
-I$(srctree)/arch/mips/include/asm/mach-cavium-octeon
+ifdef CONFIG_CAVIUM_OCTEON_SOC
+LINUXINCLUDE   += -I$(srctree)/arch/mips/include/asm/mach-cavium-octeon
+endif
 load-$(CONFIG_CAVIUM_OCTEON_SOC)   += 0x8110
diff --git a/arch/mips/include/asm/checksum.h b/arch/mips/include/asm/checksum.h
index 77cad232a1c6..f8fff2ced216 100644
--- a/arch/mips/include/asm/checksum.h
+++ b/arch/mips/include/asm/checksum.h
@@ -12,7 +12,7 @@
 #ifndef _ASM_CHECKSUM_H
 #define _ASM_CHECKSUM_H
 
-#ifdef CONFIG_GENERIC_CSUM
+#if defined(CONFIG_GENERIC_CSUM) || defined(__EMITTING_BPF__)
 #include 
 #else
 
diff --git a/arch/mips/include/uapi/asm/swab.h 
b/arch/mips/include/uapi/asm/swab.h
index 23cd9b118c9e..42ed70015c70 100644
--- a/arch/mips/include/uapi/asm/swab.h
+++ b/arch/mips/include/uapi/asm/swab.h
@@ -13,7 +13,7 @@
 
 #define __SWAB_64_THRU_32__
 
-#if !defined(__mips16) &&  \
+#if !defined(__mips16) && !defined(__EMITTING_BPF__) &&
\
((defined(__mips_isa_rev) && (__mips_isa_rev >= 2)) ||  \
 defined(_MIPS_ARCH_LOONGSON3A))
 
-- 
2.11.0



Re: [PATCH] skbuff: make skb_put_zero() return void

2017-06-15 Thread Joe Perches
On Fri, 2017-06-16 at 00:21 +0200, Johannes Berg wrote:
> On Thu, 2017-06-15 at 15:17 -0700, Joe Perches wrote:
> 
> > I suggest changing those to skb_put_char(skb, char)
> 
> That might be something to think of, but you can't really know for sure
> that they're not using len > 1 and don't yet care about the other bytes
> or something. That'd probably be another bug, but ... dunno
> 
> And anyway, I think
> 
> *(u8 *)skb_put(skb, 1) = c;
> 
> isn't really that bad. Obviously that could be converted further to
> skb_put_char(), using a simple spatch:
> 
> @@
> expression SKB, C;
> @@
> - *(u8 *)skb_put(SKB, 1) = C;
> + skb_put_char(SKB, C);
> 
> 
> > Here's a script that does the conversion.
> > 
> > $ /usr/bin/git grep -P --name-only
> > "\*\s*skb_put\s*\(\s*([\w\.\[\]\>\-]+)\s*,\s*1\s*\)\s*=\s*([^;]+);" |
> > \
> >   xargs perl -p -i -e 's/\*\s*skb_put\s*\(\s*([\w\.\[\]\>\-
> > ]+)\s*,\s*1\s*\)\s*=\s*([^;]+);/skb_put_char(\1, \2);/'
> 
> Uh, I think you're using the wrong tool for the job :-)

I'm familiar with both.

It depends on how much you want to wait.

The thing I wrote finished in about 2 seconds
on my little laptop.

cheers, Joe


Re: [PATCH] skbuff: make skb_put_zero() return void

2017-06-15 Thread Johannes Berg
On Thu, 2017-06-15 at 15:17 -0700, Joe Perches wrote:

> Here's a script that does the conversion.
> 
> $ /usr/bin/git grep -P --name-only
> "\*\s*skb_put\s*\(\s*([\w\.\[\]\>\-]+)\s*,\s*1\s*\)\s*=\s*([^;]+);" |
> \
>   xargs perl -p -i -e 's/\*\s*skb_put\s*\(\s*([\w\.\[\]\>\-
> ]+)\s*,\s*1\s*\)\s*=\s*([^;]+);/skb_put_char(\1, \2);/'

Btw, this is incomplete - you have "\*\s*" at the beginning, but there
are cases like

 *(skb_put(skb, 1)) = c;

where you have extra parentheses. By just adding them to the spatch, it
finds both cases trivially.

I'm much more comfortable using spatch to do things like this.

johannes


Re: [PATCH] skbuff: make skb_put_zero() return void

2017-06-15 Thread Johannes Berg
On Thu, 2017-06-15 at 15:17 -0700, Joe Perches wrote:

> I suggest changing those to skb_put_char(skb, char)

That might be something to think of, but you can't really know for sure
that they're not using len > 1 and don't yet care about the other bytes
or something. That'd probably be another bug, but ... dunno

And anyway, I think

*(u8 *)skb_put(skb, 1) = c;

isn't really that bad. Obviously that could be converted further to
skb_put_char(), using a simple spatch:

@@
expression SKB, C;
@@
- *(u8 *)skb_put(SKB, 1) = C;
+ skb_put_char(SKB, C);


> Here's a script that does the conversion.
> 
> $ /usr/bin/git grep -P --name-only
> "\*\s*skb_put\s*\(\s*([\w\.\[\]\>\-]+)\s*,\s*1\s*\)\s*=\s*([^;]+);" |
> \
>   xargs perl -p -i -e 's/\*\s*skb_put\s*\(\s*([\w\.\[\]\>\-
> ]+)\s*,\s*1\s*\)\s*=\s*([^;]+);/skb_put_char(\1, \2);/'

Uh, I think you're using the wrong tool for the job :-)

johannes


Re: [PATCH] skbuff: make skb_put_zero() return void

2017-06-15 Thread Johannes Berg
On Wed, 2017-06-14 at 14:18 -0700, Joe Perches wrote:
> On Wed, 2017-06-14 at 22:40 +0200, Johannes Berg wrote:
> > On Wed, 2017-06-14 at 13:36 -0700, Joe Perches wrote:
> > > 
> > > Given you are adding a lot of these, it might be better
> > > to add an exported function that duplicates most of
> > > skb_put with a memset at the end.
> > 
> > Yeah, could be done. I'm not sure why you'd want to duplicate it
> > rather
> > than call it though? To make it about as fast?
> 
> Yeah, that and reduced stack use.
> 
> Dunno how performance sensitive these uses really are
> but it seems some might be for slow cpu wireless APs in
> both the rx and tx paths.

I haven't really checked now, but the wireless (mac80211) ones I saw
weren't in the data TX/RX, only for management SKBs which are pretty
much a slowpath.

Anyway, I guess you know how to propose a patch with this :-)

However, I think in that case there should be something like
skb_pull_inline, so that the skb_put code here isn't all copied around,
but just lives in a single place that gets inlined into skb_put() and
skb_put_zero().

johannes


Re: [PATCH] skbuff: make skb_put_zero() return void

2017-06-15 Thread Joe Perches
On Thu, 2017-06-15 at 23:28 +0200, Johannes Berg wrote:
> On Thu, 2017-06-15 at 17:26 -0400, David Miller wrote:
> > 
> > > *skb_put(skb, 1) = 'x';
> > >  
> > > Seems pretty unlikely we have that, and in any case the compiler
> > 
> > would
> > > warn (error?) there if skb_put() becomes void.
> > 
> > Actually I am pretty sure I've seen a pattern like that somewhere. :-
> > )
> 
> Yeah, there are actually a ton of them, and oddly enough my spatch is
> failing to catch _one_ of them?? Still refining it :)

I suggest changing those to skb_put_char(skb, char)
in a first pass and then doing the other bits later.

Here's a script that does the conversion.

$ /usr/bin/git grep -P --name-only 
"\*\s*skb_put\s*\(\s*([\w\.\[\]\>\-]+)\s*,\s*1\s*\)\s*=\s*([^;]+);" | \
  xargs perl -p -i -e 
's/\*\s*skb_put\s*\(\s*([\w\.\[\]\>\-]+)\s*,\s*1\s*\)\s*=\s*([^;]+);/skb_put_char(\1,
 \2);/'




[RFC 3/3] networking: make skb_push & __skb_push return void pointers

2017-06-15 Thread Johannes Berg
From: Johannes Berg 

It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions return void * and remove all the casts across
the tree, adding a (u8 *) cast only where the unsigned char pointer
was used directly, all done with the following spatch:

@@
expression SKB, LEN;
typedef u8;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
@@
- *(fn(SKB, LEN))
+ *(u8 *)fn(SKB, LEN)

@@
expression E, SKB, LEN;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
type T;
@@
- E = ((T *)(fn(SKB, LEN)))
+ E = fn(SKB, LEN)

@@
expression SKB, LEN;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
@@
- fn(SKB, LEN)[0]
+ *(u8 *)fn(SKB, LEN)

Note that the last part there converts from push(...)[0] to the
more idiomatic *(u8 *)push(...).

Signed-off-by: Johannes Berg 
---
 drivers/atm/solos-pci.c|  2 +-
 drivers/bluetooth/bpa10x.c |  2 +-
 drivers/firewire/net.c |  8 +++---
 drivers/infiniband/hw/cxgb3/iwch_cm.c  |  6 ++---
 drivers/infiniband/hw/cxgb4/cm.c   |  2 +-
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |  4 +--
 drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c   |  2 +-
 drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c  |  2 +-
 drivers/isdn/i4l/isdn_ppp.c|  2 +-
 drivers/net/arcnet/arc-rawmode.c   |  2 +-
 drivers/net/arcnet/capmode.c   |  2 +-
 drivers/net/arcnet/rfc1051.c   |  2 +-
 drivers/net/arcnet/rfc1201.c   |  2 +-
 drivers/net/ethernet/broadcom/bcmsysport.c |  2 +-
 drivers/net/ethernet/chelsio/cxgb/sge.c|  4 +--
 drivers/net/ethernet/freescale/gianfar.c   |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_selftest.c  |  2 +-
 drivers/net/ethernet/sun/niu.c |  2 +-
 drivers/net/ethernet/toshiba/ps3_gelic_net.c   |  2 +-
 drivers/net/geneve.c   |  3 +--
 drivers/net/gtp.c  |  4 +--
 drivers/net/hippi/rrunner.c|  2 +-
 drivers/net/macsec.c   |  2 +-
 drivers/net/ppp/ppp_async.c|  2 +-
 drivers/net/ppp/ppp_generic.c  |  6 ++---
 drivers/net/ppp/ppp_synctty.c  |  2 +-
 drivers/net/ppp/pptp.c |  2 +-
 drivers/net/usb/gl620a.c   |  2 +-
 drivers/net/usb/int51x1.c  |  2 +-
 drivers/net/usb/kaweth.c   |  2 +-
 drivers/net/usb/lg-vl600.c |  2 +-
 drivers/net/usb/net1080.c  |  2 +-
 drivers/net/usb/qmi_wwan.c |  2 +-
 drivers/net/usb/rndis_host.c   |  2 +-
 drivers/net/vrf.c  |  2 +-
 drivers/net/vxlan.c|  2 +-
 drivers/net/wimax/i2400m/netdev.c  |  2 +-
 drivers/net/wireless/admtek/adm8211.c  |  2 +-
 drivers/net/wireless/ath/ar5523/ar5523.c   |  4 +--
 drivers/net/wireless/ath/ath6kl/htc_pipe.c |  3 +--
 drivers/net/wireless/ath/ath9k/hif_usb.c   |  2 +-
 drivers/net/wireless/ath/ath9k/htc_hst.c   |  3 +--
 drivers/net/wireless/ath/ath9k/wmi.c   |  2 +-
 drivers/net/wireless/ath/carl9170/tx.c |  2 +-
 drivers/net/wireless/ath/wil6210/txrx.c|  2 +-
 .../net/wireless/intersil/hostap/hostap_80211_rx.c |  8 +++---
 drivers/net/wireless/intersil/orinoco/main.c   |  7 +++--
 drivers/net/wireless/intersil/p54/txrx.c   |  4 +--
 drivers/net/wireless/intersil/prism54/islpci_eth.c |  5 +---
 drivers/net/wireless/mac80211_hwsim.c  |  4 +--
 drivers/net/wireless/marvell/libertas/rx.c |  2 +-
 drivers/net/wireless/marvell/libertas_tf/main.c|  2 +-
 drivers/net/wireless/mediatek/mt7601u/tx.c |  2 +-
 drivers/net/wireless/realtek/rtl818x/rtl8187/dev.c |  6 ++---
 .../net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c  |  2 +-
 .../net/wireless/realtek/rtlwifi/rtl8192cu/trx.c   |  2 +-
 drivers/net/wireless/st/cw1200/txrx.c  |  2 +-
 drivers/net/wireless/ti/wl1251/tx.c|  3 +--
 drivers/net/wireless/ti/wlcore/cmd.c   |  2 +-
 drivers/net/wireless/ti/wlcore/tx.c|  3 +--
 drivers/net/wireless/zydas/zd1211rw/zd_mac.c   |  3 +--
 drivers/nfc/fdp/i2c.c  |  4 +--
 drivers/nfc/microread/i2c.c|  2 +-
 drivers/nfc/microread/microread.c  |  4 +--
 drivers/nfc/nfcmrvl/main.c |  2 +-
 drivers/nfc/pn533/pn533.c 

[RFC 0/3] make skb accessors return void pointers

2017-06-15 Thread Johannes Berg
Hi,

After some more fun with spatch, I've come up with these three patches.

I couldn't figure out why spatch didn't convert one skb_put() place,
and there was one inside a macro it didn't find. Otherwise, it's
pretty much just spatch and changing the functions/prototypes.

I've compiled x86 allyesconfig with this and the skb_put_data() and
more skb_put_zero() conversions, but I'm going to wait for the 0-day
kbuild bot to tell me it succeeded on my branch (pushed all of this
to mac80211-next on the skb-access-cleanups branch) before I submit
all five patches properly.

There's, obviously, no way I'd even have attempted this before having
coccinelle :-)

johannes



[RFC 2/3] networking: make skb_pull & friends return void pointers

2017-06-15 Thread Johannes Berg
From: Johannes Berg 

It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions return void * and remove all the casts across
the tree, adding a (u8 *) cast only where the unsigned char pointer
was used directly, all done with the following spatch:

@@
expression SKB, LEN;
typedef u8;
identifier fn = {
skb_pull,
__skb_pull,
skb_pull_inline,
__pskb_pull_tail,
__pskb_pull,
pskb_pull
};
@@
- *(fn(SKB, LEN))
+ *(u8 *)fn(SKB, LEN)

@@
expression E, SKB, LEN;
identifier fn = {
skb_pull,
__skb_pull,
skb_pull_inline,
__pskb_pull_tail,
__pskb_pull,
pskb_pull
};
type T;
@@
- E = ((T *)(fn(SKB, LEN)))
+ E = fn(SKB, LEN)

Signed-off-by: Johannes Berg 
---
 drivers/bluetooth/hci_nokia.c  |  4 ++--
 drivers/isdn/i4l/isdn_ppp.c|  2 +-
 drivers/net/wan/hdlc_ppp.c |  2 +-
 drivers/nfc/nxp-nci/firmware.c |  3 +--
 drivers/scsi/fnic/fnic_fcs.c   |  2 +-
 drivers/scsi/qedf/qedf_main.c  |  2 +-
 include/linux/skbuff.h | 14 +++---
 net/bluetooth/a2mp.c   |  4 ++--
 net/core/skbuff.c  |  6 +++---
 net/ipv4/ipmr.c|  6 --
 net/ipv4/xfrm4_mode_beet.c |  3 +--
 net/ipv6/ip6mr.c   |  6 --
 net/ipv6/xfrm6_mode_beet.c |  2 +-
 13 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/drivers/bluetooth/hci_nokia.c b/drivers/bluetooth/hci_nokia.c
index c1b081725b2c..072a77a61e67 100644
--- a/drivers/bluetooth/hci_nokia.c
+++ b/drivers/bluetooth/hci_nokia.c
@@ -557,7 +557,7 @@ static int nokia_recv_negotiation_packet(struct hci_dev 
*hdev,
goto finish_neg;
}
 
-   evt = (struct hci_nokia_neg_evt *)skb_pull(skb, sizeof(*hdr));
+   evt = skb_pull(skb, sizeof(*hdr));
 
if (evt->ack != NOKIA_NEG_ACK) {
dev_err(dev, "Negotiation received: wrong reply");
@@ -595,7 +595,7 @@ static int nokia_recv_alive_packet(struct hci_dev *hdev, 
struct sk_buff *skb)
goto finish_alive;
}
 
-   pkt = (struct hci_nokia_alive_pkt *)skb_pull(skb, sizeof(*hdr));
+   pkt = skb_pull(skb, sizeof(*hdr));
 
if (pkt->mid != NOKIA_ALIVE_RESP) {
dev_err(dev, "Alive received: invalid response: 0x%02x!",
diff --git a/drivers/isdn/i4l/isdn_ppp.c b/drivers/isdn/i4l/isdn_ppp.c
index 9ce23cf3d7d2..e26cae9baf17 100644
--- a/drivers/isdn/i4l/isdn_ppp.c
+++ b/drivers/isdn/i4l/isdn_ppp.c
@@ -1509,7 +1509,7 @@ int isdn_ppp_autodial_filter(struct sk_buff *skb, 
isdn_net_local *lp)
 * temporarily remove part of the fake header stuck on
 * earlier.
 */
-   *skb_pull(skb, IPPP_MAX_HEADER - 4) = 1; /* indicate outbound */
+   *(u8 *)skb_pull(skb, IPPP_MAX_HEADER - 4) = 1; /* indicate outbound */
 
{
__be16 *p = (__be16 *)skb->data;
diff --git a/drivers/net/wan/hdlc_ppp.c b/drivers/net/wan/hdlc_ppp.c
index fa3460a0dbbe..0d2e00ece804 100644
--- a/drivers/net/wan/hdlc_ppp.c
+++ b/drivers/net/wan/hdlc_ppp.c
@@ -448,7 +448,7 @@ static int ppp_rx(struct sk_buff *skb)
/* Check HDLC header */
if (skb->len < sizeof(struct hdlc_header))
goto rx_error;
-   cp = (struct cp_header*)skb_pull(skb, sizeof(struct hdlc_header));
+   cp = skb_pull(skb, sizeof(struct hdlc_header));
if (hdr->address != HDLC_ADDR_ALLSTATIONS ||
hdr->control != HDLC_CTRL_UI)
goto rx_error;
diff --git a/drivers/nfc/nxp-nci/firmware.c b/drivers/nfc/nxp-nci/firmware.c
index 99ffee1dfd1e..e50c6f67bb39 100644
--- a/drivers/nfc/nxp-nci/firmware.c
+++ b/drivers/nfc/nxp-nci/firmware.c
@@ -311,8 +311,7 @@ void nxp_nci_fw_recv_frame(struct nci_dev *ndev, struct 
sk_buff *skb)
if (nxp_nci_fw_check_crc(skb) != 0x00)
fw_info->cmd_result = -EBADMSG;
else
-   fw_info->cmd_result = nxp_nci_fw_read_status(
-   *skb_pull(skb, NXP_NCI_FW_HDR_LEN));
+   fw_info->cmd_result = nxp_nci_fw_read_status(*(u8 
*)skb_pull(skb, NXP_NCI_FW_HDR_LEN));
kfree_skb(skb);
} else {
fw_info->cmd_result = -EIO;
diff --git a/drivers/scsi/fnic/fnic_fcs.c b/drivers/scsi/fnic/fnic_fcs.c
index 245dcd95e11f..e3b964b7235a 100644
--- a/drivers/scsi/fnic/fnic_fcs.c
+++ b/drivers/scsi/fnic/fnic_fcs.c
@@ -640,7 +640,7 @@ static inline int fnic_import_rq_eth_pkt(struct fnic *fnic, 
struct sk_buff *skb)
eh = (struct ethhdr *)skb->data;
if (eh->h_proto == htons(ETH_P_8021Q)) {
memmove((u8 *)eh + VLAN_HLEN, eh, ETH_ALEN * 2);
-   eh = (struct ethhdr *)skb_pull(skb, VLAN_HLEN);
+  

Re: [PATCH] Convert multiple netdev_info messages to netdev_dbg

2017-06-15 Thread Joe Perches
On Thu, 2017-06-15 at 19:14 +0100, Michael J Dilmore wrote:
> Multiple netdev_info messages clutter kernel output. Also add netdev_dbg for 
> packets per slave.
[]
> diff --git a/drivers/net/bonding/bond_options.c 
> b/drivers/net/bonding/bond_options.c
[]
> @@ -9,6 +9,8 @@
>   * (at your option) any later version.
>   */
>   
> +#define DEBUG 1

Is defining DEBUG really worthwhile.

As well, it's almost always just
#define DEBUG
without any level value unless the
level value is used in the code.

> +
>  #include 
>  #include 
>  #include 
> @@ -719,13 +721,13 @@ static int bond_option_mode_set(struct bonding *bond,
>   const struct bond_opt_value *newval)
>  {
>   if (!bond_mode_uses_arp(newval->value) && bond->params.arp_interval) {
> - netdev_info(bond->dev, "%s mode is incompatible with arp 
> monitoring, start mii monitoring\n",
> + netdev_dbg(bond->dev, "%s mode is incompatible with arp 
> monitoring, start mii monitoring\n",
>   newval->string);

Please realign any multiple line arguments to the
open parenthesis at the same time.

>   /* disable arp monitoring */
>   bond->params.arp_interval = 0;
>   /* set miimon to default value */
>   bond->params.miimon = BOND_DEFAULT_MIIMON;
> - netdev_info(bond->dev, "Setting MII monitoring interval to 
> %d\n",
> + netdev_dbg(bond->dev, "Setting MII monitoring interval to %d\n",
>   bond->params.miimon);

etc...



[net-next 03/15] net/mlx5: Avoid using multiple blank lines

2017-06-15 Thread Saeed Mahameed
From: Or Gerlitz 

Fixed bunch of this checkpatch complaint:

 CHECK: Please don't use multiple blank lines

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/alloc.c | 1 -
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c   | 2 --
 drivers/net/ethernet/mellanox/mlx5/core/debugfs.c   | 3 ---
 drivers/net/ethernet/mellanox/mlx5/core/en_common.c | 1 -
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 1 -
 drivers/net/ethernet/mellanox/mlx5/core/eq.c| 2 --
 drivers/net/ethernet/mellanox/mlx5/core/lag.c   | 2 --
 drivers/net/ethernet/mellanox/mlx5/core/main.c  | 1 -
 drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 1 -
 drivers/net/ethernet/mellanox/mlx5/core/qp.c| 1 -
 10 files changed, 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/alloc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
index 66bd213f35ce..3c95f7f53802 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
@@ -274,7 +274,6 @@ void mlx5_db_free(struct mlx5_core_dev *dev, struct mlx5_db 
*db)
 }
 EXPORT_SYMBOL_GPL(mlx5_db_free);
 
-
 void mlx5_fill_page_array(struct mlx5_buf *buf, __be64 *pas)
 {
u64 addr;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 46efaab9da46..e283095b69c3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -217,7 +217,6 @@ static void free_cmd(struct mlx5_cmd_work_ent *ent)
kfree(ent);
 }
 
-
 static int verify_signature(struct mlx5_cmd_work_ent *ent)
 {
struct mlx5_cmd_mailbox *next = ent->out->next;
@@ -1001,7 +1000,6 @@ static ssize_t dbg_write(struct file *filp, const char 
__user *buf,
return err ? err : count;
 }
 
-
 static const struct file_operations fops = {
.owner  = THIS_MODULE,
.open   = simple_open,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
index de40b6cfee95..7ecadb501743 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
@@ -168,7 +168,6 @@ static ssize_t average_read(struct file *filp, char __user 
*buf, size_t count,
return ret;
 }
 
-
 static ssize_t average_write(struct file *filp, const char __user *buf,
 size_t count, loff_t *pos)
 {
@@ -466,7 +465,6 @@ static ssize_t dbg_read(struct file *filp, char __user 
*buf, size_t count,
return -EINVAL;
}
 
-
if (is_str)
ret = snprintf(tbuf, sizeof(tbuf), "%s\n", (const char 
*)(unsigned long)field);
else
@@ -562,7 +560,6 @@ void mlx5_debug_qp_remove(struct mlx5_core_dev *dev, struct 
mlx5_core_qp *qp)
rem_res_tree(qp->dbg);
 }
 
-
 int mlx5_debug_eq_add(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
 {
int err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
index 46e56ec4c26f..ece3fb147e3e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
@@ -145,7 +145,6 @@ int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool 
enable_uc_lb)
int inlen;
void *in;
 
-
inlen = MLX5_ST_SZ_BYTES(modify_tir_in);
in = kvzalloc(inlen, GFP_KERNEL);
if (!in)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index 7acc4fba7ece..dfccb5305e9c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -170,7 +170,6 @@ static int __mlx5e_add_vlan_rule(struct mlx5e_priv *priv,
 
spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
 
-
switch (rule_type) {
case MLX5E_VLAN_RULE_TYPE_UNTAGGED:
rule_p = >fs.vlan.untagged_rule;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 23048247d827..43b279b01e07 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -677,7 +677,6 @@ int mlx5_eq_init(struct mlx5_core_dev *dev)
return err;
 }
 
-
 void mlx5_eq_cleanup(struct mlx5_core_dev *dev)
 {
mlx5_eq_debugfs_cleanup(dev);
@@ -689,7 +688,6 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
u64 async_event_mask = MLX5_ASYNC_EVENT_MASK;
int err;
 
-
if (MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_ETH &&
MLX5_CAP_GEN(dev, vport_group_manager) &&
mlx5_core_is_pf(dev))
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag.c 
b/drivers/net/ethernet/mellanox/mlx5/core/lag.c
index b5d5519542e8..b6993c4e4823 

[net-next 06/15] net/mlx5: Avoid space after casting

2017-06-15 Thread Saeed Mahameed
From: Or Gerlitz 

Fix checkpatch complaints on that:

 CHECK: No space is necessary after a cast

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 43b279b01e07..af51a5d2b912 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -191,7 +191,7 @@ static void eq_update_ci(struct mlx5_eq *eq, int arm)
 {
__be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
u32 val = (eq->cons_index & 0xff) | (eq->eqn << 24);
-   __raw_writel((__force u32) cpu_to_be32(val), addr);
+   __raw_writel((__force u32)cpu_to_be32(val), addr);
/* We still want ordering, just not swabbing, so add a barrier */
mb();
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 877395f34e89..b8030b5707a5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -1093,7 +1093,7 @@ int mlx5_devlink_eswitch_encap_mode_set(struct devlink 
*devlink, u8 encap)
if (err) {
esw_warn(esw->dev, "Failed re-creating fast FDB table, err 
%d\n", err);
esw->offloads.encap = !encap;
-   (void) esw_create_offloads_fast_fdb_table(esw);
+   (void)esw_create_offloads_fast_fdb_table(esw);
}
return err;
 }
-- 
2.11.0



[net-next 10/15] net/mlx5e: Rename physical symbol errors counter

2017-06-15 Thread Saeed Mahameed
From: Gal Pressman 

Rename rx_symbol_errors_phy to rx_pcs_symbol_err_phy, in order to
prevent confusion with rx_symbol_err_phy counter.

rx_pcs_symbol_err_phy counter counts the number of symbol errors that
were detected on the PCS (regardless of traffic) and weren't
corrected by FEC correction algorithm or that FEC algorithm was
not active on this interface.

rx_symbol_err_phy refers to errors on packet level (physical error
during a packet receive).

Fixes: 5db0a4f64c04 ("net/mlx5e: Expose physical layer statistical...")
Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
Cc: kernel-t...@fb.com
---
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index f81c3aa60b46..fda247587ff6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -268,7 +268,7 @@ static const struct counter_desc pport_2819_stats_desc[] = {
 };
 
 static const struct counter_desc pport_phy_statistical_stats_desc[] = {
-   { "rx_symbol_errors_phy", PPORT_PHY_STATISTICAL_OFF(phy_symbol_errors) 
},
+   { "rx_pcs_symbol_err_phy", PPORT_PHY_STATISTICAL_OFF(phy_symbol_errors) 
},
{ "rx_corrected_bits_phy", 
PPORT_PHY_STATISTICAL_OFF(phy_corrected_bits) },
 };
 
-- 
2.11.0



[net-next 09/15] net/mlx5e: Fix typo in warning if CQ moderation is not supported

2017-06-15 Thread Saeed Mahameed
From: Itay Aveksis 

Signed-off-by: Itay Aveksis 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 9dad80f32314..51f686d737cb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3718,7 +3718,7 @@ static int mlx5e_check_required_hca_cap(struct 
mlx5_core_dev *mdev)
if (!MLX5_CAP_ETH(mdev, self_lb_en_modifiable))
mlx5_core_warn(mdev, "Self loop back prevention is not 
supported\n");
if (!MLX5_CAP_GEN(mdev, cq_moderation))
-   mlx5_core_warn(mdev, "CQ modiration is not supported\n");
+   mlx5_core_warn(mdev, "CQ moderation is not supported\n");
 
return 0;
 }
-- 
2.11.0



[net-next 15/15] net/mlx5: Add fast unload support in shutdown flow

2017-06-15 Thread Saeed Mahameed
From: Majd Dibbiny 

Adding a support to flush all HW resources with one FW command and
skip all the heavy unload flows of the driver on kernel shutdown.
There's no need to free all the SW context since a new fresh kernel
will be loaded afterwards.

Regarding the FW resources, they should be closed, otherwise we will
have leakage in the FW. To accelerate this flow, we execute one command
in the beginning that tells the FW that the driver isn't going to close
any of the FW resources and asks the FW to clean up everything.
Once the commands complete, it's safe to close the PCI resources and
finish the routine.

Signed-off-by: Majd Dibbiny 
Signed-off-by: Maor Gottlieb 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fw.c   | 28 +++
 drivers/net/ethernet/mellanox/mlx5/core/health.c   |  4 +--
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 32 --
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|  3 +-
 include/linux/mlx5/mlx5_ifc.h  | 14 --
 5 files changed, 73 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index 1bc14d0fded8..e9489e8d08bb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -195,3 +195,31 @@ int mlx5_cmd_teardown_hca(struct mlx5_core_dev *dev)
MLX5_SET(teardown_hca_in, in, opcode, MLX5_CMD_OP_TEARDOWN_HCA);
return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
 }
+
+int mlx5_cmd_force_teardown_hca(struct mlx5_core_dev *dev)
+{
+   u32 out[MLX5_ST_SZ_DW(teardown_hca_out)] = {0};
+   u32 in[MLX5_ST_SZ_DW(teardown_hca_in)] = {0};
+   int force_state;
+   int ret;
+
+   if (!MLX5_CAP_GEN(dev, force_teardown)) {
+   mlx5_core_dbg(dev, "force teardown is not supported in the 
firmware\n");
+   return -EOPNOTSUPP;
+   }
+
+   MLX5_SET(teardown_hca_in, in, opcode, MLX5_CMD_OP_TEARDOWN_HCA);
+   MLX5_SET(teardown_hca_in, in, profile, 
MLX5_TEARDOWN_HCA_IN_PROFILE_FORCE_CLOSE);
+
+   ret = mlx5_cmd_exec_polling(dev, in, sizeof(in), out, sizeof(out));
+   if (ret)
+   return ret;
+
+   force_state = MLX5_GET(teardown_hca_out, out, force_state);
+   if (force_state == MLX5_TEARDOWN_HCA_OUT_FORCE_STATE_FAIL) {
+   mlx5_core_err(dev, "teardown with force mode failed\n");
+   return -EIO;
+   }
+
+   return 0;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c 
b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index c6679b21884e..0648a659b21d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -111,14 +111,14 @@ static int in_fatal(struct mlx5_core_dev *dev)
return 0;
 }
 
-void mlx5_enter_error_state(struct mlx5_core_dev *dev)
+void mlx5_enter_error_state(struct mlx5_core_dev *dev, bool force)
 {
mutex_lock(>intf_state_mutex);
if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
goto unlock;
 
mlx5_core_err(dev, "start\n");
-   if (pci_channel_offline(dev->pdev) || in_fatal(dev)) {
+   if (pci_channel_offline(dev->pdev) || in_fatal(dev) || force) {
dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
trigger_cmd_completions(dev);
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 39e7e523a0dd..715eeab5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1418,7 +1418,7 @@ static pci_ers_result_t mlx5_pci_err_detected(struct 
pci_dev *pdev,
 
dev_info(>dev, "%s was called\n", __func__);
 
-   mlx5_enter_error_state(dev);
+   mlx5_enter_error_state(dev, false);
mlx5_unload_one(dev, priv, false);
/* In case of kernel call drain the health wq */
if (state) {
@@ -1505,15 +1505,43 @@ static const struct pci_error_handlers mlx5_err_handler 
= {
.resume = mlx5_pci_resume
 };
 
+static int mlx5_try_fast_unload(struct mlx5_core_dev *dev)
+{
+   int ret;
+
+   if (!MLX5_CAP_GEN(dev, force_teardown)) {
+   mlx5_core_dbg(dev, "force teardown is not supported in the 
firmware\n");
+   return -EOPNOTSUPP;
+   }
+
+   if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) {
+   mlx5_core_dbg(dev, "Device in internal error state, giving 
up\n");
+   return -EAGAIN;
+   }
+
+   ret = mlx5_cmd_force_teardown_hca(dev);
+   if (ret) {
+   mlx5_core_dbg(dev, "Firmware couldn't do fast unload error: 
%d\n", ret);
+   return ret;
+   }
+
+   mlx5_enter_error_state(dev, true);
+
+   return 0;
+}
+
 

[net-next 01/15] net/mlx5: Update eqe_type_str() event names

2017-06-15 Thread Saeed Mahameed
From: Eli Cohen 

Add missing NIC_VPORT_CHANGE event.

Signed-off-by: Eli Cohen 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 0ed8e90ba54f..23048247d827 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -157,6 +157,8 @@ static const char *eqe_type_str(u8 type)
return "MLX5_EVENT_TYPE_PAGE_FAULT";
case MLX5_EVENT_TYPE_PPS_EVENT:
return "MLX5_EVENT_TYPE_PPS_EVENT";
+   case MLX5_EVENT_TYPE_NIC_VPORT_CHANGE:
+   return "MLX5_EVENT_TYPE_NIC_VPORT_CHANGE";
case MLX5_EVENT_TYPE_FPGA_ERROR:
return "MLX5_EVENT_TYPE_FPGA_ERROR";
default:
-- 
2.11.0



Re: [PATCH] atm: solos-pci: remove useless variable assignments

2017-06-15 Thread Gustavo A. R. Silva

Hi David,

Quoting David Miller :


From: "Gustavo A. R. Silva" 
Date: Thu, 15 Jun 2017 14:56:21 -0500


Value assigned to variable _data32_ at lines 1254 and 1257 is
overwritten at line 1260 before it can be used. This makes
such variable assignments useless.

Addresses-Coverity-ID: 1227049
Signed-off-by: Gustavo A. R. Silva 


Applied, thanks.


Absolutely, glad to help.

Regards
--
Gustavo A. R. Silva






[net-next 08/15] net/mlx5e: Use function to map aRFS into traffic type

2017-06-15 Thread Saeed Mahameed
From: Tariq Toukan 

For a better code reuse and readability, use the existing
function arfs_get_tt() to map arfs_type into mlx5e_traffic_types,
instead of duplicating the switch-case logic.

Signed-off-by: Tariq Toukan 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 21 +++--
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
index f4017c06ddd2..12d3ced61114 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
@@ -178,6 +178,7 @@ static int arfs_add_default_rule(struct mlx5e_priv *priv,
struct mlx5_flow_destination dest;
MLX5_DECLARE_FLOW_ACT(flow_act);
struct mlx5_flow_spec *spec;
+   enum mlx5e_traffic_types tt;
int err = 0;
 
spec = kvzalloc(sizeof(*spec), GFP_KERNEL);
@@ -187,24 +188,16 @@ static int arfs_add_default_rule(struct mlx5e_priv *priv,
}
 
dest.type = MLX5_FLOW_DESTINATION_TYPE_TIR;
-   switch (type) {
-   case ARFS_IPV4_TCP:
-   dest.tir_num = tir[MLX5E_TT_IPV4_TCP].tirn;
-   break;
-   case ARFS_IPV4_UDP:
-   dest.tir_num = tir[MLX5E_TT_IPV4_UDP].tirn;
-   break;
-   case ARFS_IPV6_TCP:
-   dest.tir_num = tir[MLX5E_TT_IPV6_TCP].tirn;
-   break;
-   case ARFS_IPV6_UDP:
-   dest.tir_num = tir[MLX5E_TT_IPV6_UDP].tirn;
-   break;
-   default:
+   tt = arfs_get_tt(type);
+   if (tt == -EINVAL) {
+   netdev_err(priv->netdev, "%s: bad arfs_type: %d\n",
+  __func__, type);
err = -EINVAL;
goto out;
}
 
+   dest.tir_num = tir[tt].tirn;
+
arfs_t->default_rule = mlx5_add_flow_rules(arfs_t->ft.t, spec,
   _act,
   , 1);
-- 
2.11.0



[net-next 11/15] net/mlx5e: Reduce number of heap allocated buffers for update stats

2017-06-15 Thread Saeed Mahameed
From: Gal Pressman 

Allocating buffers on the heap every 200ms is something we should avoid,
let's use buffers located on the stack instead.

Signed-off-by: Gal Pressman 
Reviewed-by: Eran Ben Elisha 
Signed-off-by: Saeed Mahameed 
Cc: kernel-t...@fb.com
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 16 ++--
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 51f686d737cb..50184021624e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -248,14 +248,10 @@ static void mlx5e_update_pport_counters(struct mlx5e_priv 
*priv)
 {
struct mlx5e_pport_stats *pstats = >stats.pport;
struct mlx5_core_dev *mdev = priv->mdev;
+   u32 in[MLX5_ST_SZ_DW(ppcnt_reg)] = {0};
int sz = MLX5_ST_SZ_BYTES(ppcnt_reg);
int prio;
void *out;
-   u32 *in;
-
-   in = kvzalloc(sz, GFP_KERNEL);
-   if (!in)
-   return;
 
MLX5_SET(ppcnt_reg, in, local_port, 1);
 
@@ -288,8 +284,6 @@ static void mlx5e_update_pport_counters(struct mlx5e_priv 
*priv)
mlx5_core_access_reg(mdev, in, sz, out, sz,
 MLX5_REG_PPCNT, 0, 0);
}
-
-   kvfree(in);
 }
 
 static void mlx5e_update_q_counter(struct mlx5e_priv *priv)
@@ -307,22 +301,16 @@ static void mlx5e_update_pcie_counters(struct mlx5e_priv 
*priv)
 {
struct mlx5e_pcie_stats *pcie_stats = >stats.pcie;
struct mlx5_core_dev *mdev = priv->mdev;
+   u32 in[MLX5_ST_SZ_DW(mpcnt_reg)] = {0};
int sz = MLX5_ST_SZ_BYTES(mpcnt_reg);
void *out;
-   u32 *in;
 
if (!MLX5_CAP_MCAM_FEATURE(mdev, pcie_performance_group))
return;
 
-   in = kvzalloc(sz, GFP_KERNEL);
-   if (!in)
-   return;
-
out = pcie_stats->pcie_perf_counters;
MLX5_SET(mpcnt_reg, in, grp, MLX5_PCIE_PERFORMANCE_COUNTERS_GROUP);
mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_MPCNT, 0, 0);
-
-   kvfree(in);
 }
 
 void mlx5e_update_stats(struct mlx5e_priv *priv)
-- 
2.11.0



[pull request][net-next 00/15] Mellanox mlx5 updates and cleanups 2017-06-16

2017-06-15 Thread Saeed Mahameed
Hi Dave,

This series provides updates and cleanups to mlx5 driver.

For more details please see tag log below.

Please pull and let me know if there's any problem.
*This series doesn't introduce any conflict with the ongoing net
pull request.

Thanks,
Saeed.

---

The following changes since commit 3715c47bcda8bb56f7e2be27276282a2d0d48c09:

  Merge branch 'r8152-support-new-chips' (2017-06-15 14:31:56 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
tags/mlx5-updates-2017-06-16

for you to fetch changes up to 8812c24d28f4972c4f2b9998bf30b1f2a1b62adf:

  net/mlx5: Add fast unload support in shutdown flow (2017-06-16 00:19:44 +0300)


mlx5-updates-2017-06-16

This series provide some updates and cleanups for mlx5 core and netdevice
driver.

>From Eli Cohen, add a missing event string.
>From Or Gerlitz, some checkpatch cleanups.
>From Moni, Disalbe HW level LAG when SRIOV is enabled.
>From Tariq, A code reuse cleanup in aRFS flow.
>From Itay Aveksis, Typo fix.
>From Gal Pressman, ethtool statistics updates and "update stats" deferred work 
>optimizations.
>From Majd Dibbiny, Fast unload support on kernel shutdown.


Eli Cohen (1):
  net/mlx5: Update eqe_type_str() event names

Gal Pressman (4):
  net/mlx5e: Rename physical symbol errors counter
  net/mlx5e: Reduce number of heap allocated buffers for update stats
  net/mlx5e: Move and optimize query out of buffer function
  net/mlx5e: Optimize update stats work

Itay Aveksis (1):
  net/mlx5e: Fix typo in warning if CQ moderation is not supported

Majd Dibbiny (2):
  net/mlx5: Expose command polling interface
  net/mlx5: Add fast unload support in shutdown flow

Moni Shoua (1):
  net/mlx5: Undo LAG upon request to create virtual functions

Or Gerlitz (5):
  net/mlx5: Fix some spelling mistakes
  net/mlx5: Avoid using multiple blank lines
  net/mlx5: Avoid blank lines before/after closing/opening braces
  net/mlx5: Align to match opening parenthesis
  net/mlx5: Avoid space after casting

Tariq Toukan (1):
  net/mlx5e: Use function to map aRFS into traffic type

 drivers/infiniband/hw/mlx5/main.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/alloc.c|  1 -
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c  | 36 +++
 drivers/net/ethernet/mellanox/mlx5/core/debugfs.c  |  3 -
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c  | 21 +++
 .../net/ethernet/mellanox/mlx5/core/en_common.c|  1 -
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c|  1 -
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 52 
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  1 -
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   |  6 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  1 -
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |  4 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/fw.c   | 28 +
 drivers/net/ethernet/mellanox/mlx5/core/health.c   |  4 +-
 drivers/net/ethernet/mellanox/mlx5/core/lag.c  | 71 +++---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 43 ++---
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|  6 +-
 .../net/ethernet/mellanox/mlx5/core/pagealloc.c|  1 -
 drivers/net/ethernet/mellanox/mlx5/core/qp.c   | 21 ---
 drivers/net/ethernet/mellanox/mlx5/core/sriov.c| 15 +++--
 include/linux/mlx5/driver.h|  3 +
 include/linux/mlx5/mlx5_ifc.h  | 18 --
 include/linux/mlx5/qp.h|  2 -
 28 files changed, 227 insertions(+), 124 deletions(-)


[net-next 04/15] net/mlx5: Avoid blank lines before/after closing/opening braces

2017-06-15 Thread Saeed Mahameed
From: Or Gerlitz 

Fixed checkpatch complaints on that:

 CHECK: Blank lines aren't necessary before a close brace '}'
 CHECK: Blank lines aren't necessary after an open brace '{'

and one on missing blank line..

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 -
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  | 1 -
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 1 -
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +-
 4 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 5afec0f4a658..c803a533ec3c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3067,7 +3067,6 @@ mlx5e_get_stats(struct net_device *dev, struct 
rtnl_link_stats64 *stats)
 */
stats->multicast =
VPORT_COUNTER_GET(vstats, received_eth_multicast.packets);
-
 }
 
 static void mlx5e_set_rx_mode(struct net_device *dev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 70c2b8d020bd..01798e1ab667 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1019,7 +1019,6 @@ mlx5e_vport_rep_load(struct mlx5_eswitch *esw, struct 
mlx5_eswitch_rep *rep)
mlx5e_destroy_netdev(netdev_priv(netdev));
kfree(rpriv);
return err;
-
 }
 
 static void
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 37927156f258..89bfda419efe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1217,7 +1217,6 @@ static int esw_vport_ingress_config(struct mlx5_eswitch 
*esw,
   "vport[%d] configure ingress rules failed, 
illegal mac with spoofchk\n",
   vport->vport);
return -EPERM;
-
}
 
esw_vport_cleanup_ingress_rules(esw, vport);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 6380c2db355a..e8690fe46bf2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -104,6 +104,7 @@ struct node_caps {
size_t  arr_sz;
long*caps;
 };
+
 static struct init_tree_node {
enum fs_node_type   type;
struct init_tree_node *children;
@@ -1858,7 +1859,6 @@ static int create_anchor_flow_table(struct 
mlx5_flow_steering *steering)
 
 static int init_root_ns(struct mlx5_flow_steering *steering)
 {
-
steering->root_ns = create_root_ns(steering, FS_FT_NIC_RX);
if (!steering->root_ns)
goto cleanup;
-- 
2.11.0



[net-next 12/15] net/mlx5e: Move and optimize query out of buffer function

2017-06-15 Thread Saeed Mahameed
From: Gal Pressman 

Move "query queue counter out of buffer" helper function out of
qp.c to en_main.c, since mlx5e netdev driver is the only one to use it.

Also allocate the output buffer on the stack instead of the heap, to reduce
number of heap allocs on update_stats work.

Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
Cc: kernel-t...@fb.com
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  9 +++--
 drivers/net/ethernet/mellanox/mlx5/core/qp.c  | 20 
 include/linux/mlx5/qp.h   |  2 --
 3 files changed, 7 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 50184021624e..8bb0241df069 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -289,12 +289,17 @@ static void mlx5e_update_pport_counters(struct mlx5e_priv 
*priv)
 static void mlx5e_update_q_counter(struct mlx5e_priv *priv)
 {
struct mlx5e_qcounter_stats *qcnt = >stats.qcnt;
+   u32 out[MLX5_ST_SZ_DW(query_q_counter_out)];
+   int err;
 
if (!priv->q_counter)
return;
 
-   mlx5_core_query_out_of_buffer(priv->mdev, priv->q_counter,
- >rx_out_of_buffer);
+   err = mlx5_core_query_q_counter(priv->mdev, priv->q_counter, 0, out, 
sizeof(out));
+   if (err)
+   return;
+
+   qcnt->rx_out_of_buffer = MLX5_GET(query_q_counter_out, out, 
out_of_buffer);
 }
 
 static void mlx5e_update_pcie_counters(struct mlx5e_priv *priv)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c 
b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
index da0f18f93616..340f281c9801 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
@@ -518,23 +518,3 @@ int mlx5_core_query_q_counter(struct mlx5_core_dev *dev, 
u16 counter_id,
return mlx5_cmd_exec(dev, in, sizeof(in), out, out_size);
 }
 EXPORT_SYMBOL_GPL(mlx5_core_query_q_counter);
-
-int mlx5_core_query_out_of_buffer(struct mlx5_core_dev *dev, u16 counter_id,
- u32 *out_of_buffer)
-{
-   int outlen = MLX5_ST_SZ_BYTES(query_q_counter_out);
-   void *out;
-   int err;
-
-   out = kvzalloc(outlen, GFP_KERNEL);
-   if (!out)
-   return -ENOMEM;
-
-   err = mlx5_core_query_q_counter(dev, counter_id, 0, out, outlen);
-   if (!err)
-   *out_of_buffer = MLX5_GET(query_q_counter_out, out,
- out_of_buffer);
-
-   kfree(out);
-   return err;
-}
diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
index bef80d0a0e30..1f637f4d1265 100644
--- a/include/linux/mlx5/qp.h
+++ b/include/linux/mlx5/qp.h
@@ -569,8 +569,6 @@ int mlx5_core_alloc_q_counter(struct mlx5_core_dev *dev, 
u16 *counter_id);
 int mlx5_core_dealloc_q_counter(struct mlx5_core_dev *dev, u16 counter_id);
 int mlx5_core_query_q_counter(struct mlx5_core_dev *dev, u16 counter_id,
  int reset, void *out, int out_size);
-int mlx5_core_query_out_of_buffer(struct mlx5_core_dev *dev, u16 counter_id,
- u32 *out_of_buffer);
 
 static inline const char *mlx5_qp_type_str(int type)
 {
-- 
2.11.0



[net-next 13/15] net/mlx5e: Optimize update stats work

2017-06-15 Thread Saeed Mahameed
From: Gal Pressman 

Unlike ethtool stats, get_stats ndo provides information cached by
update stats work that is running in the background without updating
them explicitly.
We cannot update all counters inside the ndo because some
updates require firmware commands that cannot be performed under a
spinlock.

update_stats work does not need to update ALL counters, since only
some of them are needed by ndo_get_stats.
This patch will allow for a minimal run of update_stats using an extra
parameter which will update necessary counters only and cut 13
firmware commands in each iteration of the work.

Work duration previous to this patch: ~4200us.
Work duration after this patch: ~700us (17% of the original time).

Signed-off-by: Gal Pressman 
Reviewed-by: Eran Ben Elisha 
Signed-off-by: Saeed Mahameed 
Cc: kernel-t...@fb.com
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c| 19 ++-
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index a0516b0a5273..8094e78292de 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -822,7 +822,7 @@ void mlx5e_rx_am(struct mlx5e_rq *rq);
 void mlx5e_rx_am_work(struct work_struct *work);
 struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
 
-void mlx5e_update_stats(struct mlx5e_priv *priv);
+void mlx5e_update_stats(struct mlx5e_priv *priv, bool full);
 
 int mlx5e_create_flow_steering(struct mlx5e_priv *priv);
 void mlx5e_destroy_flow_steering(struct mlx5e_priv *priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index b4514f247402..216752070391 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -311,7 +311,7 @@ static void mlx5e_get_ethtool_stats(struct net_device *dev,
 
mutex_lock(>state_lock);
if (test_bit(MLX5E_STATE_OPENED, >state))
-   mlx5e_update_stats(priv);
+   mlx5e_update_stats(priv, true);
channels = >channels;
mutex_unlock(>state_lock);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 8bb0241df069..2e9a4187a533 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -244,7 +244,7 @@ static void mlx5e_update_vport_counters(struct mlx5e_priv 
*priv)
mlx5_cmd_exec(mdev, in, sizeof(in), out, outlen);
 }
 
-static void mlx5e_update_pport_counters(struct mlx5e_priv *priv)
+static void mlx5e_update_pport_counters(struct mlx5e_priv *priv, bool full)
 {
struct mlx5e_pport_stats *pstats = >stats.pport;
struct mlx5_core_dev *mdev = priv->mdev;
@@ -259,6 +259,9 @@ static void mlx5e_update_pport_counters(struct mlx5e_priv 
*priv)
MLX5_SET(ppcnt_reg, in, grp, MLX5_IEEE_802_3_COUNTERS_GROUP);
mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPCNT, 0, 0);
 
+   if (!full)
+   return;
+
out = pstats->RFC_2863_counters;
MLX5_SET(ppcnt_reg, in, grp, MLX5_RFC_2863_COUNTERS_GROUP);
mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPCNT, 0, 0);
@@ -318,15 +321,21 @@ static void mlx5e_update_pcie_counters(struct mlx5e_priv 
*priv)
mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_MPCNT, 0, 0);
 }
 
-void mlx5e_update_stats(struct mlx5e_priv *priv)
+void mlx5e_update_stats(struct mlx5e_priv *priv, bool full)
 {
-   mlx5e_update_pcie_counters(priv);
-   mlx5e_update_pport_counters(priv);
+   if (full)
+   mlx5e_update_pcie_counters(priv);
+   mlx5e_update_pport_counters(priv, full);
mlx5e_update_vport_counters(priv);
mlx5e_update_q_counter(priv);
mlx5e_update_sw_counters(priv);
 }
 
+static void mlx5e_update_ndo_stats(struct mlx5e_priv *priv)
+{
+   mlx5e_update_stats(priv, false);
+}
+
 void mlx5e_update_stats_work(struct work_struct *work)
 {
struct delayed_work *dwork = to_delayed_work(work);
@@ -4195,7 +4204,7 @@ static const struct mlx5e_profile mlx5e_nic_profile = {
.cleanup_tx= mlx5e_cleanup_nic_tx,
.enable= mlx5e_nic_enable,
.disable   = mlx5e_nic_disable,
-   .update_stats  = mlx5e_update_stats,
+   .update_stats  = mlx5e_update_ndo_stats,
.max_nch   = mlx5e_get_max_num_channels,
.rx_handlers.handle_rx_cqe   = mlx5e_handle_rx_cqe,
.rx_handlers.handle_rx_cqe_mpwqe = mlx5e_handle_rx_cqe_mpwrq,
-- 
2.11.0



[net-next 05/15] net/mlx5: Align to match opening parenthesis

2017-06-15 Thread Saeed Mahameed
From: Or Gerlitz 

Fixed checkpatch complaints of the form:

 CHECK: Alignment should match open parenthesis

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c  | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 5 +++--
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c| 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 2 +-
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index e283095b69c3..c1d8b0bcde75 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -1151,7 +1151,7 @@ static struct mlx5_cmd_msg *mlx5_alloc_cmd_msg(struct 
mlx5_core_dev *dev,
 }
 
 static void mlx5_free_cmd_msg(struct mlx5_core_dev *dev,
- struct mlx5_cmd_msg *msg)
+ struct mlx5_cmd_msg *msg)
 {
struct mlx5_cmd_mailbox *head = msg->next;
struct mlx5_cmd_mailbox *next;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index c803a533ec3c..9dad80f32314 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -124,7 +124,8 @@ static void mlx5e_update_carrier(struct mlx5e_priv *priv)
u8 port_state;
 
port_state = mlx5_query_vport_state(mdev,
-   MLX5_QUERY_VPORT_STATE_IN_OP_MOD_VNIC_VPORT, 0);
+   
MLX5_QUERY_VPORT_STATE_IN_OP_MOD_VNIC_VPORT,
+   0);
 
if (port_state == VPORT_STATE_UP) {
netdev_info(priv->netdev, "Link up\n");
@@ -3850,7 +3851,7 @@ void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
/* set CQE compression */
params->rx_cqe_compress_def = false;
if (MLX5_CAP_GEN(mdev, cqe_compression) &&
-MLX5_CAP_GEN(mdev, vport_group_manager))
+   MLX5_CAP_GEN(mdev, vport_group_manager))
params->rx_cqe_compress_def = 
cqe_compress_heuristic(link_speed, pci_bw);
 
MLX5E_SET_PFLAG(params, MLX5E_PFLAG_RX_CQE_COMPRESS, 
params->rx_cqe_compress_def);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index ab3bb026ff9e..ef3e1918d8cc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -245,7 +245,7 @@ mlx5e_txwqe_build_dsegs(struct mlx5e_txqsq *sq, struct 
sk_buff *skb,
int fsz = skb_frag_size(frag);
 
dma_addr = skb_frag_dma_map(sq->pdev, frag, 0, fsz,
-DMA_TO_DEVICE);
+   DMA_TO_DEVICE);
if (unlikely(dma_mapping_error(sq->pdev, dma_addr)))
return -ENOMEM;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 3795943ef2d1..877395f34e89 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -691,7 +691,7 @@ mlx5_eswitch_create_vport_rx_rule(struct mlx5_eswitch *esw, 
int vport, u32 tirn)
 
flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
flow_rule = mlx5_add_flow_rules(esw->offloads.ft_offloads, spec,
-  _act, , 1);
+   _act, , 1);
if (IS_ERR(flow_rule)) {
esw_warn(esw->dev, "fs offloads: Failed to add vport rx rule 
err %ld\n", PTR_ERR(flow_rule));
goto out;
-- 
2.11.0



[net-next 07/15] net/mlx5: Undo LAG upon request to create virtual functions

2017-06-15 Thread Saeed Mahameed
From: Moni Shoua 

LAG cannot work if virtual functions are present. Therefore, if LAG is
configured, the attempt to create virtual functions will fail. This gives
precedence to LAG over SRIOV which is not the desired behavior as users
might want to use the bonding/teaming driver also want to work with SRIOV.
In that case we don't want to force an order of actions, first create
virtual functions and only than configure a bonding/teaming net device.
To fix, if LAG is configured during a request to create virtual
functions, remove it and continue.

We ignore ENODEV when trying to forbid lag. This makes sense
because "No such device" means that lag is forbidden anyway.

Signed-off-by: Moni Shoua 
Reviewed-by: Aviv Heller 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/lag.c  | 69 +++---
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|  3 +
 drivers/net/ethernet/mellanox/mlx5/core/sriov.c| 15 +++--
 3 files changed, 74 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag.c 
b/drivers/net/ethernet/mellanox/mlx5/core/lag.c
index b6993c4e4823..a3a836bdcfd2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag.c
@@ -61,6 +61,11 @@ struct mlx5_lag {
struct lag_trackertracker;
struct delayed_work   bond_work;
struct notifier_block nb;
+
+   /* Admin state. Allow lag only if allowed is true
+* even if network conditions for lag were met
+*/
+   bool  allowed;
 };
 
 /* General purpose, use for short periods of time.
@@ -214,6 +219,7 @@ static void mlx5_do_bond(struct mlx5_lag *ldev)
struct lag_tracker tracker;
u8 v2p_port1, v2p_port2;
int i, err;
+   bool do_bond;
 
if (!dev0 || !dev1)
return;
@@ -222,13 +228,9 @@ static void mlx5_do_bond(struct mlx5_lag *ldev)
tracker = ldev->tracker;
mutex_unlock(_mutex);
 
-   if (tracker.is_bonded && !mlx5_lag_is_bonded(ldev)) {
-   if (mlx5_sriov_is_enabled(dev0) ||
-   mlx5_sriov_is_enabled(dev1)) {
-   mlx5_core_warn(dev0, "LAG is not supported with SRIOV");
-   return;
-   }
+   do_bond = tracker.is_bonded && ldev->allowed;
 
+   if (do_bond && !mlx5_lag_is_bonded(ldev)) {
for (i = 0; i < MLX5_MAX_PORTS; i++)
mlx5_remove_dev_by_protocol(ldev->pf[i].dev,
MLX5_INTERFACE_PROTOCOL_IB);
@@ -237,7 +239,7 @@ static void mlx5_do_bond(struct mlx5_lag *ldev)
 
mlx5_add_dev_by_protocol(dev0, MLX5_INTERFACE_PROTOCOL_IB);
mlx5_nic_vport_enable_roce(dev1);
-   } else if (tracker.is_bonded && mlx5_lag_is_bonded(ldev)) {
+   } else if (do_bond && mlx5_lag_is_bonded(ldev)) {
mlx5_infer_tx_affinity_mapping(, _port1,
   _port2);
 
@@ -252,7 +254,7 @@ static void mlx5_do_bond(struct mlx5_lag *ldev)
  "Failed to modify LAG (%d)\n",
  err);
}
-   } else if (!tracker.is_bonded && mlx5_lag_is_bonded(ldev)) {
+   } else if (!do_bond && mlx5_lag_is_bonded(ldev)) {
mlx5_remove_dev_by_protocol(dev0, MLX5_INTERFACE_PROTOCOL_IB);
mlx5_nic_vport_disable_roce(dev1);
 
@@ -411,6 +413,15 @@ static int mlx5_lag_netdev_event(struct notifier_block 
*this,
return NOTIFY_DONE;
 }
 
+static bool mlx5_lag_check_prereq(struct mlx5_lag *ldev)
+{
+   if ((ldev->pf[0].dev && mlx5_sriov_is_enabled(ldev->pf[0].dev)) ||
+   (ldev->pf[1].dev && mlx5_sriov_is_enabled(ldev->pf[1].dev)))
+   return false;
+   else
+   return true;
+}
+
 static struct mlx5_lag *mlx5_lag_dev_alloc(void)
 {
struct mlx5_lag *ldev;
@@ -420,6 +431,7 @@ static struct mlx5_lag *mlx5_lag_dev_alloc(void)
return NULL;
 
INIT_DELAYED_WORK(>bond_work, mlx5_do_bond_work);
+   ldev->allowed = mlx5_lag_check_prereq(ldev);
 
return ldev;
 }
@@ -444,7 +456,9 @@ static void mlx5_lag_dev_add_pf(struct mlx5_lag *ldev,
ldev->tracker.netdev_state[fn].link_up = 0;
ldev->tracker.netdev_state[fn].tx_enabled = 0;
 
+   ldev->allowed = mlx5_lag_check_prereq(ldev);
dev->priv.lag = ldev;
+
mutex_unlock(_mutex);
 }
 
@@ -464,6 +478,7 @@ static void mlx5_lag_dev_remove_pf(struct mlx5_lag *ldev,
memset(>pf[i], 0, sizeof(*ldev->pf));
 
dev->priv.lag = NULL;
+   ldev->allowed = mlx5_lag_check_prereq(ldev);
mutex_unlock(_mutex);
 }
 
@@ -542,6 +557,44 @@ bool mlx5_lag_is_active(struct mlx5_core_dev *dev)
 }
 

[net-next 02/15] net/mlx5: Fix some spelling mistakes

2017-06-15 Thread Saeed Mahameed
From: Or Gerlitz 

Fixed few places where endianness was misspelled and
one spot whwere output was:

CHECK: 'endianess' may be misspelled - perhaps 'endianness'?
CHECK: 'ouput' may be misspelled - perhaps 'output'?

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/infiniband/hw/mlx5/main.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 10 +-
 include/linux/mlx5/mlx5_ifc.h  |  4 ++--
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 852a6a75db98..2ab505d1e8e3 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -439,7 +439,7 @@ static void get_atomic_caps(struct mlx5_ib_dev *dev,
u8 atomic_operations = MLX5_CAP_ATOMIC(dev->mdev, atomic_operations);
u8 atomic_size_qp = MLX5_CAP_ATOMIC(dev->mdev, atomic_size_qp);
u8 atomic_req_8B_endianness_mode =
-   MLX5_CAP_ATOMIC(dev->mdev, atomic_req_8B_endianess_mode);
+   MLX5_CAP_ATOMIC(dev->mdev, atomic_req_8B_endianness_mode);
 
/* Check if HW supports 8 bytes standard atomic operations and capable
 * of host endianness respond
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 10d282841f5b..46efaab9da46 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -874,7 +874,7 @@ static const char *deliv_status_to_str(u8 status)
case MLX5_CMD_DELIVERY_STAT_IN_LENGTH_ERR:
return "command input length error";
case MLX5_CMD_DELIVERY_STAT_OUT_LENGTH_ERR:
-   return "command ouput length error";
+   return "command output length error";
case MLX5_CMD_DELIVERY_STAT_RES_FLD_NOT_CLR_ERR:
return "reserved fields not cleared";
case MLX5_CMD_DELIVERY_STAT_CMD_DESCR_ERR:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index dc890944c4ea..3319968ed789 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -356,7 +356,7 @@ static void mlx5_disable_msix(struct mlx5_core_dev *dev)
kfree(priv->msix_arr);
 }
 
-struct mlx5_reg_host_endianess {
+struct mlx5_reg_host_endianness {
u8  he;
u8  rsvd[15];
 };
@@ -475,7 +475,7 @@ static int handle_hca_cap_atomic(struct mlx5_core_dev *dev)
 
req_endianness =
MLX5_CAP_ATOMIC(dev,
-   supported_atomic_req_8B_endianess_mode_1);
+   supported_atomic_req_8B_endianness_mode_1);
 
if (req_endianness != MLX5_ATOMIC_REQ_MODE_HOST_ENDIANNESS)
return 0;
@@ -487,7 +487,7 @@ static int handle_hca_cap_atomic(struct mlx5_core_dev *dev)
set_hca_cap = MLX5_ADDR_OF(set_hca_cap_in, set_ctx, capability);
 
/* Set requestor to host endianness */
-   MLX5_SET(atomic_caps, set_hca_cap, atomic_req_8B_endianess_mode,
+   MLX5_SET(atomic_caps, set_hca_cap, atomic_req_8B_endianness_mode,
 MLX5_ATOMIC_REQ_MODE_HOST_ENDIANNESS);
 
err = set_caps(dev, set_ctx, set_sz, MLX5_SET_HCA_CAP_OP_MOD_ATOMIC);
@@ -562,8 +562,8 @@ static int handle_hca_cap(struct mlx5_core_dev *dev)
 
 static int set_hca_ctrl(struct mlx5_core_dev *dev)
 {
-   struct mlx5_reg_host_endianess he_in;
-   struct mlx5_reg_host_endianess he_out;
+   struct mlx5_reg_host_endianness he_in;
+   struct mlx5_reg_host_endianness he_out;
int err;
 
if (!mlx5_core_is_pf(dev))
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 32b044e953d2..1fd144662491 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -661,9 +661,9 @@ enum {
 struct mlx5_ifc_atomic_caps_bits {
u8 reserved_at_0[0x40];
 
-   u8 atomic_req_8B_endianess_mode[0x2];
+   u8 atomic_req_8B_endianness_mode[0x2];
u8 reserved_at_42[0x4];
-   u8 supported_atomic_req_8B_endianess_mode_1[0x1];
+   u8 supported_atomic_req_8B_endianness_mode_1[0x1];
 
u8 reserved_at_47[0x19];
 
-- 
2.11.0



Re: [PATCH net-next 5/7] qed*: Set rdma generic functions prefix

2017-06-15 Thread kbuild test robot
Hi Michal,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Yuval-Mintz/qed-RDMA-and-infrastructure-for-iWARP/20170616-043925
config: i386-randconfig-x076-06120530 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All error/warnings (new ones prefixed by >>):


vim +80 include/linux/qed/qede_rdma.h

b4d583a63 include/linux/qed/qede_rdma.h Michal Kalderon 2017-06-15  73  void 
qede_rdma_dev_event_open(struct qede_dev *dev);
b4d583a63 include/linux/qed/qede_rdma.h Michal Kalderon 2017-06-15  74  void 
qede_rdma_dev_event_close(struct qede_dev *dev);
b4d583a63 include/linux/qed/qede_rdma.h Michal Kalderon 2017-06-15  75  void 
qede_rdma_dev_remove(struct qede_dev *dev);
b4d583a63 include/linux/qed/qede_rdma.h Michal Kalderon 2017-06-15  76  void 
qede_rdma_event_changeaddr(struct qede_dev *edr);
b4d583a63 include/linux/qed/qede_rdma.h Michal Kalderon 2017-06-15  77  
cee9fbd8e include/linux/qed/qede_roce.h Ram Amrani  2016-10-01  78  #else
b4d583a63 include/linux/qed/qede_rdma.h Michal Kalderon 2017-06-15 @79  static 
inline int qede_rdma_dev_add(struct qede_dev *dev);
cee9fbd8e include/linux/qed/qede_roce.h Ram Amrani  2016-10-01 @80  {
cee9fbd8e include/linux/qed/qede_roce.h Ram Amrani  2016-10-01  81  
return 0;
cee9fbd8e include/linux/qed/qede_roce.h Ram Amrani  2016-10-01  82  }
cee9fbd8e include/linux/qed/qede_roce.h Ram Amrani  2016-10-01  83  

:: The code at line 80 was first introduced by commit
:: cee9fbd8e2e9e713cd8bf227c6492fd8854de74b qede: Add qedr framework

:: TO: Ram Amrani 
:: CC: David S. Miller 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[net-next 14/15] net/mlx5: Expose command polling interface

2017-06-15 Thread Saeed Mahameed
From: Majd Dibbiny 

Add a new interface for commands execution that allows the
caller to wait for the command's completion in a busy-wait
loop (polling mode).

This is useful if we want to execute a command in a polling mode
while the driver is working in events mode for the rest of
the commands.
This interface will be used in the downstream patches.

Signed-off-by: Majd Dibbiny 
Signed-off-by: Maor Gottlieb 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 30 ---
 include/linux/mlx5/driver.h   |  3 +++
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index c1d8b0bcde75..4d5bd01f1ebb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -785,6 +785,8 @@ static void cmd_work_handler(struct work_struct *work)
struct mlx5_cmd_layout *lay;
struct semaphore *sem;
unsigned long flags;
+   bool poll_cmd = ent->polling;
+
 
sem = ent->page_queue ? >pages_sem : >sem;
down(sem);
@@ -845,7 +847,7 @@ static void cmd_work_handler(struct work_struct *work)
iowrite32be(1 << ent->idx, >iseg->cmd_dbell);
mmiowb();
/* if not in polling don't use ent after this point */
-   if (cmd->mode == CMD_MODE_POLLING) {
+   if (cmd->mode == CMD_MODE_POLLING || poll_cmd) {
poll_timeout(ent);
/* make sure we read the descriptor after ownership is SW */
rmb();
@@ -889,7 +891,7 @@ static int wait_func(struct mlx5_core_dev *dev, struct 
mlx5_cmd_work_ent *ent)
struct mlx5_cmd *cmd = >cmd;
int err;
 
-   if (cmd->mode == CMD_MODE_POLLING) {
+   if (cmd->mode == CMD_MODE_POLLING || ent->polling) {
wait_for_completion(>done);
} else if (!wait_for_completion_timeout(>done, timeout)) {
ent->ret = -ETIMEDOUT;
@@ -917,7 +919,7 @@ static int mlx5_cmd_invoke(struct mlx5_core_dev *dev, 
struct mlx5_cmd_msg *in,
   struct mlx5_cmd_msg *out, void *uout, int uout_size,
   mlx5_cmd_cbk_t callback,
   void *context, int page_queue, u8 *status,
-  u8 token)
+  u8 token, bool force_polling)
 {
struct mlx5_cmd *cmd = >cmd;
struct mlx5_cmd_work_ent *ent;
@@ -935,6 +937,7 @@ static int mlx5_cmd_invoke(struct mlx5_core_dev *dev, 
struct mlx5_cmd_msg *in,
return PTR_ERR(ent);
 
ent->token = token;
+   ent->polling = force_polling;
 
if (!callback)
init_completion(>done);
@@ -1535,7 +1538,8 @@ static int is_manage_pages(void *in)
 }
 
 static int cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size, void 
*out,
-   int out_size, mlx5_cmd_cbk_t callback, void *context)
+   int out_size, mlx5_cmd_cbk_t callback, void *context,
+   bool force_polling)
 {
struct mlx5_cmd_msg *inb;
struct mlx5_cmd_msg *outb;
@@ -1580,7 +1584,7 @@ static int cmd_exec(struct mlx5_core_dev *dev, void *in, 
int in_size, void *out,
}
 
err = mlx5_cmd_invoke(dev, inb, outb, out, out_size, callback, context,
- pages_queue, , token);
+ pages_queue, , token, force_polling);
if (err)
goto out_out;
 
@@ -1608,7 +1612,7 @@ int mlx5_cmd_exec(struct mlx5_core_dev *dev, void *in, 
int in_size, void *out,
 {
int err;
 
-   err = cmd_exec(dev, in, in_size, out, out_size, NULL, NULL);
+   err = cmd_exec(dev, in, in_size, out, out_size, NULL, NULL, false);
return err ? : mlx5_cmd_check(dev, in, out);
 }
 EXPORT_SYMBOL(mlx5_cmd_exec);
@@ -1617,10 +1621,22 @@ int mlx5_cmd_exec_cb(struct mlx5_core_dev *dev, void 
*in, int in_size,
 void *out, int out_size, mlx5_cmd_cbk_t callback,
 void *context)
 {
-   return cmd_exec(dev, in, in_size, out, out_size, callback, context);
+   return cmd_exec(dev, in, in_size, out, out_size, callback, context,
+   false);
 }
 EXPORT_SYMBOL(mlx5_cmd_exec_cb);
 
+int mlx5_cmd_exec_polling(struct mlx5_core_dev *dev, void *in, int in_size,
+ void *out, int out_size)
+{
+   int err;
+
+   err = cmd_exec(dev, in, in_size, out, out_size, NULL, NULL, true);
+
+   return err ? : mlx5_cmd_check(dev, in, out);
+}
+EXPORT_SYMBOL(mlx5_cmd_exec_polling);
+
 static void destroy_msg_cache(struct mlx5_core_dev *dev)
 {
struct cmd_msg_cache *ch;
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 6ea2f5734e37..bf15e87da8fa 100644
--- a/include/linux/mlx5/driver.h
+++ 

Re: [PATCH net-next 4/7] qed*: Rename qede_roce.[ch]

2017-06-15 Thread kbuild test robot
Hi Michal,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Yuval-Mintz/qed-RDMA-and-infrastructure-for-iWARP/20170616-043925
config: tile-allmodconfig (attached as .config)
compiler: tilegx-linux-gcc (GCC) 4.6.2
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=tile 

All warnings (new ones prefixed by >>):

   In file included from drivers/net//ethernet/qlogic/qede/qede.h:43:0,
from drivers/net//ethernet/qlogic/qede/qede_ptp.h:38,
from drivers/net//ethernet/qlogic/qede/qede_fp.c:43:
>> include/linux/qed/qede_rdma.h:57:12: warning: 'struct pci_dev' declared 
>> inside parameter list [enabled by default]
>> include/linux/qed/qede_rdma.h:57:12: warning: its scope is only this 
>> definition or declaration, which is probably not what you want [enabled by 
>> default]

vim +57 include/linux/qed/qede_rdma.h

cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  41
QEDE_DOWN,
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  42
QEDE_CHANGE_ADDR,
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  43
QEDE_CLOSE
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  44  };
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  45  
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  46  struct 
qede_roce_event_work {
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  47struct 
list_head list;
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  48struct 
work_struct work;
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  49void 
*ptr;
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  50enum 
qede_roce_event event;
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  51  };
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  52  
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  53  struct 
qedr_driver {
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  54
unsigned char name[32];
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  55  
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  56struct 
qedr_dev* (*add)(struct qed_dev *, struct pci_dev *,
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01 @57
struct net_device *);
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  58  
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  59void 
(*remove)(struct qedr_dev *);
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  60void 
(*notify)(struct qedr_dev *, enum qede_roce_event);
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  61  };
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  62  
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  63  /* APIs for 
RoCE driver to register callback handlers,
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  64   * which will 
be invoked when device is added, removed, ifup, ifdown
cee9fbd8 include/linux/qed/qede_roce.h Ram Amrani 2016-10-01  65   */

:: The code at line 57 was first introduced by commit
:: cee9fbd8e2e9e713cd8bf227c6492fd8854de74b qede: Add qedr framework

:: TO: Ram Amrani 
:: CC: David S. Miller 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH] atm: solos-pci: remove useless variable assignments

2017-06-15 Thread David Miller
From: "Gustavo A. R. Silva" 
Date: Thu, 15 Jun 2017 14:56:21 -0500

> Value assigned to variable _data32_ at lines 1254 and 1257 is
> overwritten at line 1260 before it can be used. This makes
> such variable assignments useless.
> 
> Addresses-Coverity-ID: 1227049
> Signed-off-by: Gustavo A. R. Silva 

Applied, thanks.


Re: [PATCH] skbuff: make skb_put_zero() return void

2017-06-15 Thread Johannes Berg
On Thu, 2017-06-15 at 17:26 -0400, David Miller wrote:
> 
> > *skb_put(skb, 1) = 'x';
> > 
> > Seems pretty unlikely we have that, and in any case the compiler
> would
> > warn (error?) there if skb_put() becomes void.
> 
> Actually I am pretty sure I've seen a pattern like that somewhere. :-
> )

Yeah, there are actually a ton of them, and oddly enough my spatch is
failing to catch _one_ of them?? Still refining it :)

johannes


Re: [PATCH] skbuff: make skb_put_zero() return void

2017-06-15 Thread David Miller
From: Johannes Berg 
Date: Thu, 15 Jun 2017 21:28:32 +0200

> On Thu, 2017-06-15 at 12:18 -0400, David Miller wrote:
> 
>> Although a bit disruptive, it might be nice to convert all of the
>> other "char *" related data pointers in skbuff based interfaces.
> 
> I think it'd actually be pretty easy, since there are very few cases
> where you need non-void, e.g.
> 
> *skb_put(skb, 1) = 'x';
> 
> Seems pretty unlikely we have that, and in any case the compiler would
> warn (error?) there if skb_put() becomes void.

Actually I am pretty sure I've seen a pattern like that somewhere. :-)



Re: [PATCH net-next] net: dsa: assign default CPU port to all ports

2017-06-15 Thread David Miller
From: Vivien Didelot 
Date: Thu, 15 Jun 2017 15:06:54 -0400

> The current code only assigns the default cpu_dp to all user ports of
> the switch to which the CPU port belongs. The user ports of the other
> switches of the fabric thus don't have a default CPU port.
> 
> This patch fixes this by assigning the cpu_dp of all user ports of all
> switches of the fabric when the tree is fully parsed.
> 
> Fixes: a29342e73911 ("net: dsa: Associate slave network device with CPU port")
> Signed-off-by: Vivien Didelot 

Applied, thanks.


[net-next:master 1119/1146] ERROR: "tcp_rate_check_app_limited" [net/tls/tls.ko] undefined!

2017-06-15 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   3715c47bcda8bb56f7e2be27276282a2d0d48c09
commit: 3c4d7559159bfe1e3b94df3a657b2cda3a34e218 [1119/1146] tls: kernel TLS 
support
config: x86_64-randconfig-u0-06160346 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
git checkout 3c4d7559159bfe1e3b94df3a657b2cda3a34e218
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   ERROR: "tcp_prot" [net/tls/tls.ko] undefined!
>> ERROR: "tcp_rate_check_app_limited" [net/tls/tls.ko] undefined!
   ERROR: "tcp_register_ulp" [net/tls/tls.ko] undefined!
   ERROR: "tcp_unregister_ulp" [net/tls/tls.ko] undefined!
   ERROR: "do_tcp_sendpages" [net/tls/tls.ko] undefined!

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [Intel-wired-lan] [PATCH] intel: i40e: virtchnl: fix incorrect variable assignment

2017-06-15 Thread Gustavo A. R. Silva

Hi Jesse,

Quoting Jesse Brandeburg :


On Wed, 14 Jun 2017 21:38:26 -0500
"Gustavo A. R. Silva"  wrote:


Fix incorrect variable assignment.
Based on line 1511: aq_ret = I40_ERR_PARAM; the correct variable to be
used in this instance is aq_ret instead of ret. Also, variable ret is
updated at line 1602 just before return, so assigning a value to this
variable in this code block is useless.

Addresses-Coverity-ID: 1397693
Signed-off-by: Gustavo A. R. Silva 


Thanks for the fix, looks reasonable.
Acked-by: Jesse Brandeburg 


Absolutely, glad to help.

Regards
--
Gustavo A. R. Silva






[net 6/6] net/mlx5e: Avoid doing a cleanup call if the profile doesn't have it

2017-06-15 Thread Saeed Mahameed
From: Or Gerlitz 

The error flow of mlx5e_create_netdev calls the cleanup call
of the given profile without checking if it exists, fix that.

Currently the VF reps don't register that callback and we crash
if getting into error -- can be reproduced by the user doing ctrl^C
while attempting to change the sriov mode from legacy to switchdev.

Fixes: 26e59d8077a3 '(net/mlx5e: Implement mlx5e interface attach/detach 
callbacks')
Signed-off-by: Or Gerlitz 
Reported-by: Sabrina Dubroca 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 41cd22a223dc..277f4de30375 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4241,7 +4241,8 @@ struct net_device *mlx5e_create_netdev(struct 
mlx5_core_dev *mdev,
return netdev;
 
 err_cleanup_nic:
-   profile->cleanup(priv);
+   if (profile->cleanup)
+   profile->cleanup(priv);
free_netdev(netdev);
 
return NULL;
-- 
2.11.0



[net 3/6] net/mlx5e: Fix min inline value for VF rep SQs

2017-06-15 Thread Saeed Mahameed
From: Chris Mi 

The offending commit only changed the code path for PF/VF, but it
didn't take care of VF representors. As a result, since
params->tx_min_inline_mode for VF representors is kzalloced to 0
(MLX5_INLINE_MODE_NONE), all VF reps SQs were set to that mode.

This actually works on CX5 by default but broke CX4. Fix that by
adding a call to query the min inline mode from the VF rep build up code.

Fixes: a6f402e49901 ("net/mlx5e: Tx, no inline copy on ConnectX-5")
Signed-off-by: Chris Mi 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 79462c0368a0..46984a52a94b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -791,6 +791,8 @@ static void mlx5e_build_rep_params(struct mlx5_core_dev 
*mdev,
params->tx_max_inline = mlx5e_get_max_inline_cap(mdev);
params->num_tc= 1;
params->lro_wqe_sz= MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ;
+
+   mlx5_query_min_inline(mdev, >tx_min_inline_mode);
 }
 
 static void mlx5e_build_rep_netdev(struct net_device *netdev)
-- 
2.11.0



[net 4/6] net/mlx5: Properly check applicability of devlink eswitch commands

2017-06-15 Thread Saeed Mahameed
From: Or Gerlitz 

Currently we don't check that the link type is Eth and hence crash
on IB ports when attempting to deref esw->xxx, fix that.

To avoid repeating this check over and over, put the existing
checks and the one on link type in a single helper.

Fixes: 7768d1971de6 ('net/mlx5: E-Switch, Add control for encapsulation')
Signed-off-by: Or Gerlitz 
Reported-by: Mohamad Badarnah 
Reviewed-by: Roi Dayan 
Signed-off-by: Saeed Mahameed 
---
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 77 +++---
 1 file changed, 40 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index f991f669047e..a53e982a6863 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -906,21 +906,34 @@ static int esw_inline_mode_to_devlink(u8 mlx5_mode, u8 
*mode)
return 0;
 }
 
-int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
+static int mlx5_devlink_eswitch_check(struct devlink *devlink)
 {
-   struct mlx5_core_dev *dev;
-   u16 cur_mlx5_mode, mlx5_mode = 0;
+   struct mlx5_core_dev *dev = devlink_priv(devlink);
 
-   dev = devlink_priv(devlink);
+   if (MLX5_CAP_GEN(dev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
+   return -EOPNOTSUPP;
 
if (!MLX5_CAP_GEN(dev, vport_group_manager))
return -EOPNOTSUPP;
 
-   cur_mlx5_mode = dev->priv.eswitch->mode;
-
-   if (cur_mlx5_mode == SRIOV_NONE)
+   if (dev->priv.eswitch->mode == SRIOV_NONE)
return -EOPNOTSUPP;
 
+   return 0;
+}
+
+int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
+{
+   struct mlx5_core_dev *dev = devlink_priv(devlink);
+   u16 cur_mlx5_mode, mlx5_mode = 0;
+   int err;
+
+   err = mlx5_devlink_eswitch_check(devlink);
+   if (err)
+   return err;
+
+   cur_mlx5_mode = dev->priv.eswitch->mode;
+
if (esw_mode_from_devlink(mode, _mode))
return -EINVAL;
 
@@ -937,15 +950,12 @@ int mlx5_devlink_eswitch_mode_set(struct devlink 
*devlink, u16 mode)
 
 int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode)
 {
-   struct mlx5_core_dev *dev;
-
-   dev = devlink_priv(devlink);
-
-   if (!MLX5_CAP_GEN(dev, vport_group_manager))
-   return -EOPNOTSUPP;
+   struct mlx5_core_dev *dev = devlink_priv(devlink);
+   int err;
 
-   if (dev->priv.eswitch->mode == SRIOV_NONE)
-   return -EOPNOTSUPP;
+   err = mlx5_devlink_eswitch_check(devlink);
+   if (err)
+   return err;
 
return esw_mode_to_devlink(dev->priv.eswitch->mode, mode);
 }
@@ -954,15 +964,12 @@ int mlx5_devlink_eswitch_inline_mode_set(struct devlink 
*devlink, u8 mode)
 {
struct mlx5_core_dev *dev = devlink_priv(devlink);
struct mlx5_eswitch *esw = dev->priv.eswitch;
-   int num_vports = esw->enabled_vports;
int err, vport;
u8 mlx5_mode;
 
-   if (!MLX5_CAP_GEN(dev, vport_group_manager))
-   return -EOPNOTSUPP;
-
-   if (esw->mode == SRIOV_NONE)
-   return -EOPNOTSUPP;
+   err = mlx5_devlink_eswitch_check(devlink);
+   if (err)
+   return err;
 
switch (MLX5_CAP_ETH(dev, wqe_inline_mode)) {
case MLX5_CAP_INLINE_MODE_NOT_REQUIRED:
@@ -985,7 +992,7 @@ int mlx5_devlink_eswitch_inline_mode_set(struct devlink 
*devlink, u8 mode)
if (err)
goto out;
 
-   for (vport = 1; vport < num_vports; vport++) {
+   for (vport = 1; vport < esw->enabled_vports; vport++) {
err = mlx5_modify_nic_vport_min_inline(dev, vport, mlx5_mode);
if (err) {
esw_warn(dev, "Failed to set min inline on vport %d\n",
@@ -1010,12 +1017,11 @@ int mlx5_devlink_eswitch_inline_mode_get(struct devlink 
*devlink, u8 *mode)
 {
struct mlx5_core_dev *dev = devlink_priv(devlink);
struct mlx5_eswitch *esw = dev->priv.eswitch;
+   int err;
 
-   if (!MLX5_CAP_GEN(dev, vport_group_manager))
-   return -EOPNOTSUPP;
-
-   if (esw->mode == SRIOV_NONE)
-   return -EOPNOTSUPP;
+   err = mlx5_devlink_eswitch_check(devlink);
+   if (err)
+   return err;
 
return esw_inline_mode_to_devlink(esw->offloads.inline_mode, mode);
 }
@@ -1062,11 +1068,9 @@ int mlx5_devlink_eswitch_encap_mode_set(struct devlink 
*devlink, u8 encap)
struct mlx5_eswitch *esw = dev->priv.eswitch;
int err;
 
-   if (!MLX5_CAP_GEN(dev, vport_group_manager))
-   return -EOPNOTSUPP;
-
-   if (esw->mode == SRIOV_NONE)
-   return -EOPNOTSUPP;
+   err = mlx5_devlink_eswitch_check(devlink);
+   if 

[net 5/6] net/mlx5e: Remove TC header re-write offloading of ip tos

2017-06-15 Thread Saeed Mahameed
From: Or Gerlitz 

Currently the firmware API is partial and allows to offload only
the dscp part of the tos, also, ipv6 support isn't there yet.

As such, remove the offloading option of ipv4 dscp till the FW
APIs are more comprehensive.

Fixes: d79b6df6b10a ('net/mlx5e: Add parsing of TC pedit actions to HW format')
Signed-off-by: Or Gerlitz 
Reviewed-by: Paul Blakey 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index ec63158ab643..9df9fc0d26f5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -895,7 +895,6 @@ static struct mlx5_fields fields[] = {
{MLX5_ACTION_IN_FIELD_OUT_SMAC_15_0,  2, offsetof(struct pedit_headers, 
eth.h_source[4])},
{MLX5_ACTION_IN_FIELD_OUT_ETHERTYPE,  2, offsetof(struct pedit_headers, 
eth.h_proto)},
 
-   {MLX5_ACTION_IN_FIELD_OUT_IP_DSCP, 1, offsetof(struct pedit_headers, 
ip4.tos)},
{MLX5_ACTION_IN_FIELD_OUT_IP_TTL,  1, offsetof(struct pedit_headers, 
ip4.ttl)},
{MLX5_ACTION_IN_FIELD_OUT_SIPV4,   4, offsetof(struct pedit_headers, 
ip4.saddr)},
{MLX5_ACTION_IN_FIELD_OUT_DIPV4,   4, offsetof(struct pedit_headers, 
ip4.daddr)},
-- 
2.11.0



[net 2/6] net/mlx5e: Fix timestamping capabilities reporting

2017-06-15 Thread Saeed Mahameed
From: Maor Dickman 

Misuse of (BIT) macro caused to report wrong flags for
"Hardware Transmit Timestamp Modes" and "Hardware Receive
Filter Modes"

Fixes: ef9814deafd0 ('net/mlx5e: Add HW timestamping (TS) support')
Signed-off-by: Maor Dickman 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 8209affa75c3..16486dff1493 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1242,11 +1242,11 @@ static int mlx5e_get_ts_info(struct net_device *dev,
 SOF_TIMESTAMPING_RX_HARDWARE |
 SOF_TIMESTAMPING_RAW_HARDWARE;
 
-   info->tx_types = (BIT(1) << HWTSTAMP_TX_OFF) |
-(BIT(1) << HWTSTAMP_TX_ON);
+   info->tx_types = BIT(HWTSTAMP_TX_OFF) |
+BIT(HWTSTAMP_TX_ON);
 
-   info->rx_filters = (BIT(1) << HWTSTAMP_FILTER_NONE) |
-  (BIT(1) << HWTSTAMP_FILTER_ALL);
+   info->rx_filters = BIT(HWTSTAMP_FILTER_NONE) |
+  BIT(HWTSTAMP_FILTER_ALL);
 
return 0;
 }
-- 
2.11.0



[pull request][net 0/6] Mellanox mlx5 fixes 2017-06-14

2017-06-15 Thread Saeed Mahameed
Hi Dave,

This series contains some fixes for the mlx5 core and netdev driver.

Please pull and let me know if there's any problem.

For -stable:
("net/mlx5: Wait for FW readiness before initializing command interface") 
kernels >= 4.4
("net/mlx5e: Fix timestamping capabilities reporting") kernels >= 4.5
("net/mlx5e: Avoid doing a cleanup call if the profile doesn't have it") 
kernels >= 4.9
("net/mlx5e: Fix min inline value for VF rep SQs") kernels >= 4.11

The "net/mlx5e: Fix min inline .." (a oneliner patch) doesn't cleanly apply
to 4.11, it hits a contextual conflict and can be easily resolved by:
+   mlx5_query_min_inline(mdev, >params.tx_min_inline_mode);
to the end of mlx5e_build_rep_netdev_priv. Note the 2nd parameter of
mlx5_query_min_inline is slightly different from the original one.

Thanks,
Saeed.

---

The following changes since commit 3b1bbafbfd14474fee61487552c9916ec1b25c58:

  Doc: net: dsa: b53: update location of referenced dsa.txt (2017-06-15 
15:02:40 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
tags/mlx5-fixes-2017-06-14

for you to fetch changes up to 31ac93386d135a6c96de9c8bab406f5ccabf5a4d:

  net/mlx5e: Avoid doing a cleanup call if the profile doesn't have it 
(2017-06-15 23:27:46 +0300)


mlx5-fixes-2017-06-14


Chris Mi (1):
  net/mlx5e: Fix min inline value for VF rep SQs

Eli Cohen (1):
  net/mlx5: Wait for FW readiness before initializing command interface

Maor Dickman (1):
  net/mlx5e: Fix timestamping capabilities reporting

Or Gerlitz (3):
  net/mlx5: Properly check applicability of devlink eswitch commands
  net/mlx5e: Remove TC header re-write offloading of ip tos
  net/mlx5e: Avoid doing a cleanup call if the profile doesn't have it

 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  8 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  3 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  2 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c|  1 -
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 77 +++---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 14 +++-
 6 files changed, 60 insertions(+), 45 deletions(-)


[net 1/6] net/mlx5: Wait for FW readiness before initializing command interface

2017-06-15 Thread Saeed Mahameed
From: Eli Cohen 

Before attempting to initialize the command interface we must wait till
the fw_initializing bit is clear.

If we fail to meet this condition the hardware will drop our
configuration, specifically the descriptors page address.  This scenario
can happen when the firmware is still executing an FLR flow and did not
finish yet so the driver needs to wait for that to finish.

Fixes: e3297246c2c8 ('net/mlx5_core: Wait for FW readiness on startup')
Signed-off-by: Eli Cohen 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 4f577a5abf88..13be264587f1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -175,8 +175,9 @@ static struct mlx5_profile profile[] = {
},
 };
 
-#define FW_INIT_TIMEOUT_MILI   2000
-#define FW_INIT_WAIT_MS2
+#define FW_INIT_TIMEOUT_MILI   2000
+#define FW_INIT_WAIT_MS2
+#define FW_PRE_INIT_TIMEOUT_MILI   1
 
 static int wait_fw_init(struct mlx5_core_dev *dev, u32 max_wait_mili)
 {
@@ -1013,6 +1014,15 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, 
struct mlx5_priv *priv,
 */
dev->state = MLX5_DEVICE_STATE_UP;
 
+   /* wait for firmware to accept initialization segments configurations
+*/
+   err = wait_fw_init(dev, FW_PRE_INIT_TIMEOUT_MILI);
+   if (err) {
+   dev_err(>pdev->dev, "Firmware over %d MS in 
pre-initializing state, aborting\n",
+   FW_PRE_INIT_TIMEOUT_MILI);
+   goto out;
+   }
+
err = mlx5_cmd_init(dev);
if (err) {
dev_err(>dev, "Failed initializing command interface, 
aborting\n");
-- 
2.11.0



Re: [Intel-wired-lan] [PATCH] intel: i40e: virtchnl: fix incorrect variable assignment

2017-06-15 Thread Jesse Brandeburg
On Wed, 14 Jun 2017 21:38:26 -0500
"Gustavo A. R. Silva"  wrote:

> Fix incorrect variable assignment.
> Based on line 1511: aq_ret = I40_ERR_PARAM; the correct variable to be
> used in this instance is aq_ret instead of ret. Also, variable ret is
> updated at line 1602 just before return, so assigning a value to this
> variable in this code block is useless.
> 
> Addresses-Coverity-ID: 1397693
> Signed-off-by: Gustavo A. R. Silva 

Thanks for the fix, looks reasonable.
Acked-by: Jesse Brandeburg 


[PATCH net-next 6/7] qed: Wait for resources before FUNC_CLOSE

2017-06-15 Thread Yuval Mintz
From: Michal Kalderon 

Driver needs to wait for all resources to return from FW before it can send
the FUNC_CLOSE ramrod.

Signed-off-by: Michal Kalderon 
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_roce.c | 35 +-
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c 
b/drivers/net/ethernet/qlogic/qed/qed_roce.c
index 8419dcc..7482905 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
@@ -372,22 +372,7 @@ static void qed_rdma_bmap_free(struct qed_hwfn *p_hwfn,
 
 static void qed_rdma_resc_free(struct qed_hwfn *p_hwfn)
 {
-   struct qed_bmap *rcid_map = _hwfn->p_rdma_info->real_cid_map;
struct qed_rdma_info *p_rdma_info = p_hwfn->p_rdma_info;
-   int wait_count = 0;
-
-   /* when destroying a_RoCE QP the control is returned to the user after
-* the synchronous part. The asynchronous part may take a little longer.
-* We delay for a short while if an async destroy QP is still expected.
-* Beyond the added delay we clear the bitmap anyway.
-*/
-   while (bitmap_weight(rcid_map->bitmap, rcid_map->max_count)) {
-   msleep(100);
-   if (wait_count++ > 20) {
-   DP_NOTICE(p_hwfn, "cid bitmap wait timed out\n");
-   break;
-   }
-   }
 
qed_rdma_bmap_free(p_hwfn, _hwfn->p_rdma_info->cid_map, 1);
qed_rdma_bmap_free(p_hwfn, _hwfn->p_rdma_info->pd_map, 1);
@@ -704,6 +689,25 @@ static int qed_rdma_setup(struct qed_hwfn *p_hwfn,
return qed_rdma_start_fw(p_hwfn, params, p_ptt);
 }
 
+void qed_roce_stop(struct qed_hwfn *p_hwfn)
+{
+   struct qed_bmap *rcid_map = _hwfn->p_rdma_info->real_cid_map;
+   int wait_count = 0;
+
+   /* when destroying a_RoCE QP the control is returned to the user after
+* the synchronous part. The asynchronous part may take a little longer.
+* We delay for a short while if an async destroy QP is still expected.
+* Beyond the added delay we clear the bitmap anyway.
+*/
+   while (bitmap_weight(rcid_map->bitmap, rcid_map->max_count)) {
+   msleep(100);
+   if (wait_count++ > 20) {
+   DP_NOTICE(p_hwfn, "cid bitmap wait timed out\n");
+   break;
+   }
+   }
+}
+
 static int qed_rdma_stop(void *rdma_cxt)
 {
struct qed_hwfn *p_hwfn = (struct qed_hwfn *)rdma_cxt;
@@ -733,6 +737,7 @@ static int qed_rdma_stop(void *rdma_cxt)
qed_wr(p_hwfn, p_ptt, PRS_REG_LIGHT_L2_ETHERTYPE_EN,
   (ll2_ethertype_en & 0xFFFE));
 
+   qed_roce_stop(p_hwfn);
qed_ptt_release(p_hwfn, p_ptt);
 
/* Get SPQ entry */
-- 
2.9.4



[PATCH net-next 7/7] qed: SPQ async callback registration

2017-06-15 Thread Yuval Mintz
From: Michal Kalderon 

Whenever firmware indicates that there's an async indication it needs
to handle, there's a switch-case where the right functionality is called
based on function's personality and information.

Before iWARP is added [as yet another client], switch over the SPQ into
a callback-registered mechanism, allowing registration of the relevant
event-processing logic based on the function's personality. This allows
us to tidy the code by removing protocol-specifics from a common file.

Signed-off-by: Michal Kalderon 
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_iscsi.c | 24 -
 drivers/net/ethernet/qlogic/qed/qed_roce.c  | 16 ++---
 drivers/net/ethernet/qlogic/qed/qed_roce.h  |  6 
 drivers/net/ethernet/qlogic/qed/qed_sp.h| 17 +
 drivers/net/ethernet/qlogic/qed/qed_spq.c   | 54 -
 drivers/net/ethernet/qlogic/qed/qed_sriov.c | 16 +++--
 drivers/net/ethernet/qlogic/qed/qed_sriov.h | 18 --
 7 files changed, 96 insertions(+), 55 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_iscsi.c 
b/drivers/net/ethernet/qlogic/qed/qed_iscsi.c
index 5a1ed05..813c77c 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_iscsi.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_iscsi.c
@@ -62,6 +62,22 @@
 #include "qed_sriov.h"
 #include "qed_reg_addr.h"
 
+static int
+qed_iscsi_async_event(struct qed_hwfn *p_hwfn,
+ u8 fw_event_code,
+ u16 echo, union event_ring_data *data, u8 fw_return_code)
+{
+   if (p_hwfn->p_iscsi_info->event_cb) {
+   struct qed_iscsi_info *p_iscsi = p_hwfn->p_iscsi_info;
+
+   return p_iscsi->event_cb(p_iscsi->event_context,
+fw_event_code, data);
+   } else {
+   DP_NOTICE(p_hwfn, "iSCSI async completion is not set\n");
+   return -EINVAL;
+   }
+}
+
 struct qed_iscsi_conn {
struct list_head list_entry;
bool free_on_delete;
@@ -265,6 +281,9 @@ qed_sp_iscsi_func_start(struct qed_hwfn *p_hwfn,
p_hwfn->p_iscsi_info->event_context = event_context;
p_hwfn->p_iscsi_info->event_cb = async_event_cb;
 
+   qed_spq_register_async_cb(p_hwfn, PROTOCOLID_ISCSI,
+ qed_iscsi_async_event);
+
return qed_spq_post(p_hwfn, p_ent, NULL);
 }
 
@@ -631,7 +650,10 @@ static int qed_sp_iscsi_func_stop(struct qed_hwfn *p_hwfn,
p_ramrod = _ent->ramrod.iscsi_destroy;
p_ramrod->hdr.op_code = ISCSI_RAMROD_CMD_ID_DESTROY_FUNC;
 
-   return qed_spq_post(p_hwfn, p_ent, NULL);
+   rc = qed_spq_post(p_hwfn, p_ent, NULL);
+
+   qed_spq_unregister_async_cb(p_hwfn, PROTOCOLID_ISCSI);
+   return rc;
 }
 
 static void __iomem *qed_iscsi_get_db_addr(struct qed_hwfn *p_hwfn, u32 cid)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c 
b/drivers/net/ethernet/qlogic/qed/qed_roce.c
index 7482905..673f80a 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
@@ -68,12 +68,14 @@
 
 static void qed_roce_free_real_icid(struct qed_hwfn *p_hwfn, u16 icid);
 
-void qed_roce_async_event(struct qed_hwfn *p_hwfn,
- u8 fw_event_code, union rdma_eqe_data *rdma_data)
+static int
+qed_roce_async_event(struct qed_hwfn *p_hwfn,
+u8 fw_event_code,
+u16 echo, union event_ring_data *data, u8 fw_return_code)
 {
if (fw_event_code == ROCE_ASYNC_EVENT_DESTROY_QP_DONE) {
u16 icid =
-   (u16)le32_to_cpu(rdma_data->rdma_destroy_qp_data.cid);
+   (u16)le32_to_cpu(data->rdma_data.rdma_destroy_qp_data.cid);
 
/* icid release in this async event can occur only if the icid
 * was offloaded to the FW. In case it wasn't offloaded this is
@@ -85,8 +87,10 @@ void qed_roce_async_event(struct qed_hwfn *p_hwfn,
 
events->affiliated_event(p_hwfn->p_rdma_info->events.context,
 fw_event_code,
-_data->async_handle);
+(void *)>rdma_data.async_handle);
}
+
+   return 0;
 }
 
 static int qed_rdma_bmap_alloc(struct qed_hwfn *p_hwfn,
@@ -686,6 +690,9 @@ static int qed_rdma_setup(struct qed_hwfn *p_hwfn,
if (rc)
return rc;
 
+   qed_spq_register_async_cb(p_hwfn, PROTOCOLID_ROCE,
+ qed_roce_async_event);
+
return qed_rdma_start_fw(p_hwfn, params, p_ptt);
 }
 
@@ -706,6 +713,7 @@ void qed_roce_stop(struct qed_hwfn *p_hwfn)
break;
}
}
+   qed_spq_unregister_async_cb(p_hwfn, PROTOCOLID_ROCE);
 }
 
 static int qed_rdma_stop(void *rdma_cxt)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.h 

[PATCH net-next 5/7] qed*: Set rdma generic functions prefix

2017-06-15 Thread Yuval Mintz
From: Michal Kalderon 

Rename the functions common to both iWARP and RoCE to have a prefix of
_rdma_ instead of _roce_.

Signed-off-by: Michal Kalderon 
Signed-off-by: Yuval Mintz 
---
 drivers/infiniband/hw/qedr/main.c|   6 +-
 drivers/net/ethernet/qlogic/qede/qede.h  |   4 +-
 drivers/net/ethernet/qlogic/qede/qede_main.c |  12 +--
 drivers/net/ethernet/qlogic/qede/qede_rdma.c | 142 +--
 include/linux/qed/qede_rdma.h|  37 +++
 5 files changed, 101 insertions(+), 100 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/main.c 
b/drivers/infiniband/hw/qedr/main.c
index 714eb0c..b5851fd 100644
--- a/drivers/infiniband/hw/qedr/main.c
+++ b/drivers/infiniband/hw/qedr/main.c
@@ -902,7 +902,7 @@ static void qedr_mac_address_change(struct qedr_dev *dev)
  * initialization done before RoCE driver notifies
  * event to stack.
  */
-static void qedr_notify(struct qedr_dev *dev, enum qede_roce_event event)
+static void qedr_notify(struct qedr_dev *dev, enum qede_rdma_event event)
 {
switch (event) {
case QEDE_UP:
@@ -931,12 +931,12 @@ static struct qedr_driver qedr_drv = {
 
 static int __init qedr_init_module(void)
 {
-   return qede_roce_register_driver(_drv);
+   return qede_rdma_register_driver(_drv);
 }
 
 static void __exit qedr_exit_module(void)
 {
-   qede_roce_unregister_driver(_drv);
+   qede_rdma_unregister_driver(_drv);
 }
 
 module_init(qedr_init_module);
diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
index 2d6b30c..4dfb238 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -154,8 +154,8 @@ struct qede_vlan {
 struct qede_rdma_dev {
struct qedr_dev *qedr_dev;
struct list_head entry;
-   struct list_head roce_event_list;
-   struct workqueue_struct *roce_wq;
+   struct list_head rdma_event_list;
+   struct workqueue_struct *rdma_wq;
 };
 
 struct qede_ptp;
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
index e9eaa38..06ca13d 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -262,7 +262,7 @@ static int qede_netdev_event(struct notifier_block *this, 
unsigned long event,
break;
case NETDEV_CHANGEADDR:
edev = netdev_priv(ndev);
-   qede_roce_event_changeaddr(edev);
+   qede_rdma_event_changeaddr(edev);
break;
}
 
@@ -977,7 +977,7 @@ static int __qede_probe(struct pci_dev *pdev, u32 
dp_module, u8 dp_level,
 
qede_init_ndev(edev);
 
-   rc = qede_roce_dev_add(edev);
+   rc = qede_rdma_dev_add(edev);
if (rc)
goto err3;
 
@@ -1013,7 +1013,7 @@ static int __qede_probe(struct pci_dev *pdev, u32 
dp_module, u8 dp_level,
return 0;
 
 err4:
-   qede_roce_dev_remove(edev);
+   qede_rdma_dev_remove(edev);
 err3:
free_netdev(edev->ndev);
 err2:
@@ -1064,7 +1064,7 @@ static void __qede_remove(struct pci_dev *pdev, enum 
qede_remove_mode mode)
 
qede_ptp_disable(edev);
 
-   qede_roce_dev_remove(edev);
+   qede_rdma_dev_remove(edev);
 
edev->ops->common->set_power_state(cdev, PCI_D0);
 
@@ -1964,7 +1964,7 @@ static void qede_unload(struct qede_dev *edev, enum 
qede_unload_mode mode,
 
edev->state = QEDE_STATE_CLOSED;
 
-   qede_roce_dev_event_close(edev);
+   qede_rdma_dev_event_close(edev);
 
/* Close OS Tx */
netif_tx_disable(edev->ndev);
@@ -2069,7 +2069,7 @@ static int qede_load(struct qede_dev *edev, enum 
qede_load_mode mode,
link_params.link_up = true;
edev->ops->common->set_link(edev->cdev, _params);
 
-   qede_roce_dev_event_open(edev);
+   qede_rdma_dev_event_open(edev);
 
edev->state = QEDE_STATE_OPEN;
 
diff --git a/drivers/net/ethernet/qlogic/qede/qede_rdma.c 
b/drivers/net/ethernet/qlogic/qede/qede_rdma.c
index 9837ee2..50b142f 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_rdma.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_rdma.c
@@ -40,12 +40,12 @@ static struct qedr_driver *qedr_drv;
 static LIST_HEAD(qedr_dev_list);
 static DEFINE_MUTEX(qedr_dev_list_lock);
 
-bool qede_roce_supported(struct qede_dev *dev)
+bool qede_rdma_supported(struct qede_dev *dev)
 {
return dev->dev_info.common.rdma_supported;
 }
 
-static void _qede_roce_dev_add(struct qede_dev *edev)
+static void _qede_rdma_dev_add(struct qede_dev *edev)
 {
if (!qedr_drv)
return;
@@ -54,11 +54,11 @@ static void _qede_roce_dev_add(struct qede_dev *edev)
 edev->ndev);
 }
 
-static int qede_roce_create_wq(struct qede_dev *edev)
+static int qede_rdma_create_wq(struct qede_dev *edev)
 {
-   

[PATCH net-next 4/7] qed*: Rename qede_roce.[ch]

2017-06-15 Thread Yuval Mintz
From: Michal Kalderon 

Once we have iWARP support, the qede portion of the qedr<->qede would
serve all the RDMA protocols - so rename the file to be appropriate
to its function.

Signed-off-by: Michal Kalderon 
Signed-off-by: Yuval Mintz 
---
 drivers/infiniband/hw/qedr/main.c | 2 +-
 drivers/infiniband/hw/qedr/qedr.h | 2 +-
 drivers/net/ethernet/qlogic/qede/Makefile | 2 +-
 drivers/net/ethernet/qlogic/qede/qede.h   | 1 +
 drivers/net/ethernet/qlogic/qede/qede_main.c  | 1 -
 drivers/net/ethernet/qlogic/qede/{qede_roce.c => qede_rdma.c} | 2 +-
 include/linux/qed/{qede_roce.h => qede_rdma.h}| 0
 7 files changed, 5 insertions(+), 5 deletions(-)
 rename drivers/net/ethernet/qlogic/qede/{qede_roce.c => qede_rdma.c} (99%)
 rename include/linux/qed/{qede_roce.h => qede_rdma.h} (100%)

diff --git a/drivers/infiniband/hw/qedr/main.c 
b/drivers/infiniband/hw/qedr/main.c
index 5a32b80..714eb0c 100644
--- a/drivers/infiniband/hw/qedr/main.c
+++ b/drivers/infiniband/hw/qedr/main.c
@@ -37,7 +37,7 @@
 #include 
 #include 
 #include 
-#include 
+
 #include 
 #include 
 #include "qedr.h"
diff --git a/drivers/infiniband/hw/qedr/qedr.h 
b/drivers/infiniband/hw/qedr/qedr.h
index 80333ec..2376019 100644
--- a/drivers/infiniband/hw/qedr/qedr.h
+++ b/drivers/infiniband/hw/qedr/qedr.h
@@ -37,7 +37,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include "qedr_hsi_rdma.h"
 
diff --git a/drivers/net/ethernet/qlogic/qede/Makefile 
b/drivers/net/ethernet/qlogic/qede/Makefile
index bc5f7c3..75408fb 100644
--- a/drivers/net/ethernet/qlogic/qede/Makefile
+++ b/drivers/net/ethernet/qlogic/qede/Makefile
@@ -2,4 +2,4 @@ obj-$(CONFIG_QEDE) := qede.o
 
 qede-y := qede_main.o qede_fp.o qede_filter.o qede_ethtool.o qede_ptp.o
 qede-$(CONFIG_DCB) += qede_dcbnl.o
-qede-$(CONFIG_QED_RDMA) += qede_roce.o
+qede-$(CONFIG_QED_RDMA) += qede_rdma.o
diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
index 694c09b..2d6b30c 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #ifdef CONFIG_RFS_ACCEL
 #include 
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 37ad799..e9eaa38 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -60,7 +60,6 @@
 #include 
 #include 
 #include 
-#include 
 #include "qede.h"
 #include "qede_ptp.h"
 
diff --git a/drivers/net/ethernet/qlogic/qede/qede_roce.c 
b/drivers/net/ethernet/qlogic/qede/qede_rdma.c
similarity index 99%
rename from drivers/net/ethernet/qlogic/qede/qede_roce.c
rename to drivers/net/ethernet/qlogic/qede/qede_rdma.c
index c0030fb..9837ee2 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_roce.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_rdma.c
@@ -33,7 +33,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include "qede.h"
 
 static struct qedr_driver *qedr_drv;
diff --git a/include/linux/qed/qede_roce.h b/include/linux/qed/qede_rdma.h
similarity index 100%
rename from include/linux/qed/qede_roce.h
rename to include/linux/qed/qede_rdma.h
-- 
2.9.4



[PATCH net-next 3/7] qed: Disable RoCE dpm when DCBx change occurs

2017-06-15 Thread Yuval Mintz
If DCBx update occurs while QPs are open, stop sending edpms until all
QPs are closed.

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_dcbx.c |  8 +++
 drivers/net/ethernet/qlogic/qed/qed_roce.c | 36 ++
 drivers/net/ethernet/qlogic/qed/qed_roce.h |  5 +
 3 files changed, 49 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c 
b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
index 15b516a..f888045 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
@@ -44,6 +44,7 @@
 #include "qed_hsi.h"
 #include "qed_sp.h"
 #include "qed_sriov.h"
+#include "qed_roce.h"
 #ifdef CONFIG_DCB
 #include 
 #endif
@@ -892,6 +893,13 @@ qed_dcbx_mib_update_event(struct qed_hwfn *p_hwfn,
 
/* update storm FW with negotiation results */
qed_sp_pf_update(p_hwfn);
+
+   /* for roce PFs, we may want to enable/disable DPM
+* when DCBx change occurs
+*/
+   if (p_hwfn->hw_info.personality ==
+   QED_PCI_ETH_ROCE)
+   qed_roce_dpm_dcbx(p_hwfn, p_ptt);
}
}
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c 
b/drivers/net/ethernet/qlogic/qed/qed_roce.c
index 4bc2f6c..8419dcc 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
@@ -162,6 +162,11 @@ static int qed_bmap_test_id(struct qed_hwfn *p_hwfn,
return test_bit(id_num, bmap->bitmap);
 }
 
+static bool qed_bmap_is_empty(struct qed_bmap *bmap)
+{
+   return bmap->max_count == find_first_bit(bmap->bitmap, bmap->max_count);
+}
+
 static u32 qed_rdma_get_sb_id(void *p_hwfn, u32 rel_sb_id)
 {
/* First sb id for RoCE is after all the l2 sb */
@@ -2638,6 +2643,23 @@ static void *qed_rdma_get_rdma_ctx(struct qed_dev *cdev)
return QED_LEADING_HWFN(cdev);
 }
 
+static bool qed_rdma_allocated_qps(struct qed_hwfn *p_hwfn)
+{
+   bool result;
+
+   /* if rdma info has not been allocated, naturally there are no qps */
+   if (!p_hwfn->p_rdma_info)
+   return false;
+
+   spin_lock_bh(_hwfn->p_rdma_info->lock);
+   if (!p_hwfn->p_rdma_info->cid_map.bitmap)
+   result = false;
+   else
+   result = !qed_bmap_is_empty(_hwfn->p_rdma_info->cid_map);
+   spin_unlock_bh(_hwfn->p_rdma_info->lock);
+   return result;
+}
+
 static void qed_rdma_dpm_conf(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)
 {
u32 val;
@@ -2650,6 +2672,20 @@ static void qed_rdma_dpm_conf(struct qed_hwfn *p_hwfn, 
struct qed_ptt *p_ptt)
   val, p_hwfn->dcbx_no_edpm, p_hwfn->db_bar_no_edpm);
 }
 
+void qed_roce_dpm_dcbx(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)
+{
+   u8 val;
+
+   /* if any QPs are already active, we want to disable DPM, since their
+* context information contains information from before the latest DCBx
+* update. Otherwise enable it.
+*/
+   val = qed_rdma_allocated_qps(p_hwfn) ? true : false;
+   p_hwfn->dcbx_no_edpm = (u8)val;
+
+   qed_rdma_dpm_conf(p_hwfn, p_ptt);
+}
+
 void qed_rdma_dpm_bar(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)
 {
p_hwfn->db_bar_no_edpm = true;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.h 
b/drivers/net/ethernet/qlogic/qed/qed_roce.h
index 94be3b5..ddd7761 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_roce.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_roce.h
@@ -168,10 +168,15 @@ struct qed_rdma_qp {
 
 #if IS_ENABLED(CONFIG_QED_RDMA)
 void qed_rdma_dpm_bar(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt);
+void qed_roce_dpm_dcbx(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt);
 void qed_roce_async_event(struct qed_hwfn *p_hwfn,
  u8 fw_event_code, union rdma_eqe_data *rdma_data);
 #else
 static inline void qed_rdma_dpm_bar(struct qed_hwfn *p_hwfn, struct qed_ptt 
*p_ptt) {}
+
+static inline void qed_roce_dpm_dcbx(struct qed_hwfn *p_hwfn,
+struct qed_ptt *p_ptt) {}
+
 static inline void qed_roce_async_event(struct qed_hwfn *p_hwfn,
u8 fw_event_code,
union rdma_eqe_data *rdma_data) {}
-- 
2.9.4



[PATCH net-next 1/7] qed: Chain support for external PBL

2017-06-15 Thread Yuval Mintz
iWARP would require the chains to allocate/free their PBL memory
independently, so add the infrastructure to provide it externally.

Signed-off-by: Yuval Mintz 
---
 drivers/infiniband/hw/qedr/main.c |  2 +-
 drivers/infiniband/hw/qedr/verbs.c|  6 ++---
 drivers/net/ethernet/qlogic/qed/qed_dev.c | 35 ---
 drivers/net/ethernet/qlogic/qed/qed_dev_api.h |  5 +++-
 drivers/net/ethernet/qlogic/qed/qed_iscsi.c   |  6 ++---
 drivers/net/ethernet/qlogic/qed/qed_ll2.c |  6 ++---
 drivers/net/ethernet/qlogic/qed/qed_spq.c |  6 ++---
 drivers/net/ethernet/qlogic/qede/qede_main.c  |  8 +++---
 include/linux/qed/qed_chain.h |  7 ++
 include/linux/qed/qed_if.h|  3 ++-
 10 files changed, 56 insertions(+), 28 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/main.c 
b/drivers/infiniband/hw/qedr/main.c
index 485c1fe..5a32b80 100644
--- a/drivers/infiniband/hw/qedr/main.c
+++ b/drivers/infiniband/hw/qedr/main.c
@@ -276,7 +276,7 @@ static int qedr_alloc_resources(struct qedr_dev *dev)
   QED_CHAIN_CNT_TYPE_U16,
   n_entries,
   sizeof(struct regpair *),
-  >pbl);
+  >pbl, NULL);
if (rc)
goto err4;
 
diff --git a/drivers/infiniband/hw/qedr/verbs.c 
b/drivers/infiniband/hw/qedr/verbs.c
index 17685cf..80df89b 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -925,7 +925,7 @@ struct ib_cq *qedr_create_cq(struct ib_device *ibdev,
   QED_CHAIN_CNT_TYPE_U32,
   chain_entries,
   sizeof(union rdma_cqe),
-  >pbl);
+  >pbl, NULL);
if (rc)
goto err1;
 
@@ -1413,7 +1413,7 @@ qedr_roce_create_kernel_qp(struct qedr_dev *dev,
   QED_CHAIN_CNT_TYPE_U32,
   n_sq_elems,
   QEDR_SQE_ELEMENT_SIZE,
-  >sq.pbl);
+  >sq.pbl, NULL);
 
if (rc)
return rc;
@@ -1427,7 +1427,7 @@ qedr_roce_create_kernel_qp(struct qedr_dev *dev,
   QED_CHAIN_CNT_TYPE_U32,
   n_rq_elems,
   QEDR_RQE_ELEMENT_SIZE,
-  >rq.pbl);
+  >rq.pbl, NULL);
if (rc)
return rc;
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 65fe494..8b14054 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -3075,12 +3075,15 @@ static void qed_chain_free_pbl(struct qed_dev *cdev, 
struct qed_chain *p_chain)
}
 
pbl_size = page_cnt * QED_CHAIN_PBL_ENTRY_SIZE;
-   dma_free_coherent(>pdev->dev,
- pbl_size,
- p_chain->pbl_sp.p_virt_table,
- p_chain->pbl_sp.p_phys_table);
+
+   if (!p_chain->b_external_pbl)
+   dma_free_coherent(>pdev->dev,
+ pbl_size,
+ p_chain->pbl_sp.p_virt_table,
+ p_chain->pbl_sp.p_phys_table);
 out:
vfree(p_chain->pbl.pp_virt_addr_tbl);
+   p_chain->pbl.pp_virt_addr_tbl = NULL;
 }
 
 void qed_chain_free(struct qed_dev *cdev, struct qed_chain *p_chain)
@@ -3174,7 +3177,10 @@ qed_chain_alloc_single(struct qed_dev *cdev, struct 
qed_chain *p_chain)
return 0;
 }
 
-static int qed_chain_alloc_pbl(struct qed_dev *cdev, struct qed_chain *p_chain)
+static int
+qed_chain_alloc_pbl(struct qed_dev *cdev,
+   struct qed_chain *p_chain,
+   struct qed_chain_ext_pbl *ext_pbl)
 {
u32 page_cnt = p_chain->page_cnt, size, i;
dma_addr_t p_phys = 0, p_pbl_phys = 0;
@@ -3194,8 +3200,16 @@ static int qed_chain_alloc_pbl(struct qed_dev *cdev, 
struct qed_chain *p_chain)
 * should be saved to allow its freeing during the error flow.
 */
size = page_cnt * QED_CHAIN_PBL_ENTRY_SIZE;
-   p_pbl_virt = dma_alloc_coherent(>pdev->dev,
-   size, _pbl_phys, GFP_KERNEL);
+
+   if (!ext_pbl) {
+   p_pbl_virt = dma_alloc_coherent(>pdev->dev,
+   size, _pbl_phys, 

[PATCH net-next 2/7] qed: RoCE EDPM to honor PFC

2017-06-15 Thread Yuval Mintz
Configure device according to DCBx results so that EDPMs
made by RoCE would honor flow-control.

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 16 
 drivers/net/ethernet/qlogic/qed/qed_reg_addr.h |  6 ++
 2 files changed, 22 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c 
b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
index e2a62c0..15b516a 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
@@ -896,6 +896,22 @@ qed_dcbx_mib_update_event(struct qed_hwfn *p_hwfn,
}
 
qed_dcbx_get_params(p_hwfn, _hwfn->p_dcbx_info->get, type);
+
+   if (type == QED_DCBX_OPERATIONAL_MIB) {
+   struct qed_dcbx_results *p_data;
+   u16 val;
+
+   /* Configure in NIG which protocols support EDPM and should
+* honor PFC.
+*/
+   p_data = _hwfn->p_dcbx_info->results;
+   val = (0x1 << p_data->arr[DCBX_PROTOCOL_ROCE].tc) |
+ (0x1 << p_data->arr[DCBX_PROTOCOL_ROCE_V2].tc);
+   val <<= NIG_REG_TX_EDPM_CTRL_TX_EDPM_TC_EN_SHIFT;
+   val |= NIG_REG_TX_EDPM_CTRL_TX_EDPM_EN;
+   qed_wr(p_hwfn, p_ptt, NIG_REG_TX_EDPM_CTRL, val);
+   }
+
qed_dcbx_aen(p_hwfn, type);
 
return rc;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_reg_addr.h 
b/drivers/net/ethernet/qlogic/qed/qed_reg_addr.h
index 7e4639c..0cdb433 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_reg_addr.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_reg_addr.h
@@ -1564,6 +1564,12 @@
 #define NIG_REG_TSGEN_FREECNT_UPDATE_K2 0x509008UL
 #define CNIG_REG_NIG_PORT0_CONF_K2 0x218200UL
 
+#define NIG_REG_TX_EDPM_CTRL 0x501f0cUL
+#define NIG_REG_TX_EDPM_CTRL_TX_EDPM_EN (0x1 << 0)
+#define NIG_REG_TX_EDPM_CTRL_TX_EDPM_EN_SHIFT 0
+#define NIG_REG_TX_EDPM_CTRL_TX_EDPM_TC_EN (0xff << 1)
+#define NIG_REG_TX_EDPM_CTRL_TX_EDPM_TC_EN_SHIFT 1
+
 #define PRS_REG_SEARCH_GFT 0x1f11bcUL
 #define PRS_REG_CM_HDR_GFT 0x1f11c8UL
 #define PRS_REG_GFT_CAM 0x1f1100UL
-- 
2.9.4



[PATCH net-next 0/7] qed*: RDMA and infrastructure for iWARP

2017-06-15 Thread Yuval Mintz
This series focuses on RDMA in general with emphasis on required changes
toward adding iWARP support. The vast majority of the changes introduced
are in qed/qede, with a couple of small changes to qedr
[mentioned below].

The infrastructure changes:
 - Patch #1 adds the ability to pass PBL memory externally for a newly
created chain.
 - Patches #4, #5 rename qede_roce.[ch] into qede_rdma.[ch] + change
prefixes from _roce_ to _rdma_, as the API between qede and qedr is
agnostic to the variant of the RDMA protocol used. These patches also
touch qedr [basically to align it with the renaming, nothing more].
 - Patch #7 replaces the current SPQ async mechanism into serving
registered callbacks [before adding iWARP which would add another client
in need of this sort of functionallity].

The non-infrastrucutre changes:
 - Patches #2, #3 contain DCB-related changes to better align RDMA with
configured DCB.
 - Patch #6 contains a minor [mostly theoretical fix] to release flow.

Dave,

Please consider applying this series to `net-next'.

Thanks,
Yuval

Michal Kalderon (4):
  qed*: Rename qede_roce.[ch]
  qed*: Set rdma generic functions prefix
  qed: Wait for resources before FUNC_CLOSE
  qed: SPQ async callback registration

Yuval Mintz (3):
  qed: Chain support for external PBL
  qed: RoCE EDPM to honor PFC
  qed: Disable RoCE dpm when DCBx change occurs

 drivers/infiniband/hw/qedr/main.c  |  10 +-
 drivers/infiniband/hw/qedr/qedr.h  |   2 +-
 drivers/infiniband/hw/qedr/verbs.c |   6 +-
 drivers/net/ethernet/qlogic/qed/qed_dcbx.c |  24 
 drivers/net/ethernet/qlogic/qed/qed_dev.c  |  35 +++--
 drivers/net/ethernet/qlogic/qed/qed_dev_api.h  |   5 +-
 drivers/net/ethernet/qlogic/qed/qed_iscsi.c|  30 -
 drivers/net/ethernet/qlogic/qed/qed_ll2.c  |   6 +-
 drivers/net/ethernet/qlogic/qed/qed_reg_addr.h |   6 +
 drivers/net/ethernet/qlogic/qed/qed_roce.c |  87 ++---
 drivers/net/ethernet/qlogic/qed/qed_roce.h |   9 +-
 drivers/net/ethernet/qlogic/qed/qed_sp.h   |  17 +++
 drivers/net/ethernet/qlogic/qed/qed_spq.c  |  60 +
 drivers/net/ethernet/qlogic/qed/qed_sriov.c|  16 ++-
 drivers/net/ethernet/qlogic/qed/qed_sriov.h|  18 ---
 drivers/net/ethernet/qlogic/qede/Makefile  |   2 +-
 drivers/net/ethernet/qlogic/qede/qede.h|   5 +-
 drivers/net/ethernet/qlogic/qede/qede_main.c   |  21 ++-
 .../qlogic/qede/{qede_roce.c => qede_rdma.c}   | 144 ++---
 include/linux/qed/qed_chain.h  |   7 +
 include/linux/qed/qed_if.h |   3 +-
 include/linux/qed/{qede_roce.h => qede_rdma.h} |  37 +++---
 22 files changed, 348 insertions(+), 202 deletions(-)
 rename drivers/net/ethernet/qlogic/qede/{qede_roce.c => qede_rdma.c} (59%)
 rename include/linux/qed/{qede_roce.h => qede_rdma.h} (67%)

-- 
2.9.4



[PATCH net-next] net: dsa: add cross-chip multicast support

2017-06-15 Thread Vivien Didelot
Similarly to how cross-chip VLAN works, define a bitmap of multicast
group members for a switch, now including its DSA ports, so that
multicast traffic can be sent to all switches of the fabric.

A switch may drop the frames if no user port is a member.

This brings support for multicast in a multi-chip environment.
As of now, all switches of the fabric must support the multicast
operations in order to program a single fabric port.

Reported-by: Jason Cobham 
Signed-off-by: Vivien Didelot 
---
 net/dsa/switch.c | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index f1029a8d0e20..97e2e9c8cf3f 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -122,19 +122,30 @@ static int dsa_switch_mdb_add(struct dsa_switch *ds,
 {
const struct switchdev_obj_port_mdb *mdb = info->mdb;
struct switchdev_trans *trans = info->trans;
+   DECLARE_BITMAP(group, ds->num_ports);
+   int port, err;
 
-   /* Do not care yet about other switch chips of the fabric */
-   if (ds->index != info->sw_index)
-   return 0;
+   /* Build a mask of Multicast group members */
+   bitmap_zero(group, ds->num_ports);
+   if (ds->index == info->sw_index)
+   set_bit(info->port, group);
+   for (port = 0; port < ds->num_ports; port++)
+   if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
+   set_bit(port, group);
 
if (switchdev_trans_ph_prepare(trans)) {
if (!ds->ops->port_mdb_prepare || !ds->ops->port_mdb_add)
return -EOPNOTSUPP;
 
-   return ds->ops->port_mdb_prepare(ds, info->port, mdb, trans);
+   for_each_set_bit(port, group, ds->num_ports) {
+   err = ds->ops->port_mdb_prepare(ds, port, mdb, trans);
+   if (err)
+   return err;
+   }
}
 
-   ds->ops->port_mdb_add(ds, info->port, mdb, trans);
+   for_each_set_bit(port, group, ds->num_ports)
+   ds->ops->port_mdb_add(ds, port, mdb, trans);
 
return 0;
 }
@@ -144,14 +155,13 @@ static int dsa_switch_mdb_del(struct dsa_switch *ds,
 {
const struct switchdev_obj_port_mdb *mdb = info->mdb;
 
-   /* Do not care yet about other switch chips of the fabric */
-   if (ds->index != info->sw_index)
-   return 0;
-
if (!ds->ops->port_mdb_del)
return -EOPNOTSUPP;
 
-   return ds->ops->port_mdb_del(ds, info->port, mdb);
+   if (ds->index == info->sw_index)
+   return ds->ops->port_mdb_del(ds, info->port, mdb);
+
+   return 0;
 }
 
 static int dsa_switch_vlan_add(struct dsa_switch *ds,
-- 
2.13.1



[RFC PATCH net-next v2 02/15] bpf: program to load socketops BPF programs

2017-06-15 Thread Lawrence Brakmo
The program tcp_bpf can be used to remove current global sockops program
and to load/replace sockops BPF programs. There is also an option to
print the bpf trace buffer (for debugging purposes).

USAGE:
  ./tcp_bpf [-r] [-l] []
WHERE:
  -r  remove current loaded socketops BPF program
  not needed if loading a new program
  -l  print BPF trace buffer. Used when loading a new program
   name of BPF sockeops program to load
  if  does not end in ".o", then "_kern.o" is appended
  example: using tcp_rto will load tcp_rto_kern.o

Signed-off-by: Lawrence Brakmo 
---
 samples/bpf/Makefile  |  3 ++
 samples/bpf/tcp_bpf.c | 81 +++
 2 files changed, 84 insertions(+)
 create mode 100644 samples/bpf/tcp_bpf.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index a0561dc..ed6bc75 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -36,6 +36,7 @@ hostprogs-y += lwt_len_hist
 hostprogs-y += xdp_tx_iptunnel
 hostprogs-y += test_map_in_map
 hostprogs-y += per_socket_stats_example
+hostprogs-y += tcp_bpf
 
 # Libbpf dependencies
 LIBBPF := ../../tools/lib/bpf/bpf.o
@@ -52,6 +53,7 @@ tracex3-objs := bpf_load.o $(LIBBPF) tracex3_user.o
 tracex4-objs := bpf_load.o $(LIBBPF) tracex4_user.o
 tracex5-objs := bpf_load.o $(LIBBPF) tracex5_user.o
 tracex6-objs := bpf_load.o $(LIBBPF) tracex6_user.o
+tcp_bpf-objs := bpf_load.o $(LIBBPF) tcp_bpf.o
 test_probe_write_user-objs := bpf_load.o $(LIBBPF) test_probe_write_user_user.o
 trace_output-objs := bpf_load.o $(LIBBPF) trace_output_user.o
 lathist-objs := bpf_load.o $(LIBBPF) lathist_user.o
@@ -130,6 +132,7 @@ HOSTLOADLIBES_tracex4 += -lelf -lrt
 HOSTLOADLIBES_tracex5 += -lelf
 HOSTLOADLIBES_tracex6 += -lelf
 HOSTLOADLIBES_test_cgrp2_sock2 += -lelf
+HOSTLOADLIBES_tcp_bpf += -lelf
 HOSTLOADLIBES_test_probe_write_user += -lelf
 HOSTLOADLIBES_trace_output += -lelf -lrt
 HOSTLOADLIBES_lathist += -lelf
diff --git a/samples/bpf/tcp_bpf.c b/samples/bpf/tcp_bpf.c
new file mode 100644
index 000..9de18ea
--- /dev/null
+++ b/samples/bpf/tcp_bpf.c
@@ -0,0 +1,81 @@
+#include 
+#include 
+#include 
+#include 
+#include "libbpf.h"
+#include "bpf_load.h"
+#include 
+#include 
+
+static void usage(char *pname)
+{
+   printf("USAGE:\n  %s [-r] [-l] \n", pname);
+   printf("WHERE:\n");
+   printf("  -r  remove current loaded socketops BPF program\n");
+   printf("  not needed if loading a new program\n");
+   printf("  -l  print out BPF log buffer\n");
+   printf("   name of BPF sockeops program to load\n");
+   printf("  if  does not end in \".o\", then \"_kern.o\" "
+  "is appended\n");
+   printf("  example: using tcp1 will load tcp1_kern.o\n");
+   printf("\n");
+   exit(1);
+}
+
+int main(int argc, char **argv)
+{
+   union bpf_attr attr;
+   int k, logFlag = 0;
+   int error = -1;
+   char fn[500];
+
+   if (argc <= 1)
+   usage(argv[0]);
+   for (k = 1; k < argc; k++) {
+   if (!strcmp(argv[k], "-r")) {
+   /* A fd of zero is used as signal to remove the
+* current SOCKET_OPS program
+*/
+   attr.bpf_fd = 0;
+   syscall(__NR_bpf, BPF_PROG_LOAD_SOCKET_OPS, ,
+   sizeof(attr));
+   } else if (!strcmp(argv[k], "-l")) {
+   logFlag = 1;
+   } else if (!strcmp(argv[k], "-h")) {
+   usage(argv[0]);
+   } else if (argv[k][0] == '-') {
+   printf("Error, unknown flag: %s\n", argv[k]);
+   exit(2);
+   } else if (strlen(argv[k]) > 450) {
+   printf("Error, program name too long %d\n",
+  (int) strlen(argv[k]));
+   exit(3);
+   } else {
+   if (!strcmp(argv[k]+strlen(argv[k])-2, ".o"))
+   strcpy(fn, argv[k]);
+   else
+   sprintf(fn, "%s_kern.o", argv[k]);
+   if (logFlag)
+   printf("loading bpf file:%s\n", fn);
+   if (load_bpf_file(fn)) {
+   printf("%s", bpf_log_buf);
+   return 1;
+   }
+   if (logFlag) {
+   printf("TCP BPF Loaded %s\n", fn);
+   printf("%s\n", bpf_log_buf);
+   }
+   attr.bpf_fd = prog_fd[0];
+   error = syscall(__NR_bpf, BPF_PROG_LOAD_SOCKET_OPS,
+   , sizeof(attr));
+   if (error) {
+   printf("ERROR: syscall(BPF_PROG_SOCKET_OPS: 
%d\n",
+   

[RFC PATCH net-next v2 00/15] bpf: BPF support for socket ops

2017-06-15 Thread Lawrence Brakmo
Created a new BPF program type, BPF_PROG_TYPE_SOCKET_OPS, and a corresponding
struct that allows BPF programs of this type to access some of the
socket's fields (such as IP addresses, ports, etc.) and setting
connection parameters such as buffer sizes, initial window, SYN/SYN-ACK
RTOs, etc.

Unlike current BPF program types that expect to be called at a particular
place in the network stack code, SOCKET_OPS program can be called at
different places and use an "op" field to indicate the context. There
are currently two types of operations, those whose effect is through
their return value and those whose effect is through the new
bpf_setsocketop BPF helper function.

Example operands of the first type are:
  BPF_SOCKET_OPS_TIMEOUT_INIT
  BPF_SOCKET_OPS_RWND_INIT
  BPF_SOCKET_OPS_NEEDS_ECN

Example operands of the secont type are:
  BPF_SOCKET_OPS_TCP_CONNECT_CB
  BPF_SOCKET_OPS_ACTIVE_ESTABLISHED_CB
  BPF_SOCKET_OPS_PASSIVE_ESTABLISHED_CB

Current operands are only called during connection establishment so
there should not be any BPF overheads after connection establishment. The
main idea is to use connection information form both hosts, such as IP
addresses and ports to allow setting of per connection parameters to
optimize the connection's peformance.

Alghough there are already 3 mechanisms to set parameters (sysctls,
route metrics and setsockopts), this new mechanism provides some
disticnt advantages. Unlike sysctls, it can set parameters per
connection. In contrast to route metrics, it can also use port numbers
and information provided by a user level program. In addition, it could
set parameters probabilistically for evaluation purposes (i.e. do
something different on 10% of the flows and compare results with the
other 90% of the flows). Also, in cases where IPv6 addresses contain
geographic information, the rules to make changes based on the distance
(or RTT) between the hosts are much easier than route metric rules and
can be global. Finally, unlike setsockopt, it does not require
application changes and it can be updated easily at any time.

Currently there is functionality to load one global BPF program of this
type but I plan to add support for loading per cgroup socket ops BPF
programs in the near future. When that is done, the global program could
be called when a cgroup has no program associated with it.

One question is whether I should add this functionality into David Ahern's
BPF_PROG_TYPE_CGROUP_SOCK or create a new cgroup bpf type. Whereas the
current cgroup_sock type expects to be called only once during a connection's
lifetime, the new socket_ops type could be called multipe times. My preference
is to define a new cgroup BPF program type (BPF_PROG_TYPE_CGROUP_SOCKET_OPS)

This patche also includes sample BPF programs to demostrate the differnet
features.

v2: Formatting changes, rebased to latest net-next

Consists of the following patches:

[RFC PATCH net-next v2 01/15] bpf: BPF support for socket ops
[RFC PATCH net-next v2 02/15] bpf: program to load socketops BPF
[RFC PATCH net-next v2 03/15] bpf: Support for per connection 
[RFC PATCH net-next v2 04/15] bpf: Sample bpf program to set
[RFC PATCH net-next v2 05/15] bpf: Support for setting initial
[RFC PATCH net-next v2 06/15] bpf: Sample bpf program to set initial
[RFC PATCH net-next v2 07/15] bpf: Add setsockopt helper function to
[RFC PATCH net-next v2 08/15] bpf: Add TCP connection BPF callbacks
[RFC PATCH net-next v2 09/15] bpf: Sample BPF program to set buffer
[RFC PATCH net-next v2 10/15] bpf: Add support for changing
[RFC PATCH net-next v2 11/15] bpf: Sample BPF program to set
[RFC PATCH net-next v2 12/15] bpf: Adds support for setting initial
[RFC PATCH net-next v2 13/15] bpf: Sample BPF program to set initial
[RFC PATCH net-next v2 14/15] bpf: Adds support for setting sndcwnd
[RFC PATCH net-next v2 15/15] bpf: Sample bpf program to set sndcwnd

 include/linux/bpf.h   |   6 ++
 include/linux/bpf_types.h |   1 +
 include/linux/filter.h|  10 ++
 include/net/tcp.h |  57 ++-
 include/uapi/linux/bpf.h  |  66 -
 kernel/bpf/syscall.c  |   2 +
 net/core/Makefile |   3 +-
 net/core/filter.c | 258 
++
 net/core/sock_bpfops.c|  67 +
 net/ipv4/tcp.c|   2 +-
 net/ipv4/tcp_cong.c   |  15 ++-
 net/ipv4/tcp_fastopen.c   |   1 +
 net/ipv4/tcp_input.c  |  10 +-
 net/ipv4/tcp_minisocks.c  |   9 +-
 net/ipv4/tcp_output.c |  18 +++-
 samples/bpf/Makefile  |   9 ++
 samples/bpf/bpf_helpers.h |   3 +
 samples/bpf/bpf_load.c|  13 ++-
 samples/bpf/tcp_bpf.c |  81 
 samples/bpf/tcp_bufs_kern.c   |  71 ++
 samples/bpf/tcp_clamp_kern.c  |  88 +
 samples/bpf/tcp_cong_kern.c   |  68 +
 samples/bpf/tcp_iw_kern.c |  73 ++
 samples/bpf/tcp_rwnd_kern.c   |  55 

[RFC PATCH net-next v2 01/15] bpf: BPF support for socket ops

2017-06-15 Thread Lawrence Brakmo
Created a new BPF program type, BPF_PROG_TYPE_SOCKET_OPS, and a corresponding
struct that allows BPF programs of this type to access some of the
socket's fields (such as IP addresses, ports, etc.). Currently there is
functionality to load one global BPF program of this type which can be
called at appropriate times to set relevant connection parameters such
as buffer sizes, SYN and SYN-ACK RTOs, etc., based on connection
information such as IP addresses, port numbers, etc.

Alghough there are already 3 mechanisms to set parameters (sysctls,
route metrics and setsockopts), this new mechanism provides some
disticnt advantages. Unlike sysctls, it can set parameters per
connection. In contrast to route metrics, it can also use port numbers
and information provided by a user level program. In addition, it could
set parameters probabilistically for evaluation purposes (i.e. do
something different on 10% of the flows and compare results with the
other 90% of the flows). Also, in cases where IPv6 addresses contain
geographic information, the rules to make changes based on the distance
(or RTT) between the hosts are much easier than route metric rules and
can be global. Finally, unlike setsockopt, it oes not require
application changes and it can be updated easily at any time.

I plan to add support for loading per cgroup socket ops BPF programs in
the near future. One question is whether I should add this functionality
into David Ahern's BPF_PROG_TYPE_CGROUP_SOCK or create a new cgroup bpf
type. Whereas the current cgroup_sock type expects to be called only once
during a connection's lifetime, the new socket_ops type could be called
multipe times. For example, before sending SYN and SYN-ACKs to set an
appropriate timeout, when the connection is established to set
congestion control, etc. As a result it has "op" field to specify the
type of operation requested.

The purpose of this new program type is to simplify setting connection
parameters, such as buffer sizes, TCP's SYN RTO, etc. For example, it is
easy to use facebook's internal IPv6 addresses to determine if both hosts
of a connection are in the same datacenter. Therefore, it is easy to
write a BPF program to choose a small SYN RTO value when both hosts are
in the same datacenter.

This patch only contains the framework to support the new BPF program
type, following patches add the functionality to set various connection
parameters.

This patch defines a new BPF program type: BPF_PROG_TYPE_SOCKET_OPS
and a new bpf syscall command to load a new program of this type:
BPF_PROG_LOAD_SOCKET_OPS.

Two new corresponding structs (one for the kernel one for the user/BPF
program):

/* kernel version */
struct bpf_socket_ops_kern {
struct sock *sk;
__u32  is_req_sock:1;
__u32  op;
union {
__u32 reply;
__u32 replylong[4];
};
};

/* user version */
struct bpf_socket_ops {
__u32 op;
union {
__u32 reply;
__u32 replylong[4];
};
__u32 family;
__u32 remote_ip4;
__u32 local_ip4;
__u32 remote_ip6[4];
__u32 local_ip6[4];
__u32 remote_port;
__u32 local_port;
};

Currently there are two types of ops. The first type expects the BPF
program to return a value which is then used by the caller (or a
negative value to indicate the operation is not supported). The second
type expects state changes to be done by the BPF program, for example
through a setsockopt BPF helper function, and they ignore the return
value.

The reply fields of the bpf_sockt_ops struct are there in case a bpf
program needs to return a value larger than an integer.

Signed-off-by: Lawrence Brakmo 
---
 include/linux/bpf.h   |   6 ++
 include/linux/bpf_types.h |   1 +
 include/linux/filter.h|  10 +++
 include/net/tcp.h |  27 
 include/uapi/linux/bpf.h  |  28 +
 kernel/bpf/syscall.c  |   2 +
 net/core/Makefile |   3 +-
 net/core/filter.c | 157 ++
 net/core/sock_bpfops.c|  67 
 samples/bpf/bpf_load.c|  13 +++-
 10 files changed, 310 insertions(+), 4 deletions(-)
 create mode 100644 net/core/sock_bpfops.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 1bcbf0a..e164f94 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -362,4 +362,10 @@ extern const struct bpf_func_proto bpf_get_stackid_proto;
 void bpf_user_rnd_init_once(void);
 u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
 
+/* socket_ops related */
+struct sock;
+struct bpf_socket_ops_kern;
+
+int bpf_socket_ops_set_prog(int fd);
+int bpf_socket_ops_call(struct bpf_socket_ops_kern *bpf_socket);
 #endif /* _LINUX_BPF_H */
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 03bf223..ca69d10 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -10,6 +10,7 @@ 

[RFC PATCH net-next v2 03/15] bpf: Support for per connection SYN/SYN-ACK RTOs

2017-06-15 Thread Lawrence Brakmo
This patch adds support for setting a per connection SYN and
SYN_ACK RTOs from within a BPF_SOCKET_OPS program. For example,
to set small RTOs when it is known both hosts are within a
datacenter.

Signed-off-by: Lawrence Brakmo 
---
 include/net/tcp.h| 11 +++
 include/uapi/linux/bpf.h |  3 +++
 net/ipv4/tcp_input.c |  3 ++-
 net/ipv4/tcp_output.c|  2 +-
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9ad0d80..a726486 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2018,4 +2018,15 @@ static inline int tcp_call_bpf(struct sock *sk, bool 
is_req_sock, int op)
 }
 #endif
 
+static inline u32 tcp_timeout_init(struct sock *sk, bool is_req_sock)
+{
+   int timeout;
+
+   timeout = tcp_call_bpf(sk, is_req_sock, BPF_SOCKET_OPS_TIMEOUT_INIT);
+
+   if (timeout <= 0)
+   timeout = TCP_TIMEOUT_INIT;
+   return timeout;
+}
+
 #endif /* _TCP_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 1540336..039f327 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -746,6 +746,9 @@ struct bpf_socket_ops {
  */
 enum {
BPF_SOCKET_OPS_VOID,
+   BPF_SOCKET_OPS_TIMEOUT_INIT,/* Should return SYN-RTO value to use or
+* -1 if default value should be used
+*/
 };
 
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2ab7e2f..0867b05 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6406,7 +6406,8 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
} else {
tcp_rsk(req)->tfo_listener = false;
if (!want_cookie)
-   inet_csk_reqsk_queue_hash_add(sk, req, 
TCP_TIMEOUT_INIT);
+   inet_csk_reqsk_queue_hash_add(sk, req,
+   tcp_timeout_init((struct sock *)req, true));
af_ops->send_synack(sk, dst, , req, ,
!want_cookie ? TCP_SYNACK_NORMAL :
   TCP_SYNACK_COOKIE);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 9a9c395..5e478a1 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3327,7 +3327,7 @@ static void tcp_connect_init(struct sock *sk)
tp->rcv_wup = tp->rcv_nxt;
tp->copied_seq = tp->rcv_nxt;
 
-   inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT;
+   inet_csk(sk)->icsk_rto = tcp_timeout_init(sk, false);
inet_csk(sk)->icsk_retransmits = 0;
tcp_clear_retrans(tp);
 }
-- 
2.9.3



[RFC PATCH net-next v2 04/15] bpf: Sample bpf program to set SYN/SYN-ACK RTOs

2017-06-15 Thread Lawrence Brakmo
The sample BPF program, tcp_synrto_kern.c, sets the SYN and SYN-ACK
RTOs to 10ms when both hosts are within the same datacenter (i.e.
small RTTs) in an environment where common IPv6 prefixes indicate
both hosts are in the same data center.

Signed-off-by: Lawrence Brakmo 
---
 samples/bpf/Makefile  |  1 +
 samples/bpf/tcp_synrto_kern.c | 54 +++
 2 files changed, 55 insertions(+)
 create mode 100644 samples/bpf/tcp_synrto_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index ed6bc75..21cb016 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -113,6 +113,7 @@ always += lwt_len_hist_kern.o
 always += xdp_tx_iptunnel_kern.o
 always += test_map_in_map_kern.o
 always += cookie_uid_helper_example.o
+always += tcp_synrto_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/tcp_synrto_kern.c b/samples/bpf/tcp_synrto_kern.c
new file mode 100644
index 000..03579cb
--- /dev/null
+++ b/samples/bpf/tcp_synrto_kern.c
@@ -0,0 +1,54 @@
+/*
+ * BPF program to set SYN and SYN-ACK RTOs to 10ms when using IPv6 addresses
+ * and the first 5.5 bytes of the IPv6 addresses are the same (in this example
+ * that means both hosts are in the same datacenter.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+#define DEBUG 1
+
+SEC("sockops")
+int bpf_synrto(struct bpf_socket_ops *skops)
+{
+   char fmt1[] = "BPF command: %d\n";
+   char fmt2[] = "  Returning %d\n";
+   int rv = -1;
+   int op;
+
+   /* For testing purposes, only execute rest of BPF program
+* if neither port numberis 55601
+*/
+   if (skops->remote_port != 55601 && skops->local_port != 55601)
+   return -1;
+
+   op = (int) skops->op;
+
+#ifdef DEBUG
+   bpf_trace_printk(fmt1, sizeof(fmt1), op);
+#endif
+
+   /* Check for TIMEOUT_INIT operation and IPv6 addresses */
+   if (op == BPF_SOCKET_OPS_TIMEOUT_INIT &&
+   skops->family == AF_INET6) {
+
+   /* If the first 5.5 bytes of the IPv6 address are the same
+* then both hosts are in the same datacenter
+* so use an RTO of 10ms
+*/
+   if (skops->local_ip6[0] == skops->remote_ip6[0] &&
+   (skops->local_ip6[1] & 0xfff0) ==
+   (skops->remote_ip6[1] & 0xfff0))
+   rv = 10;
+   }
+#ifdef DEBUG
+   bpf_trace_printk(fmt2, sizeof(fmt2), rv);
+#endif
+   return rv;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.9.3



[RFC PATCH net-next v2 11/15] bpf: Sample BPF program to set congestion control

2017-06-15 Thread Lawrence Brakmo
Sample BPF program that sets congestion control to dctcp when both hosts
are within the same datacenter. In this example that is assumed to be
when they have the first 5.5 bytes of their IPv6 address are the same.

Signed-off-by: Lawrence Brakmo 
---
 samples/bpf/Makefile|  1 +
 samples/bpf/tcp_cong_kern.c | 68 +
 2 files changed, 69 insertions(+)
 create mode 100644 samples/bpf/tcp_cong_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 942c7c7..eb324e0 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -116,6 +116,7 @@ always += cookie_uid_helper_example.o
 always += tcp_synrto_kern.o
 always += tcp_rwnd_kern.o
 always += tcp_bufs_kern.o
+always += tcp_cong_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/tcp_cong_kern.c b/samples/bpf/tcp_cong_kern.c
new file mode 100644
index 000..24a3bc4
--- /dev/null
+++ b/samples/bpf/tcp_cong_kern.c
@@ -0,0 +1,68 @@
+/*
+ * BPF program to set congestion control to dctcp when both hosts are
+ * in the same datacenter (as deteremined by IPv6 prefix).
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+#define DEBUG 1
+
+SEC("sockops")
+int bpf_cong(struct bpf_socket_ops *skops)
+{
+   char fmt1[] = "BPF command: %d\n";
+   char fmt2[] = "  Returning %d\n";
+   char cong[] = "dctcp";
+   int rv = 0;
+   int op;
+
+   /* For testing purposes, only execute rest of BPF program
+* if neither port numberis 55601
+*/
+   if (skops->remote_port != 55601 && skops->local_port != 55601)
+   return -1;
+
+   op = (int) skops->op;
+
+#ifdef DEBUG
+   bpf_trace_printk(fmt1, sizeof(fmt1), op);
+#endif
+
+   /* Check if both hosts are in the same datacenter. For this
+* example they are if the 1st 5.5 bytes in the IPv6 address
+* are the same.
+*/
+   if (skops->family == AF_INET6 &&
+   skops->local_ip6[0] == skops->remote_ip6[0] &&
+   (skops->local_ip6[1] & 0xfff0) ==
+   (skops->remote_ip6[1] & 0xfff0)) {
+   switch (op) {
+   case BPF_SOCKET_OPS_NEEDS_ECN:
+   rv = 1;
+   break;
+   case BPF_SOCKET_OPS_ACTIVE_ESTABLISHED_CB:
+   rv = bpf_setsockopt(skops, SOL_TCP, TCP_CONGESTION,
+   cong, sizeof(cong));
+   break;
+   case BPF_SOCKET_OPS_PASSIVE_ESTABLISHED_CB:
+   rv = bpf_setsockopt(skops, SOL_TCP, TCP_CONGESTION,
+   cong, sizeof(cong));
+   break;
+   default:
+   rv = -1;
+   }
+   } else {
+   rv = -1;
+   }
+#ifdef DEBUG
+   bpf_trace_printk(fmt2, sizeof(fmt2), rv);
+#endif
+   return rv;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.9.3



[RFC PATCH net-next v2 15/15] bpf: Sample bpf program to set sndcwnd clamp

2017-06-15 Thread Lawrence Brakmo
Sample BPF program, tcp_clamp_kern.c, to demostrate the use
of setting the sndcwnd clamp. This program assumes that if the
first 5.5 bytes of the host's IPv6 addresses are the same, then
the hosts are in the same datacenter and sets sndcwnd clamp to
100 packets, SYN and SYN-ACK RTOs to 10ms and send/receive buffer
sizes to 150KB.

Signed-off-by: Lawrence Brakmo 
---
 samples/bpf/Makefile |  1 +
 samples/bpf/tcp_clamp_kern.c | 88 
 2 files changed, 89 insertions(+)
 create mode 100644 samples/bpf/tcp_clamp_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 3ec96a0..59975c3 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -118,6 +118,7 @@ always += tcp_rwnd_kern.o
 always += tcp_bufs_kern.o
 always += tcp_cong_kern.o
 always += tcp_iw_kern.o
+always += tcp_clamp_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/tcp_clamp_kern.c b/samples/bpf/tcp_clamp_kern.c
new file mode 100644
index 000..e96eadd
--- /dev/null
+++ b/samples/bpf/tcp_clamp_kern.c
@@ -0,0 +1,88 @@
+/*
+ * Sample BPF program to set send and receive buffers to 150KB, sndcwnd clamp
+ * to 100 packets and SYN and SYN_ACK RTOs to 10ms when both hosts are within
+ * the same datacenter. For his example, we assume they are within the same
+ * datacenter when the first 5.5 bytes of their IPv6 addresses are the same.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+#define DEBUG 1
+
+SEC("sockops")
+int bpf_clamp(struct bpf_socket_ops *skops)
+{
+   char fmt1[] = "BPF command: %d\n";
+   char fmt2[] = "  Returning %d\n";
+   int bufsize = 15;
+   int to_init = 10;
+   int clamp = 100;
+   int rv = 0;
+   int op;
+
+   /* For testing purposes, only execute rest of BPF program
+* if neither port numberis 55601
+*/
+   if (skops->remote_port != 55601 && skops->local_port != 55601)
+   return -1;
+
+   op = (int) skops->op;
+
+#ifdef DEBUG
+   bpf_trace_printk(fmt1, sizeof(fmt1), op);
+#endif
+
+   /* Check that both hosts are within same datacenter. For this example
+* it is the case when the first 5.5 bytes of their IPv6 addresses are
+* the same.
+*/
+   if (skops->family == AF_INET6 &&
+   skops->local_ip6[0] == skops->remote_ip6[0] &&
+   (skops->local_ip6[1] & 0xfff0) ==
+   (skops->remote_ip6[1] & 0xfff0)) {
+   switch (op) {
+   case BPF_SOCKET_OPS_TIMEOUT_INIT:
+   rv = to_init;
+   break;
+   case BPF_SOCKET_OPS_TCP_CONNECT_CB:
+   /* Set sndbuf and rcvbuf of active connections */
+   rv = bpf_setsockopt(skops, SOL_SOCKET, SO_SNDBUF,
+   , sizeof(bufsize));
+   rv = -rv*100 + bpf_setsockopt(skops, SOL_SOCKET,
+ SO_RCVBUF, ,
+ sizeof(bufsize));
+   break;
+   case BPF_SOCKET_OPS_ACTIVE_ESTABLISHED_CB:
+   rv = bpf_setsockopt(skops, SOL_TCP,
+   TCP_BPF_SNDCWND_CLAMP,
+   , sizeof(clamp));
+   break;
+   case BPF_SOCKET_OPS_PASSIVE_ESTABLISHED_CB:
+   /* Set sndbuf and rcvbuf of passive connections */
+   rv = bpf_setsockopt(skops, SOL_TCP,
+   TCP_BPF_SNDCWND_CLAMP,
+   , sizeof(clamp));
+   rv = -rv*100 + bpf_setsockopt(skops, SOL_SOCKET,
+ SO_SNDBUF, ,
+ sizeof(bufsize));
+   rv = -rv*200 + bpf_setsockopt(skops, SOL_SOCKET,
+ SO_RCVBUF, ,
+ sizeof(bufsize));
+   break;
+   default:
+   rv = -1;
+   }
+   } else {
+   rv = -1;
+   }
+#ifdef DEBUG
+   bpf_trace_printk(fmt2, sizeof(fmt2), rv);
+#endif
+   return rv;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.9.3



[RFC PATCH net-next v2 09/15] bpf: Sample BPF program to set buffer sizes

2017-06-15 Thread Lawrence Brakmo
This patch contains a BPF program to set initial receive window to
40 packets and send and receive buffers to 1.5MB. This would usually
be done after doing appropriate checks that indicate the hosts are
far enough away (i.e. large RTT).

Signed-off-by: Lawrence Brakmo 
---
 samples/bpf/Makefile|  1 +
 samples/bpf/tcp_bufs_kern.c | 71 +
 2 files changed, 72 insertions(+)
 create mode 100644 samples/bpf/tcp_bufs_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 9aca209..942c7c7 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -115,6 +115,7 @@ always += test_map_in_map_kern.o
 always += cookie_uid_helper_example.o
 always += tcp_synrto_kern.o
 always += tcp_rwnd_kern.o
+always += tcp_bufs_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/tcp_bufs_kern.c b/samples/bpf/tcp_bufs_kern.c
new file mode 100644
index 000..37b2923
--- /dev/null
+++ b/samples/bpf/tcp_bufs_kern.c
@@ -0,0 +1,71 @@
+/*
+ * BPF program to set initial receive window to 40 packets and send
+ * and receive buffers to 1.5MB. This would usually be done after
+ * doing appropriate checks that indicate the hosts are far enough
+ * away (i.e. large RTT).
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+#define DEBUG 1
+
+SEC("sockops")
+int bpf_bufs(struct bpf_socket_ops *skops)
+{
+   char fmt1[] = "BPF command: %d\n";
+   char fmt2[] = "  Returning %d\n";
+   int bufsize = 150;
+   int rwnd_init = 40;
+   int rv = 0;
+   int op;
+
+   /* For testing purposes, only execute rest of BPF program
+* if neither port numberis 55601
+*/
+   if (skops->remote_port != 55601 && skops->local_port != 55601)
+   return -1;
+
+   op = (int) skops->op;
+
+#ifdef DEBUG
+   bpf_trace_printk(fmt1, sizeof(fmt1), op);
+#endif
+
+   /* Usually there would be a check to insure the hosts are far
+* from each other so it makes sense to increase buffer sizes
+*/
+   switch (op) {
+   case BPF_SOCKET_OPS_RWND_INIT:
+   rv = rwnd_init;
+   break;
+   case BPF_SOCKET_OPS_TCP_CONNECT_CB:
+   /* Set sndbuf and rcvbuf of active connections */
+   rv = bpf_setsockopt(skops, SOL_SOCKET, SO_SNDBUF, ,
+   sizeof(bufsize));
+   rv = -rv*100 + bpf_setsockopt(skops, SOL_SOCKET, SO_RCVBUF,
+ , sizeof(bufsize));
+   break;
+   case BPF_SOCKET_OPS_ACTIVE_ESTABLISHED_CB:
+   /* Nothing to do */
+   break;
+   case BPF_SOCKET_OPS_PASSIVE_ESTABLISHED_CB:
+   /* Set sndbuf and rcvbuf of passive connections */
+   rv = bpf_setsockopt(skops, SOL_SOCKET, SO_SNDBUF, ,
+   sizeof(bufsize));
+   rv = -rv*100 + bpf_setsockopt(skops, SOL_SOCKET, SO_RCVBUF,
+ , sizeof(bufsize));
+   break;
+   default:
+   rv = -1;
+   }
+#ifdef DEBUG
+   bpf_trace_printk(fmt2, sizeof(fmt2), rv);
+#endif
+   return rv;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.9.3



[RFC PATCH net-next v2 07/15] bpf: Add setsockopt helper function to bpf

2017-06-15 Thread Lawrence Brakmo
Added support for calling a subset of socket setsockopts from
BPF_PROG_TYPE_SOCKET_OPS programs. The code was duplicated rather
than making the changes to call the socket setsockopt function because
the changes required would have been larger.

The ops supported are:
  SO_RCVBUF
  SO_SNDBUF
  SO_MAX_PACING_RATE
  SO_PRIORITY
  SO_RCVLOWAT
  SO_MARK

Signed-off-by: Lawrence Brakmo 
---
 include/uapi/linux/bpf.h  | 14 -
 net/core/filter.c | 77 ++-
 samples/bpf/bpf_helpers.h |  3 ++
 3 files changed, 92 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d945336..8accb4d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -520,6 +520,17 @@ union bpf_attr {
  * Set full skb->hash.
  * @skb: pointer to skb
  * @hash: hash to set
+ *
+ * int bpf_setsockopt(bpf_socket, level, optname, optval, optlen)
+ * Calls setsockopt. Not all opts are available, only those with
+ * integer optvals plus TCP_CONGESTION.
+ * Supported levels: SOL_SOCKET and IPROTO_TCP
+ * @bpf_socket: pointer to bpf_socket
+ * @level: SOL_SOCKET or IPROTO_TCP
+ * @optname: option name
+ * @optval: pointer to option value
+ * @optlen: length of optval in byes
+ * Return: 0 or negative error
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -570,7 +581,8 @@ union bpf_attr {
FN(probe_read_str), \
FN(get_socket_cookie),  \
FN(get_socket_uid), \
-   FN(set_hash),
+   FN(set_hash),   \
+   FN(setsockopt),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 7466f55..9ff567c 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -54,6 +54,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /**
  * sk_filter_trim_cap - run a packet through a socket filter
@@ -2671,6 +2672,69 @@ static const struct bpf_func_proto 
bpf_get_socket_uid_proto = {
.arg1_type  = ARG_PTR_TO_CTX,
 };
 
+BPF_CALL_5(bpf_setsockopt, struct bpf_socket_ops_kern *, bpf_socket,
+  int, level, int, optname, char *, optval, int, optlen)
+{
+   struct sock *sk = bpf_socket->sk;
+   int ret = 0;
+   int val;
+
+   if (bpf_socket->is_req_sock)
+   return -EINVAL;
+
+   if (level == SOL_SOCKET) {
+   /* Only some socketops are supported */
+   val = *((int *)optval);
+
+   switch (optname) {
+   case SO_RCVBUF:
+   sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
+   sk->sk_rcvbuf = max_t(int, val * 2, SOCK_MIN_RCVBUF);
+   break;
+   case SO_SNDBUF:
+   sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
+   sk->sk_sndbuf = max_t(int, val * 2, SOCK_MIN_SNDBUF);
+   break;
+   case SO_MAX_PACING_RATE:
+   sk->sk_max_pacing_rate = val;
+   sk->sk_pacing_rate = min(sk->sk_pacing_rate,
+sk->sk_max_pacing_rate);
+   break;
+   case SO_PRIORITY:
+   sk->sk_priority = val;
+   break;
+   case SO_RCVLOWAT:
+   if (val < 0)
+   val = INT_MAX;
+   sk->sk_rcvlowat = val ? : 1;
+   break;
+   case SO_MARK:
+   sk->sk_mark = val;
+   break;
+   default:
+   ret = -EINVAL;
+   }
+   } else if (level == SOL_TCP &&
+  bpf_socket->sk->sk_prot->setsockopt == tcp_setsockopt) {
+   /* Place holder */
+   ret = -EINVAL;
+   } else {
+   ret = -EINVAL;
+   }
+   return ret;
+}
+
+static const struct bpf_func_proto bpf_setsockopt_proto = {
+   .func   = bpf_setsockopt,
+   .gpl_only   = true,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_CTX,
+   .arg2_type  = ARG_ANYTHING,
+   .arg3_type  = ARG_ANYTHING,
+   .arg4_type  = ARG_PTR_TO_MEM,
+   .arg5_type  = ARG_CONST_SIZE_OR_ZERO,
+};
+
 static const struct bpf_func_proto *
 bpf_base_func_proto(enum bpf_func_id func_id)
 {
@@ -2822,6 +2886,17 @@ lwt_inout_func_proto(enum bpf_func_id func_id)
 }
 
 static const struct bpf_func_proto *
+   socket_ops_func_proto(enum bpf_func_id func_id)
+{
+   switch (func_id) {
+   case BPF_FUNC_setsockopt:
+   return _setsockopt_proto;
+   default:
+   return bpf_base_func_proto(func_id);
+   }
+}
+
+static const struct 

[RFC PATCH net-next v2 05/15] bpf: Support for setting initial receive window

2017-06-15 Thread Lawrence Brakmo
This patch adds suppport for setting the initial advertized window from
within a BPF_SOCKET_OPS program. This can be used to support larger
initial cwnd values in environments where it is known to be safe.

Signed-off-by: Lawrence Brakmo 
---
 include/net/tcp.h| 10 ++
 include/uapi/linux/bpf.h |  4 
 net/ipv4/tcp_minisocks.c |  9 -
 net/ipv4/tcp_output.c|  7 ++-
 4 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index a726486..29c27dc 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2029,4 +2029,14 @@ static inline u32 tcp_timeout_init(struct sock *sk, bool 
is_req_sock)
return timeout;
 }
 
+static inline u32 tcp_rwnd_init_bpf(struct sock *sk, bool is_req_sock)
+{
+   int rwnd;
+
+   rwnd = tcp_call_bpf(sk, is_req_sock, BPF_SOCKET_OPS_RWND_INIT);
+
+   if (rwnd < 0)
+   rwnd = 0;
+   return rwnd;
+}
 #endif /* _TCP_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 039f327..d945336 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -749,6 +749,10 @@ enum {
BPF_SOCKET_OPS_TIMEOUT_INIT,/* Should return SYN-RTO value to use or
 * -1 if default value should be used
 */
+   BPF_SOCKET_OPS_RWND_INIT,   /* Should return initial advertized
+* window (in packets) or -1 if default
+* value should be used
+*/
 };
 
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index d30ee31..bbaf3c6 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -351,6 +351,7 @@ void tcp_openreq_init_rwin(struct request_sock *req,
int full_space = tcp_full_space(sk_listener);
u32 window_clamp;
__u8 rcv_wscale;
+   u32 rcv_wnd;
int mss;
 
mss = tcp_mss_clamp(tp, dst_metric_advmss(dst));
@@ -363,6 +364,12 @@ void tcp_openreq_init_rwin(struct request_sock *req,
(req->rsk_window_clamp > full_space || req->rsk_window_clamp == 0))
req->rsk_window_clamp = full_space;
 
+   rcv_wnd = tcp_rwnd_init_bpf((struct sock *)req, true);
+   if (rcv_wnd == 0)
+   rcv_wnd = dst_metric(dst, RTAX_INITRWND);
+   else if (full_space < rcv_wnd * mss)
+   full_space = rcv_wnd * mss;
+
/* tcp_full_space because it is guaranteed to be the first packet */
tcp_select_initial_window(full_space,
mss - (ireq->tstamp_ok ? TCPOLEN_TSTAMP_ALIGNED : 0),
@@ -370,7 +377,7 @@ void tcp_openreq_init_rwin(struct request_sock *req,
>rsk_window_clamp,
ireq->wscale_ok,
_wscale,
-   dst_metric(dst, RTAX_INITRWND));
+   rcv_wnd);
ireq->rcv_wscale = rcv_wscale;
 }
 EXPORT_SYMBOL(tcp_openreq_init_rwin);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5e478a1..e5f623f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3267,6 +3267,7 @@ static void tcp_connect_init(struct sock *sk)
const struct dst_entry *dst = __sk_dst_get(sk);
struct tcp_sock *tp = tcp_sk(sk);
__u8 rcv_wscale;
+   u32 rcv_wnd;
 
/* We'll fix this up when we get a response from the other end.
 * See tcp_input.c:tcp_rcv_state_process case TCP_SYN_SENT.
@@ -3300,13 +3301,17 @@ static void tcp_connect_init(struct sock *sk)
(tp->window_clamp > tcp_full_space(sk) || tp->window_clamp == 0))
tp->window_clamp = tcp_full_space(sk);
 
+   rcv_wnd = tcp_rwnd_init_bpf(sk, false);
+   if (rcv_wnd == 0)
+   rcv_wnd = dst_metric(dst, RTAX_INITRWND);
+
tcp_select_initial_window(tcp_full_space(sk),
  tp->advmss - (tp->rx_opt.ts_recent_stamp ? 
tp->tcp_header_len - sizeof(struct tcphdr) : 0),
  >rcv_wnd,
  >window_clamp,
  sock_net(sk)->ipv4.sysctl_tcp_window_scaling,
  _wscale,
- dst_metric(dst, RTAX_INITRWND));
+ rcv_wnd);
 
tp->rx_opt.rcv_wscale = rcv_wscale;
tp->rcv_ssthresh = tp->rcv_wnd;
-- 
2.9.3



[RFC PATCH net-next v2 14/15] bpf: Adds support for setting sndcwnd clamp

2017-06-15 Thread Lawrence Brakmo
Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_SNDCWND_CLAMP, which
sets the initial congestion window. It is useful to limit the sndcwnd
when the host are close to each other (small RTT).

Signed-off-by: Lawrence Brakmo 
---
 include/uapi/linux/bpf.h | 1 +
 net/core/filter.c| 7 +++
 2 files changed, 8 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ecb28e2..4d033eb 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -782,5 +782,6 @@ enum {
 };
 
 #define TCP_BPF_IW 1001/* Set TCP initial congestion window */
+#define TCP_BPF_SNDCWND_CLAMP  1002/* Set sndcwnd_clamp */
 
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/net/core/filter.c b/net/core/filter.c
index a1d9214..f484fef 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2733,6 +2733,13 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_socket_ops_kern *, 
bpf_socket,
else
tp->snd_cwnd = val;
break;
+   case TCP_BPF_SNDCWND_CLAMP:
+   if (val <= 0) {
+   ret = -EINVAL;
+   } else {
+   tp->snd_cwnd_clamp = val;
+   tp->snd_ssthresh = val;
+   }
default:
ret = -EINVAL;
}
-- 
2.9.3



[RFC PATCH net-next v2 08/15] bpf: Add TCP connection BPF callbacks

2017-06-15 Thread Lawrence Brakmo
Added callbacks to BPF SOCKET_OPS type program before an active
connection is intialized and after a passive or active connection is
established.

The following patch demostrates how they can be used to set send and
receive buffer sizes.

Signed-off-by: Lawrence Brakmo 
---
 include/uapi/linux/bpf.h | 11 +++
 net/ipv4/tcp_fastopen.c  |  1 +
 net/ipv4/tcp_input.c |  4 +++-
 net/ipv4/tcp_output.c|  1 +
 4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 8accb4d..c3490d3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -765,6 +765,17 @@ enum {
 * window (in packets) or -1 if default
 * value should be used
 */
+   BPF_SOCKET_OPS_TCP_CONNECT_CB,  /* Calls BPF program right before an
+* active connection is initialized
+*/
+   BPF_SOCKET_OPS_ACTIVE_ESTABLISHED_CB,   /* Calls BPF program when an
+* active connection is
+* established
+*/
+   BPF_SOCKET_OPS_PASSIVE_ESTABLISHED_CB,  /* Calls BPF program when a
+* passive connection is
+* established
+*/
 };
 
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 4af82b9..c3ec4ec 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -221,6 +221,7 @@ static struct sock *tcp_fastopen_create_child(struct sock 
*sk,
tcp_init_congestion_control(child);
tcp_mtup_init(child);
tcp_init_metrics(child);
+   tcp_call_bpf(child, false, BPF_SOCKET_OPS_PASSIVE_ESTABLISHED_CB);
tcp_init_buffer_space(child);
 
tp->rcv_nxt = TCP_SKB_CB(skb)->seq + 1;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 0867b05..e0d688a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5571,7 +5571,7 @@ void tcp_finish_connect(struct sock *sk, struct sk_buff 
*skb)
icsk->icsk_af_ops->rebuild_header(sk);
 
tcp_init_metrics(sk);
-
+   tcp_call_bpf(sk, false, BPF_SOCKET_OPS_ACTIVE_ESTABLISHED_CB);
tcp_init_congestion_control(sk);
 
/* Prevent spurious tcp_cwnd_restart() on first data
@@ -5977,6 +5977,8 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff 
*skb)
} else {
/* Make sure socket is routed, for correct metrics. */
icsk->icsk_af_ops->rebuild_header(sk);
+   tcp_call_bpf(sk, false,
+BPF_SOCKET_OPS_PASSIVE_ESTABLISHED_CB);
tcp_init_congestion_control(sk);
 
tcp_mtup_init(sk);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index e5f623f..9124d3d 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3445,6 +3445,7 @@ int tcp_connect(struct sock *sk)
struct sk_buff *buff;
int err;
 
+   tcp_call_bpf(sk, false, BPF_SOCKET_OPS_TCP_CONNECT_CB);
tcp_connect_init(sk);
 
if (unlikely(tp->repair)) {
-- 
2.9.3



[RFC PATCH net-next v2 10/15] bpf: Add support for changing congestion control

2017-06-15 Thread Lawrence Brakmo
Added support for changing congestion control for SOCKET_OPS bps
programs through the setsockopt bpf helper function. It also adds
a new SOCKET_OPS op, BPF_SOCKET_OPS_NEEDS_ECN, that is needed for
congestion controls, like dctcp, that need to enable ECN in the
SYN packets.

Signed-off-by: Lawrence Brakmo 
---
 include/net/tcp.h|  9 -
 include/uapi/linux/bpf.h |  3 +++
 net/core/filter.c| 11 +--
 net/ipv4/tcp.c   |  2 +-
 net/ipv4/tcp_cong.c  | 15 ++-
 net/ipv4/tcp_input.c |  3 ++-
 net/ipv4/tcp_output.c|  8 +---
 7 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 29c27dc..371b1bd 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1001,7 +1001,9 @@ void tcp_get_default_congestion_control(char *name);
 void tcp_get_available_congestion_control(char *buf, size_t len);
 void tcp_get_allowed_congestion_control(char *buf, size_t len);
 int tcp_set_allowed_congestion_control(char *allowed);
-int tcp_set_congestion_control(struct sock *sk, const char *name);
+int tcp_set_congestion_control(struct sock *sk, const char *name, bool load);
+void tcp_reinit_congestion_control(struct sock *sk,
+  const struct tcp_congestion_ops *ca);
 u32 tcp_slow_start(struct tcp_sock *tp, u32 acked);
 void tcp_cong_avoid_ai(struct tcp_sock *tp, u32 w, u32 acked);
 
@@ -2039,4 +2041,9 @@ static inline u32 tcp_rwnd_init_bpf(struct sock *sk, bool 
is_req_sock)
rwnd = 0;
return rwnd;
 }
+
+static inline bool tcp_bpf_ca_needs_ecn(struct sock *sk)
+{
+   return (tcp_call_bpf(sk, true, BPF_SOCKET_OPS_NEEDS_ECN) == 1);
+}
 #endif /* _TCP_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c3490d3..8d1d2b7 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -776,6 +776,9 @@ enum {
 * passive connection is
 * established
 */
+   BPF_SOCKET_OPS_NEEDS_ECN,   /* If connection's congestion control
+* needs ECN
+*/
 };
 
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/net/core/filter.c b/net/core/filter.c
index 9ff567c..4325aba 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2716,8 +2716,15 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_socket_ops_kern *, 
bpf_socket,
}
} else if (level == SOL_TCP &&
   bpf_socket->sk->sk_prot->setsockopt == tcp_setsockopt) {
-   /* Place holder */
-   ret = -EINVAL;
+   if (optname == TCP_CONGESTION) {
+   ret = tcp_set_congestion_control(sk, optval, false);
+   if (!ret && bpf_socket->op > BPF_SOCKET_OPS_NEEDS_ECN)
+   /* replacing an existing ca */
+   tcp_reinit_congestion_control(sk,
+   inet_csk(sk)->icsk_ca_ops);
+   } else {
+   ret = -EINVAL;
+   }
} else {
ret = -EINVAL;
}
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index cc8fd8b..07984ea 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2478,7 +2478,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
name[val] = 0;
 
lock_sock(sk);
-   err = tcp_set_congestion_control(sk, name);
+   err = tcp_set_congestion_control(sk, name, true);
release_sock(sk);
return err;
}
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 324c9bc..e6f3717 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -189,8 +189,8 @@ void tcp_init_congestion_control(struct sock *sk)
INET_ECN_dontxmit(sk);
 }
 
-static void tcp_reinit_congestion_control(struct sock *sk,
- const struct tcp_congestion_ops *ca)
+void tcp_reinit_congestion_control(struct sock *sk,
+  const struct tcp_congestion_ops *ca)
 {
struct inet_connection_sock *icsk = inet_csk(sk);
 
@@ -334,7 +334,7 @@ int tcp_set_allowed_congestion_control(char *val)
 }
 
 /* Change congestion control for socket */
-int tcp_set_congestion_control(struct sock *sk, const char *name)
+int tcp_set_congestion_control(struct sock *sk, const char *name, bool load)
 {
struct inet_connection_sock *icsk = inet_csk(sk);
const struct tcp_congestion_ops *ca;
@@ -344,7 +344,10 @@ int tcp_set_congestion_control(struct sock *sk, const char 
*name)
return -EPERM;
 
rcu_read_lock();
-   ca = __tcp_ca_find_autoload(name);
+   if (!load)
+   ca = tcp_ca_find(name);
+   else
+   ca = 

[RFC PATCH net-next v2 12/15] bpf: Adds support for setting initial cwnd

2017-06-15 Thread Lawrence Brakmo
Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_IW, which sets the
initial congestion window. This can be used when the hosts are far
apart (large RTTs) and it is safe to start with a large inital cwnd.

Signed-off-by: Lawrence Brakmo 
---
 include/uapi/linux/bpf.h |  2 ++
 net/core/filter.c| 14 +-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 8d1d2b7..ecb28e2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -781,4 +781,6 @@ enum {
 */
 };
 
+#define TCP_BPF_IW 1001/* Set TCP initial congestion window */
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/net/core/filter.c b/net/core/filter.c
index 4325aba..a1d9214 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2723,7 +2723,19 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_socket_ops_kern *, 
bpf_socket,
tcp_reinit_congestion_control(sk,
inet_csk(sk)->icsk_ca_ops);
} else {
-   ret = -EINVAL;
+   struct tcp_sock *tp = tcp_sk(sk);
+
+   val = *((int *)optval);
+   switch (optname) {
+   case TCP_BPF_IW:
+   if (val <= 0 || tp->data_segs_out > 0)
+   ret = -EINVAL;
+   else
+   tp->snd_cwnd = val;
+   break;
+   default:
+   ret = -EINVAL;
+   }
}
} else {
ret = -EINVAL;
-- 
2.9.3



[RFC PATCH net-next v2 06/15] bpf: Sample bpf program to set initial window

2017-06-15 Thread Lawrence Brakmo
The sample bpf program, tcp_rwnd_kern.c, sets the initial
advertized window to 40 packets in an environment where
distinct IPv6 prefixes indicate that both hosts are not
in the same data center.

Signed-off-by: Lawrence Brakmo 
---
 samples/bpf/Makefile|  1 +
 samples/bpf/tcp_rwnd_kern.c | 55 +
 2 files changed, 56 insertions(+)
 create mode 100644 samples/bpf/tcp_rwnd_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 21cb016..9aca209 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -114,6 +114,7 @@ always += xdp_tx_iptunnel_kern.o
 always += test_map_in_map_kern.o
 always += cookie_uid_helper_example.o
 always += tcp_synrto_kern.o
+always += tcp_rwnd_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/tcp_rwnd_kern.c b/samples/bpf/tcp_rwnd_kern.c
new file mode 100644
index 000..f01b6d4
--- /dev/null
+++ b/samples/bpf/tcp_rwnd_kern.c
@@ -0,0 +1,55 @@
+/*
+ * BPF program to set initial receive window to 40 packets when using IPv6
+ * and the first 5.5 bytes of the IPv6 addresses are not the same (in this
+ * example that means both hosts are not the same datacenter.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+#define DEBUG 1
+
+SEC("sockops")
+int bpf_rwnd(struct bpf_socket_ops *skops)
+{
+   char fmt1[] = "BPF command: %d\n";
+   char fmt2[] = "  Returning %d\n";
+   int rv = -1;
+   int op;
+
+   /* For testing purposes, only execute rest of BPF program
+* if neither port numberis 55601
+*/
+   if (skops->remote_port != 55601 && skops->local_port != 55601)
+   return -1;
+
+   op = (int) skops->op;
+
+#ifdef DEBUG
+   bpf_trace_printk(fmt1, sizeof(fmt1), op);
+#endif
+
+   /* Check for RWND_INIT operation and IPv6 addresses */
+   if (op == BPF_SOCKET_OPS_RWND_INIT &&
+   skops->family == AF_INET6) {
+
+   /* If the first 5.5 bytes of the IPv6 address are not the same
+* then both hosts are not in the same datacenter
+* so use a larger initial advertized window (40 packets)
+*/
+   if (skops->local_ip6[0] != skops->remote_ip6[0] ||
+   (skops->local_ip6[1] & 0xf000) !=
+   (skops->remote_ip6[1] & 0xf000))
+   bpf_trace_printk(fmt2, sizeof(fmt2), -1);
+   rv = 40;
+   }
+#ifdef DEBUG
+   bpf_trace_printk(fmt2, sizeof(fmt2), rv);
+#endif
+   return rv;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.9.3



[RFC PATCH net-next v2 13/15] bpf: Sample BPF program to set initial cwnd

2017-06-15 Thread Lawrence Brakmo
Sample BPF program that assumes hosts are far away (i.e. large RTTs)
and sets initial cwnd and initial receive window to 40 packets,
send and receive buffers to 1.5MB.

In practice there would be a test to insure the hosts are actually
far enough away.

Signed-off-by: Lawrence Brakmo 
---
 samples/bpf/Makefile  |  1 +
 samples/bpf/tcp_iw_kern.c | 73 +++
 2 files changed, 74 insertions(+)
 create mode 100644 samples/bpf/tcp_iw_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index eb324e0..3ec96a0 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -117,6 +117,7 @@ always += tcp_synrto_kern.o
 always += tcp_rwnd_kern.o
 always += tcp_bufs_kern.o
 always += tcp_cong_kern.o
+always += tcp_iw_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/tcp_iw_kern.c b/samples/bpf/tcp_iw_kern.c
new file mode 100644
index 000..07962c8
--- /dev/null
+++ b/samples/bpf/tcp_iw_kern.c
@@ -0,0 +1,73 @@
+/*
+ * BPF program to set initial congestion window and initial receive
+ * window to 40 packets and send and receive buffers to 1.5MB. This
+ * would usually be done after doing appropriate checks that indicate
+ * the hosts are far enough away (i.e. large RTT).
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+#define DEBUG 1
+
+SEC("sockops")
+int bpf_iw(struct bpf_socket_ops *skops)
+{
+   char fmt1[] = "BPF command: %d\n";
+   char fmt2[] = "  Returning %d\n";
+   int bufsize = 150;
+   int rwnd_init = 40;
+   int iw = 40;
+   int rv = 0;
+   int op;
+
+   /* For testing purposes, only execute rest of BPF program
+* if neither port numberis 55601
+*/
+   if (skops->remote_port != 55601 && skops->local_port != 55601)
+   return -1;
+
+   op = (int) skops->op;
+
+#ifdef DEBUG
+   bpf_trace_printk(fmt1, sizeof(fmt1), op);
+#endif
+
+   /* Usually there would be a check to insure the hosts are far
+* from each other so it makes sense to increase buffer sizes
+*/
+   switch (op) {
+   case BPF_SOCKET_OPS_RWND_INIT:
+   rv = rwnd_init;
+   break;
+   case BPF_SOCKET_OPS_TCP_CONNECT_CB:
+   /* Set sndbuf and rcvbuf of active connections */
+   rv = bpf_setsockopt(skops, SOL_SOCKET, SO_SNDBUF, ,
+   sizeof(bufsize));
+   rv = -rv*100 + bpf_setsockopt(skops, SOL_SOCKET, SO_RCVBUF,
+ , sizeof(bufsize));
+   break;
+   case BPF_SOCKET_OPS_ACTIVE_ESTABLISHED_CB:
+   rv = bpf_setsockopt(skops, SOL_TCP, TCP_BPF_IW, ,
+   sizeof(iw));
+   break;
+   case BPF_SOCKET_OPS_PASSIVE_ESTABLISHED_CB:
+   /* Set sndbuf and rcvbuf of passive connections */
+   rv = bpf_setsockopt(skops, SOL_SOCKET, SO_SNDBUF, ,
+   sizeof(bufsize));
+   rv = -rv*100 + bpf_setsockopt(skops, SOL_SOCKET, SO_RCVBUF,
+ , sizeof(bufsize));
+   break;
+   default:
+   rv = -1;
+   }
+#ifdef DEBUG
+   bpf_trace_printk(fmt2, sizeof(fmt2), rv);
+#endif
+   return rv;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.9.3



  1   2   3   >